Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
@ 2025-03-13 12:05 Gerd Hoffman
  2025-03-13 13:31 ` Jörg Rödel
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-13 12:05 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

> The devil lies in "same order of setup calls". Without a way to define
> this order through the vmfwupdate interface there is a lot of implicit
> knowledge required about how KVM/QEMU setup the TEE context to calculate
> the expected after-reset launch measurement. Even worse, the exact way
> this setup is done then becomes ABI, because any change in this process
> will lead to a different launch measurement.
> 
> Some examples of initial memory layout calls which influence the launch
> measurment:
> 
> 	1) Launch VMSA(s) (SEV-SNP only, though I believe on TDX the
> 	   initial register state can also be changed to some
> 	   degree).
> 	2) Pre-Validated/Accepted memory regions (TDX and SEV-SNP). This
> 	   is especially important, as different FWs have different
> 	   requirements on what memory is pre-validated, zeroed, etc.
> 	3) Zero-pages, measured and unmeasured (TDX and SEV-SNP).
> 	4) Position of the CPUID page, secrets page, and id-blob
> 	   (SEV-SNP).
> 	5) Pre-populated data (TDX and SEV-SNP).
> 
> If I understand the vmfwupdate interface correctly, and please let me
> know if I am wrong here, it only allows to specify a call for part 5) of
> the above list. Some of the other parts can be specified in architecture
> dependent ways in the FW image itself, but not all of them.

v6 patch is outdated, latest approach is to pass a list of regions:

// regions_addr points to an array of this structure
struct vmfwupdate_regions {
    uint64_t size;
    uint64_t src_addr;       // source address (before update)
    uint64_t dst_addr;       // destination address (after update)
    uint64_t flags;          // control bits
};

// flags
#define VMFWUPDATE_REGION_FLAG_COPY     // data must be copied
#define VMFWUPDATE_REGION_FLAG_ZERO     // dest must be cleared
#define VMFWUPDATE_REGION_FLAG_MEASURE  // data must be measured

(1)   is still not covered.
(2+3) can be handled with FLAG_ZERO regions, with and without
      FLAG_MEASURE.
(4)   Alex already pointed that the cpuid page is special, guess we
      need additional flags for those oages.

Open question is what we do about IGVM.

One option would be the guest vmfwupdate tool loading and parsing igvm,
preparing the region list, then invoke the update.  Problem is that some
igvm feaures such as initial register state can not be easily supported
that way.

We could also expect the hypervisor support igvm, so the guest can
simply load the file into memory and pass address and size to the
hypervisor.  Either as option, say via VMFWUPDATE_REGION_FLAG_IGVM, or
mandatory, i.e. scratch the region list and use IGVM exclusively.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 12:05 [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support Gerd Hoffman
@ 2025-03-13 13:31 ` Jörg Rödel
  2025-03-13 14:06   ` Ani Sinha
  0 siblings, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 13:31 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

Hi Gerd,

On Thu, Mar 13, 2025 at 01:05:13PM +0100, Gerd Hoffman wrote:
> // regions_addr points to an array of this structure
> struct vmfwupdate_regions {
>     uint64_t size;
>     uint64_t src_addr;       // source address (before update)
>     uint64_t dst_addr;       // destination address (after update)
>     uint64_t flags;          // control bits
> };
> 
> // flags
> #define VMFWUPDATE_REGION_FLAG_COPY     // data must be copied
> #define VMFWUPDATE_REGION_FLAG_ZERO     // dest must be cleared
> #define VMFWUPDATE_REGION_FLAG_MEASURE  // data must be measured
> 
> (1)   is still not covered.
> (2+3) can be handled with FLAG_ZERO regions, with and without
>       FLAG_MEASURE.
> (4)   Alex already pointed that the cpuid page is special, guess we
>       need additional flags for those oages.

That looks better, when the host VMM guarantees the order in which it
translates these regions into VM setup calls, then it is a step
forward. Although there more things to keep in mind, like the guest
policy and SEV status parameters.

> Open question is what we do about IGVM.
> 
> One option would be the guest vmfwupdate tool loading and parsing igvm,
> preparing the region list, then invoke the update.  Problem is that some
> igvm feaures such as initial register state can not be easily supported
> that way.
> 
> We could also expect the hypervisor support igvm, so the guest can
> simply load the file into memory and pass address and size to the
> hypervisor.  Either as option, say via VMFWUPDATE_REGION_FLAG_IGVM, or
> mandatory, i.e. scratch the region list and use IGVM exclusively.

This is of course up to the QEMU maintainers to decide, but I want to
highlight that IGVM already solves all the problems mentioned above,
including setting multiple memory regions of different type, special
data pages (cpuid, secrets, id-blob, vmsa) and more. It defines the
order of setup calls the VMM has to invoke for the new context and also
works for multiple platforms like TDX, SNP, non-coco, and in the future
ARM as well.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 13:31 ` Jörg Rödel
@ 2025-03-13 14:06   ` Ani Sinha
  2025-03-14 11:27     ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-13 14:06 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Gerd Hoffman, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Thu, Mar 13, 2025 at 7:01 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> Hi Gerd,
>
> On Thu, Mar 13, 2025 at 01:05:13PM +0100, Gerd Hoffman wrote:
> > // regions_addr points to an array of this structure
> > struct vmfwupdate_regions {
> >     uint64_t size;
> >     uint64_t src_addr;       // source address (before update)
> >     uint64_t dst_addr;       // destination address (after update)
> >     uint64_t flags;          // control bits
> > };
> >
> > // flags
> > #define VMFWUPDATE_REGION_FLAG_COPY     // data must be copied
> > #define VMFWUPDATE_REGION_FLAG_ZERO     // dest must be cleared
> > #define VMFWUPDATE_REGION_FLAG_MEASURE  // data must be measured
> >
> > (1)   is still not covered.
> > (2+3) can be handled with FLAG_ZERO regions, with and without
> >       FLAG_MEASURE.
> > (4)   Alex already pointed that the cpuid page is special, guess we
> >       need additional flags for those oages.
>
> That looks better, when the host VMM guarantees the order in which it
> translates these regions into VM setup calls, then it is a step
> forward. Although there more things to keep in mind, like the guest
> policy and SEV status parameters.
>
> > Open question is what we do about IGVM.
> >
> > One option would be the guest vmfwupdate tool loading and parsing igvm,
> > preparing the region list, then invoke the update.  Problem is that some
> > igvm feaures such as initial register state can not be easily supported
> > that way.
> >
> > We could also expect the hypervisor support igvm, so the guest can
> > simply load the file into memory and pass address and size to the
> > hypervisor.  Either as option, say via VMFWUPDATE_REGION_FLAG_IGVM, or
> > mandatory, i.e. scratch the region list and use IGVM exclusively.
>
> This is of course up to the QEMU maintainers to decide, but I want to
> highlight that IGVM already solves all the problems mentioned above,
> including setting multiple memory regions of different type, special
> data pages (cpuid, secrets, id-blob, vmsa) and more. It defines the
> order of setup calls the VMM has to invoke for the new context and also
> works for multiple platforms like TDX, SNP, non-coco, and in the future
> ARM as well.

My take on this is that instead of making the interface more complex,
let's empower the guest to use IGVM if they think they need more
guarantees around those things that IGVM solves very well today. QEMU
already has IGVM backend support. To that end, we can make small
changes to the hypervisor side so that instructions can be executed by
IGVM loaded in guest memory. My talk in the KVM Forum was "empowering
the guest" :-)

>
> Regards,
>
>         Joerg
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 14:06   ` Ani Sinha
@ 2025-03-14 11:27     ` Gerd Hoffman
  2025-03-14 12:47       ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-14 11:27 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jörg Rödel, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> > > Open question is what we do about IGVM.
> > >
> > > One option would be the guest vmfwupdate tool loading and parsing igvm,
> > > preparing the region list, then invoke the update.  Problem is that some
> > > igvm feaures such as initial register state can not be easily supported
> > > that way.
> > >
> > > We could also expect the hypervisor support igvm, so the guest can
> > > simply load the file into memory and pass address and size to the
> > > hypervisor.  Either as option, say via VMFWUPDATE_REGION_FLAG_IGVM, or
> > > mandatory, i.e. scratch the region list and use IGVM exclusively.
> >
> > This is of course up to the QEMU maintainers to decide, but I want to
> > highlight that IGVM already solves all the problems mentioned above,
> > including setting multiple memory regions of different type, special
> > data pages (cpuid, secrets, id-blob, vmsa) and more. It defines the
> > order of setup calls the VMM has to invoke for the new context and also
> > works for multiple platforms like TDX, SNP, non-coco, and in the future
> > ARM as well.
> 
> My take on this is that instead of making the interface more complex,
> let's empower the guest to use IGVM if they think they need more
> guarantees around those things that IGVM solves very well today. QEMU
> already has IGVM backend support.

Ok, assuming we allow the guest submit a IGVM image (which makes sense
indeed, otherwise we'll probably end up re-inventing IGVM).  How will
the kernel hashes be handled then?  I assume they will not be part of
the igvm image, but they must be part of the launch measurement ...

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 11:27     ` Gerd Hoffman
@ 2025-03-14 12:47       ` Alexander Graf
  2025-03-14 14:08         ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-14 12:47 UTC (permalink / raw)
  To: Gerd Hoffman, Ani Sinha
  Cc: Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel


On 14.03.25 12:27, Gerd Hoffman wrote:
>    Hi,
>
>>>> Open question is what we do about IGVM.
>>>>
>>>> One option would be the guest vmfwupdate tool loading and parsing igvm,
>>>> preparing the region list, then invoke the update.  Problem is that some
>>>> igvm feaures such as initial register state can not be easily supported
>>>> that way.
>>>>
>>>> We could also expect the hypervisor support igvm, so the guest can
>>>> simply load the file into memory and pass address and size to the
>>>> hypervisor.  Either as option, say via VMFWUPDATE_REGION_FLAG_IGVM, or
>>>> mandatory, i.e. scratch the region list and use IGVM exclusively.
>>> This is of course up to the QEMU maintainers to decide, but I want to
>>> highlight that IGVM already solves all the problems mentioned above,
>>> including setting multiple memory regions of different type, special
>>> data pages (cpuid, secrets, id-blob, vmsa) and more. It defines the
>>> order of setup calls the VMM has to invoke for the new context and also
>>> works for multiple platforms like TDX, SNP, non-coco, and in the future
>>> ARM as well.
>> My take on this is that instead of making the interface more complex,
>> let's empower the guest to use IGVM if they think they need more
>> guarantees around those things that IGVM solves very well today. QEMU
>> already has IGVM backend support.
> Ok, assuming we allow the guest submit a IGVM image (which makes sense
> indeed, otherwise we'll probably end up re-inventing IGVM).  How will
> the kernel hashes be handled then?  I assume they will not be part of
> the igvm image, but they must be part of the launch measurement ...


The kernel hashes must be embedded in the IGVM image by the time you 
invoke vmfwupdate. That means when you generate the FUKI, you take 4 
inputs: Generic firmware image, kernel, initramfs, cmdline. Out of 
those, you generate and embed an IGVM image that consists of the 
firmware image as well as the kernel hash page.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 12:47       ` Alexander Graf
@ 2025-03-14 14:08         ` Gerd Hoffman
  2025-03-14 14:50           ` Alexander Graf
  2025-03-14 15:16           ` Jörg Rödel
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-14 14:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> > Ok, assuming we allow the guest submit a IGVM image (which makes sense
> > indeed, otherwise we'll probably end up re-inventing IGVM).  How will
> > the kernel hashes be handled then?  I assume they will not be part of
> > the igvm image, but they must be part of the launch measurement ...
> 
> The kernel hashes must be embedded in the IGVM image by the time you invoke
> vmfwupdate. That means when you generate the FUKI, you take 4 inputs:
> Generic firmware image, kernel, initramfs, cmdline. Out of those, you
> generate and embed an IGVM image that consists of the firmware image as well
> as the kernel hash page.

If your input firmware image already is an IGVM (say coconut), what is
supposed to happen?

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 14:08         ` Gerd Hoffman
@ 2025-03-14 14:50           ` Alexander Graf
  2025-03-17  9:56             ` Gerd Hoffman
  2025-03-14 15:16           ` Jörg Rödel
  1 sibling, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-14 14:50 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel


On 14.03.25 15:08, Gerd Hoffman wrote:
>    Hi,
>
>>> Ok, assuming we allow the guest submit a IGVM image (which makes sense
>>> indeed, otherwise we'll probably end up re-inventing IGVM).  How will
>>> the kernel hashes be handled then?  I assume they will not be part of
>>> the igvm image, but they must be part of the launch measurement ...
>> The kernel hashes must be embedded in the IGVM image by the time you invoke
>> vmfwupdate. That means when you generate the FUKI, you take 4 inputs:
>> Generic firmware image, kernel, initramfs, cmdline. Out of those, you
>> generate and embed an IGVM image that consists of the firmware image as well
>> as the kernel hash page.
> If your input firmware image already is an IGVM (say coconut), what is
> supposed to happen?


I'll leave the details to Jörg on how he envisions it, but IIUC the flow 
for a "readily assembled IGVM" is different. In case of a COCONUT-SVSM 
IGVM, you expect chaining of trust. So the SVSM implements a TPM which 
then the OS would use with measured boot etc etc.

It's a fundamentally different concept from FUKI. But it could share the 
same vmfwupdate mechanism to load.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 14:50           ` Alexander Graf
@ 2025-03-17  9:56             ` Gerd Hoffman
  2025-03-17 17:29               ` Alexander Graf
  2025-03-20  8:31               ` Jörg Rödel
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-17  9:56 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Fri, Mar 14, 2025 at 03:50:19PM +0100, Alexander Graf wrote:
> 
> On 14.03.25 15:08, Gerd Hoffman wrote:
> >    Hi,
> > 
> > > > Ok, assuming we allow the guest submit a IGVM image (which makes sense
> > > > indeed, otherwise we'll probably end up re-inventing IGVM).  How will
> > > > the kernel hashes be handled then?  I assume they will not be part of
> > > > the igvm image, but they must be part of the launch measurement ...
> > > The kernel hashes must be embedded in the IGVM image by the time you invoke
> > > vmfwupdate. That means when you generate the FUKI, you take 4 inputs:
> > > Generic firmware image, kernel, initramfs, cmdline. Out of those, you
> > > generate and embed an IGVM image that consists of the firmware image as well
> > > as the kernel hash page.
> > If your input firmware image already is an IGVM (say coconut), what is
> > supposed to happen?
> 
> I'll leave the details to Jörg on how he envisions it, but IIUC the flow for
> a "readily assembled IGVM" is different. In case of a COCONUT-SVSM IGVM, you
> expect chaining of trust. So the SVSM implements a TPM which then the OS
> would use with measured boot etc etc.

Well, I don't consider igvm being useful for svsm only.  Shipping
standard edk2 as igvm looks useful to me.  Main benefit: pre-calculate
launch measurements without having to know qemu internals.

> It's a fundamentally different concept from FUKI.

Hmm?  IGVM is just a different way to ship the firmware (and optionally
more).

> But it could share the same vmfwupdate mechanism to load.

Yep.  But we have to sort the details.

 (1) How we are going to load kernel + initrd in case the firmware is
     igvm?  Just update the igvm to also include linux kernel and
     initrd (see parallel reply from Jörg)?  If so, how does the
     launched firmware find the kernel + initrd?
     While digging around in the igvm spec I've seen there is the
     concept of 'parameters'.  Can this be used to pass on the memory
     location of kernel + initrd + cmdline?  Maybe the kernel hashes too?

 (2) Will the igvm be generated on the fly from FUKI data?  Or should
     the distros ship igvm images with firmware + kernel + initrd?

 (3) How we are going to handle uki add-ons?

 (4) Do we want keep the region list?  Or declare that igvm should be
     used in case a single region is not enough?  Or even go all-in and
     use IGVM exclusively?

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-17  9:56             ` Gerd Hoffman
@ 2025-03-17 17:29               ` Alexander Graf
  2025-03-18  7:00                 ` Gerd Hoffman
  2025-03-20  8:31               ` Jörg Rödel
  1 sibling, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-17 17:29 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel


On 17.03.25 10:56, Gerd Hoffman wrote:
> On Fri, Mar 14, 2025 at 03:50:19PM +0100, Alexander Graf wrote:
>> On 14.03.25 15:08, Gerd Hoffman wrote:
>>>     Hi,
>>>
>>>>> Ok, assuming we allow the guest submit a IGVM image (which makes sense
>>>>> indeed, otherwise we'll probably end up re-inventing IGVM).  How will
>>>>> the kernel hashes be handled then?  I assume they will not be part of
>>>>> the igvm image, but they must be part of the launch measurement ...
>>>> The kernel hashes must be embedded in the IGVM image by the time you invoke
>>>> vmfwupdate. That means when you generate the FUKI, you take 4 inputs:
>>>> Generic firmware image, kernel, initramfs, cmdline. Out of those, you
>>>> generate and embed an IGVM image that consists of the firmware image as well
>>>> as the kernel hash page.
>>> If your input firmware image already is an IGVM (say coconut), what is
>>> supposed to happen?
>> I'll leave the details to Jörg on how he envisions it, but IIUC the flow for
>> a "readily assembled IGVM" is different. In case of a COCONUT-SVSM IGVM, you
>> expect chaining of trust. So the SVSM implements a TPM which then the OS
>> would use with measured boot etc etc.
> Well, I don't consider igvm being useful for svsm only.  Shipping
> standard edk2 as igvm looks useful to me.  Main benefit: pre-calculate
> launch measurements without having to know qemu internals.
>
>> It's a fundamentally different concept from FUKI.
> Hmm?  IGVM is just a different way to ship the firmware (and optionally
> more).
>
>> But it could share the same vmfwupdate mechanism to load.
> Yep.  But we have to sort the details.
>
>   (1) How we are going to load kernel + initrd in case the firmware is
>       igvm?  Just update the igvm to also include linux kernel and
>       initrd (see parallel reply from Jörg)?  If so, how does the
>       launched firmware find the kernel + initrd?
>       While digging around in the igvm spec I've seen there is the
>       concept of 'parameters'.  Can this be used to pass on the memory
>       location of kernel + initrd + cmdline?  Maybe the kernel hashes too?


I don't know. But even if not, the only thing customers can actually 
reason about is a combined hash of firmware + kernel + initrd + cmdline. 
Whether we assemble that using an edk2 IGVM plus parameters or by 
generating an IGVM from a "proprietary format" such as the edk2 BIOS 
blob plus a generated kernel hash page does not really make a difference 
from the user's point of view.

The flow will be the same: You use ukify to generate a FUKI from source 
files and get a FUKI plus precalculated hashes.


>   (2) Will the igvm be generated on the fly from FUKI data?  Or should
>       the distros ship igvm images with firmware + kernel + initrd?


If your distro embraces UKIs today, they should IMHO also embrace 
prebuilt FUKIs. Whether they are building their IGVM on the fly but 
precalculate the hash for that on-the-fly generated image ahead of time 
or whether they embed an IGVM is not terribly important I think.

The main difference between the 2 approaches is that dynamic generation 
allows you to have multiple different environment builds in a single 
FUKI. That way you can for example support UKI addons [1]


>   (3) How we are going to handle uki add-ons?


You can only (sensibly) handle them with on the fly generation.


>   (4) Do we want keep the region list?  Or declare that igvm should be
>       used in case a single region is not enough?  Or even go all-in and
>       use IGVM exclusively?


IGVM would replace the region list I think. Before making that jump, I 
would like to see it actually work though.


Alex


[1] 
https://archive.fosdem.org/2024/events/attachments/fosdem-2024-2024-uki-addons-and-extensions-safely-extending-ukis-kernel-command-line-and-initrd/slides/22125/Uki_addons_O97iYns.pdf


>
> take care,
>    Gerd
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-17 17:29               ` Alexander Graf
@ 2025-03-18  7:00                 ` Gerd Hoffman
  2025-03-18 11:11                   ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-18  7:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> >   (1) How we are going to load kernel + initrd in case the firmware is
> >       igvm?  Just update the igvm to also include linux kernel and
> >       initrd (see parallel reply from Jörg)?  If so, how does the
> >       launched firmware find the kernel + initrd?
> >       While digging around in the igvm spec I've seen there is the
> >       concept of 'parameters'.  Can this be used to pass on the memory
> >       location of kernel + initrd + cmdline?  Maybe the kernel hashes too?
> 
> I don't know. But even if not, the only thing customers can actually reason
> about is a combined hash of firmware + kernel + initrd + cmdline. Whether we
> assemble that using an edk2 IGVM plus parameters or by generating an IGVM
> from a "proprietary format" such as the edk2 BIOS blob plus a generated
> kernel hash page does not really make a difference from the user's point of
> view.

Maybe not from the user's point of view, but surely for the vmfwupdate
interface design and for the launch measurement calculations.

When using igvm parameters for the kernel hashes we need to pass on (at
least) two items via vmfwupdate API:  The igvm image itself and the
kernel hashes, so the VMM can fill the parameters for launch.

I tend to think it makes sense to keep the region list, so we can
actually pass on multiple items if needed, and simply add region flags
to declare that a region is an IGVM image.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-18  7:00                 ` Gerd Hoffman
@ 2025-03-18 11:11                   ` Gerd Hoffman
  2025-03-20  8:34                     ` Jörg Rödel
  2025-03-20 13:53                     ` Alexander Graf
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-18 11:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> Maybe not from the user's point of view, but surely for the vmfwupdate
> interface design and for the launch measurement calculations.
> 
> When using igvm parameters for the kernel hashes we need to pass on (at
> least) two items via vmfwupdate API:  The igvm image itself and the
> kernel hashes, so the VMM can fill the parameters for launch.
> 
> I tend to think it makes sense to keep the region list, so we can
> actually pass on multiple items if needed, and simply add region flags
> to declare that a region is an IGVM image.

Went over the interface spec today, here it is.  Changes:

 - Moved descriptions into source code comments.
 - Added leftovers noticed in recent discussions, such as cpuid page.
 - Added capability flags and region flags for IGVM.

Open questions:

 - Does the idea to use igvm parameters for the kernel hashes makes
   sense?  Are parameters part of the launch measurement?
 - Do we want actually keep the complete interface (and the functional
   overlap with igvm)?

take care,
  Gerd

------------------------- cut here ---------------------------------

/*
 * Mar 2025 vmfwupdate interface rewrite
 */

struct vmfwupdate {
    // VMM capabilities, see VMFWUPDATE_CAP_*, read-only.
    uint64_t capabilities;
    // firmware storage size (below 4G on x86), read-only.
    uint64_t firmware_size;

    // address of opaque blob, the guest can use this to pass on information,
    // for example which memory region the linux kernel has been loaded to.
    // writable, will be kept intact on firmware update.
    uint64_t opaque_addr;

    // regions (see vmfwupdate_regions struct), memory location and length of
    // the list.  writable, will be cleared on firmware update and reset.
    uint64_t regions_addr;
    uint16_t regions_count;

    // control bits, see VMFWUPDATE_CTL_*
    // - disable bit can be set by the guest.
    // - disable bit can only be cleared by reset.
    uint16_t control;
};

// --- 'capabilities' field bits ---
// vmm supports resizing of firmware memory
#define VMFWUPDATE_CAP_BIOS_RESIZE    (1 << 0)
// vmm supports loading igvm images
#define VMFWUPDATE_CAP_IGVM_IMAGES    (2 << 0)

// --- 'control' field bits ---
// disable vmfwupdate interface
#define VMFWUPDATE_CTL_DISABLE        (1 << 0)

// 'regions_addr' field points to an array of this structure
struct vmfwupdate_regions {
    uint64_t size;           // size of the region
    uint64_t src_addr;       // source address (before update)
    uint64_t dst_addr;       // destination address (after update)
    uint64_t flags;          // control bits
};

// --- 'flags' field bits ---
// data must be copied
#define VMFWUPDATE_REGION_FLAG_COPY          (1 << 0)
// dest must be filled with zeros (src is not used)
#define VMFWUPDATE_REGION_FLAG_ZERO          (1 << 1)
// region must be measured
#define VMFWUPDATE_REGION_FLAG_MEASURE       (1 << 2)
// region must be (pre-)validated
#define VMFWUPDATE_REGION_FLAG_VALIDATE      (1 << 3)

// region contains igvm image
#define VMFWUPDATE_REGION_FLAG_IGVM_IMAGE    (1 << 8)
// region contains igvm parameters (TODO: details)
#define VMFWUPDATE_REGION_FLAG_IGVM_PARAM    (1 << 9)

// region is sev cpuid page
#define VMFWUPDATE_REGION_FLAG_SEV_CPUID     (1 << 16)
// region is sev secrets page
#define VMFWUPDATE_REGION_FLAG_SEV_SECRETS   (1 << 17)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-18 11:11                   ` Gerd Hoffman
@ 2025-03-20  8:34                     ` Jörg Rödel
  2025-03-21  8:22                       ` Gerd Hoffman
  2025-03-20 13:53                     ` Alexander Graf
  1 sibling, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-20  8:34 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Tue, Mar 18, 2025 at 12:11:02PM +0100, Gerd Hoffman wrote:
> Open questions:
> 
>  - Does the idea to use igvm parameters for the kernel hashes makes
>    sense?  Are parameters part of the launch measurement?

Parameters itself are fully measured, their presence is, but not their
data. This is to keep the same launch measurements across different
platform configurations.

So for hashes it is best to put some on some measured page and let the
parameters point to it.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-20  8:34                     ` Jörg Rödel
@ 2025-03-21  8:22                       ` Gerd Hoffman
  2025-03-24 16:08                         ` Daniel P. Berrangé
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-21  8:22 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Thu, Mar 20, 2025 at 09:34:26AM +0100, Jörg Rödel wrote:
> On Tue, Mar 18, 2025 at 12:11:02PM +0100, Gerd Hoffman wrote:
> > Open questions:
> > 
> >  - Does the idea to use igvm parameters for the kernel hashes makes
> >    sense?  Are parameters part of the launch measurement?
> 
> Parameters itself are fully measured, their presence is, but not their
> data. This is to keep the same launch measurements across different
> platform configurations.
> 
> So for hashes it is best to put some on some measured page and let the
> parameters point to it.

Had a look at the kernel hashes details this week.

So, the story is this: It's essentially a private arrangement between
ovmf (the amdsev build variant only) and qemu.  The hashes are placed in
a specific page, together with "launch secrets" (that is not the sev-snp
"secrets" page).  That page is part of the lanuch measurement.  That
effectively makes the kernel + initrd + cmdline part of the launch
measurement too (ovmf verifies the hashes), but without the relatively
slow secure processor hashing kernel + initrd + cmdline, which reduces
the time needed to launch a VM.

The "launch secret" is intended to hold things like a luks secret to
unlock the root filesystem.  OVMF doesn't touch it but reserves the page
and registers a EFI table for it so the linux kernel can find it.

As far I know these are more experimental bits than something actually
used in production.  It's also clearly a pre-UKI design.  That IMHO
opens up the question whenever we actually want carry forward with that,
or if we better check out what alternatives we have.  We'll have a
signed UKI after all, so going for secure boot and/or measured boot for
the UKI verification looks attractive compared to passing around hashes
for the elements inside the UKI.

Not fully sure what to do about the "launch secrets".  IIRC the initial
design of this is for sev-es, i.e. pre-snp, so maybe the sev-snp secrets
page can be used instead.  I see the spec has 0x60 bytes (offset 0xa0)
reserved for guest os usage.  In any case this probably is only needed
as temporary stopgap until we have a complete vTPM implementation for
the svsm.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-21  8:22                       ` Gerd Hoffman
@ 2025-03-24 16:08                         ` Daniel P. Berrangé
  0 siblings, 0 replies; 68+ messages in thread
From: Daniel P. Berrangé @ 2025-03-24 16:08 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Jörg Rödel, Alexander Graf, Ani Sinha, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Fri, Mar 21, 2025 at 09:22:59AM +0100, Gerd Hoffman wrote:
> On Thu, Mar 20, 2025 at 09:34:26AM +0100, Jörg Rödel wrote:
> > On Tue, Mar 18, 2025 at 12:11:02PM +0100, Gerd Hoffman wrote:
> > > Open questions:
> > > 
> > >  - Does the idea to use igvm parameters for the kernel hashes makes
> > >    sense?  Are parameters part of the launch measurement?
> > 
> > Parameters itself are fully measured, their presence is, but not their
> > data. This is to keep the same launch measurements across different
> > platform configurations.
> > 
> > So for hashes it is best to put some on some measured page and let the
> > parameters point to it.
> 
> Had a look at the kernel hashes details this week.
> 
> So, the story is this: It's essentially a private arrangement between
> ovmf (the amdsev build variant only) and qemu.  The hashes are placed in
> a specific page, together with "launch secrets" (that is not the sev-snp
> "secrets" page).  That page is part of the lanuch measurement.  That
> effectively makes the kernel + initrd + cmdline part of the launch
> measurement too (ovmf verifies the hashes), but without the relatively
> slow secure processor hashing kernel + initrd + cmdline, which reduces
> the time needed to launch a VM.
> 
> The "launch secret" is intended to hold things like a luks secret to
> unlock the root filesystem.  OVMF doesn't touch it but reserves the page
> and registers a EFI table for it so the linux kernel can find it.
> 
> As far I know these are more experimental bits than something actually
> used in production.  It's also clearly a pre-UKI design.  That IMHO
> opens up the question whenever we actually want carry forward with that,
> or if we better check out what alternatives we have.  We'll have a
> signed UKI after all, so going for secure boot and/or measured boot for
> the UKI verification looks attractive compared to passing around hashes
> for the elements inside the UKI.

Although it has been extended to SNP in QEMU, personally I'd consider
the QEMU 'kernel hashes' functionality to be an niche feature.

From a virt mgmt POV we know in general that direct kernel boot while
useful, is somewhat limited in usage. Most of the time it is saner to
have the guest decide what kernel to boot without interaction of the
host. So by extension, the kernel hashes feature is also going to be
a similarly (or even more) niche use case. The UKI with system extensions
and secureboot model gives much greater flexibility for defining policy
around valid configurations than using fixed hashes - especially when
it comes to kernel command line which has potentially many valid configs.


> Not fully sure what to do about the "launch secrets".  IIRC the initial
> design of this is for sev-es, i.e. pre-snp, so maybe the sev-snp secrets
> page can be used instead.  I see the spec has 0x60 bytes (offset 0xa0)
> reserved for guest os usage.  In any case this probably is only needed
> as temporary stopgap until we have a complete vTPM implementation for
> the svsm.

I view the launch secrets feature as a crutch to cope with the HW
limitations of SEV/SEV-ES. You have the host initiated attestation
dance and after verification can pass data back to the host, which
will inject it into the guest. The inability to have a secure TPM
in SEV/SEV-ES forces you down this road. It is highly undesirable
as a feature going forward because we're better served by having
a boot workflow that is common to traditional virt, confidential
virt, and bare metal, and TPM delivers on that.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-18 11:11                   ` Gerd Hoffman
  2025-03-20  8:34                     ` Jörg Rödel
@ 2025-03-20 13:53                     ` Alexander Graf
  2025-03-21  3:36                       ` Ani Sinha
  2025-03-21  9:14                       ` Gerd Hoffman
  1 sibling, 2 replies; 68+ messages in thread
From: Alexander Graf @ 2025-03-20 13:53 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

Hey Gerd,

On 18.03.25 12:11, Gerd Hoffman wrote:
>    Hi,
>
>> Maybe not from the user's point of view, but surely for the vmfwupdate
>> interface design and for the launch measurement calculations.
>>
>> When using igvm parameters for the kernel hashes we need to pass on (at
>> least) two items via vmfwupdate API:  The igvm image itself and the
>> kernel hashes, so the VMM can fill the parameters for launch.
>>
>> I tend to think it makes sense to keep the region list, so we can
>> actually pass on multiple items if needed, and simply add region flags
>> to declare that a region is an IGVM image.
> Went over the interface spec today, here it is.  Changes:
>
>   - Moved descriptions into source code comments.
>   - Added leftovers noticed in recent discussions, such as cpuid page.
>   - Added capability flags and region flags for IGVM.
>
> Open questions:
>
>   - Does the idea to use igvm parameters for the kernel hashes makes
>     sense?  Are parameters part of the launch measurement?
>   - Do we want actually keep the complete interface (and the functional
>     overlap with igvm)?


I think if we want to embrace IGVM, we should embrace it fully and make 
it replace the region list. At the end of the day, IGVM is effectively a 
region list plus data.

How difficult would it be to put up a prototype that uses only IGVM as 
vmfwupdate payload? We can definitely assemble that IGVM in ukify.py or 
as part of the boot stub. Or for the prototype even pre-assemble by hand.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-20 13:53                     ` Alexander Graf
@ 2025-03-21  3:36                       ` Ani Sinha
  2025-03-21  8:09                         ` Alexander Graf
  2025-03-21  9:14                       ` Gerd Hoffman
  1 sibling, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-21  3:36 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gerd Hoffman, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 20, 2025 at 7:24 PM Alexander Graf <graf@amazon.com> wrote:
>
> Hey Gerd,
>
> On 18.03.25 12:11, Gerd Hoffman wrote:
> >    Hi,
> >
> >> Maybe not from the user's point of view, but surely for the vmfwupdate
> >> interface design and for the launch measurement calculations.
> >>
> >> When using igvm parameters for the kernel hashes we need to pass on (at
> >> least) two items via vmfwupdate API:  The igvm image itself and the
> >> kernel hashes, so the VMM can fill the parameters for launch.
> >>
> >> I tend to think it makes sense to keep the region list, so we can
> >> actually pass on multiple items if needed, and simply add region flags
> >> to declare that a region is an IGVM image.
> > Went over the interface spec today, here it is.  Changes:
> >
> >   - Moved descriptions into source code comments.
> >   - Added leftovers noticed in recent discussions, such as cpuid page.
> >   - Added capability flags and region flags for IGVM.
> >
> > Open questions:
> >
> >   - Does the idea to use igvm parameters for the kernel hashes makes
> >     sense?  Are parameters part of the launch measurement?
> >   - Do we want actually keep the complete interface (and the functional
> >     overlap with igvm)?
>
>
> I think if we want to embrace IGVM, we should embrace it fully and make
> it replace the region list. At the end of the day, IGVM is effectively a
> region list plus data.

Are you suggesting that vmfwupdate only accept IGVM as payload? I am
not sure if I like that idea.

>
> How difficult would it be to put up a prototype that uses only IGVM as
> vmfwupdate payload? We can definitely assemble that IGVM in ukify.py or
> as part of the boot stub. Or for the prototype even pre-assemble by hand.
>
>
> Alex
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-21  3:36                       ` Ani Sinha
@ 2025-03-21  8:09                         ` Alexander Graf
  0 siblings, 0 replies; 68+ messages in thread
From: Alexander Graf @ 2025-03-21  8:09 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Gerd Hoffman, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel


On 21.03.25 04:36, Ani Sinha wrote:
> On Thu, Mar 20, 2025 at 7:24 PM Alexander Graf <graf@amazon.com> wrote:
>> Hey Gerd,
>>
>> On 18.03.25 12:11, Gerd Hoffman wrote:
>>>     Hi,
>>>
>>>> Maybe not from the user's point of view, but surely for the vmfwupdate
>>>> interface design and for the launch measurement calculations.
>>>>
>>>> When using igvm parameters for the kernel hashes we need to pass on (at
>>>> least) two items via vmfwupdate API:  The igvm image itself and the
>>>> kernel hashes, so the VMM can fill the parameters for launch.
>>>>
>>>> I tend to think it makes sense to keep the region list, so we can
>>>> actually pass on multiple items if needed, and simply add region flags
>>>> to declare that a region is an IGVM image.
>>> Went over the interface spec today, here it is.  Changes:
>>>
>>>    - Moved descriptions into source code comments.
>>>    - Added leftovers noticed in recent discussions, such as cpuid page.
>>>    - Added capability flags and region flags for IGVM.
>>>
>>> Open questions:
>>>
>>>    - Does the idea to use igvm parameters for the kernel hashes makes
>>>      sense?  Are parameters part of the launch measurement?
>>>    - Do we want actually keep the complete interface (and the functional
>>>      overlap with igvm)?
>>
>> I think if we want to embrace IGVM, we should embrace it fully and make
>> it replace the region list. At the end of the day, IGVM is effectively a
>> region list plus data.
> Are you suggesting that vmfwupdate only accept IGVM as payload? I am
> not sure if I like that idea.


I don't like it either, but I don't see a good alternative. If the spec 
allowed both IGVM and the home grown region list, hypervisors would 
still need to implement both ways to be compatible with all payloads. So 
by adding one more path, we're only adding complexity without helping 
anyone.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-20 13:53                     ` Alexander Graf
  2025-03-21  3:36                       ` Ani Sinha
@ 2025-03-21  9:14                       ` Gerd Hoffman
  1 sibling, 0 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-21  9:14 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> I think if we want to embrace IGVM, we should embrace it fully and make it
> replace the region list. At the end of the day, IGVM is effectively a region
> list plus data.
> 
> How difficult would it be to put up a prototype that uses only IGVM as
> vmfwupdate payload? We can definitely assemble that IGVM in ukify.py or as
> part of the boot stub. Or for the prototype even pre-assemble by hand.

Played around with igvm yesterday (no vmfwupdate yet), with traditional
firmware wrapped into an igvm.

igvm support in qemu needs a fix for that to work properly.
https://github.com/coconut-svsm/qemu/pull/18

Launching a VM using igvm works.  seabios+ovmf images created using this:
https://gitlab.com/kraxel/virt-firmware-rs/-/blob/main/efi-tools/src/bin/igvm-wrap.rs

There are some noteworthy differences between igvm and non-igvm mode
though:

 * When launching without igvm qemu goes create the pc bios rom memory
   region, and the alias to map the top 128k of that below 1G ('pc.bios'
   and 'isa-bios' in 'info mtree').

 * When launching with igvm qemu doesn't do that (at least with the
   current, not-yet upstream patches).  One can compensate for that by
   creating two regions in the igvm (below 4G and below 1M).  There are
   still some small differences, the memory regions are ram not rom,
   also the chipset emulation for rom shadowing has no effect.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-17  9:56             ` Gerd Hoffman
  2025-03-17 17:29               ` Alexander Graf
@ 2025-03-20  8:31               ` Jörg Rödel
  2025-03-21 10:08                 ` Gerd Hoffman
  1 sibling, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-20  8:31 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

(Excuse my delayed reply, still suffering from some flu or whatever...)

On Mon, Mar 17, 2025 at 10:56:04AM +0100, Gerd Hoffman wrote:
> Yep.  But we have to sort the details.
> 
>  (1) How we are going to load kernel + initrd in case the firmware is
>      igvm?  Just update the igvm to also include linux kernel and
>      initrd (see parallel reply from Jörg)?  If so, how does the
>      launched firmware find the kernel + initrd?
>      While digging around in the igvm spec I've seen there is the
>      concept of 'parameters'.  Can this be used to pass on the memory
>      location of kernel + initrd + cmdline?  Maybe the kernel hashes too?

The find the locations of the kernel, initrd, cmdline, ... I think IGVM
parameters, either directly or (preferably indirectly) are a good
solution. By "indirectly" I mean to put these regions on a separate and
measured page which also contains the region hashes.

This allows to put these regions as unmeasured memory into the IGVM and
speed up launching on some platforms.

>  (2) Will the igvm be generated on the fly from FUKI data?  Or should
>      the distros ship igvm images with firmware + kernel + initrd?

My preference would be that distros ship the tooling and components to
build IGVM files and build them during kernel update. If a distro comes
up with a generic initrd+cmdline it can also ship pre-built IGVM files.

From that IGVM-file the tooling can also extract expected measurements
and place them on some trusted verifier (e.g. remote or TPM).

>  (3) How we are going to handle uki add-ons?
> 
>  (4) Do we want keep the region list?  Or declare that igvm should be
>      used in case a single region is not enough?  Or even go all-in and
>      use IGVM exclusively?

No strong opinion here, but I just want to mention again that by
allowing a workflow with a pre-defined setup order for CVMs, the
implemented order in QEMU/KVM can not be changed afterwards without
breaking measurements for users.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-20  8:31               ` Jörg Rödel
@ 2025-03-21 10:08                 ` Gerd Hoffman
  2025-03-21 12:44                   ` Ani Sinha
  2025-03-21 12:45                   ` Daniel P. Berrangé
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-21 10:08 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> >      While digging around in the igvm spec I've seen there is the
> >      concept of 'parameters'.  Can this be used to pass on the memory
> >      location of kernel + initrd + cmdline?  Maybe the kernel hashes too?
> 
> The find the locations of the kernel, initrd, cmdline, ... I think IGVM
> parameters, either directly or (preferably indirectly) are a good
> solution. By "indirectly" I mean to put these regions on a separate and
> measured page which also contains the region hashes.

regions and hashes should be separate I think.  The regions are not
necessarily fixed, the physical memory location where things have been
loaded to can change.

But also see my other reply.  I'm not convinced any more carrying
forward the hashes logic makes much sense.

> >  (2) Will the igvm be generated on the fly from FUKI data?  Or should
> >      the distros ship igvm images with firmware + kernel + initrd?
> 
> My preference would be that distros ship the tooling and components to
> build IGVM files and build them during kernel update. If a distro comes
> up with a generic initrd+cmdline it can also ship pre-built IGVM files.

Working with the (F)UKI instead has the advantage that we can make use
of the secure boot signature, with a workflow along these lines:

(distro) build:
 * Embed the firmware as igvm inside the UKI.
 * Sign the UKI.

first vm launch:
 * Load the complete UKI.
 * Pass on (a) the igvm section and (b) the UKI (including signature)
   to the vmfwupdate device.

vmwupdate device:
 * loads the igvm image.
 * passes on the UKI location (as igvm parameter?).

second vm launch:
 * firmware checks UKI signature.
 * firmware (optionally) measures UKI into vTPM.
 * firmware launches UKI.

Going ship the distro kernel as igvm image would work too.  Will
simplify the measurement pre-calculation.  Also there is no need to pass
around any parameters, everything (how the firmware finds the UKI etc)
can be arranged at igvm build time then.  Disadvantage: This introduces
a completely new boot workflow.  Will probably need a new set of cloud
images exclusively for the BYOF case.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-21 10:08                 ` Gerd Hoffman
@ 2025-03-21 12:44                   ` Ani Sinha
  2025-03-24  7:43                     ` Gerd Hoffman
  2025-03-21 12:45                   ` Daniel P. Berrangé
  1 sibling, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-21 12:44 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On Fri, 21 Mar, 2025, 3:38 pm Gerd Hoffman, <kraxel@redhat.com> wrote:

>   Hi,
>
> > >      While digging around in the igvm spec I've seen there is the
> > >      concept of 'parameters'.  Can this be used to pass on the memory
> > >      location of kernel + initrd + cmdline?  Maybe the kernel hashes
> too?
> >
> > The find the locations of the kernel, initrd, cmdline, ... I think IGVM
> > parameters, either directly or (preferably indirectly) are a good
> > solution. By "indirectly" I mean to put these regions on a separate and
> > measured page which also contains the region hashes.
>
> regions and hashes should be separate I think.  The regions are not
> necessarily fixed, the physical memory location where things have been
> loaded to can change.
>
> But also see my other reply.  I'm not convinced any more carrying
> forward the hashes logic makes much sense.
>
> > >  (2) Will the igvm be generated on the fly from FUKI data?  Or should
> > >      the distros ship igvm images with firmware + kernel + initrd?
> >
> > My preference would be that distros ship the tooling and components to
> > build IGVM files and build them during kernel update. If a distro comes
> > up with a generic initrd+cmdline it can also ship pre-built IGVM files.
>
> Working with the (F)UKI instead has the advantage that we can make use
> of the secure boot signature, with a workflow along these lines:
>
> (distro) build:
>  * Embed the firmware as igvm inside the UKI.
>  * Sign the UKI.
>
> first vm launch:
>  * Load the complete UKI.
>  * Pass on (a) the igvm section and (b) the UKI (including signature)
>    to the vmfwupdate device.
>
> vmwupdate device:
>  * loads the igvm image.
>  * passes on the UKI location (as igvm parameter?).
>
> second vm launch:
>  * firmware checks UKI signature.
>  * firmware (optionally) measures UKI into vTPM.
>  * firmware launches UKI.
>
> Going ship the distro kernel as igvm image would work too.  Will
> simplify the measurement pre-calculation.  Also there is no need to pass
> around any parameters, everything (how the firmware finds the UKI etc)
> can be arranged at igvm build time then.  Disadvantage: This introduces
> a completely new boot workflow.  Will probably need a new set of cloud
> images exclusively for the BYOF case.
>

What does all this mean for the hypervisor interface ?

>

[-- Attachment #2: Type: text/html, Size: 3157 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-21 12:44                   ` Ani Sinha
@ 2025-03-24  7:43                     ` Gerd Hoffman
  2025-03-24 11:12                       ` Ani Sinha
  2025-04-08  8:11                       ` Gerd Hoffman
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-24  7:43 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

  Hi,

> > Going ship the distro kernel as igvm image would work too.  Will
> > simplify the measurement pre-calculation.  Also there is no need to pass
> > around any parameters, everything (how the firmware finds the UKI etc)
> > can be arranged at igvm build time then.  Disadvantage: This introduces
> > a completely new boot workflow.  Will probably need a new set of cloud
> > images exclusively for the BYOF case.
> 
> What does all this mean for the hypervisor interface ?

That means we'll go scratch the region list idea and depend on igvm
instead.

Which means we are back to the single firmware image.  I think it makes
sense to continue supporting classic rom images (which can also be
loaded via -bios).  Any use case which needs more fine-grained control
must use igvm.  We can use format bits in both capabilities and control
fields to indicate what the hypervisor supports and what has been
uploaded to the firmware image region.  See interface header file draft
below.

'opaque' exists as before, even though I think it makes sense to also
specify the size for the opaque blob.  This gives the guest a bit more
flexibility in how this is used, for example it could pass the complete
UKI as opaque blob.

take care,
  Gerd

------------------------------ cut here -----------------------------
struct vmfwupdate {
    // VMM capabilities, see VMFWUPDATE_CAP_*, read-only.
    uint64_t capabilities;
    // control bits, see VMFWUPDATE_CTL_*
    uint64_t control;

    // firmware rom/flash storage size, read-only.
    uint64_t fw_rom_size;

    // address and size of the firmware update image.  Will be cleared on
    // firmware update and reset.
    uint64_t fw_image_addr;
    uint16_t fw_image_size;

    // address + size of opaque blob.  The guest can use this to pass on
    // information, for example which memory region the linux kernel has been
    // loaded to.  writable, will be kept intact on firmware update.
    uint64_t opaque_addr;
    uint64_t opaque_size;
};

// --- format bits, used by both 'capabilities' and 'control' ---
// igvm
#define VMFWUPDATE_FORMAT_IGVM           (1 << 32)
// rom/flash on platform-specific location
//  - x86: below 4G + 1G
//  - arm: 0
#define VMFWUPDATE_FORMAT_PLATFORM_ROM   (1 << 33)

// --- 'capabilities' field bits ---
// vmm supports resizing of x86 firmware memory
#define VMFWUPDATE_CAP_X86_RESIZE         (1 << 0)

// --- 'control' field bits ---
// disable vmfwupdate interface
#define VMFWUPDATE_CTL_DISABLE            (1 << 0)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24  7:43                     ` Gerd Hoffman
@ 2025-03-24 11:12                       ` Ani Sinha
  2025-03-24 15:48                         ` Gerd Hoffman
  2025-04-08  8:11                       ` Gerd Hoffman
  1 sibling, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-24 11:12 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Mon, Mar 24, 2025 at 1:13 PM Gerd Hoffman <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > Going ship the distro kernel as igvm image would work too.  Will
> > > simplify the measurement pre-calculation.  Also there is no need to pass
> > > around any parameters, everything (how the firmware finds the UKI etc)
> > > can be arranged at igvm build time then.  Disadvantage: This introduces
> > > a completely new boot workflow.  Will probably need a new set of cloud
> > > images exclusively for the BYOF case.
> >
> > What does all this mean for the hypervisor interface ?
>
> That means we'll go scratch the region list idea and depend on igvm
> instead.
>
> Which means we are back to the single firmware image.

So in this model, how are we passing the hashes of kernel, initrd and cmdline?
Are they going to be part of the firmware image as was previously thought?

  I think it makes
> sense to continue supporting classic rom images (which can also be
> loaded via -bios).  Any use case which needs more fine-grained control
> must use igvm.  We can use format bits in both capabilities and control
> fields to indicate what the hypervisor supports and what has been
> uploaded to the firmware image region.  See interface header file draft
> below.
>
> 'opaque' exists as before, even though I think it makes sense to also
> specify the size for the opaque blob.  This gives the guest a bit more
> flexibility in how this is used, for example it could pass the complete
> UKI as opaque blob.
>
> take care,
>   Gerd
>
> ------------------------------ cut here -----------------------------
> struct vmfwupdate {
>     // VMM capabilities, see VMFWUPDATE_CAP_*, read-only.
>     uint64_t capabilities;
>     // control bits, see VMFWUPDATE_CTL_*
>     uint64_t control;
>
>     // firmware rom/flash storage size, read-only.
>     uint64_t fw_rom_size;
>
>     // address and size of the firmware update image.  Will be cleared on
>     // firmware update and reset.
>     uint64_t fw_image_addr;
>     uint16_t fw_image_size;
>
>     // address + size of opaque blob.  The guest can use this to pass on
>     // information, for example which memory region the linux kernel has been
>     // loaded to.  writable, will be kept intact on firmware update.
>     uint64_t opaque_addr;
>     uint64_t opaque_size;
> };
>
> // --- format bits, used by both 'capabilities' and 'control' ---
> // igvm
> #define VMFWUPDATE_FORMAT_IGVM           (1 << 32)
> // rom/flash on platform-specific location
> //  - x86: below 4G + 1G
> //  - arm: 0
> #define VMFWUPDATE_FORMAT_PLATFORM_ROM   (1 << 33)
>
> // --- 'capabilities' field bits ---
> // vmm supports resizing of x86 firmware memory
> #define VMFWUPDATE_CAP_X86_RESIZE         (1 << 0)
>
> // --- 'control' field bits ---
> // disable vmfwupdate interface
> #define VMFWUPDATE_CTL_DISABLE            (1 << 0)
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24 11:12                       ` Ani Sinha
@ 2025-03-24 15:48                         ` Gerd Hoffman
  2025-03-24 16:31                           ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-24 15:48 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Mon, Mar 24, 2025 at 04:42:28PM +0530, Ani Sinha wrote:
> On Mon, Mar 24, 2025 at 1:13 PM Gerd Hoffman <kraxel@redhat.com> wrote:
> >
> >   Hi,
> >
> > > > Going ship the distro kernel as igvm image would work too.  Will
> > > > simplify the measurement pre-calculation.  Also there is no need to pass
> > > > around any parameters, everything (how the firmware finds the UKI etc)
> > > > can be arranged at igvm build time then.  Disadvantage: This introduces
> > > > a completely new boot workflow.  Will probably need a new set of cloud
> > > > images exclusively for the BYOF case.
> > >
> > > What does all this mean for the hypervisor interface ?
> >
> > That means we'll go scratch the region list idea and depend on igvm
> > instead.
> >
> > Which means we are back to the single firmware image.
> 
> So in this model, how are we passing the hashes of kernel, initrd and cmdline?
> Are they going to be part of the firmware image as was previously thought?

Either scratch the idea of using hashes to verify kernel + initrd +
cmdline and use a secure boot signed UKI instead, or use IGVM firmware
images and add the kernel hashes page to the igvm.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24 15:48                         ` Gerd Hoffman
@ 2025-03-24 16:31                           ` Alexander Graf
  2025-03-24 17:53                             ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-24 16:31 UTC (permalink / raw)
  To: Gerd Hoffman, Ani Sinha
  Cc: Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel


On 24.03.25 16:48, Gerd Hoffman wrote:
> On Mon, Mar 24, 2025 at 04:42:28PM +0530, Ani Sinha wrote:
>> On Mon, Mar 24, 2025 at 1:13 PM Gerd Hoffman <kraxel@redhat.com> wrote:
>>>    Hi,
>>>
>>>>> Going ship the distro kernel as igvm image would work too.  Will
>>>>> simplify the measurement pre-calculation.  Also there is no need to pass
>>>>> around any parameters, everything (how the firmware finds the UKI etc)
>>>>> can be arranged at igvm build time then.  Disadvantage: This introduces
>>>>> a completely new boot workflow.  Will probably need a new set of cloud
>>>>> images exclusively for the BYOF case.
>>>> What does all this mean for the hypervisor interface ?
>>> That means we'll go scratch the region list idea and depend on igvm
>>> instead.
>>>
>>> Which means we are back to the single firmware image.
>> So in this model, how are we passing the hashes of kernel, initrd and cmdline?
>> Are they going to be part of the firmware image as was previously thought?
> Either scratch the idea of using hashes to verify kernel + initrd +
> cmdline and use a secure boot signed UKI instead, or use IGVM firmware
> images and add the kernel hashes page to the igvm.


It's a nice idea to have a firmware IGVM file that contains a signature, 
so you can for example have a RHEL provided IGVM that only runs UKIs 
that are signed by Red Hat.

Unfortunately, it does not address some of the other interesting use cases:

   - Attestable Bootable Containers. In that case, the command line 
contains a hash of the target fs-verity protected partition. Red Hat can 
not be the entity signing that UKI. It must be the user.
   - Add-ons. How do you provide a "debug add-on" and prevent it to gain 
any access to confidential data?

So we need some equivalent of a hash page. That can absolutely integrate 
more deeply into UEFI and be actual PE hashes rather than the icky thing 
the OVMF code does today. But we need to support a way to embed the hash 
in the ukify.py generated IGVM on the fly. With add-ons, maybe even 
multiple ones.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24 16:31                           ` Alexander Graf
@ 2025-03-24 17:53                             ` Gerd Hoffman
  2025-03-24 18:07                               ` Daniel P. Berrangé
  2025-03-25  8:04                               ` Alexander Graf
  0 siblings, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-24 17:53 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Mon, Mar 24, 2025 at 05:31:30PM +0100, Alexander Graf wrote:
> 
> > > > > What does all this mean for the hypervisor interface ?

> > > > That means we'll go scratch the region list idea and depend on igvm
> > > > instead.

> > > > Which means we are back to the single firmware image.
> > > So in this model, how are we passing the hashes of kernel, initrd and cmdline?
> > > Are they going to be part of the firmware image as was previously thought?

> > Either scratch the idea of using hashes to verify kernel + initrd +
> > cmdline and use a secure boot signed UKI instead, or use IGVM firmware
> > images and add the kernel hashes page to the igvm.

> It's a nice idea to have a firmware IGVM file that contains a signature, so
> you can for example have a RHEL provided IGVM that only runs UKIs that are
> signed by Red Hat.

Yep, that is what should be possible when all this is ready.

>   - Attestable Bootable Containers. In that case, the command line contains
> a hash of the target fs-verity protected partition. Red Hat can not be the
> entity signing that UKI. It must be the user.

Well, add-ons have been created to address exactly that:  Allow the UKI
being configured without changing it, so the UKI can be signed by redhat.
You'll go add user-signed add-on for the command line.

>   - Add-ons. How do you provide a "debug add-on" and prevent it to gain any
> access to confidential data?

Not sure I understand the point you are trying to raise.  Add-ons
signatures are checked too.

> So we need some equivalent of a hash page.

Ok.  So two opaque blobs, one which is measured into the launch
measurement and one which is not?  That gives you the option to pass
around hashes (or any other data) and leave a trace in the launch
measurement should you need that.

> That can absolutely integrate more deeply into UEFI and be actual PE
> hashes rather than the icky thing the OVMF code does today.

How the measured opaque blob is actually used is not something
vmfwupdate needs to care about.

> But we need to support a way to embed the hash in the ukify.py
> generated IGVM on the fly. With add-ons, maybe even multiple ones.

I'd prefer not patch the IGVM on the fly at boot time.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24 17:53                             ` Gerd Hoffman
@ 2025-03-24 18:07                               ` Daniel P. Berrangé
  2025-03-25  8:04                               ` Alexander Graf
  1 sibling, 0 replies; 68+ messages in thread
From: Daniel P. Berrangé @ 2025-03-24 18:07 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Mon, Mar 24, 2025 at 06:53:59PM +0100, Gerd Hoffman wrote:
> On Mon, Mar 24, 2025 at 05:31:30PM +0100, Alexander Graf wrote:
> > 
> > > > > > What does all this mean for the hypervisor interface ?
> 
> > > > > That means we'll go scratch the region list idea and depend on igvm
> > > > > instead.
> 
> > > > > Which means we are back to the single firmware image.
> > > > So in this model, how are we passing the hashes of kernel, initrd and cmdline?
> > > > Are they going to be part of the firmware image as was previously thought?
> 
> > > Either scratch the idea of using hashes to verify kernel + initrd +
> > > cmdline and use a secure boot signed UKI instead, or use IGVM firmware
> > > images and add the kernel hashes page to the igvm.
> 
> > It's a nice idea to have a firmware IGVM file that contains a signature, so
> > you can for example have a RHEL provided IGVM that only runs UKIs that are
> > signed by Red Hat.
> 
> Yep, that is what should be possible when all this is ready.
> 
> >   - Attestable Bootable Containers. In that case, the command line contains
> > a hash of the target fs-verity protected partition. Red Hat can not be the
> > entity signing that UKI. It must be the user.
> 
> Well, add-ons have been created to address exactly that:  Allow the UKI
> being configured without changing it, so the UKI can be signed by redhat.
> You'll go add user-signed add-on for the command line.
> 
> >   - Add-ons. How do you provide a "debug add-on" and prevent it to gain any
> > access to confidential data?
> 
> Not sure I understand the point you are trying to raise.  Add-ons
> signatures are checked too.

"Debug add-on" is quite a broad concept - some things might be safe
for the vendor to ship as pre-built signed add-ons by default, others
things may not be safe & require user opt-in, by signing an add-on
locally with their MOK and enrolling with it shim.

Also UKIs semi-recently gained support for multiple profiles avoiding
the need for addons in some cases[1]. A simple use of this might be two
cmdlines - one quiet by default and one with verbose kernel messages.
A more advanced use would be one "normal" profile, and one "factory reset"
profile

With regards,
Daniel

[1] https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md#multi-profile-ukis
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24 17:53                             ` Gerd Hoffman
  2025-03-24 18:07                               ` Daniel P. Berrangé
@ 2025-03-25  8:04                               ` Alexander Graf
  2025-03-26 12:27                                 ` Gerd Hoffman
  1 sibling, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-25  8:04 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On 24.03.25 18:53, Gerd Hoffman wrote:
> On Mon, Mar 24, 2025 at 05:31:30PM +0100, Alexander Graf wrote:
>>>>>> What does all this mean for the hypervisor interface ?
>>>>> That means we'll go scratch the region list idea and depend on igvm
>>>>> instead.
>>>>> Which means we are back to the single firmware image.
>>>> So in this model, how are we passing the hashes of kernel, initrd and cmdline?
>>>> Are they going to be part of the firmware image as was previously thought?
>>> Either scratch the idea of using hashes to verify kernel + initrd +
>>> cmdline and use a secure boot signed UKI instead, or use IGVM firmware
>>> images and add the kernel hashes page to the igvm.
>> It's a nice idea to have a firmware IGVM file that contains a signature, so
>> you can for example have a RHEL provided IGVM that only runs UKIs that are
>> signed by Red Hat.
> Yep, that is what should be possible when all this is ready.
>
>>    - Attestable Bootable Containers. In that case, the command line contains
>> a hash of the target fs-verity protected partition. Red Hat can not be the
>> entity signing that UKI. It must be the user.
> Well, add-ons have been created to address exactly that:  Allow the UKI
> being configured without changing it, so the UKI can be signed by redhat.
> You'll go add user-signed add-on for the command line.

The problem is that add-ons are

   1) Separate binaries. So you need to match multiple files.
   2) In this case, get generated out of the vendor (RH)'s control in a 
one-off fashion.

I don't think "signing" is the correct way to address the latter. It's 
rather hashing. But I agree with you that it could (should?) be hashing 
at the PE loader level similar or identical to Secure Boot rather than a 
separate hashing mechanism.

>
>>    - Add-ons. How do you provide a "debug add-on" and prevent it to gain any
>> access to confidential data?
> Not sure I understand the point you are trying to raise.  Add-ons
> signatures are checked too.

2 points here here:

   1) With add-ons, there are multiple binaries. We can't only pass a 
single one.
   2) The security posture of the system may be different between 2 
validly signed images. Think of Daniel's example of verbose kernel 
output. Maybe I consider verbose kernel output already inacceptable, 
while RH thinks it's an ok posture to have. The user needs to have the 
chance to differentiate between a system booted with or without verbose 
kernel output.

>> So we need some equivalent of a hash page.
> Ok.  So two opaque blobs, one which is measured into the launch
> measurement and one which is not?  That gives you the option to pass
> around hashes (or any other data) and leave a trace in the launch
> measurement should you need that.

Yes, I think you want to have one or multiple pages with what 
effectively is a db append variable update blob. Or even a full variable 
store blob. Then a built-in always-on UEFI Secure Boot that runs in 
kernel mode rather than SMM. The IGVM should dictate the physical 
location of that blob so that you can precalculate the launch digest 
with the blob included. Ukify.py then generates the blob along with the 
FUKI.

>> That can absolutely integrate more deeply into UEFI and be actual PE
>> hashes rather than the icky thing the OVMF code does today.
> How the measured opaque blob is actually used is not something
> vmfwupdate needs to care about.

I agree, from a vmfwupdate point of view, we would basically have an 
IGVM. The IGVM describes 2 special page (ranges?): One for the CPUID 
special page, one for the variable store seed. To precalculate the 
launch digest you would need the firmware IGVM, and the seed blob.

All unauthenticated data, such as locations of the UKI and its add-ons 
gets passed as parameter to the firmware IGVM.

>> But we need to support a way to embed the hash in the ukify.py
>> generated IGVM on the fly. With add-ons, maybe even multiple ones.
> I'd prefer not patch the IGVM on the fly at boot time.

Does the flow above align with your preferences? :)

Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-25  8:04                               ` Alexander Graf
@ 2025-03-26 12:27                                 ` Gerd Hoffman
  2025-03-26 15:22                                   ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-26 12:27 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> The problem is that add-ons are
> 
>   1) Separate binaries. So you need to match multiple files.
>   2) In this case, get generated out of the vendor (RH)'s control in a
> one-off fashion.
> 
> I don't think "signing" is the correct way to address the latter. It's
> rather hashing. But I agree with you that it could (should?) be hashing at
> the PE loader level similar or identical to Secure Boot rather than a
> separate hashing mechanism.

Secure boot offers signing and hashing.  You can add both signing
certificates and authenticode hashes to 'db'.

> > Not sure I understand the point you are trying to raise.  Add-ons
> > signatures are checked too.
> 
> 2 points here here:
> 
>   1) With add-ons, there are multiple binaries. We can't only pass a single
> one.

Sure.  You'll go need some container, say a cpio archive, but that is
something vmfwupdate loader and loaded firmware have to agree on and not
something we have to worry about too much for the vmfwupdate interface
design.

>   2) The security posture of the system may be different between 2 validly
> signed images. Think of Daniel's example of verbose kernel output. Maybe I
> consider verbose kernel output already inacceptable, while RH thinks it's an
> ok posture to have. The user needs to have the chance to differentiate
> between a system booted with or without verbose kernel output.

You easily get that by looking at the vTPM measurements.  So not sure
what you are asking for, do you want be able to also do that without a
vTPM?

> > > So we need some equivalent of a hash page.
> > Ok.  So two opaque blobs, one which is measured into the launch
> > measurement and one which is not?  That gives you the option to pass
> > around hashes (or any other data) and leave a trace in the launch
> > measurement should you need that.
> 
> Yes, I think you want to have one or multiple pages with what effectively is
> a db append variable update blob.

Makes sense to me.

> Or even a full variable store blob.

Hmm, not sure.  I'd rather go for passing efi signature database
appendix, probably for both allow ('db') and deny ('dbx').  That is
going to work better I think, variable store format is an
firmware-internal implementation detail I'd avoid encoding that in
some interface.

> The IGVM should dictate the physical location of that blob so that you
> can precalculate the launch digest with the blob included.

Hmm, right.  Physical location matters for the launch measurement, so
adding opaque_measured_addr to struct vmfwupdate isn't going to work
very well.

> Ukify.py then generates the blob along with the FUKI.

Doesn't fly from a distro point of view.  The UKI is signed, so RH ships
that and the customer can't modify it to update the blob, say with some
additional hashes for 'db'.

Effectively we need something roughly equivalent to shim's MokList.  The
distro ships a default configuration (like the cert compiled into the
shim binary) which works fine for most people.  For IGVM that would be a
default efi variable store compiled into the firmware binary.

If you need additional stuff (like the signer cert for the nvidia
driver) there must be some way to add hashes and certificates to db/dbx.

Does it make sense to simply move the firmware update sections from the
main UKI to an add-on?  That would allow customers to easily use their
own if they wish, without breaking the RH signature on the UKI.

> I agree, from a vmfwupdate point of view, we would basically have an IGVM.
> The IGVM describes 2 special page (ranges?): One for the CPUID special page,
> one for the variable store seed. To precalculate the launch digest you would
> need the firmware IGVM, and the seed blob.

Guess we'll have to update the IGVM format spec for that, so we have
page types for 'db' and 'dbx'.  With that it should be possible to load
the igvm, find + update the pages + write it out again.  Then wrap it
into an UKI add-on + copy to ESP.

RH default igvm/add-on would simply have empty 'db' and 'dbx' pages.

Looks workable to me.

> All unauthenticated data, such as locations of the UKI and its add-ons gets
> passed as parameter to the firmware IGVM.

i.e. have a IGVM parameter for opaque_addr + opaque_size instead of
placing this in the vmfwupdate struct?

take care,
  Gerd

PS: in Berlin tomorrow?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-26 12:27                                 ` Gerd Hoffman
@ 2025-03-26 15:22                                   ` Alexander Graf
  2025-03-26 21:51                                     ` Gerd Hoffman
  2025-03-27 12:12                                     ` Ani Sinha
  0 siblings, 2 replies; 68+ messages in thread
From: Alexander Graf @ 2025-03-26 15:22 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On 26.03.25 13:27, Gerd Hoffman wrote:
>    Hi,
>
>> The problem is that add-ons are
>>
>>    1) Separate binaries. So you need to match multiple files.
>>    2) In this case, get generated out of the vendor (RH)'s control in a
>> one-off fashion.
>>
>> I don't think "signing" is the correct way to address the latter. It's
>> rather hashing. But I agree with you that it could (should?) be hashing at
>> the PE loader level similar or identical to Secure Boot rather than a
>> separate hashing mechanism.
> Secure boot offers signing and hashing.  You can add both signing
> certificates and authenticode hashes to 'db'.
>
>>> Not sure I understand the point you are trying to raise.  Add-ons
>>> signatures are checked too.
>> 2 points here here:
>>
>>    1) With add-ons, there are multiple binaries. We can't only pass a single
>> one.
> Sure.  You'll go need some container, say a cpio archive, but that is
> something vmfwupdate loader and loaded firmware have to agree on and not
> something we have to worry about too much for the vmfwupdate interface
> design.
>
>>    2) The security posture of the system may be different between 2 validly
>> signed images. Think of Daniel's example of verbose kernel output. Maybe I
>> consider verbose kernel output already inacceptable, while RH thinks it's an
>> ok posture to have. The user needs to have the chance to differentiate
>> between a system booted with or without verbose kernel output.
> You easily get that by looking at the vTPM measurements.  So not sure
> what you are asking for, do you want be able to also do that without a
> vTPM?

All of this should work without vTPMs. Why add complexity when you don't 
need it? The industry is already struggling to deal with TPMs alone. 
There are way too many potential holes in a solution where you first 
have to reason about the integrity of your TPM before you can trust it. 
All of the vTPM in SEV-SNP talk is a house of cards I'm happy to push 
(blow to keep the analogy?) against as hard as I can.

>
>>>> So we need some equivalent of a hash page.
>>> Ok.  So two opaque blobs, one which is measured into the launch
>>> measurement and one which is not?  That gives you the option to pass
>>> around hashes (or any other data) and leave a trace in the launch
>>> measurement should you need that.
>> Yes, I think you want to have one or multiple pages with what effectively is
>> a db append variable update blob.
> Makes sense to me.
>
>> Or even a full variable store blob.
> Hmm, not sure.  I'd rather go for passing efi signature database
> appendix, probably for both allow ('db') and deny ('dbx').  That is
> going to work better I think, variable store format is an
> firmware-internal implementation detail I'd avoid encoding that in
> some interface.
>
>> The IGVM should dictate the physical location of that blob so that you
>> can precalculate the launch digest with the blob included.
> Hmm, right.  Physical location matters for the launch measurement, so
> adding opaque_measured_addr to struct vmfwupdate isn't going to work
> very well.
>
>> Ukify.py then generates the blob along with the FUKI.
> Doesn't fly from a distro point of view.  The UKI is signed, so RH ships
> that and the customer can't modify it to update the blob, say with some
> additional hashes for 'db'.

I don't follow. Is RH going to ship a UKI or a FUKI? And if RH ships the 
UKI, ukify could still take a UKI as input and generate a FUKI based on 
it, no? During that process, it would generate a db which gets put at a 
fixed location in RAM so the (RH provided) firmware picks it up and 
validates the UKI it loads is exactly that one UKI.

We can extend that with an additional signature flow, where the ukify 
generated db contains a signature for the UKI instead. But I would 
generally advise against optimizing for certificate based flows until 
revocation is sorted out. And since revocation requires a new dbx in 
this scheme and that means a different launch digest per revocation, you 
may as well just directly only use hashes.

> Effectively we need something roughly equivalent to shim's MokList.  The
> distro ships a default configuration (like the cert compiled into the
> shim binary) which works fine for most people.  For IGVM that would be a
> default efi variable store compiled into the firmware binary.

This only makes sense in a world where RH ships an SVSM and that's all 
you want to attest. But that's a different flow from what we describe 
here. To actually get workload attestation, you need to include your 
rootfs in the attestation. And the only entity that can do that is the 
end customer. And they will typically do that through something like 
dm-verity or fs-verity and a hash provided on the kernel command line.

>
> If you need additional stuff (like the signer cert for the nvidia
> driver) there must be some way to add hashes and certificates to db/dbx.
>
> Does it make sense to simply move the firmware update sections from the
> main UKI to an add-on?  That would allow customers to easily use their
> own if they wish, without breaking the RH signature on the UKI.

I'm still not convinced "RH signature" is a useful marker at execution 
time of confidential workloads. It may be an interesting one at assembly 
time, but after that we're folding everything into a single launch 
digest anyway, for good reasons.

>
>> I agree, from a vmfwupdate point of view, we would basically have an IGVM.
>> The IGVM describes 2 special page (ranges?): One for the CPUID special page,
>> one for the variable store seed. To precalculate the launch digest you would
>> need the firmware IGVM, and the seed blob.
> Guess we'll have to update the IGVM format spec for that, so we have
> page types for 'db' and 'dbx'.  With that it should be possible to load
> the igvm, find + update the pages + write it out again.  Then wrap it
> into an UKI add-on + copy to ESP.
>
> RH default igvm/add-on would simply have empty 'db' and 'dbx' pages.
>
> Looks workable to me.

I would personally consider the "RH binary adds RH signature into binary 
by default" a backdoor, but that's you call :).

I agree with the plan to amend the IGVM spec with the UEFI variable 
update page. I don't think it should be "db" and "dbx" pages. It should 
be a more generic. In fact, why not make it a loader UEFI (PE) binary? 
The binary installs the db/dbx updates and chain loads the actual 
payload UKIs using a file format that's private between the loader 
binary and the sd-boot stub.

>
>> All unauthenticated data, such as locations of the UKI and its add-ons gets
>> passed as parameter to the firmware IGVM.
> i.e. have a IGVM parameter for opaque_addr + opaque_size instead of
> placing this in the vmfwupdate struct?

It would be a pretty natural fit for it, no?

>
> take care,
>    Gerd
>
> PS: in Berlin tomorrow?

Yup, let's sync up! :)

Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-26 15:22                                   ` Alexander Graf
@ 2025-03-26 21:51                                     ` Gerd Hoffman
  2025-04-07 16:21                                       ` Dionna Amalie Glaze
  2025-03-27 12:12                                     ` Ani Sinha
  1 sibling, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-03-26 21:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Jörg Rödel, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> > >    2) The security posture of the system may be different between 2 validly
> > > signed images. Think of Daniel's example of verbose kernel output. Maybe I
> > > consider verbose kernel output already inacceptable, while RH thinks it's an
> > > ok posture to have. The user needs to have the chance to differentiate
> > > between a system booted with or without verbose kernel output.
> > You easily get that by looking at the vTPM measurements.  So not sure
> > what you are asking for, do you want be able to also do that without a
> > vTPM?
> 
> All of this should work without vTPMs. Why add complexity when you don't
> need it? The industry is already struggling to deal with TPMs alone. There
> are way too many potential holes in a solution where you first have to
> reason about the integrity of your TPM before you can trust it. All of the
> vTPM in SEV-SNP talk is a house of cards I'm happy to push (blow to keep the
> analogy?) against as hard as I can.

TPM is the only thing we have outside of confidential computing.  So
naturally there is alot of code and infrastructure for that despite the
complexity of TPMs.  So having vTPMs in CVMs allows to reuse that
infrastructure and it totally makes sense to support that.

That doesn't imply this is the only option to handle things.  Having the
option to tie everything to the launch measurement makes sense too.
Fair point.

> > > Ukify.py then generates the blob along with the FUKI.
> > Doesn't fly from a distro point of view.  The UKI is signed, so RH ships
> > that and the customer can't modify it to update the blob, say with some
> > additional hashes for 'db'.
> 
> I don't follow. Is RH going to ship a UKI or a FUKI?

How should I answer that one while we are still discussing the design?

> And if RH ships the UKI, ukify could still take a UKI as input and
> generate a FUKI based on it, no?

Of course you can, but it'll break the RH signature.

> During that process, it would generate a db which gets put at a fixed
> location in RAM so the (RH provided) firmware picks it up and validates the
> UKI it loads is exactly that one UKI.

Well.  If you want put the db into the igvm and the igvm into the uki
you've got a chicken-and-egg problem.  Moving the firmware from the main
UKI to UKI add-on would solve that.

> We can extend that with an additional signature flow, where the ukify
> generated db contains a signature for the UKI instead.

certificate, not signature.  But, yes, that would work.

> But I would generally advise against optimizing for certificate based
> flows until revocation is sorted out.

Well, if at the end of the day we'll go pass on a 'db' both signatures
and hashes can be used.

> > Effectively we need something roughly equivalent to shim's MokList.  The
> > distro ships a default configuration (like the cert compiled into the
> > shim binary) which works fine for most people.  For IGVM that would be a
> > default efi variable store compiled into the firmware binary.
> 
> This only makes sense in a world where RH ships an SVSM and that's all you
> want to attest. But that's a different flow from what we describe here. To
> actually get workload attestation, you need to include your rootfs in the
> attestation. And the only entity that can do that is the end customer. And
> they will typically do that through something like dm-verity or fs-verity
> and a hash provided on the kernel command line.

So you'll have the RH UKI + customer add-on with the kernel command
line.  And you need a customized igvm containing a 'db' which will allow
both, either via certificate+signature or via hash, whatever you prefer.

> > Does it make sense to simply move the firmware update sections from the
> > main UKI to an add-on?  That would allow customers to easily use their
> > own if they wish, without breaking the RH signature on the UKI.
> 
> I'm still not convinced "RH signature" is a useful marker at execution time
> of confidential workloads.

If the customer agrees they can go for hashes.  If not they should be
able to use certificates.  So patching the RH shipped UKI is not an
option.

> > RH default igvm/add-on would simply have empty 'db' and 'dbx' pages.
> > 
> > Looks workable to me.
> 
> I would personally consider the "RH binary adds RH signature into binary by
> default" a backdoor, but that's you call :).

It's surely possible to ship two variants, one with the RH certificates
and one blank, so customers can pick the latter if they want allow
specific hashes only.

> I agree with the plan to amend the IGVM spec with the UEFI variable update
> page. I don't think it should be "db" and "dbx" pages. It should be a more
> generic. In fact, why not make it a loader UEFI (PE) binary?

Not convinced a PE binary is a good idea.  Adds one more indirection for
no obvious gain.

> > > All unauthenticated data, such as locations of the UKI and its add-ons gets
> > > passed as parameter to the firmware IGVM.
> > i.e. have a IGVM parameter for opaque_addr + opaque_size instead of
> > placing this in the vmfwupdate struct?
> 
> It would be a pretty natural fit for it, no?

Absolutely.  Just wanted to make sure we are talking about the same thing.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-26 21:51                                     ` Gerd Hoffman
@ 2025-04-07 16:21                                       ` Dionna Amalie Glaze
  2025-04-08  8:33                                         ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Dionna Amalie Glaze @ 2025-04-07 16:21 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Wed, Mar 26, 2025 at 2:53 PM Gerd Hoffman <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > >    2) The security posture of the system may be different between 2 validly
> > > > signed images. Think of Daniel's example of verbose kernel output. Maybe I
> > > > consider verbose kernel output already inacceptable, while RH thinks it's an
> > > > ok posture to have. The user needs to have the chance to differentiate
> > > > between a system booted with or without verbose kernel output.
> > > You easily get that by looking at the vTPM measurements.  So not sure
> > > what you are asking for, do you want be able to also do that without a
> > > vTPM?
> >
> > All of this should work without vTPMs. Why add complexity when you don't
> > need it? The industry is already struggling to deal with TPMs alone. There
> > are way too many potential holes in a solution where you first have to
> > reason about the integrity of your TPM before you can trust it. All of the
> > vTPM in SEV-SNP talk is a house of cards I'm happy to push (blow to keep the
> > analogy?) against as hard as I can.
>
> TPM is the only thing we have outside of confidential computing.  So
> naturally there is alot of code and infrastructure for that despite the
> complexity of TPMs.  So having vTPMs in CVMs allows to reuse that
> infrastructure and it totally makes sense to support that.
>
> That doesn't imply this is the only option to handle things.  Having the
> option to tie everything to the launch measurement makes sense too.
> Fair point.
>
> > > > Ukify.py then generates the blob along with the FUKI.
> > > Doesn't fly from a distro point of view.  The UKI is signed, so RH ships
> > > that and the customer can't modify it to update the blob, say with some
> > > additional hashes for 'db'.
> >
> > I don't follow. Is RH going to ship a UKI or a FUKI?
>
> How should I answer that one while we are still discussing the design?
>
> > And if RH ships the UKI, ukify could still take a UKI as input and
> > generate a FUKI based on it, no?
>
> Of course you can, but it'll break the RH signature.
>
> > During that process, it would generate a db which gets put at a fixed
> > location in RAM so the (RH provided) firmware picks it up and validates the
> > UKI it loads is exactly that one UKI.
>
> Well.  If you want put the db into the igvm and the igvm into the uki
> you've got a chicken-and-egg problem.  Moving the firmware from the main
> UKI to UKI add-on would solve that.

Why is embedding a public key that will sign the IGVM in the IGVM a
chicken-and-egg problem? It's only that if db were a list of
acceptable measurements, which it isn't.
I'm not sure why relying on secure boot makes any sense for
confidential computing. I still think that tearing down the VM context
and rebuilding it is more secure, given
the need for an honest launch measurement/MRTD.

For Cloud Computing, which is the only place that FUKI makes sense as
a delivery mechanism, it's more important to rely on attestation
policy that has more awareness of the current security posture of
measurements.
Secure boot conflates authentic with secure. With a supply chain
distribution standard like CoRIM, we can ensure that individual
binaries have not only authentic measurements, but also security
version numbers (SVNs) that can be compared against a shorter-lived
endorsement of what is the most up-to-date and which CVEs are known
for lower SVNs.
Revocation is just not a real thing that works. Short-lived policy is.
Policy services can be updated more simply than the UEFI variables of
every node in the fleet.



>
> > We can extend that with an additional signature flow, where the ukify
> > generated db contains a signature for the UKI instead.
>
> certificate, not signature.  But, yes, that would work.
>
> > But I would generally advise against optimizing for certificate based
> > flows until revocation is sorted out.
>
> Well, if at the end of the day we'll go pass on a 'db' both signatures
> and hashes can be used.
>
> > > Effectively we need something roughly equivalent to shim's MokList.  The
> > > distro ships a default configuration (like the cert compiled into the
> > > shim binary) which works fine for most people.  For IGVM that would be a
> > > default efi variable store compiled into the firmware binary.
> >
> > This only makes sense in a world where RH ships an SVSM and that's all you
> > want to attest. But that's a different flow from what we describe here. To
> > actually get workload attestation, you need to include your rootfs in the
> > attestation. And the only entity that can do that is the end customer. And
> > they will typically do that through something like dm-verity or fs-verity
> > and a hash provided on the kernel command line.
>
> So you'll have the RH UKI + customer add-on with the kernel command
> line.  And you need a customized igvm containing a 'db' which will allow
> both, either via certificate+signature or via hash, whatever you prefer.
>
> > > Does it make sense to simply move the firmware update sections from the
> > > main UKI to an add-on?  That would allow customers to easily use their
> > > own if they wish, without breaking the RH signature on the UKI.
> >
> > I'm still not convinced "RH signature" is a useful marker at execution time
> > of confidential workloads.
>
> If the customer agrees they can go for hashes.  If not they should be
> able to use certificates.  So patching the RH shipped UKI is not an
> option.
>
> > > RH default igvm/add-on would simply have empty 'db' and 'dbx' pages.
> > >
> > > Looks workable to me.
> >
> > I would personally consider the "RH binary adds RH signature into binary by
> > default" a backdoor, but that's you call :).
>
> It's surely possible to ship two variants, one with the RH certificates
> and one blank, so customers can pick the latter if they want allow
> specific hashes only.
>
> > I agree with the plan to amend the IGVM spec with the UEFI variable update
> > page. I don't think it should be "db" and "dbx" pages. It should be a more
> > generic. In fact, why not make it a loader UEFI (PE) binary?
>
> Not convinced a PE binary is a good idea.  Adds one more indirection for
> no obvious gain.
>
> > > > All unauthenticated data, such as locations of the UKI and its add-ons gets
> > > > passed as parameter to the firmware IGVM.
> > > i.e. have a IGVM parameter for opaque_addr + opaque_size instead of
> > > placing this in the vmfwupdate struct?
> >
> > It would be a pretty natural fit for it, no?
>
> Absolutely.  Just wanted to make sure we are talking about the same thing.
>
> take care,
>   Gerd
>
>


--
-Dionna Glaze, PhD, CISSP, CCSP (she/her)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-07 16:21                                       ` Dionna Amalie Glaze
@ 2025-04-08  8:33                                         ` Gerd Hoffman
  2025-04-08 21:42                                           ` Dionna Amalie Glaze
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-04-08  8:33 UTC (permalink / raw)
  To: Dionna Amalie Glaze
  Cc: Alexander Graf, Ani Sinha, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

  Hi,

> > Well.  If you want put the db into the igvm and the igvm into the uki
> > you've got a chicken-and-egg problem.  Moving the firmware from the main
> > UKI to UKI add-on would solve that.
> 
> Why is embedding a public key that will sign the IGVM in the IGVM a
> chicken-and-egg problem? It's only that if db were a list of
> acceptable measurements, which it isn't.
> I'm not sure why relying on secure boot makes any sense for
> confidential computing. I still think that tearing down the VM context
> and rebuilding it is more secure, given
> the need for an honest launch measurement/MRTD.

Current idea is to allow passing EFI signature databases for db/dbx to
the firmware getting loaded.  The signature databases must be part of
the launch measurement.  Not clear yet how exactly to handle that, one
idea is to add a special page type to the igvm spec, so a igvm parser
can easily find and update db/dbx.

So, the VM context will be rebuild, the igvm (including db/dbx) will be
measured, the firmware can verify the payload using db/dbx and standard
secure boot hash/signature.

This allows to use both signing (pass CA certificate in db) and hashing
(pass authenticode hash(es) in db) for payload verification.

The chicken-and-egg problem arises if you go for hashing and want embed
the igvm file in the UKI.

> Revocation is just not a real thing that works. Short-lived policy is.
> Policy services can be updated more simply than the UEFI variables of
> every node in the fleet.

In the model outlined above you'll go ship db/dbx in the igvm, so the
launch measurement will tell you what is allowed and what is not.  Which
in turn can be used in attestation server policies.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-08  8:33                                         ` Gerd Hoffman
@ 2025-04-08 21:42                                           ` Dionna Amalie Glaze
  2025-04-09  6:21                                             ` Gerd Hoffman
  2025-04-09 11:59                                             ` Ani Sinha
  0 siblings, 2 replies; 68+ messages in thread
From: Dionna Amalie Glaze @ 2025-04-08 21:42 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Tue, Apr 8, 2025 at 1:33 AM Gerd Hoffman <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > Well.  If you want put the db into the igvm and the igvm into the uki
> > > you've got a chicken-and-egg problem.  Moving the firmware from the main
> > > UKI to UKI add-on would solve that.
> >
> > Why is embedding a public key that will sign the IGVM in the IGVM a
> > chicken-and-egg problem? It's only that if db were a list of
> > acceptable measurements, which it isn't.
> > I'm not sure why relying on secure boot makes any sense for
> > confidential computing. I still think that tearing down the VM context
> > and rebuilding it is more secure, given
> > the need for an honest launch measurement/MRTD.
>
> Current idea is to allow passing EFI signature databases for db/dbx to
> the firmware getting loaded.  The signature databases must be part of
> the launch measurement.  Not clear yet how exactly to handle that, one
> idea is to add a special page type to the igvm spec, so a igvm parser
> can easily find and update db/dbx.
>
> So, the VM context will be rebuild, the igvm (including db/dbx) will be
> measured, the firmware can verify the payload using db/dbx and standard
> secure boot hash/signature.
>
> This allows to use both signing (pass CA certificate in db) and hashing
> (pass authenticode hash(es) in db) for payload verification.
>
> The chicken-and-egg problem arises if you go for hashing and want embed
> the igvm file in the UKI.

I don't really see how signing the IGVM file for secure boot helps anything.
The VM context will be reconstructed by the hypervisor, and anything
checked or stored before loading
the new firmware is suspect.
SEV-SNP has IDBLOCK_AUTH to have the SNP firmware check a signature,
but TDX has nothing of the sort.
Your initial measurement is checked by nothing trusted.

Do you need the UEFI_APPLICATION that uses the vmfwupdate interface to
be signed for secure boot? Seems unnecessary.

>
> > Revocation is just not a real thing that works. Short-lived policy is.
> > Policy services can be updated more simply than the UEFI variables of
> > every node in the fleet.
>
> In the model outlined above you'll go ship db/dbx in the igvm, so the
> launch measurement will tell you what is allowed and what is not.  Which
> in turn can be used in attestation server policies.
>
> take care,
>   Gerd
>

--
-Dionna Glaze, PhD, CISSP, CCSP (she/her)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-08 21:42                                           ` Dionna Amalie Glaze
@ 2025-04-09  6:21                                             ` Gerd Hoffman
  2025-04-10  6:31                                               ` Ani Sinha
  2025-04-09 11:59                                             ` Ani Sinha
  1 sibling, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-04-09  6:21 UTC (permalink / raw)
  To: Dionna Amalie Glaze
  Cc: Alexander Graf, Ani Sinha, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

  Hi,

> > The chicken-and-egg problem arises if you go for hashing and want embed
> > the igvm file in the UKI.
> 
> I don't really see how signing the IGVM file for secure boot helps anything.

It doesn't help indeed.  This comes from the original idea by Alex to
simply add a firmware image to the UKI.  In that case the firmware is
covered by the signature / hash, even though it is not needed.  Quite
the contrary, it complicates things when we want ship db/dbx in the
firmware image.

So most likely the firmware will not be part of the main UKI.  Options
for alternatives are using UKI add-ons, or simply ship a plain igvm
file.  Details need to be sorted out (but they don't matter for the
vmfwupdate interface design).

> Do you need the UEFI_APPLICATION that uses the vmfwupdate interface to
> be signed for secure boot? Seems unnecessary.

Agree.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-09  6:21                                             ` Gerd Hoffman
@ 2025-04-10  6:31                                               ` Ani Sinha
  2025-04-10 10:44                                                 ` Gerd Hoffmann
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-04-10  6:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Dionna Amalie Glaze, Alexander Graf, Jörg Rödel,
	Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael Tsirkin, Fabiano Rosas, Laurent Vivier,
	Igor Mammedov, Vitaly Kuznetsov, qemu-devel



> On 9 Apr 2025, at 11:51 AM, Gerd Hoffman <kraxel@redhat.com> wrote:
> 
>  Hi,
> 
>>> The chicken-and-egg problem arises if you go for hashing and want embed
>>> the igvm file in the UKI.
>> 
>> I don't really see how signing the IGVM file for secure boot helps anything.
> 
> It doesn't help indeed.  This comes from the original idea by Alex to
> simply add a firmware image to the UKI.  In that case the firmware is
> covered by the signature / hash, even though it is not needed.  Quite
> the contrary, it complicates things when we want ship db/dbx in the
> firmware image.
> 
> So most likely the firmware will not be part of the main UKI.  Options
> for alternatives are using UKI add-ons,

But add-ons are also subjected to signature verification. How does not using the main UKI help?

> or simply ship a plain igvm
> file.  Details need to be sorted out (but they don't matter for the
> vmfwupdate interface design).
> 
>> Do you need the UEFI_APPLICATION that uses the vmfwupdate interface to
>> be signed for secure boot? Seems unnecessary.
> 
> Agree.
> 
> take care,
>  Gerd
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-10  6:31                                               ` Ani Sinha
@ 2025-04-10 10:44                                                 ` Gerd Hoffmann
  2025-04-16 11:40                                                   ` Ani Sinha
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffmann @ 2025-04-10 10:44 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Dionna Amalie Glaze, Alexander Graf, Jörg Rödel,
	Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael Tsirkin, Fabiano Rosas, Laurent Vivier,
	Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Thu, Apr 10, 2025 at 12:01:18PM +0530, Ani Sinha wrote:
> 
> 
> > On 9 Apr 2025, at 11:51 AM, Gerd Hoffman <kraxel@redhat.com> wrote:
> > 
> >  Hi,
> > 
> >>> The chicken-and-egg problem arises if you go for hashing and want embed
> >>> the igvm file in the UKI.
> >> 
> >> I don't really see how signing the IGVM file for secure boot helps anything.
> > 
> > It doesn't help indeed.  This comes from the original idea by Alex to
> > simply add a firmware image to the UKI.  In that case the firmware is
> > covered by the signature / hash, even though it is not needed.  Quite
> > the contrary, it complicates things when we want ship db/dbx in the
> > firmware image.
> > 
> > So most likely the firmware will not be part of the main UKI.  Options
> > for alternatives are using UKI add-ons,
> 
> But add-ons are also subjected to signature verification. How does not using the main UKI help?

For the first boot using secure boot doesn't help much, the trust
in the firmware being loaded for the second boot is established via
launch measurement not secure boot signature.

For the second boot (with new firmware) you don't need the add-on any
more.

The main advantage of wrapping the igvm into a uki add-on is that we
can easily use the hwid matching support of systemd-stub when packaging
multiple firmware variants (aws, azure, gcp, qemu, ...).  Not sure this
will actually matter in practice though.

take care,
  Gersd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-10 10:44                                                 ` Gerd Hoffmann
@ 2025-04-16 11:40                                                   ` Ani Sinha
  0 siblings, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-04-16 11:40 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Dionna Amalie Glaze, Alexander Graf, Jörg Rödel,
	Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael Tsirkin, Fabiano Rosas, Laurent Vivier,
	Igor Mammedov, Vitaly Kuznetsov, qemu-devel



> On 10 Apr 2025, at 4:14 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> 
> On Thu, Apr 10, 2025 at 12:01:18PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 9 Apr 2025, at 11:51 AM, Gerd Hoffman <kraxel@redhat.com> wrote:
>>> 
>>> Hi,
>>> 
>>>>> The chicken-and-egg problem arises if you go for hashing and want embed
>>>>> the igvm file in the UKI.
>>>> 
>>>> I don't really see how signing the IGVM file for secure boot helps anything.
>>> 
>>> It doesn't help indeed.  This comes from the original idea by Alex to
>>> simply add a firmware image to the UKI.  In that case the firmware is
>>> covered by the signature / hash, even though it is not needed.  Quite
>>> the contrary, it complicates things when we want ship db/dbx in the
>>> firmware image.
>>> 
>>> So most likely the firmware will not be part of the main UKI.  Options
>>> for alternatives are using UKI add-ons,
>> 
>> But add-ons are also subjected to signature verification. How does not using the main UKI help?
> 
> For the first boot using secure boot doesn't help much, the trust
> in the firmware being loaded for the second boot is established via
> launch measurement not secure boot signature.
> 
> For the second boot (with new firmware) you don't need the add-on any
> more.
> 
> The main advantage of wrapping the igvm into a uki add-on is that we
> can easily use the hwid matching support of systemd-stub when packaging
> multiple firmware variants (aws, azure, gcp, qemu, ...).  Not sure this
> will actually matter in practice though.

This makes sense.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-08 21:42                                           ` Dionna Amalie Glaze
  2025-04-09  6:21                                             ` Gerd Hoffman
@ 2025-04-09 11:59                                             ` Ani Sinha
  1 sibling, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-04-09 11:59 UTC (permalink / raw)
  To: Dionna Amalie Glaze
  Cc: Gerd Hoffman, Alexander Graf, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel



> On Apr 9, 2025, at 03:12, Dionna Amalie Glaze <dionnaglaze@google.com> wrote:
> 
> On Tue, Apr 8, 2025 at 1:33 AM Gerd Hoffman <kraxel@redhat.com> wrote:
>> 
>>  Hi,
>> 
>>>> Well.  If you want put the db into the igvm and the igvm into the uki
>>>> you've got a chicken-and-egg problem.  Moving the firmware from the main
>>>> UKI to UKI add-on would solve that.
>>> 
>>> Why is embedding a public key that will sign the IGVM in the IGVM a
>>> chicken-and-egg problem? It's only that if db were a list of
>>> acceptable measurements, which it isn't.
>>> I'm not sure why relying on secure boot makes any sense for
>>> confidential computing. I still think that tearing down the VM context
>>> and rebuilding it is more secure, given
>>> the need for an honest launch measurement/MRTD.
>> 
>> Current idea is to allow passing EFI signature databases for db/dbx to
>> the firmware getting loaded.  The signature databases must be part of
>> the launch measurement.  Not clear yet how exactly to handle that, one
>> idea is to add a special page type to the igvm spec, so a igvm parser
>> can easily find and update db/dbx.
>> 
>> So, the VM context will be rebuild, the igvm (including db/dbx) will be
>> measured, the firmware can verify the payload using db/dbx and standard
>> secure boot hash/signature.
>> 
>> This allows to use both signing (pass CA certificate in db) and hashing
>> (pass authenticode hash(es) in db) for payload verification.
>> 
>> The chicken-and-egg problem arises if you go for hashing and want embed
>> the igvm file in the UKI.
> 
> I don't really see how signing the IGVM file for secure boot helps anything.
> The VM context will be reconstructed by the hypervisor, and anything
> checked or stored before loading
> the new firmware is suspect.
> SEV-SNP has IDBLOCK_AUTH to have the SNP firmware check a signature,
> but TDX has nothing of the sort.
> Your initial measurement is checked by nothing trusted.

By "initial measurement" you mean before the VM context is reconstructed, that is in the first stage boot correct?

> 
> Do you need the UEFI_APPLICATION that uses the vmfwupdate interface to
> be signed for secure boot? Seems unnecessary.
> 
>> 
>>> Revocation is just not a real thing that works. Short-lived policy is.
>>> Policy services can be updated more simply than the UEFI variables of
>>> every node in the fleet.
>> 
>> In the model outlined above you'll go ship db/dbx in the igvm, so the
>> launch measurement will tell you what is allowed and what is not.  Which
>> in turn can be used in attestation server policies.
>> 
>> take care,
>>  Gerd
>> 
> 
> --
> -Dionna Glaze, PhD, CISSP, CCSP (she/her)




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-26 15:22                                   ` Alexander Graf
  2025-03-26 21:51                                     ` Gerd Hoffman
@ 2025-03-27 12:12                                     ` Ani Sinha
  1 sibling, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-03-27 12:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gerd Hoffman, Jörg Rödel, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Wed, Mar 26, 2025 at 8:52 PM Alexander Graf <graf@amazon.com> wrote:
>
>
> On 26.03.25 13:27, Gerd Hoffman wrote:
> >    Hi,
> >
> >> The problem is that add-ons are
> >>
> >>    1) Separate binaries. So you need to match multiple files.
> >>    2) In this case, get generated out of the vendor (RH)'s control in a
> >> one-off fashion.
> >>
> >> I don't think "signing" is the correct way to address the latter. It's
> >> rather hashing. But I agree with you that it could (should?) be hashing at
> >> the PE loader level similar or identical to Secure Boot rather than a
> >> separate hashing mechanism.
> > Secure boot offers signing and hashing.  You can add both signing
> > certificates and authenticode hashes to 'db'.
> >
> >>> Not sure I understand the point you are trying to raise.  Add-ons
> >>> signatures are checked too.
> >> 2 points here here:
> >>
> >>    1) With add-ons, there are multiple binaries. We can't only pass a single
> >> one.
> > Sure.  You'll go need some container, say a cpio archive, but that is
> > something vmfwupdate loader and loaded firmware have to agree on and not
> > something we have to worry about too much for the vmfwupdate interface
> > design.
> >
> >>    2) The security posture of the system may be different between 2 validly
> >> signed images. Think of Daniel's example of verbose kernel output. Maybe I
> >> consider verbose kernel output already inacceptable, while RH thinks it's an
> >> ok posture to have. The user needs to have the chance to differentiate
> >> between a system booted with or without verbose kernel output.
> > You easily get that by looking at the vTPM measurements.  So not sure
> > what you are asking for, do you want be able to also do that without a
> > vTPM?
>
>
> All of this should work without vTPMs. Why add complexity when you don't
> need it? The industry is already struggling to deal with TPMs alone.
> There are way too many potential holes in a solution where you first
> have to reason about the integrity of your TPM before you can trust it.
> All of the vTPM in SEV-SNP talk is a house of cards I'm happy to push
> (blow to keep the analogy?) against as hard as I can.
>
>
> >
> >>>> So we need some equivalent of a hash page.
> >>> Ok.  So two opaque blobs, one which is measured into the launch
> >>> measurement and one which is not?  That gives you the option to pass
> >>> around hashes (or any other data) and leave a trace in the launch
> >>> measurement should you need that.
> >> Yes, I think you want to have one or multiple pages with what effectively is
> >> a db append variable update blob.
> > Makes sense to me.
> >
> >> Or even a full variable store blob.
> > Hmm, not sure.  I'd rather go for passing efi signature database
> > appendix, probably for both allow ('db') and deny ('dbx').  That is
> > going to work better I think, variable store format is an
> > firmware-internal implementation detail I'd avoid encoding that in
> > some interface.
> >
> >> The IGVM should dictate the physical location of that blob so that you
> >> can precalculate the launch digest with the blob included.
> > Hmm, right.  Physical location matters for the launch measurement, so
> > adding opaque_measured_addr to struct vmfwupdate isn't going to work
> > very well.
> >
> >> Ukify.py then generates the blob along with the FUKI.
> > Doesn't fly from a distro point of view.  The UKI is signed, so RH ships
> > that and the customer can't modify it to update the blob, say with some
> > additional hashes for 'db'.
>
>
> I don't follow. Is RH going to ship a UKI or a FUKI? And if RH ships the
> UKI, ukify could still take a UKI as input and generate a FUKI based on
> it, no? During that process, it would generate a db which gets put at a
> fixed location in RAM so the (RH provided) firmware picks it up and
> validates the UKI it loads is exactly that one UKI.
>
> We can extend that with an additional signature flow, where the ukify
> generated db contains a signature for the UKI instead. But I would
> generally advise against optimizing for certificate based flows until
> revocation is sorted out. And since revocation requires a new dbx in
> this scheme and that means a different launch digest per revocation, you
> may as well just directly only use hashes.
>
>
> > Effectively we need something roughly equivalent to shim's MokList.  The
> > distro ships a default configuration (like the cert compiled into the
> > shim binary) which works fine for most people.  For IGVM that would be a
> > default efi variable store compiled into the firmware binary.
>
>
> This only makes sense in a world where RH ships an SVSM and that's all
> you want to attest. But that's a different flow from what we describe
> here. To actually get workload attestation, you need to include your
> rootfs in the attestation. And the only entity that can do that is the
> end customer. And they will typically do that through something like
> dm-verity or fs-verity and a hash provided on the kernel command line.
>
>
> >
> > If you need additional stuff (like the signer cert for the nvidia
> > driver) there must be some way to add hashes and certificates to db/dbx.
> >
> > Does it make sense to simply move the firmware update sections from the
> > main UKI to an add-on?  That would allow customers to easily use their
> > own if they wish, without breaking the RH signature on the UKI.
>
>
> I'm still not convinced "RH signature" is a useful marker at execution
> time of confidential workloads. It may be an interesting one at assembly
> time, but after that we're folding everything into a single launch
> digest anyway, for good reasons.
>
>
> >
> >> I agree, from a vmfwupdate point of view, we would basically have an IGVM.
> >> The IGVM describes 2 special page (ranges?): One for the CPUID special page,
> >> one for the variable store seed. To precalculate the launch digest you would
> >> need the firmware IGVM, and the seed blob.
> > Guess we'll have to update the IGVM format spec for that, so we have
> > page types for 'db' and 'dbx'.  With that it should be possible to load
> > the igvm, find + update the pages + write it out again.  Then wrap it
> > into an UKI add-on + copy to ESP.
> >
> > RH default igvm/add-on would simply have empty 'db' and 'dbx' pages.
> >
> > Looks workable to me.
>
>
> I would personally consider the "RH binary adds RH signature into binary
> by default" a backdoor, but that's you call :).
>
> I agree with the plan to amend the IGVM spec with the UEFI variable
> update page. I don't think it should be "db" and "dbx" pages. It should
> be a more generic. In fact, why not make it a loader UEFI (PE) binary?
> The binary installs the db/dbx updates and chain loads the actual
> payload UKIs using a file format that's private between the loader
> binary and the sd-boot stub.
>
>
> >
> >> All unauthenticated data, such as locations of the UKI and its add-ons gets
> >> passed as parameter to the firmware IGVM.
> > i.e. have a IGVM parameter for opaque_addr + opaque_size instead of
> > placing this in the vmfwupdate struct?
>
>
> It would be a pretty natural fit for it, no?
>
>
> >
> > take care,
> >    Gerd
> >
> > PS: in Berlin tomorrow?
>
>
> Yup, let's sync up! :)

Unfortunately I won't be there. Vitaly will be. So you three can sync
along with anyone else who is there. We should have a call after for
those who are not lucky to be attending.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-24  7:43                     ` Gerd Hoffman
  2025-03-24 11:12                       ` Ani Sinha
@ 2025-04-08  8:11                       ` Gerd Hoffman
  2025-05-21  7:50                         ` Ani Sinha
  1 sibling, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-04-08  8:11 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

  Hi,

> Which means we are back to the single firmware image.  I think it makes
> sense to continue supporting classic rom images (which can also be
> loaded via -bios).  Any use case which needs more fine-grained control
> must use igvm.  We can use format bits in both capabilities and control
> fields to indicate what the hypervisor supports and what has been
> uploaded to the firmware image region.  See interface header file draft
> below.

Updated draft.  Idea is to go all-in on IGVM and support IGVM only.  We
keep the format bit, but more to make things future-proof (have the
option to support other formats should the need arise at some point in
the future) and not because we plan to support multiple formats today.

So we are down to this:

--------------------------- cut here ------------------------

/*
 * igvm only vmfwupdate interface rewrite
 */

struct vmfwupdate {
    // VMM capabilities, see VMFWUPDATE_CAP_*, read-only.
    uint64_t capabilities;
    // control bits, see VMFWUPDATE_CTL_*
    uint64_t control;

    // address and size of the firmware update image.  Will be cleared on
    // firmware update and reset.
    uint64_t fw_image_addr;
    uint16_t fw_image_size;
};

// --- format bits, used by both 'capabilities' and 'control' ---
// igvm
#define VMFWUPDATE_FORMAT_IGVM           (1 << 32)

// --- 'control' field bits ---
// disable vmfwupdate interface
#define VMFWUPDATE_CTL_DISABLE            (1 << 0)

--------------------------- cut here ------------------------

All other details will be offloaded to IGVM.  We will need some IGVM
format updates for that:

 * Add a parameter to specify the location of the payload (i.e.
   the UKI, or some container format in case we want pass on more
   than just one efi binary).
 * Add a page types for db/dbx signature databases where we can
   store either the signing key or the authenticode hash of the
   payload.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-04-08  8:11                       ` Gerd Hoffman
@ 2025-05-21  7:50                         ` Ani Sinha
  0 siblings, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-05-21  7:50 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jörg Rödel, Alexander Graf, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel, Dionna Amalie Glaze



> On 8 Apr 2025, at 1:41 PM, Gerd Hoffman <kraxel@redhat.com> wrote:
> 
>  Hi,
> 
>> Which means we are back to the single firmware image.  I think it makes
>> sense to continue supporting classic rom images (which can also be
>> loaded via -bios).  Any use case which needs more fine-grained control
>> must use igvm.  We can use format bits in both capabilities and control
>> fields to indicate what the hypervisor supports and what has been
>> uploaded to the firmware image region.  See interface header file draft
>> below.
> 
> Updated draft.  

Is everyone OK with this latest draft version of the interface? Can we assume that this is more or less how the final hypervisor interface would look on QEMU? 

> Idea is to go all-in on IGVM and support IGVM only.  We
> keep the format bit, but more to make things future-proof (have the
> option to support other formats should the need arise at some point in
> the future) and not because we plan to support multiple formats today.
> 
> So we are down to this:
> 
> --------------------------- cut here ------------------------
> 
> /*
> * igvm only vmfwupdate interface rewrite
> */
> 
> struct vmfwupdate {
>    // VMM capabilities, see VMFWUPDATE_CAP_*, read-only.
>    uint64_t capabilities;
>    // control bits, see VMFWUPDATE_CTL_*
>    uint64_t control;
> 
>    // address and size of the firmware update image.  Will be cleared on
>    // firmware update and reset.
>    uint64_t fw_image_addr;
>    uint16_t fw_image_size;
> };
> 
> // --- format bits, used by both 'capabilities' and 'control' ---
> // igvm
> #define VMFWUPDATE_FORMAT_IGVM           (1 << 32)
> 
> // --- 'control' field bits ---
> // disable vmfwupdate interface
> #define VMFWUPDATE_CTL_DISABLE            (1 << 0)
> 
> --------------------------- cut here ------------------------
> 
> All other details will be offloaded to IGVM.  We will need some IGVM
> format updates for that:
> 
> * Add a parameter to specify the location of the payload (i.e.
>   the UKI, or some container format in case we want pass on more
>   than just one efi binary).
> * Add a page types for db/dbx signature databases where we can
>   store either the signing key or the authenticode hash of the
>   payload.
> 
> take care,
>  Gerd
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-21 10:08                 ` Gerd Hoffman
  2025-03-21 12:44                   ` Ani Sinha
@ 2025-03-21 12:45                   ` Daniel P. Berrangé
  1 sibling, 0 replies; 68+ messages in thread
From: Daniel P. Berrangé @ 2025-03-21 12:45 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Jörg Rödel, Alexander Graf, Ani Sinha, Paolo Bonzini,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Richard Henderson, Michael S. Tsirkin,
	Fabiano Rosas, Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Fri, Mar 21, 2025 at 11:08:11AM +0100, Gerd Hoffman wrote:
>   Hi,
> 
> > >      While digging around in the igvm spec I've seen there is the
> > >      concept of 'parameters'.  Can this be used to pass on the memory
> > >      location of kernel + initrd + cmdline?  Maybe the kernel hashes too?
> > 
> > The find the locations of the kernel, initrd, cmdline, ... I think IGVM
> > parameters, either directly or (preferably indirectly) are a good
> > solution. By "indirectly" I mean to put these regions on a separate and
> > measured page which also contains the region hashes.
> 
> regions and hashes should be separate I think.  The regions are not
> necessarily fixed, the physical memory location where things have been
> loaded to can change.
> 
> But also see my other reply.  I'm not convinced any more carrying
> forward the hashes logic makes much sense.
> 
> > >  (2) Will the igvm be generated on the fly from FUKI data?  Or should
> > >      the distros ship igvm images with firmware + kernel + initrd?
> > 
> > My preference would be that distros ship the tooling and components to
> > build IGVM files and build them during kernel update. If a distro comes
> > up with a generic initrd+cmdline it can also ship pre-built IGVM files.
> 
> Working with the (F)UKI instead has the advantage that we can make use
> of the secure boot signature, with a workflow along these lines:
> 
> (distro) build:
>  * Embed the firmware as igvm inside the UKI.
>  * Sign the UKI.
> 
> first vm launch:
>  * Load the complete UKI.
>  * Pass on (a) the igvm section and (b) the UKI (including signature)
>    to the vmfwupdate device.
> 
> vmwupdate device:
>  * loads the igvm image.
>  * passes on the UKI location (as igvm parameter?).
> 
> second vm launch:
>  * firmware checks UKI signature.
>  * firmware (optionally) measures UKI into vTPM.
>  * firmware launches UKI.

> 
> Going ship the distro kernel as igvm image would work too.  Will
> simplify the measurement pre-calculation.  Also there is no need to pass
> around any parameters, everything (how the firmware finds the UKI etc)
> can be arranged at igvm build time then.  Disadvantage: This introduces
> a completely new boot workflow.  Will probably need a new set of cloud
> images exclusively for the BYOF case.

Having a completely different boot workflow & images for CVM BYOF that
does not involve/consume existing UKIs & addons feels pretty undesirable
to me.

If the baseline is that the distro/user is using UKIs for everything,
and then they want to do BYOF, that change should involve as little
difference as possible to pull firmware into the mix. Your first
workflow example with FUKI looks like it would satisfy that, but
switching to bundle the kernel/initrd/cmdline directly in the IGVM
would not.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 14:08         ` Gerd Hoffman
  2025-03-14 14:50           ` Alexander Graf
@ 2025-03-14 15:16           ` Jörg Rödel
  2025-03-15  6:08             ` Ani Sinha
  1 sibling, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-14 15:16 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Fri, Mar 14, 2025 at 03:08:43PM +0100, Gerd Hoffman wrote:
> If your input firmware image already is an IGVM (say coconut), what is
> supposed to happen?

The COCONUT igvmbuilder has the ability to take another IGVM file as
input and add its directive and contents to the output file. This is
needed for the Hyper-V firmware, which is also provided as an IGVM file.
I think it can also make changes when processing an input IGVM, like
changing the VMPL of a VMSA.

This can be extended to cover more use-cases, e.g. like direct-boot of a
linux kernel with initrd and command-line.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-14 15:16           ` Jörg Rödel
@ 2025-03-15  6:08             ` Ani Sinha
  0 siblings, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-03-15  6:08 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Gerd Hoffman, Alexander Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Fri, Mar 14, 2025 at 8:47 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> On Fri, Mar 14, 2025 at 03:08:43PM +0100, Gerd Hoffman wrote:
> > If your input firmware image already is an IGVM (say coconut), what is
> > supposed to happen?
>
> The COCONUT igvmbuilder has the ability to take another IGVM file as
> input and add its directive and contents to the output file. This is
> needed for the Hyper-V firmware, which is also provided as an IGVM file.
> I think it can also make changes when processing an input IGVM, like
> changing the VMPL of a VMSA.
>
> This can be extended to cover more use-cases, e.g. like direct-boot of a
> linux kernel with initrd and command-line.

Have we considered how IGVM will be deployed in the FUKI case? I have
always assumed that UKI would be the right means for this. For edk2
for example, we are implementing support in UKI. Hopefully the same
can apply for IGVM.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
@ 2025-02-14 15:34 Ani Sinha
  2025-02-24 15:47 ` Gerd Hoffman
  2025-03-13  9:02 ` Jörg Rödel
  0 siblings, 2 replies; 68+ messages in thread
From: Ani Sinha @ 2025-02-14 15:34 UTC (permalink / raw)
  To: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier
  Cc: Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 32537 bytes --]

VM firmware update is a mechanism where the virtual machines can use their
preferred and trusted firmware image in their execution environment without
having to depend on a untrusted party to provide the firmware bundle. This is
particularly useful for confidential virtual machines that are deployed in the
cloud where the tenant and the cloud provider are two different entities. In
this scenario, virtual machines can bring their own trusted firmware image
bundled as a part of their filesystem (using UKIs for example[1]) and then use
this hypervisor interface to update to their trusted firmware image. This also
allows the guests to have a consistent measurements on the firmware image.

This change introduces support for the fw-cfg based hypervisor interface
and the corresponding device.The change also includes the
specification document for this interface. The interface is made generic
enough so that guests are free to use their own ABI to pass required
information between initial and trusted execution contexts (where they are
running their own trusted firmware image) without the hypervisor getting
involved in between. For pc machines, it implements support for
copying firmware image from the guest source physical address specifiied
by the guest where the guest loaded the next stage firmware.

Currently, this device is only supported for pc machines. Hence, the device
is not initialized for other machine types. Trying to initialize it
for arm for example will lead to failure:

$ ./qemu-system-arm -device vmfwupdate -machine virt
qemu-system-arm: -device vmfwupdate: This machine does not support vmfwupdate device

Functional and qtests has been added to test basic device operations and fw-cfg
files.

[1] See systemd pull requests https://github.com/systemd/systemd/pull/35091
and https://github.com/systemd/systemd/pull/35281 for some discussions on
how we can bundle firmware image within an UKI.

CC: Alex Graf <graf@amazon.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gerd Hoffman <kraxel@redhat.com>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 MAINTAINERS                         |  11 ++
 docs/specs/index.rst                |   1 +
 docs/specs/vmfwupdate.rst           | 119 ++++++++++++++++
 hw/core/machine.c                   |   2 +
 hw/i386/pc.c                        |  55 ++++++++
 hw/i386/x86.c                       |   8 ++
 hw/misc/meson.build                 |   2 +
 hw/misc/vmfwupdate.c                | 208 ++++++++++++++++++++++++++++
 include/hw/boards.h                 |   1 +
 include/hw/misc/vmfwupdate.h        | 101 ++++++++++++++
 tests/functional/meson.build        |   2 +
 tests/functional/test_vmfwupdate.py |  82 +++++++++++
 tests/qtest/meson.build             |   1 +
 tests/qtest/vmfwupdate-test.c       |  67 +++++++++
 14 files changed, 660 insertions(+)
 create mode 100644 docs/specs/vmfwupdate.rst
 create mode 100644 hw/misc/vmfwupdate.c
 create mode 100644 include/hw/misc/vmfwupdate.h
 create mode 100644 tests/functional/test_vmfwupdate.py
 create mode 100644 tests/qtest/vmfwupdate-test.c

changelogs:
v6: Alex's inputs on v5 taken into consideration.
Pipeline is still green:
https://gitlab.com/anisinha/qemu/-/pipelines/1671565480
v5: Alex's input, add qtest and functional test. guest reset full
support for x86. More testing is required but will need support of
DMA based fw-cfw file write capability. This has been added in the
patchset https://patchwork.ozlabs.org/project/qemu-devel/list/?series=441003
Cover letter: 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20250120043847.954881-1-anisinha@redhat.com/
 .
For more comprehensive testing of this patch, I am requesting inputs and
suggestions.

CI pipeline is green - so no regressions.
https://gitlab.com/anisinha/qemu/-/pipelines/1646036807

v4: remove delay in functional test. Not needed now.
v3: inputs from Gerd and Phil taken into account. One basic functional
test added. Spec doc updated as per Gerd's suggestions.
v2: do not allow changing bios region if advertized size is 0 (non-pc
platforms).

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38..4eff7f3c20 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2564,6 +2564,17 @@ F: include/hw/acpi/vmgenid.h
 F: docs/specs/vmgenid.rst
 F: tests/qtest/vmgenid-test.c
 
+VM Firmware Update
+M: Ani Sinha <anisinha@redhat.com>
+M: Alex Graf <graf@amazon.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: hw/misc/vmfwupdate.c
+F: include/hw/misc/vmfwupdate.h
+F: docs/specs/vmfwupdate.rst
+F: tests/qtest/vmfwupdate-test.c
+F: tests/functional/test_vmfwupdate.py
+
 LED
 M: Philippe Mathieu-Daudé <philmd@linaro.org>
 S: Maintained
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index d7675cebc2..8d78b647f2 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -34,6 +34,7 @@ guest hardware that is specific to QEMU.
    virt-ctlr
    vmcoreinfo
    vmgenid
+   vmfwupdate
    rapl-msr
    rocker
    riscv-iommu
diff --git a/docs/specs/vmfwupdate.rst b/docs/specs/vmfwupdate.rst
new file mode 100644
index 0000000000..6e8a8ae32b
--- /dev/null
+++ b/docs/specs/vmfwupdate.rst
@@ -0,0 +1,119 @@
+VMFWUPDATE INTERFACE SPECIFICATION
+##################################
+
+Introduction
+************
+
+``Vmfwupdate`` is an extension to ``fw-cfg`` that allows guests to replace early boot
+code in their virtual machine. Through a combination of vmfwupdate and
+hypervisor stack knowledge, guests can deterministically replace the launch
+payload for guests. This is useful for environments like SEV-SNP where the
+launch payload becomes the launch digest. Guests can use vmfwupdate to provide
+a measured, full guest payload (BIOS image, kernel, initramfs, kernel
+command line) to the virtual machine which enables them to easily reason about
+integrity of the resulting system.
+For more information, please see the `KVM Forum 2024 presentation <KVMFORUM_>`__
+about this work from the authors [1]_.
+
+
+.. _KVMFORUM: https://www.youtube.com/watch?v=VCMBxU6tAto
+
+Base Requirements
+*****************
+
+#. **fw-cfg**:
+     The target system must provide a ``fw-cfg`` interface. For x86 based
+     environments, this ``fw-cfg`` interface must be accessible through PIO ports
+     0x510 and 0x511. The ``fw-cfg`` interface does not need to be announced as part
+     of system device tables such as DSDT. The ``fw-cfg`` interface must support the
+     DMA interface. It may only support the DMA interface for write operations.
+
+#. **BIOS region**:
+     The hypervisor must provide a BIOS region which may be
+     statically sized. Through vmfwupdate, the guest is able to atomically replace
+     its contents. The BIOS region must be mapped as read-write memory. In a
+     SEV-SNP environment, the BIOS region must be mapped as private memory at
+     launch time.
+
+Fw-cfg Files
+************
+
+Guests drive vmfwupdate through special ``fw-cfg`` files that control its flow
+followed by a standard system reset operation. When vmfwupdate is available,
+it provides the following ``fw-cfg`` files:
+
+* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of additional
+  capabilities the interface supports. List of available capabilities:
+
+     ``VMFWUPDATE_CAP_BIOS_RESIZE        0x0000000000000001``
+
+* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the BIOS region.
+  Read-only by default. Optionally Read-write if ``vmfwupdate/cap`` contains
+  ``VMFWUPDATE_CAP_BIOS_RESIZE``. On write, the BIOS region may resize. Guests are
+  required to read the value after writing and compare it with the requested size
+  to determine whether the resize was successful. Note, x86 BIOS regions always
+  start at 4GiB - bios-size.
+
+* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives the BIOS replacement
+  flow. Can be used by the guest to propagate guest physical addresses of payloads
+  to its BIOS stage. It is recommended to make the new BIOS clear this file on boot
+  if it exists. Contents of this file are under control by the hypervisor. In an
+  environment that considers the hypervisor outside of its trust boundary, guests
+  are advised to validate its contents before consumption.
+
+* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is disabled.
+  Returns 0 for enabled, 1 for disabled. Writing any value disables it. Writing is
+  only allowed if the value is 0. When the interface is disabled, the replace file
+  is ignored on reset. This value resets to 0 on system reset.
+
+* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest physical address
+  at the beginning of the replacement BIOS region. The provided payload must reside
+  in shared memory. 0 on system reset.
+
+
+Triggering the Firmware Update
+******************************
+
+To initiate the firmware update process, the guest issues a standard system reset
+operation through any of the means implemented by the machine model.
+
+On reset, the hypervisor evaluates whether ``vmfwupdate/disable`` is ``1``. If it is, it ignores
+any other vmfwupdate values and performs a standard system reset.
+
+If ``vmfwupdate/disable`` is ``0``, the hypervisor checks if bios-addr is ``0``. If it is, it
+performs a standard system reset.
+
+If ``vmfwupdate/bios-addr`` is ``non-0``, the hypervisor replaces the contents of the system’s
+BIOS region with the guest physically contiguous ``vmfwupdate/bios-size`` sized payload at the
+guest physical address address vmfwupdate/bios-addr.
+
+The firmware update mechanism works both for confidential and non-confidential
+guests. In confidential guests, as a part of the reset operation, all existing
+guest shared memory (shared with the hypervisor) as well as the ``vmfwupdate/opaque`` file
+are preserved. The reset causes recreation of the VM context which triggers a fresh
+measurement of the replaced BIOS region and reset CPU state [2]_ .
+For non-confidential guests, there is no concept of guest private memory and all the existing
+guest memory is preserved (this is the default behavior today - QEMU does not reset/clear
+guest memory upon reset).
+
+In both confidential and non-confidential cases, CPU and device state are reset to
+the default hypervisor specific reset states. In confidential environments, the guest
+always resumes operation in the highest privileged mode available to it (VMPL0 in SEV-SNP).
+
+Closing Remarks
+***************
+The handover protocol (format of the ``vmwupdate/opaque`` file etc.) will be implemented by
+the firmware loader and firmware image, both provided by the guest.  The hypervisor does
+not need to know these details, so it is not included in this specification.
+
+
+
+Footnotes:
+^^^^^^^^^^
+.. [1] Original author of the specification: *Alex Graf <graf@amazon.com>*,
+       converted to re-structured-text (rst format) and slightly edited
+       by *Ani Sinha <anisinha@redhat.com>*.
+.. [2] Currently SEV-SNP guests do not support reset. Upon reset, the instance is
+       terminated and a new instance must be created with new VM confidential context.
+       Work is being done currently to support resetting SEV-SNP guests with a new
+       confidential/SEV context after reset.
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 254cc20c4c..35b56916fb 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,6 +34,7 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/misc/vmfwupdate.h"
 #include "audio/audio.h"
 
 GlobalProperty hw_compat_9_2[] = {
@@ -256,6 +257,7 @@ GlobalProperty hw_compat_2_8[] = {
     { "virtio-pci", "x-pcie-pm-init", "off" },
     { "cirrus-vga", "vgamem_mb", "8" },
     { "isa-cirrus-vga", "vgamem_mb", "8" },
+    {TYPE_VMFWUPDATE, "disable", "1"},
 };
 const size_t hw_compat_2_8_len = G_N_ELEMENTS(hw_compat_2_8);
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0eb52d315b..db8792b431 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -53,6 +53,7 @@
 #include "hw/usb.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/net/ne2000-isa.h"
+#include "hw/misc/vmfwupdate.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/virtio/virtio-md-pci.h"
 #include "hw/i386/kvm/xen_overlay.h"
@@ -1719,10 +1720,64 @@ static void pc_machine_initfn(Object *obj)
     qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }
 
+static void handle_vmfwupd_reset(MachineState *machine,
+                                 ResetType type, VMFwUpdateState *vmfw)
+{
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    void *biosmem = memory_region_get_ram_ptr(&x86ms->bios);
+    uint64_t bios_size = memory_region_size(&x86ms->bios);
+
+    if (type != RESET_TYPE_COLD) {
+        return;
+    }
+
+    if (vmfw->disable) {
+        return;
+    }
+
+    if (!vmfw->fw_blob.bios_paddr) {
+        return;
+    }
+
+    if (!vmfw->fw_blob.bios_size) {
+        return;
+    }
+
+    g_assert(!(vmfw->fw_blob.bios_size % 65536));
+    g_assert(vmfw->fw_blob.bios_size <= vmfw->plat_bios_size);
+
+    /*
+     * bios memory region initialization will need to be performed here
+     * if bios_size < vfw->plat_bios_size. We may need to call
+     * memory_region_init_ram() or memory_region_init_ram_guest_memfd()
+     * to initialize a new bios memory region.
+     */
+
+    /*
+     * Read new BIOS from guest RAM into the BIOS region.
+     */
+    cpu_physical_memory_read(vmfw->fw_blob.bios_paddr,
+                             biosmem + bios_size - vmfw->fw_blob.bios_size,
+                             vmfw->fw_blob.bios_size);
+    x86_firmware_configure(0x100000000ULL - vmfw->fw_blob.bios_size,
+                           biosmem, vmfw->fw_blob.bios_size);
+}
+
 static void pc_machine_reset(MachineState *machine, ResetType type)
 {
     CPUState *cs;
     X86CPU *cpu;
+    VMFwUpdateState *vmfw = vmfwupdate_find();
+
+    /*
+     * When vmfwupdate device is present, handle reset actions for
+     * this firmware update device. The reset operations are
+     * defined in the device specification document. See
+     * docs/specs/vmfwupdate.rst.
+     */
+    if (vmfw) {
+        handle_vmfwupd_reset(machine, type, vmfw);
+    }
 
     qemu_devices_reset(type);
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 69bfc00b9a..c00fafab04 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -357,10 +357,18 @@ static int x86_kvm_type(MachineState *ms, const char *vm_type)
     return kvm_enabled() ? kvm_get_vm_type(ms) : 0;
 }
 
+static uint64_t mr_bios_size_x86(MachineState *ms)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    return memory_region_size(&x86ms->bios);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
+    MachineState *ms = MACHINE(obj);
     X86MachineState *x86ms = X86_MACHINE(obj);
 
+    ms->mr_bios_size = mr_bios_size_x86;
     x86ms->smm = ON_OFF_AUTO_AUTO;
     x86ms->acpi = ON_OFF_AUTO_AUTO;
     x86ms->pit = ON_OFF_AUTO_AUTO;
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 55f493521b..e806bf48f5 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -150,6 +150,8 @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
 specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
 specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
 
+specific_ss.add(when: 'CONFIG_FW_CFG_DMA', if_true: files('vmfwupdate.c'))
+
 system_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
 
 # HPPA devices
diff --git a/hw/misc/vmfwupdate.c b/hw/misc/vmfwupdate.c
new file mode 100644
index 0000000000..5007139f8c
--- /dev/null
+++ b/hw/misc/vmfwupdate.c
@@ -0,0 +1,208 @@
+/*
+ * Guest driven VM boot component update device
+ * For details and specification, please look at docs/specs/vmfwupdate.rst.
+ *
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Authors: Ani Sinha <anisinha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "system/reset.h"
+#include "hw/nvram/fw_cfg.h"
+#include "hw/i386/pc.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/vmfwupdate.h"
+#include "qemu/error-report.h"
+
+/*
+ * the following is the list of machines currently
+ * supporting this device.
+ * If a new machine is added in this list, the
+ * corresponding vm/machine reset operations must also
+ * be implemented. Please see pc_machine_reset() ->
+ * handle_vmfwupd_reset() as an example. The reset
+ * implementation must adhere to the device spec.
+ */
+static const char *supported_machines[] = {
+    TYPE_X86_MACHINE,
+    NULL,
+};
+
+static const char *vmfwupdate_supported(void)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    const char **machine = supported_machines;
+    while (*machine) {
+        if (object_dynamic_cast(OBJECT(ms), *machine)) {
+            return *machine;
+        }
+        machine++;
+    }
+    return NULL;
+}
+
+static uint64_t get_bios_size(void)
+{
+    Object *m_obj = qdev_get_machine();
+    MachineState *ms = MACHINE(m_obj);
+
+    if (ms->mr_bios_size) {
+        return ms->mr_bios_size(ms);
+    } else {
+        /*
+         * for machine types and platforms where the call is not implemented,
+         * return 0 for now.
+         */
+        return 0;
+    }
+}
+
+static void fw_blob_write(void *dev, off_t offset, size_t len)
+{
+    VMFwUpdateState *s = VMFWUPDATE(dev);
+
+    /* for non-pc platform, we do not allow changing bios_size yet */
+    if (!s->plat_bios_size) {
+        return;
+    }
+
+    /*
+     * in order to change the bios size, appropriate capability
+     * must be enabled
+     */
+    if (s->fw_blob.bios_size &&
+        !(s->capability & VMFWUPDATE_CAP_BIOS_RESIZE)) {
+        warn_report("vmfwupdate: VMFWUPDATE_CAP_BIOS_RESIZE not enabled");
+        return;
+    }
+
+    /*
+     * For now, we do not let the guest resize the bios size to a value
+     * larger than the size of the memory region that holds the current image.
+     * If the size is larger, we may have to reinitialize the bios
+     * memory region. For pc, see x86_bios_rom_init().
+     */
+    if (s->fw_blob.bios_size > get_bios_size()) {
+        warn_report("vmfwupdate: bios size cannot be larger than %" PRIu64,
+                    get_bios_size());
+        return;
+    }
+
+    s->plat_bios_size = s->fw_blob.bios_size;
+
+    return;
+}
+
+static void vmfwupdate_realize(DeviceState *dev, Error **errp)
+{
+    VMFwUpdateState *s = VMFWUPDATE(dev);
+    FWCfgState *fw_cfg = fw_cfg_find();
+
+    /* multiple devices are not supported */
+    if (!vmfwupdate_find()) {
+        error_setg(errp, "at most one %s device is permitted",
+                   TYPE_VMFWUPDATE);
+        return;
+    }
+
+    /* if current machine is not supported, do not initialize */
+    if (!vmfwupdate_supported()) {
+        error_setg(errp, "This machine does not support vmfwupdate device");
+        return;
+    }
+
+    /* fw_cfg with DMA support is necessary to support this device */
+    if (!fw_cfg || !fw_cfg_dma_enabled(fw_cfg)) {
+        error_setg(errp, "%s device requires fw_cfg",
+                   TYPE_VMFWUPDATE);
+        return;
+    }
+
+    /*
+     * If the device is disabled on purpose, do not initialize.
+     * Old machines like pc-i440fx-2.8 does not have enough fw-cfg slots
+     * and hence this device is disabled for those machines.
+     */
+    if (s->disable) {
+        info_report("vmfwupdate device is disabled on the command-line");
+        return;
+    }
+
+    memset(&s->fw_blob, 0, sizeof(s->fw_blob));
+    memset(&s->opaque_blobs, 0, sizeof(s->opaque_blobs));
+
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_OBLOB,
+                             NULL, NULL, s,
+                             &s->opaque_blobs,
+                             sizeof(s->opaque_blobs),
+                             false);
+
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_FWBLOB,
+                             NULL, fw_blob_write, s,
+                             &s->fw_blob,
+                             sizeof(s->fw_blob),
+                             false);
+
+    /*
+     * Add global capability fw_cfg file. This will be used by the guest to
+     * check capability of the hypervisor.
+     * We do not allow the guest to change bios size for now.
+     * Effectively all capability bits are cleared for now.
+     */
+    memset(&s->capability, 0, sizeof(s->capability));
+
+    fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_CAP,
+                    &s->capability, sizeof(s->capability));
+
+    s->plat_bios_size = get_bios_size(); /* for non-pc, this is 0 */
+    /* size of bios region for the platform - read only by the guest */
+    fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_BIOS_SIZE,
+                    &s->plat_bios_size, sizeof(s->plat_bios_size));
+    /*
+     * add fw cfg control file to disable the hypervisor interface.
+     */
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_CONTROL,
+                             NULL, NULL, s,
+                             &s->disable,
+                             sizeof(s->disable),
+                             false);
+    /*
+     * This device requires to register a global reset because it is
+     * not plugged to a bus (which, as its QOM parent, would reset it).
+     */
+    qemu_register_resettable(OBJECT(s));
+}
+
+static const Property vmfwupdate_properties[] = {
+    DEFINE_PROP_UINT8("disable", VMFwUpdateState, disable, 0),
+};
+
+static void vmfwupdate_device_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    /* we are not interested in migration - so no need to populate dc->vmsd */
+    dc->desc = "VM firmware update device";
+    dc->realize = vmfwupdate_realize;
+    dc->hotpluggable = false;
+    device_class_set_props(dc, vmfwupdate_properties);
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vmfwupdate_device_types[] = {
+    {
+        .name          = TYPE_VMFWUPDATE,
+        .parent        = TYPE_DEVICE,
+        .instance_size = sizeof(VMFwUpdateState),
+        .class_init    = vmfwupdate_device_class_init,
+    },
+};
+
+DEFINE_TYPES(vmfwupdate_device_types)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index e1f41b2a53..eaba4aedf9 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -441,6 +441,7 @@ struct MachineState {
     SmpCache smp_cache;
     struct NVDIMMState *nvdimms_state;
     struct NumaState *numa_state;
+    uint64_t (*mr_bios_size)(MachineState *ms);
 };
 
 /*
diff --git a/include/hw/misc/vmfwupdate.h b/include/hw/misc/vmfwupdate.h
new file mode 100644
index 0000000000..2e1b9b9ba1
--- /dev/null
+++ b/include/hw/misc/vmfwupdate.h
@@ -0,0 +1,101 @@
+/*
+ * Guest driven VM boot component update device
+ * For details and specification, please look at docs/specs/vmfwupdate.rst.
+ *
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Authors: Ani Sinha <anisinha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef VMFWUPDATE_H
+#define VMFWUPDATE_H
+
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+#include "qemu/units.h"
+
+#define TYPE_VMFWUPDATE "vmfwupdate"
+
+#define VMFWUPDATE_CAP_BIOS_RESIZE 0x01 /* guests may resize bios region */
+
+#define VMFWUPDATE_OPAQUE_SIZE (4 * KiB) /* PAGE_SIZE */
+
+/* fw_cfg file definitions */
+#define FILE_VMFWUPDATE_OBLOB "etc/vmfwupdate/opaque-blob"
+#define FILE_VMFWUPDATE_FWBLOB "etc/vmfwupdate/fw-blob"
+#define FILE_VMFWUPDATE_CAP "etc/vmfwupdate/cap"
+#define FILE_VMFWUPDATE_BIOS_SIZE "etc/vmfwupdate/bios-size"
+#define FILE_VMFWUPDATE_CONTROL "etc/vmfwupdate/disable"
+
+/*
+ * Address and length of the guest provided firmware blob.
+ * The blob itself is passed using the guest shared memory to QEMU.
+ * This is then copied to the guest private memeory in the secure vm
+ * by the hypervisor.
+ */
+typedef struct {
+    uint64_t bios_size; /*
+                         * this is used by the guest to update plat_bios_size
+                         * when VMFWUPDATE_CAP_BIOS_RESIZE is set.
+                         */
+    uint64_t bios_paddr; /*
+                          * starting gpa where the blob is in shared guest
+                          * memory. Cleared upon system reset.
+                          */
+} VMFwUpdateFwBlob;
+
+typedef struct VMFwUpdateState {
+    DeviceState parent_obj;
+
+    /*
+     * capabilities - 64 bits.
+     * Little endian format.
+     */
+    uint64_t capability;
+
+    /*
+     * size of the bios region - architecture dependent.
+     * Read-only by the guest unless VMFWUPDATE_CAP_BIOS_RESIZE
+     * capability is set.
+     */
+    uint64_t plat_bios_size;
+
+    /*
+     * disable - disables the interface when non-zero value is written to it.
+     * Writing 0 to this file enables the interface.
+     */
+    uint8_t disable;
+
+    /*
+     * The first stage boot uses this opaque blob to convey to the next stage
+     * where the next stage components are loaded. The exact structure and
+     * number of entries are unknown to the hypervisor and the hypervisor
+     * does not touch this memory or does any validations. The contents of
+     * this memory survives a vm reset.
+     * The contents of this memory needs to be validated by the guest and
+     * must be ABI compatible between the first and second boot stages of
+     * the guest.
+     */
+    unsigned char opaque_blobs[VMFWUPDATE_OPAQUE_SIZE];
+
+    /*
+     * firmware blob addresses and sizes. These are moved to guest
+     * private memory.
+     */
+    VMFwUpdateFwBlob fw_blob;
+} VMFwUpdateState;
+
+OBJECT_DECLARE_SIMPLE_TYPE(VMFwUpdateState, VMFWUPDATE);
+
+/* returns NULL unless there is exactly one device */
+static inline VMFwUpdateState *vmfwupdate_find(void)
+{
+    Object *o = object_resolve_path_type("", TYPE_VMFWUPDATE, NULL);
+
+    return o ? VMFWUPDATE(o) : NULL;
+}
+
+#endif
diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index b516d21cba..5e455f2e03 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -80,6 +80,7 @@ tests_aarch64_system_thorough = [
   'aarch64_xen',
   'aarch64_xlnx_versal',
   'multiprocess',
+  'vmfwupdate',
 ]
 
 tests_alpha_system_quick = [
@@ -277,6 +278,7 @@ tests_x86_64_system_quick = [
   'virtio_version',
   'x86_cpu_model_versions',
   'vnc',
+  'vmfwupdate',
 ]
 
 tests_x86_64_system_thorough = [
diff --git a/tests/functional/test_vmfwupdate.py b/tests/functional/test_vmfwupdate.py
new file mode 100644
index 0000000000..13eefd3d75
--- /dev/null
+++ b/tests/functional/test_vmfwupdate.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+#
+# Check for vmfwupdate device.
+#
+# Copyright (c) 2025 Red Hat, Inc.
+#
+# Author:
+#  Ani Sinha <anisinha@redhat.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+from qemu_test import QemuSystemTest
+import time
+
+class VmFwUpdateDeviceCheck(QemuSystemTest):
+    DELAY_BOOT_SEQUENCE = 1
+
+    def test_vmfwupdate_pass(self):
+        """
+        Basic test to make sure vmfwupdate device can be instantiated.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fwupd1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        time.sleep(self.DELAY_BOOT_SEQUENCE)
+        self.vm.shutdown()
+        self.assertEqual(self.vm.exitcode(), 0, "QEMU exit code should be 0")
+
+    def test_vmfwupdate_disabled(self):
+        """
+        Basic test to make sure vmfwupdate device can be instantiated.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fwupd,disable=1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        time.sleep(self.DELAY_BOOT_SEQUENCE)
+        self.vm.shutdown()
+        self.assertRegex(self.vm.get_log(),
+                         r'vmfwupdate device is disabled on the command-line')
+        self.assertEqual(self.vm.exitcode(), 0, "QEMU exit code should be 0")
+
+    def test_multiple_device_fail(self):
+        """
+        Only one vmfwdevice can be instantiated. Ensure failure if
+        user tries to create more than one device.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fw1',
+                         '-device', 'vmfwupdate,id=fw2')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        self.vm.wait()
+        self.assertEqual(self.vm.exitcode(), 1, "QEMU exit code should be 1")
+        self.assertRegex(self.vm.get_log(),
+                         r'at most one vmfwupdate device is permitted')
+
+    def aarch64_fail_test(self):
+        """
+        Currently the device is only supported for pc platforms.
+        """
+        if self.arch != 'aarch64':
+            return
+
+        self.vm.add_args('-machine', 'virt', '-device',
+                         'vmfwupdate,id=fwupd1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        self.vm.wait()
+        self.assertEqual(self.vm.exitcode(), 1, "QEMU exit code should be 1")
+        self.assertRegex(self.vm.get_log(),
+                         r'This machine does not support vmfwupdate device')
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 68316dbdc1..2b82fb2ea8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -58,6 +58,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_FDC_ISA') ? ['fdc-test'] : []) +                      \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['fw_cfg-test'] : []) +                    \
   (config_all_devices.has_key('CONFIG_FW_CFG_DMA') ? ['vmcoreinfo-test'] : []) +            \
+  (config_all_devices.has_key('CONFIG_Q35') ? ['vmfwupdate-test'] : []) +                   \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['i440fx-test'] : []) +                    \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['ide-test'] : []) +                       \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['numa-test'] : []) +                      \
diff --git a/tests/qtest/vmfwupdate-test.c b/tests/qtest/vmfwupdate-test.c
new file mode 100644
index 0000000000..9bbe0bf560
--- /dev/null
+++ b/tests/qtest/vmfwupdate-test.c
@@ -0,0 +1,67 @@
+/*
+ * vmfwupdate device fwcfg test.
+ *
+ * Copyright (c) 2025 Red Hat, Inc.
+ *
+ * Author:
+ *   Ani Sinha <anisinha@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqtest.h"
+#include "standard-headers/linux/qemu_fw_cfg.h"
+#include "libqos/fw_cfg.h"
+#include "qemu/bswap.h"
+#include "hw/misc/vmfwupdate.h"
+
+static void test_vmfwupdate_capability(void)
+{
+    QFWCFG *fw_cfg;
+    QTestState *s;
+    uint64_t capability = 0;
+    size_t filesize;
+
+    s = qtest_init("-device vmfwupdate");
+    fw_cfg = pc_fw_cfg_init(s);
+
+    filesize = qfw_cfg_get_file(fw_cfg, FILE_VMFWUPDATE_CAP,
+                                &capability, sizeof(capability));
+    g_assert_cmpint(filesize, ==, sizeof(capability));
+    capability = le64_to_cpu(capability);
+    g_assert_cmpint(capability, ==, 0);
+    pc_fw_cfg_uninit(fw_cfg);
+    qtest_quit(s);
+}
+
+static void test_vmfwupdate_bios_size(void)
+{
+    QFWCFG *fw_cfg;
+    QTestState *s;
+    uint64_t bios_size = 0;
+    size_t filesize;
+
+    s = qtest_init("-device vmfwupdate");
+    fw_cfg = pc_fw_cfg_init(s);
+
+    filesize = qfw_cfg_get_file(fw_cfg, FILE_VMFWUPDATE_BIOS_SIZE,
+                                &bios_size, sizeof(bios_size));
+    g_assert_cmpint(filesize, ==, sizeof(bios_size));
+    bios_size = le64_to_cpu(bios_size);
+    fprintf(stderr, "bios_size: %" PRIu64 "\n", bios_size);
+    g_assert_cmpint(bios_size, !=, 0);
+    pc_fw_cfg_uninit(fw_cfg);
+    qtest_quit(s);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_func("vmfwupdate/cap", test_vmfwupdate_capability);
+    qtest_add_func("vmfwupdate/bios_size", test_vmfwupdate_bios_size);
+
+    return g_test_run();
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-14 15:34 Ani Sinha
@ 2025-02-24 15:47 ` Gerd Hoffman
  2025-02-25  5:21   ` Ani Sinha
  2025-02-25 10:28   ` Igor Mammedov
  2025-03-13  9:02 ` Jörg Rödel
  1 sibling, 2 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-02-24 15:47 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> VM firmware update is a mechanism where the virtual machines can use their
> preferred and trusted firmware image in their execution environment without
> having to depend on a untrusted party to provide the firmware bundle. This is
> particularly useful for confidential virtual machines that are deployed in the
> cloud where the tenant and the cloud provider are two different entities. In
> this scenario, virtual machines can bring their own trusted firmware image
> bundled as a part of their filesystem (using UKIs for example[1]) and then use
> this hypervisor interface to update to their trusted firmware image. This also
> allows the guests to have a consistent measurements on the firmware image.

Works nicely for me.  Test case:
  https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi

> +Fw-cfg Files
> +************
> +
> +Guests drive vmfwupdate through special ``fw-cfg`` files that control its flow
> +followed by a standard system reset operation. When vmfwupdate is available,
> +it provides the following ``fw-cfg`` files:
> +
> +* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of additional
> +* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the BIOS region.
> +* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives the BIOS replacement
> +* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is disabled.
> +* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest physical address

This is out of sync with the actual code (vmfwupdate/bios-addr does not
exist there).

Also this adds five fw_cfg files.  Given that the number of fw_cfg files
we can have is limited I'm not convinced this is the best idea to move
forward.

Alternatives I see:

 (1) Place everything in a single fw_cfg file.  ramfb does this for
     example, the guest writes a struct with a bunch of values.

 (2) Go for a mmio register interface.  The EFI variable store I'm
     working on does this.  fw_cfg is used for hardware discovery,
     via etc/hardware-info file (which can carry entries for multiple
     devices).

See https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kraxel@redhat.com/
and https://lore.kernel.org/qemu-devel/20250219071431.50626-21-kraxel@redhat.com/
(v4 has a double-free bug in patch #1 which will be fixed in v5 of the
series).

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-24 15:47 ` Gerd Hoffman
@ 2025-02-25  5:21   ` Ani Sinha
  2025-02-25  8:39     ` Gerd Hoffman
  2025-02-25 10:28   ` Igor Mammedov
  1 sibling, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-02-25  5:21 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Mon, Feb 24, 2025 at 9:17 PM Gerd Hoffman <kraxel@redhat.com> wrote:
>
> On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> > VM firmware update is a mechanism where the virtual machines can use their
> > preferred and trusted firmware image in their execution environment without
> > having to depend on a untrusted party to provide the firmware bundle. This is
> > particularly useful for confidential virtual machines that are deployed in the
> > cloud where the tenant and the cloud provider are two different entities. In
> > this scenario, virtual machines can bring their own trusted firmware image
> > bundled as a part of their filesystem (using UKIs for example[1]) and then use
> > this hypervisor interface to update to their trusted firmware image. This also
> > allows the guests to have a consistent measurements on the firmware image.
>
> Works nicely for me.  Test case:
>   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi

yeah if I can't get my unit test working we can make this an
integration test. or best case scenario, we can have both.

>
> > +Fw-cfg Files
> > +************
> > +
> > +Guests drive vmfwupdate through special ``fw-cfg`` files that control its flow
> > +followed by a standard system reset operation. When vmfwupdate is available,
> > +it provides the following ``fw-cfg`` files:
> > +
> > +* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of additional
> > +* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the BIOS region.
> > +* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives the BIOS replacement
> > +* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is disabled.
> > +* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest physical address
>
> This is out of sync with the actual code (vmfwupdate/bios-addr does not
> exist there).

The actual implementation detail can vary slightly with the spec no? I
will try to bring them as close as I can.

>
> Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> we can have is limited I'm not convinced this is the best idea to move
> forward.

Right, For implementation, I suggest we combine FILE_VMFWUPDATE_OBLOB
and FILE_VMFWUPDATE_FWBLOB together and also make
FILE_VMFWUPDATE_CONTROL part of the same structure. These are all
writable by the guest so makes sense to have one fwcfg for it. We can
have another read-only fwcfg for FILE_VMFWUPDATE_BIOS_SIZE and
FILE_VMFWUPDATE_CAP.

Alex, what do you think?

>
> Alternatives I see:
>
>  (1) Place everything in a single fw_cfg file.  ramfb does this for
>      example, the guest writes a struct with a bunch of values.
>
>  (2) Go for a mmio register interface.  The EFI variable store I'm
>      working on does this.  fw_cfg is used for hardware discovery,
>      via etc/hardware-info file (which can carry entries for multiple
>      devices).
>
> See https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kraxel@redhat.com/
> and https://lore.kernel.org/qemu-devel/20250219071431.50626-21-kraxel@redhat.com/
> (v4 has a double-free bug in patch #1 which will be fixed in v5 of the
> series).
>
> take care,
>   Gerd
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-25  5:21   ` Ani Sinha
@ 2025-02-25  8:39     ` Gerd Hoffman
  2025-02-25  9:54       ` Ani Sinha
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-02-25  8:39 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Tue, Feb 25, 2025 at 10:51:08AM +0530, Ani Sinha wrote:
> On Mon, Feb 24, 2025 at 9:17 PM Gerd Hoffman <kraxel@redhat.com> wrote:
> >
> > Works nicely for me.  Test case:
> >   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> 
> yeah if I can't get my unit test working we can make this an
> integration test. or best case scenario, we can have both.

Do you have a branch with your unit test somewhere?

> > Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> > we can have is limited I'm not convinced this is the best idea to move
> > forward.
> 
> Right, For implementation, I suggest we combine FILE_VMFWUPDATE_OBLOB
> and FILE_VMFWUPDATE_FWBLOB together and also make
> FILE_VMFWUPDATE_CONTROL part of the same structure. These are all
> writable by the guest so makes sense to have one fwcfg for it. We can
> have another read-only fwcfg for FILE_VMFWUPDATE_BIOS_SIZE and
> FILE_VMFWUPDATE_CAP.

I'd prefer to put everything into one file.  Maybe except the opaque
blob page.  A struct along the lines of:

struct vmfwupdate {
    u64 capabilities;   // 'cap' file
    u64 firmware_size;  // 'bios-size' file
    u64 control;        // disable bit etc. 
    u64 update_paddr;   // 'fw-blob' file, paddr field
    u64 update_size;    // 'fw-blob' file, size field
}

On the host side you'll need two copies of the struct then: one where
the guest can read from and write to, and one shadow struct where the
actual values are stored.  The write callback goes sanity-check the
guest-written data, takes everything which passes into the shadow
struct, finally goes copy back the shadow struct to the guest struct
so the guest can see what the host has accepted.

Part of the verification process can be that you already copy the
firmware to a host buffer.  Best using dma_* functions, these return
errors in case something went wrong (say the paddr passed is not backed
by guest ram).  Which gives you the chance to propagate those errors
back to the guest before it'll actually try to launch the updated
firmware via reset.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-25  8:39     ` Gerd Hoffman
@ 2025-02-25  9:54       ` Ani Sinha
  2025-02-25 10:23         ` Gerd Hoffman
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-02-25  9:54 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

On Tue, Feb 25, 2025 at 2:09 PM Gerd Hoffman <kraxel@redhat.com> wrote:
>
> On Tue, Feb 25, 2025 at 10:51:08AM +0530, Ani Sinha wrote:
> > On Mon, Feb 24, 2025 at 9:17 PM Gerd Hoffman <kraxel@redhat.com> wrote:
> > >
> > > Works nicely for me.  Test case:
> > >   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> >
> > yeah if I can't get my unit test working we can make this an
> > integration test. or best case scenario, we can have both.
>
> Do you have a branch with your unit test somewhere?
>
> > > Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> > > we can have is limited I'm not convinced this is the best idea to move
> > > forward.
> >
> > Right, For implementation, I suggest we combine FILE_VMFWUPDATE_OBLOB
> > and FILE_VMFWUPDATE_FWBLOB together and also make
> > FILE_VMFWUPDATE_CONTROL part of the same structure. These are all
> > writable by the guest so makes sense to have one fwcfg for it. We can
> > have another read-only fwcfg for FILE_VMFWUPDATE_BIOS_SIZE and
> > FILE_VMFWUPDATE_CAP.
>
> I'd prefer to put everything into one file.  Maybe except the opaque
> blob page.  A struct along the lines of:
>
> struct vmfwupdate {
>     u64 capabilities;   // 'cap' file
>     u64 firmware_size;  // 'bios-size' file
>     u64 control;        // disable bit etc.
>     u64 update_paddr;   // 'fw-blob' file, paddr field
>     u64 update_size;    // 'fw-blob' file, size field
> }
>
> On the host side you'll need two copies of the struct then: one where
> the guest can read from and write to, and one shadow struct where the
> actual values are stored.  The write callback goes sanity-check the
> guest-written data, takes everything which passes into the shadow
> struct, finally goes copy back the shadow struct to the guest struct
> so the guest can see what the host has accepted.
>
> Part of the verification process can be that you already copy the
> firmware to a host buffer.

I think we decided early on that we would not want to do that - that
is consume extra memory on the host side for boot components.
right alex?

Best using dma_* functions, these return
> errors in case something went wrong (say the paddr passed is not backed
> by guest ram).  Which gives you the chance to propagate those errors
> back to the guest before it'll actually try to launch the updated
> firmware via reset.
>
> take care,
>   Gerd
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-25  9:54       ` Ani Sinha
@ 2025-02-25 10:23         ` Gerd Hoffman
  0 siblings, 0 replies; 68+ messages in thread
From: Gerd Hoffman @ 2025-02-25 10:23 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Igor Mammedov, Vitaly Kuznetsov, qemu-devel

  Hi,

> > Part of the verification process can be that you already copy the
> > firmware to a host buffer.
> 
> I think we decided early on that we would not want to do that - that
> is consume extra memory on the host side for boot components.

Fine with me, was just an idea, certainly not critical.  Just have qemu
log errors is probably good enough.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-24 15:47 ` Gerd Hoffman
  2025-02-25  5:21   ` Ani Sinha
@ 2025-02-25 10:28   ` Igor Mammedov
  2025-02-25 11:00     ` Gerd Hoffman
  1 sibling, 1 reply; 68+ messages in thread
From: Igor Mammedov @ 2025-02-25 10:28 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Vitaly Kuznetsov, qemu-devel

On Mon, 24 Feb 2025 16:47:31 +0100
Gerd Hoffman <kraxel@redhat.com> wrote:

> On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> > VM firmware update is a mechanism where the virtual machines can use their
> > preferred and trusted firmware image in their execution environment without
> > having to depend on a untrusted party to provide the firmware bundle. This is
> > particularly useful for confidential virtual machines that are deployed in the
> > cloud where the tenant and the cloud provider are two different entities. In
> > this scenario, virtual machines can bring their own trusted firmware image
> > bundled as a part of their filesystem (using UKIs for example[1]) and then use
> > this hypervisor interface to update to their trusted firmware image. This also
> > allows the guests to have a consistent measurements on the firmware image.  
> 
> Works nicely for me.  Test case:
>   https://kraxel.gitlab.io/uefi-tools-rs/seabios.efi
> 
> > +Fw-cfg Files
> > +************
> > +
> > +Guests drive vmfwupdate through special ``fw-cfg`` files that control its flow
> > +followed by a standard system reset operation. When vmfwupdate is available,
> > +it provides the following ``fw-cfg`` files:
> > +
> > +* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of additional
> > +* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the BIOS region.
> > +* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives the BIOS replacement
> > +* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is disabled.
> > +* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest physical address  
> 
> This is out of sync with the actual code (vmfwupdate/bios-addr does not
> exist there).
> 
> Also this adds five fw_cfg files.  Given that the number of fw_cfg files
> we can have is limited I'm not convinced this is the best idea to move
> forward.
> 
> Alternatives I see:
> 
>  (1) Place everything in a single fw_cfg file.  ramfb does this for
>      example, the guest writes a struct with a bunch of values.
> 
>  (2) Go for a mmio register interface.  The EFI variable store I'm
>      working on does this.  fw_cfg is used for hardware discovery,
>      via etc/hardware-info file (which can carry entries for multiple
>      devices).
> 
> See https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kraxel@redhat.com/

After looking at it, it seems to me that data will be in host byte order
and guest has no idea what that is.
Probably it should advertise byteorder as part of the structure,
and guest side should then do conversion if necessary.  

> and https://lore.kernel.org/qemu-devel/20250219071431.50626-21-kraxel@redhat.com/
> (v4 has a double-free bug in patch #1 which will be fixed in v5 of the
> series).
> 
> take care,
>   Gerd
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-25 10:28   ` Igor Mammedov
@ 2025-02-25 11:00     ` Gerd Hoffman
  2025-02-25 11:33       ` Igor Mammedov
  0 siblings, 1 reply; 68+ messages in thread
From: Gerd Hoffman @ 2025-02-25 11:00 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Vitaly Kuznetsov, qemu-devel

  Hi,

> > See https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kraxel@redhat.com/
> 
> After looking at it, it seems to me that data will be in host byte order
> and guest has no idea what that is.
> Probably it should advertise byteorder as part of the structure,
> and guest side should then do conversion if necessary.  

The structure is already defined and in use by edk2, so changing it is
not an option.  Guest is little endian (there is no big endian UEFI).
So simply declaring the struct fields being little endian byte order
and convert them if needed on the host side makes alot more sense here.
I'll fix the patch accordingly for v5.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-25 11:00     ` Gerd Hoffman
@ 2025-02-25 11:33       ` Igor Mammedov
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Mammedov @ 2025-02-25 11:33 UTC (permalink / raw)
  To: Gerd Hoffman
  Cc: Ani Sinha, Alex Graf, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Vitaly Kuznetsov, qemu-devel

On Tue, 25 Feb 2025 12:00:12 +0100
Gerd Hoffman <kraxel@redhat.com> wrote:

>   Hi,
> 
> > > See https://lore.kernel.org/qemu-devel/20250219071431.50626-2-kraxel@redhat.com/  
> > 
> > After looking at it, it seems to me that data will be in host byte order
> > and guest has no idea what that is.
> > Probably it should advertise byteorder as part of the structure,
> > and guest side should then do conversion if necessary.    
> 
> The structure is already defined and in use by edk2, so changing it is
> not an option.  Guest is little endian (there is no big endian UEFI).
> So simply declaring the struct fields being little endian byte order
> and convert them if needed on the host side makes alot more sense here.
> I'll fix the patch accordingly for v5.

given it's memcpy of blob it would be hard to do so in hardware_info_register(),
doing it in caller is error prone/constant source of bugs
(that was the case for ACPI tables until endianness conversion
was hidden behind API).

But if it's small number of places perhaps it's ok to do
conversion in the caller.

or alternatively instead of structure, build data as
a table, similar to aml_append(aml_int()) in ACPI,
which will take care of conversion.
Ugly? yes, but it takes care of endianness bugs,
as practice showed up.

PS:
if there is a doc for etc/hardware-info, then describe it
there as little-endian (so reader doesn't have to deduce
byte-order, it's ABI after all)
  
> 
> take care,
>   Gerd
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-02-14 15:34 Ani Sinha
  2025-02-24 15:47 ` Gerd Hoffman
@ 2025-03-13  9:02 ` Jörg Rödel
  2025-03-13  9:37   ` Ani Sinha
  1 sibling, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13  9:02 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

Hi Ani,

On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> VM firmware update is a mechanism where the virtual machines can use their
> preferred and trusted firmware image in their execution environment without
> having to depend on a untrusted party to provide the firmware bundle. This is
> particularly useful for confidential virtual machines that are deployed in the
> cloud where the tenant and the cloud provider are two different entities. In
> this scenario, virtual machines can bring their own trusted firmware image
> bundled as a part of their filesystem (using UKIs for example[1]) and then use
> this hypervisor interface to update to their trusted firmware image. This also
> allows the guests to have a consistent measurements on the firmware image.

We discussed the implications of the vmfwupdate mechanism as currently
proposed in yesterdays SVSM Development Call[1]. The reason this came to my
attention was a request to support a non-IGVM boot protocol in
COCONUT-SVSM, so I started to look into the vmfwupdate.

I do not claim to have a full picture yet, but what I'd like to better
understand is how the proposed update mechanism plans to guarantee
predictable launch measurements for confidential VMs, especially since
the measurements depend on the exact order of setup calls for the TEE
and data the vmfwupdate mechanism can currently not pass on to the
state-after-reset. Can you please share some details on that?

Regards,

	Joerg

[1] https://github.com/joergroedel/svsm-governance/blob/main/Meetings/devel-call-2025-03-12.md

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13  9:02 ` Jörg Rödel
@ 2025-03-13  9:37   ` Ani Sinha
  2025-03-13 10:10     ` Jörg Rödel
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-13  9:37 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 2:40 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> Hi Ani,
>
> On Fri, Feb 14, 2025 at 09:04:07PM +0530, Ani Sinha wrote:
> > VM firmware update is a mechanism where the virtual machines can use their
> > preferred and trusted firmware image in their execution environment without
> > having to depend on a untrusted party to provide the firmware bundle. This is
> > particularly useful for confidential virtual machines that are deployed in the
> > cloud where the tenant and the cloud provider are two different entities. In
> > this scenario, virtual machines can bring their own trusted firmware image
> > bundled as a part of their filesystem (using UKIs for example[1]) and then use
> > this hypervisor interface to update to their trusted firmware image. This also
> > allows the guests to have a consistent measurements on the firmware image.
>
> We discussed the implications of the vmfwupdate mechanism as currently
> proposed in yesterdays SVSM Development Call[1]. The reason this came to my
> attention was a request to support a non-IGVM boot protocol in
> COCONUT-SVSM, so I started to look into the vmfwupdate.
>
> I do not claim to have a full picture yet, but what I'd like to better
> understand is how the proposed update mechanism plans to guarantee
> predictable launch measurements for confidential VMs, especially since
> the measurements depend on the exact order of setup calls for the TEE
> and data the vmfwupdate mechanism can currently not pass on to the
> state-after-reset. Can you please share some details on that?

The state before reset is the state that uses stock firmware from the
hyperscaler. The state after reset is a fresh new state that uses the
"trusted and known firmware" from the end user. So the launch
measurements would not match between the state before reset and the
state after reset and there is no guarantee that there would be
"predictable launch measurements" across the reset.
What we do guarantee is that after reset, the launch measurements that
include the "trusted and known firmware" (whatever that is, not
necessarily edk2), is understood and expected. If you were to
calculate offline the measurements that include this "trusted and
known firmware" using the same order of setup calls as the target
system and then derive the launch digest, it should match that of what
the hardware would produce in the target.
I hope I understood your question correctly. Otherwise Alex can
perhaps explain better.


>
> Regards,
>
>         Joerg
>
> [1] https://github.com/joergroedel/svsm-governance/blob/main/Meetings/devel-call-2025-03-12.md
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13  9:37   ` Ani Sinha
@ 2025-03-13 10:10     ` Jörg Rödel
  2025-03-13 10:32       ` Ani Sinha
  0 siblings, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 10:10 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

Hi Ani,

On Thu, Mar 13, 2025 at 03:07:42PM +0530, Ani Sinha wrote:
> The state before reset is the state that uses stock firmware from the
> hyperscaler. The state after reset is a fresh new state that uses the
> "trusted and known firmware" from the end user. So the launch
> measurements would not match between the state before reset and the
> state after reset and there is no guarantee that there would be
> "predictable launch measurements" across the reset.

Right, I understand that the state before and after reset will have
different launch measurements, that is expected when booting with a
different firmware :)

> What we do guarantee is that after reset, the launch measurements that
> include the "trusted and known firmware" (whatever that is, not
> necessarily edk2), is understood and expected. If you were to
> calculate offline the measurements that include this "trusted and
> known firmware" using the same order of setup calls as the target
> system and then derive the launch digest, it should match that of what
> the hardware would produce in the target.

The devil lies in "same order of setup calls". Without a way to define
this order through the vmfwupdate interface there is a lot of implicit
knowledge required about how KVM/QEMU setup the TEE context to calculate
the expected after-reset launch measurement. Even worse, the exact way
this setup is done then becomes ABI, because any change in this process
will lead to a different launch measurement.

Some examples of initial memory layout calls which influence the launch
measurment:

	1) Launch VMSA(s) (SEV-SNP only, though I believe on TDX the
	   initial register state can also be changed to some
	   degree).
	2) Pre-Validated/Accepted memory regions (TDX and SEV-SNP). This
	   is especially important, as different FWs have different
	   requirements on what memory is pre-validated, zeroed, etc.
	3) Zero-pages, measured and unmeasured (TDX and SEV-SNP).
	4) Position of the CPUID page, secrets page, and id-blob
	   (SEV-SNP).
	5) Pre-populated data (TDX and SEV-SNP).

If I understand the vmfwupdate interface correctly, and please let me
know if I am wrong here, it only allows to specify a call for part 5) of
the above list. Some of the other parts can be specified in architecture
dependent ways in the FW image itself, but not all of them.

So the question is, is the plan to hardcode everything else (including
the order of calls) and make the behavior ABI?

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 10:10     ` Jörg Rödel
@ 2025-03-13 10:32       ` Ani Sinha
  2025-03-13 10:59         ` Jörg Rödel
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-13 10:32 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 3:40 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> Hi Ani,
>
> On Thu, Mar 13, 2025 at 03:07:42PM +0530, Ani Sinha wrote:
> > The state before reset is the state that uses stock firmware from the
> > hyperscaler. The state after reset is a fresh new state that uses the
> > "trusted and known firmware" from the end user. So the launch
> > measurements would not match between the state before reset and the
> > state after reset and there is no guarantee that there would be
> > "predictable launch measurements" across the reset.
>
> Right, I understand that the state before and after reset will have
> different launch measurements, that is expected when booting with a
> different firmware :)
>
> > What we do guarantee is that after reset, the launch measurements that
> > include the "trusted and known firmware" (whatever that is, not
> > necessarily edk2), is understood and expected. If you were to
> > calculate offline the measurements that include this "trusted and
> > known firmware" using the same order of setup calls as the target
> > system and then derive the launch digest, it should match that of what
> > the hardware would produce in the target.
>
> The devil lies in "same order of setup calls". Without a way to define
> this order through the vmfwupdate interface there is a lot of implicit
> knowledge required about how KVM/QEMU setup the TEE context to calculate
> the expected after-reset launch measurement. Even worse, the exact way
> this setup is done then becomes ABI, because any change in this process
> will lead to a different launch measurement.

Forgive my ignorance, today none of the hyperscalers use IGVM to
define this. How then the expected launch measurement is derived?
Our vmfwupdate interface does not make any attempt to define this in
any way or make this an ABI. Its intentionally open ended. It's really
up to the guest to set this up any way they wish. This is the same
when the guest provides the igvm file that defines the above.

>
> Some examples of initial memory layout calls which influence the launch
> measurment:
>
>         1) Launch VMSA(s) (SEV-SNP only, though I believe on TDX the
>            initial register state can also be changed to some
>            degree).
>         2) Pre-Validated/Accepted memory regions (TDX and SEV-SNP). This
>            is especially important, as different FWs have different
>            requirements on what memory is pre-validated, zeroed, etc.
>         3) Zero-pages, measured and unmeasured (TDX and SEV-SNP).
>         4) Position of the CPUID page, secrets page, and id-blob
>            (SEV-SNP).
>         5) Pre-populated data (TDX and SEV-SNP).
>
> If I understand the vmfwupdate interface correctly, and please let me
> know if I am wrong here, it only allows to specify a call for part 5) of
> the above list. Some of the other parts can be specified in architecture
> dependent ways in the FW image itself, but not all of them.
>
> So the question is, is the plan to hardcode everything else (including
> the order of calls) and make the behavior ABI?
>
> Regards,
>
>         Joerg
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 10:32       ` Ani Sinha
@ 2025-03-13 10:59         ` Jörg Rödel
  2025-03-13 11:09           ` Ani Sinha
  0 siblings, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 10:59 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

Hi Ani,

Don't get me wrong, I really like the general idea of vmfwupdate as
a way to implement BYOFW, and for non-confidential VMs it is great. I
just think the interface design is not well suited for confidential VMs
yet and want to discuss how to change that.

On Thu, Mar 13, 2025 at 04:02:11PM +0530, Ani Sinha wrote:
> Forgive my ignorance, today none of the hyperscalers use IGVM to
> define this. How then the expected launch measurement is derived?

There is at least one Hyperscaler which uses IGVM, another implemented
support for it in their hypervisor (although not for a BYOFW feature,
yet). In my view this problem is still widely being worked on today by
the CSPs. If you are lucky, and since the firmware is provided by the
CSP as well, they publish expected launch measurments for each of their
coco platforms.

> Our vmfwupdate interface does not make any attempt to define this in
> any way or make this an ABI. Its intentionally open ended. It's really
> up to the guest to set this up any way they wish. This is the same
> when the guest provides the igvm file that defines the above.

Without a way to supply user-defined setup steps of a confidential VM
the host-VMM has to hardcode them. And that hardcoded behavior becomes
ABI implicitly, because any change to this behavior will also change the
actual launch measurements, potentially breaking existing users.

This happened in the past already when KVM decided to hardcode a
different set of SEV-Features by enabling Debug-Swap. I strongly believe
that with KVM/QEMU we should work in a direction which makes those
breakages unlikely in the future.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 10:59         ` Jörg Rödel
@ 2025-03-13 11:09           ` Ani Sinha
  2025-03-13 11:27             ` Jörg Rödel
  0 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-13 11:09 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 4:29 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> Hi Ani,
>
> Don't get me wrong, I really like the general idea of vmfwupdate as
> a way to implement BYOFW, and for non-confidential VMs it is great. I
> just think the interface design is not well suited for confidential VMs
> yet and want to discuss how to change that.
>
> On Thu, Mar 13, 2025 at 04:02:11PM +0530, Ani Sinha wrote:
> > Forgive my ignorance, today none of the hyperscalers use IGVM to
> > define this. How then the expected launch measurement is derived?
>
> There is at least one Hyperscaler which uses IGVM, another implemented
> support for it in their hypervisor (although not for a BYOFW feature,
> yet). In my view this problem is still widely being worked on today by
> the CSPs. If you are lucky, and since the firmware is provided by the
> CSP as well, they publish expected launch measurments for each of their
> coco platforms.
>
> > Our vmfwupdate interface does not make any attempt to define this in
> > any way or make this an ABI. Its intentionally open ended. It's really
> > up to the guest to set this up any way they wish. This is the same
> > when the guest provides the igvm file that defines the above.
>
> Without a way to supply user-defined setup steps of a confidential VM
> the host-VMM has to hardcode them. And that hardcoded behavior becomes
> ABI implicitly, because any change to this behavior will also change the
> actual launch measurements, potentially breaking existing users.

Right so what we are proposing is generic enough so that if the VM
wants to use an IGVM container as opposed to an actual firmware image
in a FUKI, that is totally possible. Then you need to have that IGVM
setup the memory in a well defined way like you suggest. Sure, the
IGVM is not passed through a command line but it can be loaded by the
guest in a well defined memory location and then its instructions can
be executed.
In our proposal, we do not want to dictate how the guest would want to
do that. So hopefully you see now that IGVM and our approach is not
mutually exclusive but can be complementary to each other.

>
> This happened in the past already when KVM decided to hardcode a
> different set of SEV-Features by enabling Debug-Swap. I strongly believe
> that with KVM/QEMU we should work in a direction which makes those
> breakages unlikely in the future.
>
> Regards,
>
>         Joerg
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 11:09           ` Ani Sinha
@ 2025-03-13 11:27             ` Jörg Rödel
  2025-03-13 11:28               ` Jörg Rödel
                                 ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 11:27 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 04:39:15PM +0530, Ani Sinha wrote:
> Right so what we are proposing is generic enough so that if the VM
> wants to use an IGVM container as opposed to an actual firmware image
> in a FUKI, that is totally possible. Then you need to have that IGVM
> setup the memory in a well defined way like you suggest. Sure, the
> IGVM is not passed through a command line but it can be loaded by the
> guest in a well defined memory location and then its instructions can
> be executed.

That makes sense. In this scenario, how does the VMM detect that it got
an IGVM image instead of a firmware image? As I understood the current
documentation the defined behavior is to copy the guest-provided
FW-image to the BIOS region, no?

> In our proposal, we do not want to dictate how the guest would want to
> do that. So hopefully you see now that IGVM and our approach is not
> mutually exclusive but can be complementary to each other.

Fine with me. Just note that supporting the current non-IGVM process for
confidential guests still causes the implicit ABI issue I mentioned
before. But not being a KVM/QEMU maintainer I can live with that :)

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 11:27             ` Jörg Rödel
@ 2025-03-13 11:28               ` Jörg Rödel
  2025-03-13 11:56               ` Ani Sinha
  2025-03-13 14:53               ` Ani Sinha
  2 siblings, 0 replies; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 11:28 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 12:27:13PM +0100, Jörg Rödel wrote:
> Fine with me. Just note that supporting the current non-IGVM process for
> confidential guests still causes the implicit ABI issue I mentioned
> before. But not being a KVM/QEMU maintainer I can live with that :)

Small addition: Note that IGVM allows for defining multiple platform
targets in one image, among them non-confidential VMs. So IGVM can be
used for those as well.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 11:27             ` Jörg Rödel
  2025-03-13 11:28               ` Jörg Rödel
@ 2025-03-13 11:56               ` Ani Sinha
  2025-03-13 14:53               ` Ani Sinha
  2 siblings, 0 replies; 68+ messages in thread
From: Ani Sinha @ 2025-03-13 11:56 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 4:57 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> On Thu, Mar 13, 2025 at 04:39:15PM +0530, Ani Sinha wrote:
> > Right so what we are proposing is generic enough so that if the VM
> > wants to use an IGVM container as opposed to an actual firmware image
> > in a FUKI, that is totally possible. Then you need to have that IGVM
> > setup the memory in a well defined way like you suggest. Sure, the
> > IGVM is not passed through a command line but it can be loaded by the
> > guest in a well defined memory location and then its instructions can
> > be executed.
>
> That makes sense. In this scenario, how does the VMM detect that it got
> an IGVM image instead of a firmware image?

VMM does not care but UKI which the guest brings in would since a IGVM
container would be part of not .efifw section but .igvm section.

As I understood the current
> documentation the defined behavior is to copy the guest-provided
> FW-image to the BIOS region, no?

In the latest proposal which Gerd has suggested, VMM does not know or
care if its bios region or not or what kind of image it is copying.
The guest provides the target private address at which to copy the
blob from the shared memory region. So the FUKI which bundles a fw
image would provide 4G-size for x86 as the target physical address. A
FUKI that bundles an IGVM can provide something else.

>
> > In our proposal, we do not want to dictate how the guest would want to
> > do that. So hopefully you see now that IGVM and our approach is not
> > mutually exclusive but can be complementary to each other.
>
> Fine with me. Just note that supporting the current non-IGVM process for
> confidential guests still causes the implicit ABI issue I mentioned
> before. But not being a KVM/QEMU maintainer I can live with that :)
>
> Regards,
>
>         Joerg
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 11:27             ` Jörg Rödel
  2025-03-13 11:28               ` Jörg Rödel
  2025-03-13 11:56               ` Ani Sinha
@ 2025-03-13 14:53               ` Ani Sinha
  2025-03-13 15:39                 ` Jörg Rödel
  2 siblings, 1 reply; 68+ messages in thread
From: Ani Sinha @ 2025-03-13 14:53 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 4:57 PM Jörg Rödel <joro@8bytes.org> wrote:
>
> On Thu, Mar 13, 2025 at 04:39:15PM +0530, Ani Sinha wrote:
> > Right so what we are proposing is generic enough so that if the VM
> > wants to use an IGVM container as opposed to an actual firmware image
> > in a FUKI, that is totally possible. Then you need to have that IGVM
> > setup the memory in a well defined way like you suggest. Sure, the
> > IGVM is not passed through a command line but it can be loaded by the
> > guest in a well defined memory location and then its instructions can
> > be executed.
>
> That makes sense. In this scenario, how does the VMM detect that it got
> an IGVM image instead of a firmware image? As I understood the current
> documentation the defined behavior is to copy the guest-provided
> FW-image to the BIOS region, no?

Note that even with this approach where the hypervisor *thinks* it's
dealing with a real firmware, you can imagine a small rust based
firmware image that is loaded by the guest in the firmware region.
This tiny firmware then jumps to a well known address (chosen by the
guest) where IGVM is loaded and then starts executing the IGVM
instructions.

>
> > In our proposal, we do not want to dictate how the guest would want to
> > do that. So hopefully you see now that IGVM and our approach is not
> > mutually exclusive but can be complementary to each other.
>
> Fine with me. Just note that supporting the current non-IGVM process for
> confidential guests still causes the implicit ABI issue I mentioned
> before. But not being a KVM/QEMU maintainer I can live with that :)
>
> Regards,
>
>         Joerg
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 14:53               ` Ani Sinha
@ 2025-03-13 15:39                 ` Jörg Rödel
  2025-03-13 16:30                   ` Alexander Graf
  0 siblings, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 15:39 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Alex Graf, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 08:23:44PM +0530, Ani Sinha wrote:
> Note that even with this approach where the hypervisor *thinks* it's
> dealing with a real firmware, you can imagine a small rust based
> firmware image that is loaded by the guest in the firmware region.
> This tiny firmware then jumps to a well known address (chosen by the
> guest) where IGVM is loaded and then starts executing the IGVM
> instructions.

Yes, but this way the predictable launch measurement property of IGVM
is lost, as the measurement only contains hashes for the actions
which happened before the VM was finalized and launched by the VMM. The
SEV policy can also not be changed anymore when the guest is running.

Anyway, I think it doesn't matter much whether the IGVM is parsed in
guest context or by QEMU, as long as the resulting measurement is the
same as if the file was loaded at initial VM launch.

Given that QEMU will hopefully get IGVM backend support soon, there is
some value and saved effort in just passing the IGVM data to the VMM via
the vmfwupdate interface and let QEMU do the rest.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 15:39                 ` Jörg Rödel
@ 2025-03-13 16:30                   ` Alexander Graf
  2025-03-13 17:38                     ` Jörg Rödel
  0 siblings, 1 reply; 68+ messages in thread
From: Alexander Graf @ 2025-03-13 16:30 UTC (permalink / raw)
  To: Jörg Rödel, Ani Sinha
  Cc: Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

Hi Jörg,

On 13.03.25 16:39, Jörg Rödel wrote:
> On Thu, Mar 13, 2025 at 08:23:44PM +0530, Ani Sinha wrote:
>> Note that even with this approach where the hypervisor *thinks* it's
>> dealing with a real firmware, you can imagine a small rust based
>> firmware image that is loaded by the guest in the firmware region.
>> This tiny firmware then jumps to a well known address (chosen by the
>> guest) where IGVM is loaded and then starts executing the IGVM
>> instructions.
> Yes, but this way the predictable launch measurement property of IGVM
> is lost, as the measurement only contains hashes for the actions
> which happened before the VM was finalized and launched by the VMM. The
> SEV policy can also not be changed anymore when the guest is running.
>
> Anyway, I think it doesn't matter much whether the IGVM is parsed in
> guest context or by QEMU, as long as the resulting measurement is the
> same as if the file was loaded at initial VM launch.
>
> Given that QEMU will hopefully get IGVM backend support soon, there is
> some value and saved effort in just passing the IGVM data to the VMM via
> the vmfwupdate interface and let QEMU do the rest.


I have a few concerns with IGVM:

1) Parsing is non-trivial. Parsing them in QEMU may open security issues.
2) Their data structures are tied to the target CPU structures like VMSA 
which FWIW are not fully owned by QEMU, are they?
3) I don't want to allocate a bounce buffer for an IGVM in the 
hypervisor. So we would need to ensure that the memory allocated by the 
loader for the IGVM does not overlap any memory the IGVM wants to 
consume. If the loader considers the IGVM as opaque, that is difficult 
to achieve.


Alex



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 16:30                   ` Alexander Graf
@ 2025-03-13 17:38                     ` Jörg Rödel
  2025-03-13 17:49                       ` Daniel P. Berrangé
  0 siblings, 1 reply; 68+ messages in thread
From: Jörg Rödel @ 2025-03-13 17:38 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ani Sinha, Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu,
	Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

Hey Alex,

On Thu, Mar 13, 2025 at 05:30:30PM +0100, Alexander Graf wrote:
> I have a few concerns with IGVM:
> 
> 1) Parsing is non-trivial. Parsing them in QEMU may open security issues.

There is an IGVM parsing library under MIT license and written in Rust
with C-bindings. The currently proposed IGVM support patches for
QEMU also make of it as well as (I believe) the implementations in the
two other hypervisors I am aware of.

That it's written in Rust is no guarantee that there are no issues, but
certain classes of common security bugs should already be avoided by
that.

> 2) Their data structures are tied to the target CPU structures like VMSA
> which FWIW are not fully owned by QEMU, are they?

Yes, those data structures are aligned with what the hardware consumes.
That makes it a lot easier to pre-calculate the launch-measurements, as
the tooling just needs to hash what is in the file without constructing
the hardware representation first.

Not sure what you mean by "owned by QEMU", all data in the IGVM file is
at least _consumed_ by QEMU and KVM to build the initial memory image of
the CVM. Once the CVM is launched all of the data belongs to the guest.

> 3) I don't want to allocate a bounce buffer for an IGVM in the hypervisor.
> So we would need to ensure that the memory allocated by the loader for the
> IGVM does not overlap any memory the IGVM wants to consume. If the loader
> considers the IGVM as opaque, that is difficult to achieve.

Right, I think that is a reasonable constraint that should be built into
the vmfwupdate protocol. The placement of the file in guest memory must
not overlap with any memory region that is deployed by the file. That
saves QEMU from the copying and allocating the space on the host side.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support
  2025-03-13 17:38                     ` Jörg Rödel
@ 2025-03-13 17:49                       ` Daniel P. Berrangé
  0 siblings, 0 replies; 68+ messages in thread
From: Daniel P. Berrangé @ 2025-03-13 17:49 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Alexander Graf, Ani Sinha, Paolo Bonzini, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Zhao Liu, Richard Henderson, Michael S. Tsirkin, Fabiano Rosas,
	Laurent Vivier, Gerd Hoffman, Igor Mammedov, Vitaly Kuznetsov,
	qemu-devel

On Thu, Mar 13, 2025 at 06:38:44PM +0100, Jörg Rödel wrote:
> Hey Alex,
> 
> On Thu, Mar 13, 2025 at 05:30:30PM +0100, Alexander Graf wrote:
> > I have a few concerns with IGVM:
> > 
> > 1) Parsing is non-trivial. Parsing them in QEMU may open security issues.
> 
> There is an IGVM parsing library under MIT license and written in Rust
> with C-bindings. The currently proposed IGVM support patches for
> QEMU also make of it as well as (I believe) the implementations in the
> two other hypervisors I am aware of.

Yes current patches are using the rust igvm library via its C shim
layer

  https://lists.nongnu.org/archive/html/qemu-devel/2025-02/msg05714.html

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2025-05-21  7:51 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13 12:05 [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support Gerd Hoffman
2025-03-13 13:31 ` Jörg Rödel
2025-03-13 14:06   ` Ani Sinha
2025-03-14 11:27     ` Gerd Hoffman
2025-03-14 12:47       ` Alexander Graf
2025-03-14 14:08         ` Gerd Hoffman
2025-03-14 14:50           ` Alexander Graf
2025-03-17  9:56             ` Gerd Hoffman
2025-03-17 17:29               ` Alexander Graf
2025-03-18  7:00                 ` Gerd Hoffman
2025-03-18 11:11                   ` Gerd Hoffman
2025-03-20  8:34                     ` Jörg Rödel
2025-03-21  8:22                       ` Gerd Hoffman
2025-03-24 16:08                         ` Daniel P. Berrangé
2025-03-20 13:53                     ` Alexander Graf
2025-03-21  3:36                       ` Ani Sinha
2025-03-21  8:09                         ` Alexander Graf
2025-03-21  9:14                       ` Gerd Hoffman
2025-03-20  8:31               ` Jörg Rödel
2025-03-21 10:08                 ` Gerd Hoffman
2025-03-21 12:44                   ` Ani Sinha
2025-03-24  7:43                     ` Gerd Hoffman
2025-03-24 11:12                       ` Ani Sinha
2025-03-24 15:48                         ` Gerd Hoffman
2025-03-24 16:31                           ` Alexander Graf
2025-03-24 17:53                             ` Gerd Hoffman
2025-03-24 18:07                               ` Daniel P. Berrangé
2025-03-25  8:04                               ` Alexander Graf
2025-03-26 12:27                                 ` Gerd Hoffman
2025-03-26 15:22                                   ` Alexander Graf
2025-03-26 21:51                                     ` Gerd Hoffman
2025-04-07 16:21                                       ` Dionna Amalie Glaze
2025-04-08  8:33                                         ` Gerd Hoffman
2025-04-08 21:42                                           ` Dionna Amalie Glaze
2025-04-09  6:21                                             ` Gerd Hoffman
2025-04-10  6:31                                               ` Ani Sinha
2025-04-10 10:44                                                 ` Gerd Hoffmann
2025-04-16 11:40                                                   ` Ani Sinha
2025-04-09 11:59                                             ` Ani Sinha
2025-03-27 12:12                                     ` Ani Sinha
2025-04-08  8:11                       ` Gerd Hoffman
2025-05-21  7:50                         ` Ani Sinha
2025-03-21 12:45                   ` Daniel P. Berrangé
2025-03-14 15:16           ` Jörg Rödel
2025-03-15  6:08             ` Ani Sinha
  -- strict thread matches above, loose matches on Subject: below --
2025-02-14 15:34 Ani Sinha
2025-02-24 15:47 ` Gerd Hoffman
2025-02-25  5:21   ` Ani Sinha
2025-02-25  8:39     ` Gerd Hoffman
2025-02-25  9:54       ` Ani Sinha
2025-02-25 10:23         ` Gerd Hoffman
2025-02-25 10:28   ` Igor Mammedov
2025-02-25 11:00     ` Gerd Hoffman
2025-02-25 11:33       ` Igor Mammedov
2025-03-13  9:02 ` Jörg Rödel
2025-03-13  9:37   ` Ani Sinha
2025-03-13 10:10     ` Jörg Rödel
2025-03-13 10:32       ` Ani Sinha
2025-03-13 10:59         ` Jörg Rödel
2025-03-13 11:09           ` Ani Sinha
2025-03-13 11:27             ` Jörg Rödel
2025-03-13 11:28               ` Jörg Rödel
2025-03-13 11:56               ` Ani Sinha
2025-03-13 14:53               ` Ani Sinha
2025-03-13 15:39                 ` Jörg Rödel
2025-03-13 16:30                   ` Alexander Graf
2025-03-13 17:38                     ` Jörg Rödel
2025-03-13 17:49                       ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).