TDISP enablement

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TDISP enablement
@ 2023-10-31 22:56 Alexey Kardashevskiy
  2023-10-31 23:40 ` Dionna Amalie Glaze
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2023-10-31 22:56 UTC (permalink / raw)
  To: linux-coco; +Cc: kvm, linux-pci

Hi everyone,

Here is followup after the Dan's community call we had weeks ago.

Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to 
confidential VMs without trusting the HV and with enabled IDE 
(encryption) and IOMMU (performance, compared to current SWIOTLB). I am 
aware of other uses and vendors and I spend hours unsuccessfully trying 
to generalize all this in a meaningful way.

The AMD SEV TIO verbs can be simplified as:

- device_connect - starts CMA/SPDM session, returns measurements/certs, 
runs IDE_KM to program the keys;
- device_reclaim - undo the connect;
- tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, 
generates interface report;
- tdi_unbind - undo the bind;
- tdi_info - read measurements/certs/interface report;
- tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on 
the parameters).

The first 4 called by the host OS, the last two by the TVM ("Trusted 
VM"). These are implemented in the AMD PSP (platform processor).
There are CMA/SPDM, IDE_KV, TDISP in use.

Now, my strawman code does this on the host (I simplified a bit):
- after PCI discovery but before probing: walk through all TDISP-capable 
(TEE-IO in PCIe caps) endpoint devices and call device_connect;
- when drivers probe - it is all set up and the device measurements are 
visible to the driver;
- when constructing a TVM, tdi_bind is called;

and then in the TVM:
- after PCI discovery but before probing: walk through all TDIs (which 
will have TEE IO bit set) and call tdi_info, verify the report, if ok - 
call tdi_validate;
- when drivers probe - it is all set up and the driver decides if/which 
DMA mode to use (SWIOTLB or direct), or panic().

Uff. Too long already. Sorry. Now, go to the problems:

If the user wants only CMA/SPDM, the Lukas'es patched will do that 
without the PSP. This may co-exist with the AMD PSP (if the endpoint 
allows multiple sessions).

If the user wants only IDE, the AMD PSP's device_connect needs to be 
called and the host OS does not get to know the IDE keys. Other vendors 
allow programming IDE keys to the RC on the baremetal, and this also may 
co-exist with a TSM running outside of Linux - the host still manages 
trafic classes and streams.

If the user wants TDISP for VMs, this assumes the user does not trust 
the host OS and therefore the TSM (which is trusted) has to do CMA/SPDM 
and IDE.

The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE 
seem capable of co-existing, TDISP does not.

However there are common bits.
- certificates/measurements/reports blobs: storing, presenting to the 
userspace (results of device_connect and tdi_bind);
- place where we want to authenticate the device and enable IDE 
(device_connect);
- place where we want to bind TDI to a TVM (tdi_bind).

I've tried to address this with my (poorly named) 
drivers/pci/pcie/tdisp.ko and a hack for VFIO PCI device to call tdi_bind.

The next steps:
- expose blobs via configfs (like Dan did configfs-tsm);
- s/tdisp.ko/coco.ko/;
- ask the audience - what is missing to make it reusable for other 
vendors and uses?

Thanks,
-- 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-10-31 22:56 TDISP enablement Alexey Kardashevskiy
@ 2023-10-31 23:40 ` Dionna Amalie Glaze
  2023-11-01  7:38   ` Lukas Wunner
  2023-11-01  7:27 ` Lukas Wunner
  2023-11-13  5:43 ` Samuel Ortiz
  2 siblings, 1 reply; 20+ messages in thread
From: Dionna Amalie Glaze @ 2023-10-31 23:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linux-coco, kvm, linux-pci

On Tue, Oct 31, 2023 at 3:56 PM Alexey Kardashevskiy <aik@amd.com> wrote:
>
> Hi everyone,
>
Hi Alexey.

> Here is followup after the Dan's community call we had weeks ago.
>
> Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
> confidential VMs without trusting the HV and with enabled IDE
> (encryption) and IOMMU (performance, compared to current SWIOTLB). I am
> aware of other uses and vendors and I spend hours unsuccessfully trying
> to generalize all this in a meaningful way.
>
> The AMD SEV TIO verbs can be simplified as:
>
> - device_connect - starts CMA/SPDM session, returns measurements/certs,
> runs IDE_KM to program the keys;
> - device_reclaim - undo the connect;
> - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states,
> generates interface report;
> - tdi_unbind - undo the bind;
> - tdi_info - read measurements/certs/interface report;

Only read? Can user space not provide a nonce for replay protection
here, or is that just inherent to the SPDM channel setup, and the
user's replay-protected acceptance of the booted code, including this
SPDM communication logic?
I'm not fully up to speed here.

> - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on
> the parameters).
>
> The first 4 called by the host OS, the last two by the TVM ("Trusted
> VM"). These are implemented in the AMD PSP (platform processor).
> There are CMA/SPDM, IDE_KV, TDISP in use.
>
> Now, my strawman code does this on the host (I simplified a bit):
> - after PCI discovery but before probing: walk through all TDISP-capable
> (TEE-IO in PCIe caps) endpoint devices and call device_connect;
> - when drivers probe - it is all set up and the device measurements are
> visible to the driver;
> - when constructing a TVM, tdi_bind is called;
>
> and then in the TVM:
> - after PCI discovery but before probing: walk through all TDIs (which
> will have TEE IO bit set) and call tdi_info, verify the report, if ok -
> call tdi_validate;
> - when drivers probe - it is all set up and the driver decides if/which
> DMA mode to use (SWIOTLB or direct), or panic().
>
>
> Uff. Too long already. Sorry. Now, go to the problems:
>
> If the user wants only CMA/SPDM, the Lukas'es patched will do that
> without the PSP. This may co-exist with the AMD PSP (if the endpoint
> allows multiple sessions).
>
> If the user wants only IDE, the AMD PSP's device_connect needs to be
> called and the host OS does not get to know the IDE keys. Other vendors
> allow programming IDE keys to the RC on the baremetal, and this also may
> co-exist with a TSM running outside of Linux - the host still manages
> trafic classes and streams.
>
> If the user wants TDISP for VMs, this assumes the user does not trust
> the host OS and therefore the TSM (which is trusted) has to do CMA/SPDM
> and IDE.
>
> The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
> seem capable of co-existing, TDISP does not.
>
> However there are common bits.
> - certificates/measurements/reports blobs: storing, presenting to the
> userspace (results of device_connect and tdi_bind);
> - place where we want to authenticate the device and enable IDE
> (device_connect);
> - place where we want to bind TDI to a TVM (tdi_bind).
>
> I've tried to address this with my (poorly named)
> drivers/pci/pcie/tdisp.ko and a hack for VFIO PCI device to call tdi_bind.
>
> The next steps:
> - expose blobs via configfs (like Dan did configfs-tsm);

I think that the blob interface should be reworked, as Sean and I have
commented on for the SEV-SNP host patch series.
For example, the amount of memory needed for the blob should be
configurable by the host through a proposed size.
These vendored certificates will only grow in size, and they're
device-specific, so it makes sense for machines to have a local cache
of all the provisioned certificates that get forwarded to the guest
through the VMM. I'd like to see this kind of blob reporting as a more
general mechanism, however, so we can get TDX-specific blobs in too
without much fuss.

I'm not _fully_ opposed to ditching this blob idea and just requiring
the guest to contact a RIM service for all these certs, but generally
I think that's a bit of an objectionable barrier to entry. And more
work I'll need to do, lol. Probably still will have to eventually for
short-lived claims.
We then have another API to try to standardize that IETF RATS doesn't
try to touch.

All that aside, it doesn't seem like tsm/report is going to be the
right place to get attestations from various devices. It's only
designed for attesting the CPU. Would the idea be to have a new WO
attribute that is some kind of TDI id, and that multiplexes out to the
different TDI?

> - s/tdisp.ko/coco.ko/;
> - ask the audience - what is missing to make it reusable for other
> vendors and uses?
>
> Thanks,
> --
> Alexey
>
>
>


--
-Dionna Glaze, PhD (she/her)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-10-31 22:56 TDISP enablement Alexey Kardashevskiy
  2023-10-31 23:40 ` Dionna Amalie Glaze
@ 2023-11-01  7:27 ` Lukas Wunner
  2023-11-01 11:05   ` Jonathan Cameron
  2023-11-01 11:43   ` Alexey Kardashevskiy
  2023-11-13  5:43 ` Samuel Ortiz
  2 siblings, 2 replies; 20+ messages in thread
From: Lukas Wunner @ 2023-11-01  7:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linux-coco, kvm, linux-pci, Dan Williams, Jonathan Cameron

On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> - device_connect - starts CMA/SPDM session, returns measurements/certs,
> runs IDE_KM to program the keys;

Does the PSP have a set of trusted root certificates?
If so, where does it get them from?

If not, does the PSP just blindly trust the validity of the cert chain?
Who validates the cert chain, and when?
Which slot do you use?
Do you return only the cert chain of that single slot or of all slots?
Does the PSP read out all measurements available?  This may take a while
if the measurements are large and there are a lot of them.

> - tdi_info - read measurements/certs/interface report;

Does this return cached cert chains and measurements from the device
or does it retrieve them anew?  (Measurements might have changed if
MEAS_FRESH_CAP is supported.)

> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> sessions).

It can co-exist if the pci_cma_claim_ownership() library call
provided by patch 12/12 is invoked upon device_connect.

It would seem advantageous if you could delay device_connect
until a device is actually passed through.  Then the OS can
initially authenticate and measure devices and the PSP takes
over when needed.

> If the user wants only IDE, the AMD PSP's device_connect needs to be called
> and the host OS does not get to know the IDE keys. Other vendors allow
> programming IDE keys to the RC on the baremetal, and this also may co-exist
> with a TSM running outside of Linux - the host still manages trafic classes
> and streams.

I'm wondering if your implementation is spec compliant:

PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
to [...] use implementation specific key management."  But "For
Endpoint Functions, [...] Function 0 must implement [...]
the IDE key management (IDE_KM) protocol as a Responder."

So the keys need to be programmed into the endpoint using IDE_KM
but for the Root Port it's permitted to use implementation-specific
means.

The keys for the endpoint and Root Port are the same because this
is symmetric encryption.

If the keys are internal to the PSP, the kernel can't program the
keys into the endpoint using IDE_KM.  So your implementation precludes
IDE setup by the host OS kernel.

device_connect is meant to be used for TDISP, i.e. with devices which
have the TEE-IO Supported bit set in the Device Capabilities Register.

What are you going to do with IDE-capable devices which have that bit
cleared?  Are they unsupported by your implementation?

It seems to me an architecture cannot claim IDE compliance if it's
limited to TEE-IO capable devices, which might only be a subset of
the available products.

> The next steps:
> - expose blobs via configfs (like Dan did configfs-tsm);
> - s/tdisp.ko/coco.ko/;
> - ask the audience - what is missing to make it reusable for other vendors
> and uses?

I intend to expose measurements in sysfs in a measurements/ directory
below each CMA-capable device's directory.  There are products coming
to the market which support only CMA and are not interested in IDE or
TISP.  When bringing up TDISP, measurements received as part of an
interface report must be exposed in the same way so that user space
tooling which evaluates the measurememt works both with TEE-IO capable
and incapable products.  This could be achieved by fetching measurements
from the interface report instead of via SPDM when TDISP is in use.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-10-31 23:40 ` Dionna Amalie Glaze
@ 2023-11-01  7:38   ` Lukas Wunner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Wunner @ 2023-11-01  7:38 UTC (permalink / raw)
  To: Dionna Amalie Glaze
  Cc: Alexey Kardashevskiy, linux-coco, kvm, linux-pci, Dan Williams,
	Jonathan Cameron

On Tue, Oct 31, 2023 at 04:40:56PM -0700, Dionna Amalie Glaze wrote:
> Only read? Can user space not provide a nonce for replay protection
> here, or is that just inherent to the SPDM channel setup, and the

That's internal to SPDM, regardless whether SPDM is handled by the
TSM or OS kernel.


> These vendored certificates will only grow in size, and they're

The size of a cert chain is limited to 64 kByte by the SPDM spec.

A device may have 8 slots, each containing a cert chain.


> device-specific, so it makes sense for machines to have a local cache
> of all the provisioned certificates that get forwarded to the guest
> through the VMM. I'd like to see this kind of blob reporting as a more
> general mechanism, however, so we can get TDX-specific blobs in too
> without much fuss.

Cert chains and measurements from the interface report need to be
exposed as individual sysfs attributes for compatibility with
TEE-IO incapable devices.

Blobs make zero sense here.  Doubly so if they're vendor-specific.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-01  7:27 ` Lukas Wunner
@ 2023-11-01 11:05   ` Jonathan Cameron
  2023-11-02  2:28     ` Alexey Kardashevskiy
                       ` (2 more replies)
  2023-11-01 11:43   ` Alexey Kardashevskiy
  1 sibling, 3 replies; 20+ messages in thread
From: Jonathan Cameron @ 2023-11-01 11:05 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Alexey Kardashevskiy, linux-coco, kvm, linux-pci, Dan Williams,
	Jonathan Cameron, suzuki.poulose

On Wed, 1 Nov 2023 08:27:17 +0100
Lukas Wunner <lukas@wunner.de> wrote:

Thanks Alexy, this is a great discussion to kick off.

> On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > - device_connect - starts CMA/SPDM session, returns measurements/certs,
> > runs IDE_KM to program the keys;  
> 
> Does the PSP have a set of trusted root certificates?
> If so, where does it get them from?
> 
> If not, does the PSP just blindly trust the validity of the cert chain?
> Who validates the cert chain, and when?
> Which slot do you use?
> Do you return only the cert chain of that single slot or of all slots?
> Does the PSP read out all measurements available?  This may take a while
> if the measurements are large and there are a lot of them.

I'd definitely like their to be a path for certs and measurement to be
checked by the Host OS (for the non TDISP path). Whether the
policy setup cares about result is different question ;)

> 
> 
> > - tdi_info - read measurements/certs/interface report;  
> 
> Does this return cached cert chains and measurements from the device
> or does it retrieve them anew?  (Measurements might have changed if
> MEAS_FRESH_CAP is supported.)
> 
> 
> > If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > sessions).  
> 
> It can co-exist if the pci_cma_claim_ownership() library call
> provided by patch 12/12 is invoked upon device_connect.
> 
> It would seem advantageous if you could delay device_connect
> until a device is actually passed through.  Then the OS can
> initially authenticate and measure devices and the PSP takes
> over when needed.

Would that delay mean IDE isn't up - I think that wants to be
available whether or not pass through is going on.

Given potential restrictions on IDE resources, I'd expect to see an explicit
opt in from userspace on the host to start that process for a given
device.  (udev rule or similar might kick it off for simple setups).

Would that work for the flows described?  

Next bit probably has holes...  Key is that a lot of the checks
may fail, and it's up to host userspace policy to decide whether
to proceed (other policy in the secure VM side of things obviously)

So my rough thinking is - for the two options (IDE / TDISP)

Comparing with Alexey's flow I think only real difference is that
I call out explicit host userspace policy controls. I'd also like
to use similar interfaces to convey state to host userspace as
per Lukas' existing approaches.  Sure there will also be in
kernel interfaces for driver to get data if it knows what to do
with it.  I'd also like to enable the non tdisp flow to handle
IDE setup 'natively' if that's possible on particular hardware.

1. Host has a go at CMA/SPDM. Policy might say that a failure here is
   a failure in general so reject device - or it might decide it's up to
   the PSP etc.   (userspace can see if it succeeded)
   I'd argue host software can launch this at any time.  It will
   be a denial of service attack but so are many other things the host
   can do.
2. TDISP policy decision from host (userspace policy control)
   Need to know end goal.
3. IDE opt in from userspace.  Policy decision.
  - If not TDISP 
    - device_connect(IDE ONLY) - bunch of proxying in host OS.
    - Cert chain and measurements presented to host, host can then check if
      it is happy and expose for next policy decision.
    - Hooks exposed for host to request more measurements, key refresh etc.
      Idea being that the flow is host driven with PSP providing required
      services.  If host can just do setup directly that's fine too.
  - If TDISP (technically you can run tdisp from host, but lets assume
    for now no one wants to do that? (yet)).
    - device_connect(TDISP) - bunch of proxying in host OS.
    - Cert chain and measurements presented to host, host can then check if
      it is happy and expose for next policy decision.

4. Flow after this depends on early or late binding (lockdown)
   but could load driver at this point.  Userspace policy.
   tdi-bind etc.


> 
> 
> > If the user wants only IDE, the AMD PSP's device_connect needs to be called
> > and the host OS does not get to know the IDE keys. Other vendors allow
> > programming IDE keys to the RC on the baremetal, and this also may co-exist
> > with a TSM running outside of Linux - the host still manages trafic classes
> > and streams.  
> 
> I'm wondering if your implementation is spec compliant:
> 
> PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
> to [...] use implementation specific key management."  But "For
> Endpoint Functions, [...] Function 0 must implement [...]
> the IDE key management (IDE_KM) protocol as a Responder."
> 
> So the keys need to be programmed into the endpoint using IDE_KM
> but for the Root Port it's permitted to use implementation-specific
> means.
> 
> The keys for the endpoint and Root Port are the same because this
> is symmetric encryption.
> 
> If the keys are internal to the PSP, the kernel can't program the
> keys into the endpoint using IDE_KM.  So your implementation precludes
> IDE setup by the host OS kernel.

Proxy the CMA messages through the host OS. Doesn't mean host has
visibility of the keys or certs.  So indeed, the actual setup isn't being done
by the host kernel, but rather by it requesting the 'blob' to send
to the CMA DOE from PSP.

By my reading that's a bit inelegant but I don't see it being a break
with the specification.

> 
> device_connect is meant to be used for TDISP, i.e. with devices which
> have the TEE-IO Supported bit set in the Device Capabilities Register.
> 
> What are you going to do with IDE-capable devices which have that bit
> cleared?  Are they unsupported by your implementation?
> 
> It seems to me an architecture cannot claim IDE compliance if it's
> limited to TEE-IO capable devices, which might only be a subset of
> the available products.

Agreed.  If can request the PSP does a non TDISP IDE setup then
I think we are fine.  If not then indeed usecases are limited and
meh, it might be a spec compliance issue but I suspect not as
TDISP has a note at the top that says:

"Although it is permitted (and generally expected) that TDIs will
be implemented such that they can be assigned to Legacy VMs, such
use is not the focus of TDISP."

Which rather implies that devices that don't support other usecases
are allowed.

> 
> 
> > The next steps:
> > - expose blobs via configfs (like Dan did configfs-tsm);
> > - s/tdisp.ko/coco.ko/;
> > - ask the audience - what is missing to make it reusable for other vendors
> > and uses?  
> 
> I intend to expose measurements in sysfs in a measurements/ directory
> below each CMA-capable device's directory.  There are products coming
> to the market which support only CMA and are not interested in IDE or
> TISP.  When bringing up TDISP, measurements received as part of an
> interface report must be exposed in the same way so that user space
> tooling which evaluates the measurememt works both with TEE-IO capable
> and incapable products.  This could be achieved by fetching measurements
> from the interface report instead of via SPDM when TDISP is in use.

Absolutely agree on this and superficially it feels like this should not
be hard to hook up.

There will also be paths where a driver wants to see the measurement report
but that should also be easy enough to enable.

Jonathan
> 
> Thanks,
> 
> Lukas
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-01  7:27 ` Lukas Wunner
  2023-11-01 11:05   ` Jonathan Cameron
@ 2023-11-01 11:43   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2023-11-01 11:43 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-coco, kvm, linux-pci, Dan Williams, Jonathan Cameron


On 1/11/23 18:27, Lukas Wunner wrote:
> On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
>> - device_connect - starts CMA/SPDM session, returns measurements/certs,
>> runs IDE_KM to program the keys;
> 
> Does the PSP have a set of trusted root certificates?
> If so, where does it get them from?
> 
> If not, does the PSP just blindly trust the validity of the cert chain?

The PSP does trust, or "does not care".

> Who validates the cert chain, and when?

The guest validates, before enabling MMIO/IOMMU.

> Which slot do you use?

The slot number is passed to the PSP at the device setup in the PSP 
("device_connect").

> Do you return only the cert chain of that single slot or of all slots?
> Does the PSP read out all measurements available? 

All or a digest (hash).

> This may take a while
> if the measurements are large and there are a lot of them.

Hm. May be. The PSP can return either all measurements or just a digest. 
The host is supposed to cache it.

> 
>> - tdi_info - read measurements/certs/interface report;
> 
> Does this return cached cert chains and measurements from the device
> or does it retrieve them anew?  (Measurements might have changed if
> MEAS_FRESH_CAP is supported.)

It returns the digests and a flag saying if these are from before or 
after the device was TDISP-locked (tdi_bind).


>> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
>> sessions).
> 
> It can co-exist if the pci_cma_claim_ownership() library call
> provided by patch 12/12 is invoked upon device_connect.
> 
> It would seem advantageous if you could delay device_connect
> until a device is actually passed through. 

It is not exactly a whole device which is passed through but likely just 
a VF, just to clarify.

> Then the OS can
> initially authenticate and measure devices and the PSP takes
> over when needed.

The PSP is going to redo all this anyway so at least in my case it is 
just unwanted duplication. Although I am still not sure if 2 SPDM 
sessions cannot co-exist (not that I want that in particular though).


>> If the user wants only IDE, the AMD PSP's device_connect needs to be called
>> and the host OS does not get to know the IDE keys. Other vendors allow
>> programming IDE keys to the RC on the baremetal, and this also may co-exist
>> with a TSM running outside of Linux - the host still manages trafic classes
>> and streams.
> 
> I'm wondering if your implementation is spec compliant:
> 
> PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
> to [...] use implementation specific key management."  But "For
> Endpoint Functions, [...] Function 0 must implement [...]
> the IDE key management (IDE_KM) protocol as a Responder."
> 
> So the keys need to be programmed into the endpoint using IDE_KM
> but for the Root Port it's permitted to use implementation-specific
> means.

Correct.
> The keys for the endpoint and Root Port are the same because this
> is symmetric encryption.
> 
> If the keys are internal to the PSP, the kernel can't program the
> keys into the endpoint using IDE_KM.  So your implementation precludes
> IDE setup by the host OS kernel.

Correct.

> device_connect is meant to be used for TDISP, i.e. with devices which
> have the TEE-IO Supported bit set in the Device Capabilities Register.
> 
> What are you going to do with IDE-capable devices which have that bit
> cleared?  Are they unsupported by your implementation?

It should be possible to call just "device_connect" to have IDE set up.

> It seems to me an architecture cannot claim IDE compliance if it's
> limited to TEE-IO capable devices, which might only be a subset of
> the available products.
> 
>> The next steps:
>> - expose blobs via configfs (like Dan did configfs-tsm);
>> - s/tdisp.ko/coco.ko/;
>> - ask the audience - what is missing to make it reusable for other vendors
>> and uses?
> 
> I intend to expose measurements in sysfs in a measurements/ directory
> below each CMA-capable device's directory.  There are products coming
> to the market which support only CMA and are not interested in IDE or
> TISP.  When bringing up TDISP, measurements received as part of an
> interface report must be exposed in the same way so that user space
> tooling which evaluates the measurememt works both with TEE-IO capable
> and incapable products.  This could be achieved by fetching measurements
> from the interface report instead of via SPDM when TDISP is in use.

Out of curiosity - sysfs, not configfs?

> 
> Thanks,
> 
> Lukas

-- 
Alexey



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-01 11:05   ` Jonathan Cameron
@ 2023-11-02  2:28     ` Alexey Kardashevskiy
  2023-11-03 16:44       ` Jonathan Cameron
  2023-11-10 23:38       ` Dan Williams
  2023-11-10 23:30     ` Dan Williams
  2023-11-13  6:04     ` Samuel Ortiz
  2 siblings, 2 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2023-11-02  2:28 UTC (permalink / raw)
  To: Jonathan Cameron, Lukas Wunner
  Cc: linux-coco, kvm, linux-pci, Dan Williams, Jonathan Cameron,
	suzuki.poulose


On 1/11/23 22:05, Jonathan Cameron wrote:
> On Wed, 1 Nov 2023 08:27:17 +0100
> Lukas Wunner <lukas@wunner.de> wrote:
> 
> Thanks Alexy, this is a great discussion to kick off.
> 
>> On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
>>> - device_connect - starts CMA/SPDM session, returns measurements/certs,
>>> runs IDE_KM to program the keys;
>>
>> Does the PSP have a set of trusted root certificates?
>> If so, where does it get them from?
>>
>> If not, does the PSP just blindly trust the validity of the cert chain?
>> Who validates the cert chain, and when?
>> Which slot do you use?
>> Do you return only the cert chain of that single slot or of all slots?
>> Does the PSP read out all measurements available?  This may take a while
>> if the measurements are large and there are a lot of them.
> 
> I'd definitely like their to be a path for certs and measurement to be
> checked by the Host OS (for the non TDISP path). Whether the
> policy setup cares about result is different question ;)

Yup, the PSP returns these to the host OS anyway. And one of reasons why 
I wanted the same module in both host and guest for exposing these 
certs/meas to the userspace.

>>
>>> - tdi_info - read measurements/certs/interface report;
>>
>> Does this return cached cert chains and measurements from the device
>> or does it retrieve them anew?  (Measurements might have changed if
>> MEAS_FRESH_CAP is supported.)
>>
>>
>>> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
>>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
>>> sessions).
>>
>> It can co-exist if the pci_cma_claim_ownership() library call
>> provided by patch 12/12 is invoked upon device_connect.
>>
>> It would seem advantageous if you could delay device_connect
>> until a device is actually passed through.  Then the OS can
>> initially authenticate and measure devices and the PSP takes
>> over when needed.
> 
> Would that delay mean IDE isn't up - I think that wants to be
> available whether or not pass through is going on.
> 
> Given potential restrictions on IDE resources, I'd expect to see an explicit
> opt in from userspace on the host to start that process for a given
> device.  (udev rule or similar might kick it off for simple setups).
> 
> Would that work for the flows described?

This would work but my (likely wrong) intention was also to run 
necessary setup in both host and guest at the same time before drivers 
probe devices. And while delaying it in the host is fine (well, for us 
in AMD, as we are aiming for CoCo/TDISP), in the guest this means less 
flexibility in enlightening the PCI subsystem and the guest driver: 
ideally (or at least initially) the driver is supposed to probe already 
enabled and verified device, as otherwise it has to do SWIOTLB until the 
userspace does the verification and kicks the driver to go proper direct 
DMA (or reload the driver?).

> Next bit probably has holes...  Key is that a lot of the checks
> may fail, and it's up to host userspace policy to decide whether
> to proceed (other policy in the secure VM side of things obviously)
> 
> So my rough thinking is - for the two options (IDE / TDISP)
> 
> Comparing with Alexey's flow I think only real difference is that
> I call out explicit host userspace policy controls. I'd also like

My imagination fails me :) What is the host supposed to do if the device 
verification fails/succeeds later, and how much later, and the device is 
a boot disk? Or is this userspace going to be limited to initramdisk? 
What is that thing which we are protecting against? Or it is for CUDA 
and such (which yeah, it can wait)?

> to use similar interfaces to convey state to host userspace as
> per Lukas' existing approaches.  Sure there will also be in
> kernel interfaces for driver to get data if it knows what to do
> with it.  I'd also like to enable the non tdisp flow to handle
> IDE setup 'natively' if that's possible on particular hardware.
> 
> 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
>     a failure in general so reject device - or it might decide it's up to
>     the PSP etc.   (userspace can see if it succeeded)
>     I'd argue host software can launch this at any time.  It will
>     be a denial of service attack but so are many other things the host
>     can do.

Trying to visualize it in my head - policy is a kernel cmdline or module 
parameter?

> 2. TDISP policy decision from host (userspace policy control)
>     Need to know end goal.

/sys/bus/pci/devices/0000:11:22.3/tdisp ?

> 3. IDE opt in from userspace.  Policy decision.
>    - If not TDISP
>      - device_connect(IDE ONLY) - bunch of proxying in host OS.
>      - Cert chain and measurements presented to host, host can then check if
>        it is happy and expose for next policy decision.
>      - Hooks exposed for host to request more measurements, key refresh etc.
>        Idea being that the flow is host driven with PSP providing required
>        services.  If host can just do setup directly that's fine too.

I'd expect the user to want IDE on from the very beginning, why wait to 
turn it on later? The question is rather if the user wants to panic() or 
warn() or block the device if IDE setup failed.

>    - If TDISP (technically you can run tdisp from host, but lets assume
>      for now no one wants to do that? (yet)).
>      - device_connect(TDISP) - bunch of proxying in host OS.
>      - Cert chain and measurements presented to host, host can then check if
>        it is happy and expose for next policy decision.

On AMD SEV TIO the TDISP setup happens in "tdi_bind" when the device is 
about to be passed through which is when QEMU (==userspace) starts.

> 
> 4. Flow after this depends on early or late binding (lockdown)
>     but could load driver at this point.  Userspace policy.
>     tdi-bind etc.

Not sure I follow this. A host or guest driver?


>>
>>> If the user wants only IDE, the AMD PSP's device_connect needs to be called
>>> and the host OS does not get to know the IDE keys. Other vendors allow
>>> programming IDE keys to the RC on the baremetal, and this also may co-exist
>>> with a TSM running outside of Linux - the host still manages trafic classes
>>> and streams.
>>
>> I'm wondering if your implementation is spec compliant:
>>
>> PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
>> to [...] use implementation specific key management."  But "For
>> Endpoint Functions, [...] Function 0 must implement [...]
>> the IDE key management (IDE_KM) protocol as a Responder."
>>
>> So the keys need to be programmed into the endpoint using IDE_KM
>> but for the Root Port it's permitted to use implementation-specific
>> means.
>>
>> The keys for the endpoint and Root Port are the same because this
>> is symmetric encryption.
>>
>> If the keys are internal to the PSP, the kernel can't program the
>> keys into the endpoint using IDE_KM.  So your implementation precludes
>> IDE setup by the host OS kernel.
> 
> Proxy the CMA messages through the host OS. Doesn't mean host has
> visibility of the keys or certs.  So indeed, the actual setup isn't being done
> by the host kernel, but rather by it requesting the 'blob' to send
> to the CMA DOE from PSP.
> 
> By my reading that's a bit inelegant but I don't see it being a break
> with the specification.
> 
>>
>> device_connect is meant to be used for TDISP, i.e. with devices which
>> have the TEE-IO Supported bit set in the Device Capabilities Register.
>>
>> What are you going to do with IDE-capable devices which have that bit
>> cleared?  Are they unsupported by your implementation?
>>
>> It seems to me an architecture cannot claim IDE compliance if it's
>> limited to TEE-IO capable devices, which might only be a subset of
>> the available products.
> 
> Agreed.  If can request the PSP does a non TDISP IDE setup then
> I think we are fine.  If not then indeed usecases are limited and
> meh, it might be a spec compliance issue but I suspect not as
> TDISP has a note at the top that says:
> 
> "Although it is permitted (and generally expected) that TDIs will
> be implemented such that they can be assigned to Legacy VMs, such
> use is not the focus of TDISP."
> 
> Which rather implies that devices that don't support other usecases
> are allowed.
> 
>>
>>
>>> The next steps:
>>> - expose blobs via configfs (like Dan did configfs-tsm);
>>> - s/tdisp.ko/coco.ko/;
>>> - ask the audience - what is missing to make it reusable for other vendors
>>> and uses?
>>
>> I intend to expose measurements in sysfs in a measurements/ directory
>> below each CMA-capable device's directory.  There are products coming
>> to the market which support only CMA and are not interested in IDE or
>> TISP.  When bringing up TDISP, measurements received as part of an
>> interface report must be exposed in the same way so that user space
>> tooling which evaluates the measurememt works both with TEE-IO capable
>> and incapable products.  This could be achieved by fetching measurements
>> from the interface report instead of via SPDM when TDISP is in use.
> 
> Absolutely agree on this and superficially it feels like this should not
> be hard to hook up.

True. sysfs it is then. Thanks,

> 
> There will also be paths where a driver wants to see the measurement report
> but that should also be easy enough to enable.
> 
> Jonathan
>>
>> Thanks,
>>
>> Lukas
>>
> 

-- 
Alexey



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-02  2:28     ` Alexey Kardashevskiy
@ 2023-11-03 16:44       ` Jonathan Cameron
  2023-11-11 22:45         ` Dan Williams
  2023-11-10 23:38       ` Dan Williams
  1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Cameron @ 2023-11-03 16:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Lukas Wunner, linux-coco, kvm, linux-pci, Dan Williams,
	Jonathan Cameron, suzuki.poulose

 
> >>> - tdi_info - read measurements/certs/interface report;  
> >>
> >> Does this return cached cert chains and measurements from the device
> >> or does it retrieve them anew?  (Measurements might have changed if
> >> MEAS_FRESH_CAP is supported.)
> >>
> >>  
> >>> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> >>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> >>> sessions).  
> >>
> >> It can co-exist if the pci_cma_claim_ownership() library call
> >> provided by patch 12/12 is invoked upon device_connect.
> >>
> >> It would seem advantageous if you could delay device_connect
> >> until a device is actually passed through.  Then the OS can
> >> initially authenticate and measure devices and the PSP takes
> >> over when needed.  
> > 
> > Would that delay mean IDE isn't up - I think that wants to be
> > available whether or not pass through is going on.
> > 
> > Given potential restrictions on IDE resources, I'd expect to see an explicit
> > opt in from userspace on the host to start that process for a given
> > device.  (udev rule or similar might kick it off for simple setups).
> > 
> > Would that work for the flows described?  
> 
> This would work but my (likely wrong) intention was also to run 
> necessary setup in both host and guest at the same time before drivers 
> probe devices. And while delaying it in the host is fine (well, for us 
> in AMD, as we are aiming for CoCo/TDISP), in the guest this means less 
> flexibility in enlightening the PCI subsystem and the guest driver: 
> ideally (or at least initially) the driver is supposed to probe already 
> enabled and verified device, as otherwise it has to do SWIOTLB until the 
> userspace does the verification and kicks the driver to go proper direct 
> DMA (or reload the driver?).

In the case of a guest getting a VF, there probably won't be any way for
the kernel to run any native attestation anyway, so policy would have to
rely on the CoCo paths. Kernel stuff Lukas has would just not try to attest
or claim anything about it. If a VF has a CMA capable DOE instance
then that's not there for IDE stuff at all, but for the guest to get
direct measurements etc without PSP or anything else getting involved
in which case the guest using that directly is a reasonable thing to do.

> 
> > Next bit probably has holes...  Key is that a lot of the checks
> > may fail, and it's up to host userspace policy to decide whether
> > to proceed (other policy in the secure VM side of things obviously)
> > 
> > So my rough thinking is - for the two options (IDE / TDISP)
> > 
> > Comparing with Alexey's flow I think only real difference is that
> > I call out explicit host userspace policy controls. I'd also like  
> 
> My imagination fails me :) What is the host supposed to do if the device 
> verification fails/succeeds later, and how much later, and the device is 
> a boot disk? Or is this userspace going to be limited to initramdisk? 
> What is that thing which we are protecting against? Or it is for CUDA 
> and such (which yeah, it can wait)?

There are a bunch of non obvious cases indeed.  Hence make it all policy.
Though if you have a flow where verification is needed for boot disk and
it fails (and policy says that's not acceptable) then bad luck you
probably need to squirt a cert into your ramdisk or UEFI or similar.

> 
> > to use similar interfaces to convey state to host userspace as
> > per Lukas' existing approaches.  Sure there will also be in
> > kernel interfaces for driver to get data if it knows what to do
> > with it.  I'd also like to enable the non tdisp flow to handle
> > IDE setup 'natively' if that's possible on particular hardware.
> > 
> > 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
> >     a failure in general so reject device - or it might decide it's up to
> >     the PSP etc.   (userspace can see if it succeeded)
> >     I'd argue host software can launch this at any time.  It will
> >     be a denial of service attack but so are many other things the host
> >     can do.  
> 
> Trying to visualize it in my head - policy is a kernel cmdline or module 
> parameter?

Neither - it's bind not happening until userspace decides to kick it off.
The module could provide it's own policy on top of this - so userspace
could defer to that if it makes sense (so bind but rely on probe failing
if policy not met).

> 
> > 2. TDISP policy decision from host (userspace policy control)
> >     Need to know end goal.  
> 
> /sys/bus/pci/devices/0000:11:22.3/tdisp ?

Maybe - I'm sure we'll bikeshed anything like that :)

> 
> > 3. IDE opt in from userspace.  Policy decision.
> >    - If not TDISP
> >      - device_connect(IDE ONLY) - bunch of proxying in host OS.
> >      - Cert chain and measurements presented to host, host can then check if
> >        it is happy and expose for next policy decision.
> >      - Hooks exposed for host to request more measurements, key refresh etc.
> >        Idea being that the flow is host driven with PSP providing required
> >        services.  If host can just do setup directly that's fine too.  
> 
> I'd expect the user to want IDE on from the very beginning, why wait to 
> turn it on later? The question is rather if the user wants to panic() or 
> warn() or block the device if IDE setup failed.

There are some concerns about being able to support enough selective IDE streams.
Might turn out to be a false concern (I've not yet got visibility of enough
implementations to be able to tell).
Also (as I understand it as a software guy) IDE has a significant performance
and power cost (and for CXL at least there are various trade offs and options
you can enable depending on security model and device features).

There is "talk" of people turning IDE off if they can cope without it and only
enabling for CoCo (and possibly selectively doing that as well)

> 
> >    - If TDISP (technically you can run tdisp from host, but lets assume
> >      for now no one wants to do that? (yet)).
> >      - device_connect(TDISP) - bunch of proxying in host OS.
> >      - Cert chain and measurements presented to host, host can then check if
> >        it is happy and expose for next policy decision.  
> 
> On AMD SEV TIO the TDISP setup happens in "tdi_bind" when the device is 
> about to be passed through which is when QEMU (==userspace) starts.
Ah. Ok.

> 
> > 
> > 4. Flow after this depends on early or late binding (lockdown)
> >     but could load driver at this point.  Userspace policy.
> >     tdi-bind etc.  
> 
> Not sure I follow this. A host or guest driver?

Hmm - I confess I'm confusing myself now.

At this stage we just have enough info to load a driver for the PF because
to get to state we want locked prior to VF assignment the PF driver may
have some configuration to do.

If all that goes well and the TDI can be moved to locked state, and assigned
to a TVM which then has to decide to issue tdi_validate before binding
the guest driver (which I assume is the TDISP START_INTERFACE_REQUEST
bit of the state machine). Or is the guest driver ever needed before this
transition? (I see you called it out as not, but is it always a one time
thing on driver load or can that decision change without unbind/bind
of driver?)

I know this gets more complex for the PF pass through cases where the
driver needs to load and do some setup before you can lock down the device
but do people have that requirement for VFs? If they do it feels like
device was designed wrong to me...

Too many specs (some of which provide too many ways you 'could' do it)
so I may well have a bunch of this wrong :(

Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-01 11:05   ` Jonathan Cameron
  2023-11-02  2:28     ` Alexey Kardashevskiy
@ 2023-11-10 23:30     ` Dan Williams
  2023-11-24 16:25       ` Jonathan Cameron
  2023-11-13  6:04     ` Samuel Ortiz
  2 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2023-11-10 23:30 UTC (permalink / raw)
  To: Jonathan Cameron, Lukas Wunner
  Cc: Alexey Kardashevskiy, linux-coco, kvm, linux-pci, Dan Williams,
	Jonathan Cameron, suzuki.poulose

Jonathan Cameron wrote:
> On Wed, 1 Nov 2023 08:27:17 +0100
> Lukas Wunner <lukas@wunner.de> wrote:
> 
> Thanks Alexy, this is a great discussion to kick off.
> 
> > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > > - device_connect - starts CMA/SPDM session, returns measurements/certs,
> > > runs IDE_KM to program the keys;  
> > 
> > Does the PSP have a set of trusted root certificates?
> > If so, where does it get them from?
> > 
> > If not, does the PSP just blindly trust the validity of the cert chain?
> > Who validates the cert chain, and when?
> > Which slot do you use?
> > Do you return only the cert chain of that single slot or of all slots?
> > Does the PSP read out all measurements available?  This may take a while
> > if the measurements are large and there are a lot of them.
> 
> I'd definitely like their to be a path for certs and measurement to be
> checked by the Host OS (for the non TDISP path). Whether the
> policy setup cares about result is different question ;)
> 
> > 
> > 
> > > - tdi_info - read measurements/certs/interface report;  
> > 
> > Does this return cached cert chains and measurements from the device
> > or does it retrieve them anew?  (Measurements might have changed if
> > MEAS_FRESH_CAP is supported.)
> > 
> > 
> > > If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > sessions).  
> > 
> > It can co-exist if the pci_cma_claim_ownership() library call
> > provided by patch 12/12 is invoked upon device_connect.
> > 
> > It would seem advantageous if you could delay device_connect
> > until a device is actually passed through.  Then the OS can
> > initially authenticate and measure devices and the PSP takes
> > over when needed.
> 
> Would that delay mean IDE isn't up - I think that wants to be
> available whether or not pass through is going on.
> 
> Given potential restrictions on IDE resources, I'd expect to see an explicit
> opt in from userspace on the host to start that process for a given
> device.  (udev rule or similar might kick it off for simple setups).
> 
> Would that work for the flows described?  
> 
> Next bit probably has holes...  Key is that a lot of the checks
> may fail, and it's up to host userspace policy to decide whether
> to proceed (other policy in the secure VM side of things obviously)
> 
> So my rough thinking is - for the two options (IDE / TDISP)
> 
> Comparing with Alexey's flow I think only real difference is that
> I call out explicit host userspace policy controls. I'd also like
> to use similar interfaces to convey state to host userspace as
> per Lukas' existing approaches.  Sure there will also be in
> kernel interfaces for driver to get data if it knows what to do
> with it.  I'd also like to enable the non tdisp flow to handle
> IDE setup 'natively' if that's possible on particular hardware.

Are there any platforms that have IDE host capability that are not also
shipping a TSM. I know that some platform allow for either the TSM or
the OS to own that setup, but there are no standards there. I am not
opposed to the native path, but given a cross-vendor "TSM" concept is
needed and that a TSM is likely available on all IDE capable platforms
it seems reasonable for Linux to rely on TSM managed IDE for the near
term if not the long term as well.

> 
> 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
>    a failure in general so reject device - or it might decide it's up to
>    the PSP etc.   (userspace can see if it succeeded)
>    I'd argue host software can launch this at any time.  It will
>    be a denial of service attack but so are many other things the host
>    can do.
> 2. TDISP policy decision from host (userspace policy control)
>    Need to know end goal.

If the TSM owns the TDISP state what this policy decision rely comes
down to is IDE stream resource management, I otherwise struggle to
conceptualize "TDISP policy".

The policy is userspace deciding to assign an interface to a TVM, and
that TVM requests that the assigned interface be allowed to access
private memory. So it's not necessarily TDISP policy, its assigned
interface is allowed to transition to private operation.

> 3. IDE opt in from userspace.  Policy decision.
>   - If not TDISP 
>     - device_connect(IDE ONLY) - bunch of proxying in host OS.
>     - Cert chain and measurements presented to host, host can then check if
>       it is happy and expose for next policy decision.
>     - Hooks exposed for host to request more measurements, key refresh etc.
>       Idea being that the flow is host driven with PSP providing required
>       services.  If host can just do setup directly that's fine too.
>   - If TDISP (technically you can run tdisp from host, but lets assume
>     for now no one wants to do that? (yet)).
>     - device_connect(TDISP) - bunch of proxying in host OS.
>     - Cert chain and measurements presented to host, host can then check if
>       it is happy and expose for next policy decision.
> 
> 4. Flow after this depends on early or late binding (lockdown)
>    but could load driver at this point.  Userspace policy.
>    tdi-bind etc.

It is valid to load the driver and operate the device in shared mode, so
I am not sure that acceptance should gate driver loading. It also seems
like something that could be managed with module policy if someone
wanted to prevent shared operation before acceptance.

[..]
> > > The next steps:
> > > - expose blobs via configfs (like Dan did configfs-tsm);

I am missing the context here, but for measurements I think those are
better in sysfs. configs was only to allow for multiple containers to grab
attestation reports, measurements are device local and containers can
all see the same measurements.

> > > - s/tdisp.ko/coco.ko/;

My bikeshed contribution, perhaps tsm.ko? I am still not someone who can
say "coco" for confidential computing with a straight face.

> > > - ask the audience - what is missing to make it reusable for other vendors
> > > and uses?  
> > 
> > I intend to expose measurements in sysfs in a measurements/ directory
> > below each CMA-capable device's directory.  There are products coming
> > to the market which support only CMA and are not interested in IDE or
> > TISP.  When bringing up TDISP, measurements received as part of an
> > interface report must be exposed in the same way so that user space
> > tooling which evaluates the measurememt works both with TEE-IO capable
> > and incapable products.  This could be achieved by fetching measurements
> > from the interface report instead of via SPDM when TDISP is in use.
> 
> Absolutely agree on this and superficially it feels like this should not
> be hard to hook up.
> 
> There will also be paths where a driver wants to see the measurement report
> but that should also be easy enough to enable.

Agree.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-02  2:28     ` Alexey Kardashevskiy
  2023-11-03 16:44       ` Jonathan Cameron
@ 2023-11-10 23:38       ` Dan Williams
  1 sibling, 0 replies; 20+ messages in thread
From: Dan Williams @ 2023-11-10 23:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jonathan Cameron, Lukas Wunner
  Cc: linux-coco, kvm, linux-pci, Dan Williams, Jonathan Cameron,
	suzuki.poulose

Alexey Kardashevskiy wrote:
[..]
> > Next bit probably has holes...  Key is that a lot of the checks
> > may fail, and it's up to host userspace policy to decide whether
> > to proceed (other policy in the secure VM side of things obviously)
> > 
> > So my rough thinking is - for the two options (IDE / TDISP)
> > 
> > Comparing with Alexey's flow I think only real difference is that
> > I call out explicit host userspace policy controls. I'd also like
> 
> My imagination fails me :) What is the host supposed to do if the device 
> verification fails/succeeds later, and how much later, and the device is 
> a boot disk? Or is this userspace going to be limited to initramdisk? 
> What is that thing which we are protecting against? Or it is for CUDA 
> and such (which yeah, it can wait)?
> 
> > to use similar interfaces to convey state to host userspace as
> > per Lukas' existing approaches.  Sure there will also be in
> > kernel interfaces for driver to get data if it knows what to do
> > with it.  I'd also like to enable the non tdisp flow to handle
> > IDE setup 'natively' if that's possible on particular hardware.
> > 
> > 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
> >     a failure in general so reject device - or it might decide it's up to
> >     the PSP etc.   (userspace can see if it succeeded)
> >     I'd argue host software can launch this at any time.  It will
> >     be a denial of service attack but so are many other things the host
> >     can do.
> 
> Trying to visualize it in my head - policy is a kernel cmdline or module 
> parameter?
> 
> > 2. TDISP policy decision from host (userspace policy control)
> >     Need to know end goal.
> 
> /sys/bus/pci/devices/0000:11:22.3/tdisp ?
> 
> > 3. IDE opt in from userspace.  Policy decision.
> >    - If not TDISP
> >      - device_connect(IDE ONLY) - bunch of proxying in host OS.
> >      - Cert chain and measurements presented to host, host can then check if
> >        it is happy and expose for next policy decision.
> >      - Hooks exposed for host to request more measurements, key refresh etc.
> >        Idea being that the flow is host driven with PSP providing required
> >        services.  If host can just do setup directly that's fine too.
> 
> I'd expect the user to want IDE on from the very beginning, why wait to 
> turn it on later? The question is rather if the user wants to panic() or 
> warn() or block the device if IDE setup failed.

Right, but when you run out of streams where is the policy to decide who
wins. That's why I was thinking lazy IDE when it is explicitly requested 
> 
> >    - If TDISP (technically you can run tdisp from host, but lets assume
> >      for now no one wants to do that? (yet)).
> >      - device_connect(TDISP) - bunch of proxying in host OS.
> >      - Cert chain and measurements presented to host, host can then check if
> >        it is happy and expose for next policy decision.
> 
> On AMD SEV TIO the TDISP setup happens in "tdi_bind" when the device is 
> about to be passed through which is when QEMU (==userspace) starts.
> 
> > 
> > 4. Flow after this depends on early or late binding (lockdown)
> >     but could load driver at this point.  Userspace policy.
> >     tdi-bind etc.
> 
> Not sure I follow this. A host or guest driver?
> 
> 
> >>
> >>> If the user wants only IDE, the AMD PSP's device_connect needs to be called
> >>> and the host OS does not get to know the IDE keys. Other vendors allow
> >>> programming IDE keys to the RC on the baremetal, and this also may co-exist
> >>> with a TSM running outside of Linux - the host still manages trafic classes
> >>> and streams.
> >>
> >> I'm wondering if your implementation is spec compliant:
> >>
> >> PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
> >> to [...] use implementation specific key management."  But "For
> >> Endpoint Functions, [...] Function 0 must implement [...]
> >> the IDE key management (IDE_KM) protocol as a Responder."
> >>
> >> So the keys need to be programmed into the endpoint using IDE_KM
> >> but for the Root Port it's permitted to use implementation-specific
> >> means.
> >>
> >> The keys for the endpoint and Root Port are the same because this
> >> is symmetric encryption.
> >>
> >> If the keys are internal to the PSP, the kernel can't program the
> >> keys into the endpoint using IDE_KM.  So your implementation precludes
> >> IDE setup by the host OS kernel.
> > 
> > Proxy the CMA messages through the host OS. Doesn't mean host has
> > visibility of the keys or certs.  So indeed, the actual setup isn't being done
> > by the host kernel, but rather by it requesting the 'blob' to send
> > to the CMA DOE from PSP.
> > 
> > By my reading that's a bit inelegant but I don't see it being a break
> > with the specification.
> > 
> >>
> >> device_connect is meant to be used for TDISP, i.e. with devices which
> >> have the TEE-IO Supported bit set in the Device Capabilities Register.
> >>
> >> What are you going to do with IDE-capable devices which have that bit
> >> cleared?  Are they unsupported by your implementation?
> >>
> >> It seems to me an architecture cannot claim IDE compliance if it's
> >> limited to TEE-IO capable devices, which might only be a subset of
> >> the available products.
> > 
> > Agreed.  If can request the PSP does a non TDISP IDE setup then
> > I think we are fine.  If not then indeed usecases are limited and
> > meh, it might be a spec compliance issue but I suspect not as
> > TDISP has a note at the top that says:
> > 
> > "Although it is permitted (and generally expected) that TDIs will
> > be implemented such that they can be assigned to Legacy VMs, such
> > use is not the focus of TDISP."
> > 
> > Which rather implies that devices that don't support other usecases
> > are allowed.
> > 
> >>
> >>
> >>> The next steps:
> >>> - expose blobs via configfs (like Dan did configfs-tsm);
> >>> - s/tdisp.ko/coco.ko/;
q> >>> - ask the audience - what is missing to make it reusable for other vendors
> >>> and uses?
> >>
> >> I intend to expose measurements in sysfs in a measurements/ directory
> >> below each CMA-capable device's directory.  There are products coming
> >> to the market which support only CMA and are not interested in IDE or
> >> TISP.  When bringing up TDISP, measurements received as part of an
> >> interface report must be exposed in the same way so that user space
> >> tooling which evaluates the measurememt works both with TEE-IO capable
> >> and incapable products.  This could be achieved by fetching measurements
> >> from the interface report instead of via SPDM when TDISP is in use.
> > 
> > Absolutely agree on this and superficially it feels like this should not
> > be hard to hook up.
> 
> True. sysfs it is then. Thanks,
> 
> > 
> > There will also be paths where a driver wants to see the measurement report
> > but that should also be easy enough to enable.
> > 
> > Jonathan
> >>
> >> Thanks,
> >>
> >> Lukas
> >>
> > 
> 
> -- 
> Alexey
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-03 16:44       ` Jonathan Cameron
@ 2023-11-11 22:45         ` Dan Williams
  2023-11-24 14:52           ` Jonathan Cameron
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2023-11-11 22:45 UTC (permalink / raw)
  To: Jonathan Cameron, Alexey Kardashevskiy
  Cc: Lukas Wunner, linux-coco, kvm, linux-pci, Dan Williams,
	Jonathan Cameron, suzuki.poulose

Jonathan Cameron wrote:
>  
> > >>> - tdi_info - read measurements/certs/interface report;  
> > >>
> > >> Does this return cached cert chains and measurements from the device
> > >> or does it retrieve them anew?  (Measurements might have changed if
> > >> MEAS_FRESH_CAP is supported.)
> > >>
> > >>  
> > >>> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > >>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > >>> sessions).  
> > >>
> > >> It can co-exist if the pci_cma_claim_ownership() library call
> > >> provided by patch 12/12 is invoked upon device_connect.
> > >>
> > >> It would seem advantageous if you could delay device_connect
> > >> until a device is actually passed through.  Then the OS can
> > >> initially authenticate and measure devices and the PSP takes
> > >> over when needed.  
> > > 
> > > Would that delay mean IDE isn't up - I think that wants to be
> > > available whether or not pass through is going on.
> > > 
> > > Given potential restrictions on IDE resources, I'd expect to see an explicit
> > > opt in from userspace on the host to start that process for a given
> > > device.  (udev rule or similar might kick it off for simple setups).
> > > 
> > > Would that work for the flows described?  
> > 
> > This would work but my (likely wrong) intention was also to run 
> > necessary setup in both host and guest at the same time before drivers 
> > probe devices. And while delaying it in the host is fine (well, for us 
> > in AMD, as we are aiming for CoCo/TDISP), in the guest this means less 
> > flexibility in enlightening the PCI subsystem and the guest driver: 
> > ideally (or at least initially) the driver is supposed to probe already 
> > enabled and verified device, as otherwise it has to do SWIOTLB until the 
> > userspace does the verification and kicks the driver to go proper direct 
> > DMA (or reload the driver?).
> 
> In the case of a guest getting a VF, there probably won't be any way for
> the kernel to run any native attestation anyway, so policy would have to
> rely on the CoCo paths. Kernel stuff Lukas has would just not try to attest
> or claim anything about it. If a VF has a CMA capable DOE instance
> then that's not there for IDE stuff at all, but for the guest to get
> direct measurements etc without PSP or anything else getting involved
> in which case the guest using that directly is a reasonable thing to do.

Is that a practical reality that VFs are going to implement CMA? My
expectation is CMA is a PF facility and the TSM retrieves measurements
for TDIs through that. At least that seems to be fundamental assumption
of the TDISP specification. Given config-cycles are always host-mediated
I expect guest CMA will always be a proxy whether it is is per-VF CMA
interface or not.

> 
> > 
> > > Next bit probably has holes...  Key is that a lot of the checks
> > > may fail, and it's up to host userspace policy to decide whether
> > > to proceed (other policy in the secure VM side of things obviously)
> > > 
> > > So my rough thinking is - for the two options (IDE / TDISP)
> > > 
> > > Comparing with Alexey's flow I think only real difference is that
> > > I call out explicit host userspace policy controls. I'd also like  
> > 
> > My imagination fails me :) What is the host supposed to do if the device 
> > verification fails/succeeds later, and how much later, and the device is 
> > a boot disk? Or is this userspace going to be limited to initramdisk? 
> > What is that thing which we are protecting against? Or it is for CUDA 
> > and such (which yeah, it can wait)?
> 
> There are a bunch of non obvious cases indeed.  Hence make it all policy.
> Though if you have a flow where verification is needed for boot disk and
> it fails (and policy says that's not acceptable) then bad luck you
> probably need to squirt a cert into your ramdisk or UEFI or similar.

It seems policy mechanisms should be incrementally added as clear need
for policy dictates, because that has ABI implications and
kernel-depedency-on-userpace expectations.

> > > to use similar interfaces to convey state to host userspace as
> > > per Lukas' existing approaches.  Sure there will also be in
> > > kernel interfaces for driver to get data if it knows what to do
> > > with it.  I'd also like to enable the non tdisp flow to handle
> > > IDE setup 'natively' if that's possible on particular hardware.
> > > 
> > > 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
> > >     a failure in general so reject device - or it might decide it's up to
> > >     the PSP etc.   (userspace can see if it succeeded)
> > >     I'd argue host software can launch this at any time.  It will
> > >     be a denial of service attack but so are many other things the host
> > >     can do.  
> > 
> > Trying to visualize it in my head - policy is a kernel cmdline or module 
> > parameter?
> 
> Neither - it's bind not happening until userspace decides to kick it off.
> The module could provide it's own policy on top of this - so userspace
> could defer to that if it makes sense (so bind but rely on probe failing
> if policy not met).

udev module policy can already gate binding, its not clear new policy
mechanism is needed here.

> 
> > 
> > > 2. TDISP policy decision from host (userspace policy control)
> > >     Need to know end goal.  
> > 
> > /sys/bus/pci/devices/0000:11:22.3/tdisp ?
> 
> Maybe - I'm sure we'll bikeshed anything like that :)
> 
> > 
> > > 3. IDE opt in from userspace.  Policy decision.
> > >    - If not TDISP
> > >      - device_connect(IDE ONLY) - bunch of proxying in host OS.
> > >      - Cert chain and measurements presented to host, host can then check if
> > >        it is happy and expose for next policy decision.
> > >      - Hooks exposed for host to request more measurements, key refresh etc.
> > >        Idea being that the flow is host driven with PSP providing required
> > >        services.  If host can just do setup directly that's fine too.  
> > 
> > I'd expect the user to want IDE on from the very beginning, why wait to 
> > turn it on later? The question is rather if the user wants to panic() or 
> > warn() or block the device if IDE setup failed.
> 
> There are some concerns about being able to support enough selective IDE streams.
> Might turn out to be a false concern (I've not yet got visibility of enough
> implementations to be able to tell).
> Also (as I understand it as a software guy) IDE has a significant performance
> and power cost (and for CXL at least there are various trade offs and options
> you can enable depending on security model and device features).
> 
> There is "talk" of people turning IDE off if they can cope without it and only
> enabling for CoCo (and possibly selectively doing that as well)

Agree, IDE stream resource allocation is something an admin needs to be
able to reason about.

> > >    - If TDISP (technically you can run tdisp from host, but lets assume
> > >      for now no one wants to do that? (yet)).
> > >      - device_connect(TDISP) - bunch of proxying in host OS.
> > >      - Cert chain and measurements presented to host, host can then check if
> > >        it is happy and expose for next policy decision.  
> > 
> > On AMD SEV TIO the TDISP setup happens in "tdi_bind" when the device is 
> > about to be passed through which is when QEMU (==userspace) starts.
> Ah. Ok.
> 
> > 
> > > 
> > > 4. Flow after this depends on early or late binding (lockdown)
> > >     but could load driver at this point.  Userspace policy.
> > >     tdi-bind etc.  
> > 
> > Not sure I follow this. A host or guest driver?
> 
> Hmm - I confess I'm confusing myself now.
> 
> At this stage we just have enough info to load a driver for the PF because
> to get to state we want locked prior to VF assignment the PF driver may
> have some configuration to do.
> 
> If all that goes well and the TDI can be moved to locked state, and assigned
> to a TVM which then has to decide to issue tdi_validate before binding
> the guest driver (which I assume is the TDISP START_INTERFACE_REQUEST
> bit of the state machine).

Locked before assignment is valid, but lock after assignment (upon guest
wanting to transition a TDI or recover a TDI after an error) is likely
one of the first incremental features to add after a baseline is
established.

> Or is the guest driver ever needed before this
> transition? (I see you called it out as not, but is it always a one time
> thing on driver load or can that decision change without unbind/bind
> of driver?)

For whole device passthrough it may be the case that the guest needs to
do some operations with the device in shared mode before taking it
private.

> I know this gets more complex for the PF pass through cases where the
> driver needs to load and do some setup before you can lock down the device
> but do people have that requirement for VFs? If they do it feels like
> device was designed wrong to me...

Agree, because in the full device passthrough case I expect the PF
driver to just be generic vfio-pci.

> Too many specs (some of which provide too many ways you 'could' do it)
> so I may well have a bunch of this wrong :(

This is why I think we pick one painfully simple use case to enable
first and then incrementally build on it with concrete rationales.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-10-31 22:56 TDISP enablement Alexey Kardashevskiy
  2023-10-31 23:40 ` Dionna Amalie Glaze
  2023-11-01  7:27 ` Lukas Wunner
@ 2023-11-13  5:43 ` Samuel Ortiz
  2023-11-13  6:46   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 20+ messages in thread
From: Samuel Ortiz @ 2023-11-13  5:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linux-coco, kvm, linux-pci

Hi Alexey,

On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> Hi everyone,
> 
> Here is followup after the Dan's community call we had weeks ago.
> 
> Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
> confidential VMs without trusting the HV and with enabled IDE (encryption)
> and IOMMU (performance, compared to current SWIOTLB). I am aware of other
> uses and vendors and I spend hours unsuccessfully trying to generalize all
> this in a meaningful way.
> 
> The AMD SEV TIO verbs can be simplified as:
> 
> - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
> IDE_KM to program the keys;
> - device_reclaim - undo the connect;
> - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
> interface report;

From a VF to TVM use case, I think tdi_bind should only transition to
LOCKED, but not RUN. RUN should only be reached once the TVM approves
the device, and afaiu this is a host call.

> - tdi_unbind - undo the bind;
> - tdi_info - read measurements/certs/interface report;
> - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
> parameters).

That's equivalent to the TVM accepting the TDI, and this should
transition the TDI from LOCKED to RUN.


> The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
> These are implemented in the AMD PSP (platform processor).
> There are CMA/SPDM, IDE_KV, TDISP in use.
> 
> Now, my strawman code does this on the host (I simplified a bit):
> - after PCI discovery but before probing: walk through all TDISP-capable
> (TEE-IO in PCIe caps) endpoint devices and call device_connect;

Would the host call device_connect unconditionally for all TEE-IO device
probed on the host? Wouldn't you want to do so only before the first
tdi_bind for a TDI that belongs to the physical device?


> - when drivers probe - it is all set up and the device measurements are
> visible to the driver;
> - when constructing a TVM, tdi_bind is called;

Here as well, the tdi_bind could be asynchronous to e.g. support hot
plugging TDIs into TVMs.


> 
> and then in the TVM:
> - after PCI discovery but before probing: walk through all TDIs (which will
> have TEE IO bit set) and call tdi_info, verify the report, if ok - call
> tdi_validate;

By verify you mean verify the reported MMIO ranges? With support from
the TSM?
We discussed that a few times, but the device measurements and
attestation report should also be attested, i.e. run against a relying
party. The kernel may not be the right place for that, and I'm proposing
for the guest kernel to rely on a user space component and offload the
attestation part to it. This userspace component would then
synchronously return to the guest kernel with an attestation result.

> - when drivers probe - it is all set up and the driver decides if/which DMA
> mode to use (SWIOTLB or direct), or panic().
> 

When would it panic?

> Uff. Too long already. Sorry. Now, go to the problems:
> 
> If the user wants only CMA/SPDM, 

By user here, you mean the user controlling the host? Or the TVM
user/owner? I assume the former.

> the Lukas'es patched will do that without
> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> sessions).
> 
> If the user wants only IDE, the AMD PSP's device_connect needs to be called
> and the host OS does not get to know the IDE keys. Other vendors allow
> programming IDE keys to the RC on the baremetal, and this also may co-exist
> with a TSM running outside of Linux - the host still manages trafic classes
> and streams.
> 
> If the user wants TDISP for VMs, this assumes the user does not trust the
> host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
> 
> The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
> seem capable of co-existing, TDISP does not.

Which makes sense, TDISP is not designed to be used outside of the
TEE-IO VFs assigned to TVM use case.

> 
> However there are common bits.
> - certificates/measurements/reports blobs: storing, presenting to the
> userspace (results of device_connect and tdi_bind);
> - place where we want to authenticate the device and enable IDE
> (device_connect);
> - place where we want to bind TDI to a TVM (tdi_bind).
> 
> I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
> and a hack for VFIO PCI device to call tdi_bind.
> 
> The next steps:
> - expose blobs via configfs (like Dan did configfs-tsm);
> - s/tdisp.ko/coco.ko/;
> - ask the audience - what is missing to make it reusable for other vendors
> and uses?

The connect-bind-run flow is similar to the one we have defined for
RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
details, but nothing there is architectural and could somehow apply to
other architectures.

Cheers,
Samuel.

[1] https://github.com/riscv-non-isa/riscv-ap-tee-io/blob/main/specification/07-theory_operations.adoc

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-01 11:05   ` Jonathan Cameron
  2023-11-02  2:28     ` Alexey Kardashevskiy
  2023-11-10 23:30     ` Dan Williams
@ 2023-11-13  6:04     ` Samuel Ortiz
  2 siblings, 0 replies; 20+ messages in thread
From: Samuel Ortiz @ 2023-11-13  6:04 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lukas Wunner, Alexey Kardashevskiy, linux-coco, kvm, linux-pci,
	Dan Williams, Jonathan Cameron, suzuki.poulose

On Wed, Nov 01, 2023 at 11:05:51AM +0000, Jonathan Cameron wrote:
> On Wed, 1 Nov 2023 08:27:17 +0100
> Lukas Wunner <lukas@wunner.de> wrote:
> 
> Thanks Alexy, this is a great discussion to kick off.

I'd certainly agree with that.

> > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > > - device_connect - starts CMA/SPDM session, returns measurements/certs,
> > > runs IDE_KM to program the keys;  
> > 
> > Does the PSP have a set of trusted root certificates?
> > If so, where does it get them from?
> > 
> > If not, does the PSP just blindly trust the validity of the cert chain?
> > Who validates the cert chain, and when?
> > Which slot do you use?
> > Do you return only the cert chain of that single slot or of all slots?
> > Does the PSP read out all measurements available?  This may take a while
> > if the measurements are large and there are a lot of them.
> 
> I'd definitely like their to be a path for certs and measurement to be
> checked by the Host OS (for the non TDISP path). Whether the
> policy setup cares about result is different question ;)
> 
> > 
> > 
> > > - tdi_info - read measurements/certs/interface report;  
> > 
> > Does this return cached cert chains and measurements from the device
> > or does it retrieve them anew?  (Measurements might have changed if
> > MEAS_FRESH_CAP is supported.)
> > 
> > 
> > > If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > sessions).  
> > 
> > It can co-exist if the pci_cma_claim_ownership() library call
> > provided by patch 12/12 is invoked upon device_connect.
> > 
> > It would seem advantageous if you could delay device_connect
> > until a device is actually passed through.  Then the OS can
> > initially authenticate and measure devices and the PSP takes
> > over when needed.
> 
> Would that delay mean IDE isn't up - I think that wants to be
> available whether or not pass through is going on.
>
> Given potential restrictions on IDE resources, I'd expect to see an explicit
> opt in from userspace on the host to start that process for a given
> device.  (udev rule or similar might kick it off for simple setups).
> 
> Would that work for the flows described?  
> 
> Next bit probably has holes...  Key is that a lot of the checks
> may fail, and it's up to host userspace policy to decide whether
> to proceed (other policy in the secure VM side of things obviously)
> 
> So my rough thinking is - for the two options (IDE / TDISP)
> 
> Comparing with Alexey's flow I think only real difference is that
> I call out explicit host userspace policy controls. I'd also like
> to use similar interfaces to convey state to host userspace as
> per Lukas' existing approaches.  Sure there will also be in
> kernel interfaces for driver to get data if it knows what to do
> with it.  I'd also like to enable the non tdisp flow to handle
> IDE setup 'natively' if that's possible on particular hardware.
> 
> 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
>    a failure in general so reject device - or it might decide it's up to
>    the PSP etc.   (userspace can see if it succeeded)
>    I'd argue host software can launch this at any time.  It will
>    be a denial of service attack but so are many other things the host
>    can do.
> 2. TDISP policy decision from host (userspace policy control)
>    Need to know end goal.
> 3. IDE opt in from userspace.  Policy decision.
>   - If not TDISP 
>     - device_connect(IDE ONLY) - bunch of proxying in host OS.
>     - Cert chain and measurements presented to host, host can then check if
>       it is happy and expose for next policy decision.
>     - Hooks exposed for host to request more measurements, key refresh etc.
>       Idea being that the flow is host driven with PSP providing required
>       services.  If host can just do setup directly that's fine too.
>   - If TDISP (technically you can run tdisp from host, but lets assume
>     for now no one wants to do that? (yet)).

Yes, I'd say it's a safe assumption.

>     - device_connect(TDISP) - bunch of proxying in host OS.

imho TDISP should be orthogonal to the connect verb. connect is a
PF/Physical device scoped action. TDISP is a VF/TDI state machine, and
the bind verb is meant for that (This is where the TSM should start
moving the TDISP state machine to bind a TDI and a TVM together).

>     - Cert chain and measurements presented to host, host can then check if
>       it is happy and expose for next policy decision.

In the TDISP/VF passthrough case, the device cert chain and it's
attestation report will also have to be available to the guest in order
for it to verify and attest to the device.

> 
> 4. Flow after this depends on early or late binding (lockdown)
>    but could load driver at this point.  Userspace policy.
>    tdi-bind etc.
> 
> 
> > 
> > 
> > > If the user wants only IDE, the AMD PSP's device_connect needs to be called
> > > and the host OS does not get to know the IDE keys. Other vendors allow
> > > programming IDE keys to the RC on the baremetal, and this also may co-exist
> > > with a TSM running outside of Linux - the host still manages trafic classes
> > > and streams.  
> > 
> > I'm wondering if your implementation is spec compliant:
> > 
> > PCIe r6.1 sec 6.33.3 says that "It is permitted for a Root Complex
> > to [...] use implementation specific key management."  But "For
> > Endpoint Functions, [...] Function 0 must implement [...]
> > the IDE key management (IDE_KM) protocol as a Responder."
> > 
> > So the keys need to be programmed into the endpoint using IDE_KM
> > but for the Root Port it's permitted to use implementation-specific
> > means.
> > 
> > The keys for the endpoint and Root Port are the same because this
> > is symmetric encryption.
> > 
> > If the keys are internal to the PSP, the kernel can't program the
> > keys into the endpoint using IDE_KM.  So your implementation precludes
> > IDE setup by the host OS kernel.
> 
> Proxy the CMA messages through the host OS. Doesn't mean host has
> visibility of the keys or certs.  So indeed, the actual setup isn't being done
> by the host kernel, but rather by it requesting the 'blob' to send
> to the CMA DOE from PSP.
> 
> By my reading that's a bit inelegant but I don't see it being a break
> with the specification.
> 
> > 
> > device_connect is meant to be used for TDISP, i.e. with devices which
> > have the TEE-IO Supported bit set in the Device Capabilities Register.
> > 
> > What are you going to do with IDE-capable devices which have that bit
> > cleared?  Are they unsupported by your implementation?
> > 
> > It seems to me an architecture cannot claim IDE compliance if it's
> > limited to TEE-IO capable devices, which might only be a subset of
> > the available products.
> 
> Agreed.  If can request the PSP does a non TDISP IDE setup then
> I think we are fine.  

The TSM, upon receiving a connect request from the host should establish
the SPDM+IDE connection. If it never receives a bind request, it should
not do any TDISP action. This way we could have the TSM supporting both
the passthrough and non passthrough use cases.

Cheers,
Samuel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-13  5:43 ` Samuel Ortiz
@ 2023-11-13  6:46   ` Alexey Kardashevskiy
  2023-11-13 15:10     ` Samuel Ortiz
  0 siblings, 1 reply; 20+ messages in thread
From: Alexey Kardashevskiy @ 2023-11-13  6:46 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: linux-coco, kvm, linux-pci


On 13/11/23 16:43, Samuel Ortiz wrote:
> Hi Alexey,
> 
> On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
>> Hi everyone,
>>
>> Here is followup after the Dan's community call we had weeks ago.
>>
>> Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
>> confidential VMs without trusting the HV and with enabled IDE (encryption)
>> and IOMMU (performance, compared to current SWIOTLB). I am aware of other
>> uses and vendors and I spend hours unsuccessfully trying to generalize all
>> this in a meaningful way.
>>
>> The AMD SEV TIO verbs can be simplified as:
>>
>> - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
>> IDE_KM to program the keys;
>> - device_reclaim - undo the connect;
>> - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
>> interface report;
> 
>  From a VF to TVM use case, I think tdi_bind should only transition to
> LOCKED, but not RUN. RUN should only be reached once the TVM approves
> the device, and afaiu this is a host call.

What is the point in separating these? What is that thing which requires 
the device to be in LOCKED but not RUN state (besides the obvious 
START_INTERFACE_REQUEST)?

>> - tdi_unbind - undo the bind;
>> - tdi_info - read measurements/certs/interface report;
>> - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
>> parameters).
> 
> That's equivalent to the TVM accepting the TDI, and this should
> transition the TDI from LOCKED to RUN.

Even if the device was in RUN, it would not work until the validation is 
done == RMP+IOMMU are updated by the TSM. This may be different for 
other architectures though, dunno. RMP == reverse map table, an SEV SNP 
thing used for verifying memory accesses.


>> The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
>> These are implemented in the AMD PSP (platform processor).
>> There are CMA/SPDM, IDE_KV, TDISP in use.
>>
>> Now, my strawman code does this on the host (I simplified a bit):
>> - after PCI discovery but before probing: walk through all TDISP-capable
>> (TEE-IO in PCIe caps) endpoint devices and call device_connect;
> 
> Would the host call device_connect unconditionally for all TEE-IO device
> probed on the host? Wouldn't you want to do so only before the first
> tdi_bind for a TDI that belongs to the physical device?


Well, in the SEV TIO, device_connect enables IDE which has value for the 
host on its own.


>> - when drivers probe - it is all set up and the device measurements are
>> visible to the driver;
>> - when constructing a TVM, tdi_bind is called;
> 
> Here as well, the tdi_bind could be asynchronous to e.g. support hot
> plugging TDIs into TVMs.


I do not really see a huge difference between starting a VM with already 
bound TDISP device or hotplugging a device - either way the host calls 
tdi_bind and it does not really care about what the guest is doing at 
that moment and when the guest sees a TDISP device - it is always bound.

>> and then in the TVM:
>> - after PCI discovery but before probing: walk through all TDIs (which will
>> have TEE IO bit set) and call tdi_info, verify the report, if ok - call
>> tdi_validate;
> 
> By verify you mean verify the reported MMIO ranges? With support from
> the TSM?

The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the 
MMIO values and enable them in the RMP.

> We discussed that a few times, but the device measurements and
> attestation report should also be attested, i.e. run against a relying
> party. The kernel may not be the right place for that, and I'm proposing
> for the guest kernel to rely on a user space component and offload the
> attestation part to it. This userspace component would then
> synchronously return to the guest kernel with an attestation result.

What bothers me here is that the userspace works when PCI is probed so 
when the userspace is called for attestation - the device is up and 
running and hosting the rootfs. The userspace will need a knob which 
transitions the device into the trusted state (switch SWIOTLB to direct 
DMA, for example). I guess if the userspace is initramdisk, it could 
still reload the driver which is not doing useful work just yet...


>> - when drivers probe - it is all set up and the driver decides if/which DMA
>> mode to use (SWIOTLB or direct), or panic().
>>
> 
> When would it panic?

When attestation failed.

>> Uff. Too long already. Sorry. Now, go to the problems:
>>
>> If the user wants only CMA/SPDM,
> 
> By user here, you mean the user controlling the host? Or the TVM
> user/owner? I assume the former.

Yes, the physical host owner.

>> the Lukas'es patched will do that without
>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
>> sessions).
>>
>> If the user wants only IDE, the AMD PSP's device_connect needs to be called
>> and the host OS does not get to know the IDE keys. Other vendors allow
>> programming IDE keys to the RC on the baremetal, and this also may co-exist
>> with a TSM running outside of Linux - the host still manages trafic classes
>> and streams.
>>
>> If the user wants TDISP for VMs, this assumes the user does not trust the
>> host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
>>
>> The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
>> seem capable of co-existing, TDISP does not.
> 
> Which makes sense, TDISP is not designed to be used outside of the
> TEE-IO VFs assigned to TVM use case.
> 
>>
>> However there are common bits.
>> - certificates/measurements/reports blobs: storing, presenting to the
>> userspace (results of device_connect and tdi_bind);
>> - place where we want to authenticate the device and enable IDE
>> (device_connect);
>> - place where we want to bind TDI to a TVM (tdi_bind).
>>
>> I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
>> and a hack for VFIO PCI device to call tdi_bind.
>>
>> The next steps:
>> - expose blobs via configfs (like Dan did configfs-tsm);
>> - s/tdisp.ko/coco.ko/;
>> - ask the audience - what is missing to make it reusable for other vendors
>> and uses?
> 
> The connect-bind-run flow is similar to the one we have defined for
> RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
> details, but nothing there is architectural and could somehow apply to
> other architectures.

Yeah, it is good one!
I am still missing the need to have sbi_covg_start_interface() as a 
separate step though. Thanks,


> Cheers,
> Samuel.
> 
> [1] https://github.com/riscv-non-isa/riscv-ap-tee-io/blob/main/specification/07-theory_operations.adoc


-- 
Alexey



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-13  6:46   ` Alexey Kardashevskiy
@ 2023-11-13 15:10     ` Samuel Ortiz
  2023-11-14  0:57       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 20+ messages in thread
From: Samuel Ortiz @ 2023-11-13 15:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linux-coco, kvm, linux-pci

On Mon, Nov 13, 2023 at 05:46:35PM +1100, Alexey Kardashevskiy wrote:
> 
> On 13/11/23 16:43, Samuel Ortiz wrote:
> > Hi Alexey,
> > 
> > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > > Hi everyone,
> > > 
> > > Here is followup after the Dan's community call we had weeks ago.
> > > 
> > > Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
> > > confidential VMs without trusting the HV and with enabled IDE (encryption)
> > > and IOMMU (performance, compared to current SWIOTLB). I am aware of other
> > > uses and vendors and I spend hours unsuccessfully trying to generalize all
> > > this in a meaningful way.
> > > 
> > > The AMD SEV TIO verbs can be simplified as:
> > > 
> > > - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
> > > IDE_KM to program the keys;
> > > - device_reclaim - undo the connect;
> > > - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
> > > interface report;
> > 
> >  From a VF to TVM use case, I think tdi_bind should only transition to
> > LOCKED, but not RUN. RUN should only be reached once the TVM approves
> > the device, and afaiu this is a host call.
> 
> What is the point in separating these? What is that thing which requires the
> device to be in LOCKED but not RUN state (besides the obvious
> START_INTERFACE_REQUEST)?

Because they're two very different steps of the TDI assignment into a
TVM.
TDISP moves to RUN upon TVM accepting the TDI into its TCB.
LOCKED is typically driven by the host, in order to lock the TDI
configuration while the TVM verifies, attest and accept or reject it
from its TCB.

When the TSM moves the TDI to RUN, by TVM request, all IO paths (DMA and
MMIO) are supposed to be functional. I understand most architectures
have ways to prevent TDIs from accessing access confidential memory
regardless of their TDISP state, but a TDI in the RUN state should not
be forbidden from DMA'ing the TVM confidential memory. Preventing it
from doing so should be an error case, not the nominal flow.

> > > - tdi_info - read measurements/certs/interface report;
> > > - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
> > > parameters).
> > 
> > That's equivalent to the TVM accepting the TDI, and this should
> > transition the TDI from LOCKED to RUN.
> 
> Even if the device was in RUN, it would not work until the validation is
> done == RMP+IOMMU are updated by the TSM. 

Right, and that makes sense from a security perspective. But a device in
the RUN state will expect IO to work, because it's a TDISP semantic for
it being accepted into the TVM and as such the TVM allowed access to its
confidential memory.

> This may be different for other
> architectures though, dunno. RMP == reverse map table, an SEV SNP thing used
> for verifying memory accesses.
> 
> 
> > > The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
> > > These are implemented in the AMD PSP (platform processor).
> > > There are CMA/SPDM, IDE_KV, TDISP in use.
> > > 
> > > Now, my strawman code does this on the host (I simplified a bit):
> > > - after PCI discovery but before probing: walk through all TDISP-capable
> > > (TEE-IO in PCIe caps) endpoint devices and call device_connect;
> > 
> > Would the host call device_connect unconditionally for all TEE-IO device
> > probed on the host? Wouldn't you want to do so only before the first
> > tdi_bind for a TDI that belongs to the physical device?
> 
> 
> Well, in the SEV TIO, device_connect enables IDE which has value for the
> host on its own.

Ok, that makes sense to me. And the TSM would be responsible for
supporting this. Then TDISP is exercised on a particular TDI for the
device when this TDI is passed through to a specific TVM.

> 
> > > - when drivers probe - it is all set up and the device measurements are
> > > visible to the driver;
> > > - when constructing a TVM, tdi_bind is called;
> > 
> > Here as well, the tdi_bind could be asynchronous to e.g. support hot
> > plugging TDIs into TVMs.
> 
> 
> I do not really see a huge difference between starting a VM with already
> bound TDISP device or hotplugging a device - either way the host calls
> tdi_bind and it does not really care about what the guest is doing at that
> moment and when the guest sees a TDISP device - it is always bound.

I agree. What I meant is that bind can be called at TVM construction
time, or asynchronously whenever the host decides to attach a TDI to the
previously constructed TVM.

> > > and then in the TVM:
> > > - after PCI discovery but before probing: walk through all TDIs (which will
> > > have TEE IO bit set) and call tdi_info, verify the report, if ok - call
> > > tdi_validate;
> > 
> > By verify you mean verify the reported MMIO ranges? With support from
> > the TSM?
> 
> The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the
> MMIO values and enable them in the RMP.

Sounds good.

> > We discussed that a few times, but the device measurements and
> > attestation report should also be attested, i.e. run against a relying
> > party. The kernel may not be the right place for that, and I'm proposing
> > for the guest kernel to rely on a user space component and offload the
> > attestation part to it. This userspace component would then
> > synchronously return to the guest kernel with an attestation result.
> 
> What bothers me here is that the userspace works when PCI is probed so when
> the userspace is called for attestation - the device is up and running and
> hosting the rootfs.

I guess you're talking about a use case where one would pass a storage
device through, and that device would hold the guest rootfs?
With the approach we're proposing, attestation would be optional and
upon the kernel's decision. In that case, the kernel would not require
userspace to run attestation (because there is no userspace...) but the
actual guest attestation would still happen whenever the guest would
want to fetch an attestation gated secret. And that attestation flow
would include the storage device attestation report, because it's part
of the guest TCB. So, eventually, the device would be attested, but not
right when the device is attached to the guest.

> The userspace will need a knob which transitions the
> device into the trusted state (switch SWIOTLB to direct DMA, for example). I
> guess if the userspace is initramdisk, it could still reload the driver
> which is not doing useful work just yet...
> 
> 
> > > - when drivers probe - it is all set up and the driver decides if/which DMA
> > > mode to use (SWIOTLB or direct), or panic().
> > > 
> > 
> > When would it panic?
> 
> When attestation failed.

Attestation failure should only trigger a rejection from the TVM, i.e.
the TDI would not be probed. That should be reported back to the host,
who may decide to call unbind on that TDI (and thus moved it back to
UNLOCKED).

> > > Uff. Too long already. Sorry. Now, go to the problems:
> > > 
> > > If the user wants only CMA/SPDM,
> > 
> > By user here, you mean the user controlling the host? Or the TVM
> > user/owner? I assume the former.
> 
> Yes, the physical host owner.
> 
> > > the Lukas'es patched will do that without
> > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > sessions).
> > > 
> > > If the user wants only IDE, the AMD PSP's device_connect needs to be called
> > > and the host OS does not get to know the IDE keys. Other vendors allow
> > > programming IDE keys to the RC on the baremetal, and this also may co-exist
> > > with a TSM running outside of Linux - the host still manages trafic classes
> > > and streams.
> > > 
> > > If the user wants TDISP for VMs, this assumes the user does not trust the
> > > host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
> > > 
> > > The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
> > > seem capable of co-existing, TDISP does not.
> > 
> > Which makes sense, TDISP is not designed to be used outside of the
> > TEE-IO VFs assigned to TVM use case.
> > 
> > > 
> > > However there are common bits.
> > > - certificates/measurements/reports blobs: storing, presenting to the
> > > userspace (results of device_connect and tdi_bind);
> > > - place where we want to authenticate the device and enable IDE
> > > (device_connect);
> > > - place where we want to bind TDI to a TVM (tdi_bind).
> > > 
> > > I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
> > > and a hack for VFIO PCI device to call tdi_bind.
> > > 
> > > The next steps:
> > > - expose blobs via configfs (like Dan did configfs-tsm);
> > > - s/tdisp.ko/coco.ko/;
> > > - ask the audience - what is missing to make it reusable for other vendors
> > > and uses?
> > 
> > The connect-bind-run flow is similar to the one we have defined for
> > RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
> > details, but nothing there is architectural and could somehow apply to
> > other architectures.
> 
> Yeah, it is good one!

Thanks. Comments and improvements proposal are welcome.

> I am still missing the need to have sbi_covg_start_interface() as a separate
> step though. Thanks,

Just to reiterate: start_interface is a guest call into the TSM, to let
it know that it accepts the TDI. That makes the TSM do two things:

1. Enable the MMIO and DMA mappings.
2. Move the TDI to RUN.

After that call, the TDI is usable from a TVM perspective. Before that
call it is not, but its configuration and state are locked.

Cheers,
Samuel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-13 15:10     ` Samuel Ortiz
@ 2023-11-14  0:57       ` Alexey Kardashevskiy
  2023-11-14 15:35         ` Samuel Ortiz
  0 siblings, 1 reply; 20+ messages in thread
From: Alexey Kardashevskiy @ 2023-11-14  0:57 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: linux-coco, kvm, linux-pci


On 14/11/23 02:10, Samuel Ortiz wrote:
> On Mon, Nov 13, 2023 at 05:46:35PM +1100, Alexey Kardashevskiy wrote:
>>
>> On 13/11/23 16:43, Samuel Ortiz wrote:
>>> Hi Alexey,
>>>
>>> On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
>>>> Hi everyone,
>>>>
>>>> Here is followup after the Dan's community call we had weeks ago.
>>>>
>>>> Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
>>>> confidential VMs without trusting the HV and with enabled IDE (encryption)
>>>> and IOMMU (performance, compared to current SWIOTLB). I am aware of other
>>>> uses and vendors and I spend hours unsuccessfully trying to generalize all
>>>> this in a meaningful way.
>>>>
>>>> The AMD SEV TIO verbs can be simplified as:
>>>>
>>>> - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
>>>> IDE_KM to program the keys;
>>>> - device_reclaim - undo the connect;
>>>> - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
>>>> interface report;
>>>
>>>   From a VF to TVM use case, I think tdi_bind should only transition to
>>> LOCKED, but not RUN. RUN should only be reached once the TVM approves
>>> the device, and afaiu this is a host call.
>>
>> What is the point in separating these? What is that thing which requires the
>> device to be in LOCKED but not RUN state (besides the obvious
>> START_INTERFACE_REQUEST)?
> 
> Because they're two very different steps of the TDI assignment into a
> TVM.
> TDISP moves to RUN upon TVM accepting the TDI into its TCB.
> LOCKED is typically driven by the host, in order to lock the TDI
> configuration while the TVM verifies, attest and accept or reject it
> from its TCB.
> 
> When the TSM moves the TDI to RUN, by TVM request, all IO paths (DMA and
> MMIO) are supposed to be functional. I understand most architectures
> have ways to prevent TDIs from accessing access confidential memory
> regardless of their TDISP state, but a TDI in the RUN state should not
> be forbidden from DMA'ing the TVM confidential memory. Preventing it
> from doing so should be an error case, not the nominal flow.

There is always a driver which has to enable the device and tell it 
where it can DMA to/from anyway so the RUN state does not really let the 
device start doing things once it is moved to RUN (except may be P2P but 
this is not in our focus atm).


>>>> - tdi_info - read measurements/certs/interface report;
>>>> - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
>>>> parameters).
>>>
>>> That's equivalent to the TVM accepting the TDI, and this should
>>> transition the TDI from LOCKED to RUN.
>>
>> Even if the device was in RUN, it would not work until the validation is
>> done == RMP+IOMMU are updated by the TSM.
> 
> Right, and that makes sense from a security perspective. But a device in
> the RUN state will expect IO to work, because it's a TDISP semantic for
> it being accepted into the TVM and as such the TVM allowed access to its
> confidential memory.

I've read about RUN that "TDI resources are operational and permitted to 
be accessed and managed by the TVM". They are, the TDI setup is done at 
this point. It is the TVM's responsibility to request the RC side of 
things to be configured.


>> This may be different for other
>> architectures though, dunno. RMP == reverse map table, an SEV SNP thing used
>> for verifying memory accesses.
>>
>>
>>>> The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
>>>> These are implemented in the AMD PSP (platform processor).
>>>> There are CMA/SPDM, IDE_KV, TDISP in use.
>>>>
>>>> Now, my strawman code does this on the host (I simplified a bit):
>>>> - after PCI discovery but before probing: walk through all TDISP-capable
>>>> (TEE-IO in PCIe caps) endpoint devices and call device_connect;
>>>
>>> Would the host call device_connect unconditionally for all TEE-IO device
>>> probed on the host? Wouldn't you want to do so only before the first
>>> tdi_bind for a TDI that belongs to the physical device?
>>
>>
>> Well, in the SEV TIO, device_connect enables IDE which has value for the
>> host on its own.
> 
> Ok, that makes sense to me. And the TSM would be responsible for
> supporting this. Then TDISP is exercised on a particular TDI for the
> device when this TDI is passed through to a specific TVM.
>
>>
>>>> - when drivers probe - it is all set up and the device measurements are
>>>> visible to the driver;
>>>> - when constructing a TVM, tdi_bind is called;
>>>
>>> Here as well, the tdi_bind could be asynchronous to e.g. support hot
>>> plugging TDIs into TVMs.
>>
>>
>> I do not really see a huge difference between starting a VM with already
>> bound TDISP device or hotplugging a device - either way the host calls
>> tdi_bind and it does not really care about what the guest is doing at that
>> moment and when the guest sees a TDISP device - it is always bound.
> 
> I agree. What I meant is that bind can be called at TVM construction
> time, or asynchronously whenever the host decides to attach a TDI to the
> previously constructed TVM.

+1.

>>>> and then in the TVM:
>>>> - after PCI discovery but before probing: walk through all TDIs (which will
>>>> have TEE IO bit set) and call tdi_info, verify the report, if ok - call
>>>> tdi_validate;
>>>
>>> By verify you mean verify the reported MMIO ranges? With support from
>>> the TSM?
>>
>> The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the
>> MMIO values and enable them in the RMP.
> 
> Sounds good.
> 
>>> We discussed that a few times, but the device measurements and
>>> attestation report should also be attested, i.e. run against a relying
>>> party. The kernel may not be the right place for that, and I'm proposing
>>> for the guest kernel to rely on a user space component and offload the
>>> attestation part to it. This userspace component would then
>>> synchronously return to the guest kernel with an attestation result.
>>
>> What bothers me here is that the userspace works when PCI is probed so when
>> the userspace is called for attestation - the device is up and running and
>> hosting the rootfs.
> 
> I guess you're talking about a use case where one would pass a storage
> device through, and that device would hold the guest rootfs?
> With the approach we're proposing, attestation would be optional and
> upon the kernel's decision. In that case, the kernel would not require
> userspace to run attestation (because there is no userspace...) but the
> actual guest attestation would still happen whenever the guest would
> want to fetch an attestation gated secret. And that attestation flow
> would include the storage device attestation report, because it's part
> of the guest TCB. So, eventually, the device would be attested, but not
> right when the device is attached to the guest.
> 
>> The userspace will need a knob which transitions the
>> device into the trusted state (switch SWIOTLB to direct DMA, for example). I
>> guess if the userspace is initramdisk, it could still reload the driver
>> which is not doing useful work just yet...
>>
>>
>>>> - when drivers probe - it is all set up and the driver decides if/which DMA
>>>> mode to use (SWIOTLB or direct), or panic().
>>>>
>>>
>>> When would it panic?
>>
>> When attestation failed.
> 
> Attestation failure should only trigger a rejection from the TVM, i.e.
> the TDI would not be probed. That should be reported back to the host,
> who may decide to call unbind on that TDI (and thus moved it back to
> UNLOCKED).
> 
>>>> Uff. Too long already. Sorry. Now, go to the problems:
>>>>
>>>> If the user wants only CMA/SPDM,
>>>
>>> By user here, you mean the user controlling the host? Or the TVM
>>> user/owner? I assume the former.
>>
>> Yes, the physical host owner.
>>
>>>> the Lukas'es patched will do that without
>>>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
>>>> sessions).
>>>>
>>>> If the user wants only IDE, the AMD PSP's device_connect needs to be called
>>>> and the host OS does not get to know the IDE keys. Other vendors allow
>>>> programming IDE keys to the RC on the baremetal, and this also may co-exist
>>>> with a TSM running outside of Linux - the host still manages trafic classes
>>>> and streams.
>>>>
>>>> If the user wants TDISP for VMs, this assumes the user does not trust the
>>>> host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
>>>>
>>>> The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
>>>> seem capable of co-existing, TDISP does not.
>>>
>>> Which makes sense, TDISP is not designed to be used outside of the
>>> TEE-IO VFs assigned to TVM use case.
>>>
>>>>
>>>> However there are common bits.
>>>> - certificates/measurements/reports blobs: storing, presenting to the
>>>> userspace (results of device_connect and tdi_bind);
>>>> - place where we want to authenticate the device and enable IDE
>>>> (device_connect);
>>>> - place where we want to bind TDI to a TVM (tdi_bind).
>>>>
>>>> I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
>>>> and a hack for VFIO PCI device to call tdi_bind.
>>>>
>>>> The next steps:
>>>> - expose blobs via configfs (like Dan did configfs-tsm);
>>>> - s/tdisp.ko/coco.ko/;
>>>> - ask the audience - what is missing to make it reusable for other vendors
>>>> and uses?
>>>
>>> The connect-bind-run flow is similar to the one we have defined for
>>> RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
>>> details, but nothing there is architectural and could somehow apply to
>>> other architectures.
>>
>> Yeah, it is good one!
> 
> Thanks. Comments and improvements proposal are welcome.
> 
>> I am still missing the need to have sbi_covg_start_interface() as a separate
>> step though. Thanks,
> 
> Just to reiterate: start_interface is a guest call into the TSM, to let
> it know that it accepts the TDI. That makes the TSM do two things:
> 
> 1. Enable the MMIO and DMA mappings.
> 2. Move the TDI to RUN.
> 
> After that call, the TDI is usable from a TVM perspective. Before that
> call it is not, but its configuration and state are locked.
Right. I still wonder what bad thing can happen if we move to RUN before 
starting the TVM (I suspect there is something), or it is all about 
semantics (for the AMD TIO usecase, at least)?


> Cheers,
> Samuel.

-- 
Alexey



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-14  0:57       ` Alexey Kardashevskiy
@ 2023-11-14 15:35         ` Samuel Ortiz
  2023-12-06  4:43           ` Dan Williams
  0 siblings, 1 reply; 20+ messages in thread
From: Samuel Ortiz @ 2023-11-14 15:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linux-coco, kvm, linux-pci

On Tue, Nov 14, 2023 at 11:57:34AM +1100, Alexey Kardashevskiy wrote:
> 
> On 14/11/23 02:10, Samuel Ortiz wrote:
> > On Mon, Nov 13, 2023 at 05:46:35PM +1100, Alexey Kardashevskiy wrote:
> > > 
> > > On 13/11/23 16:43, Samuel Ortiz wrote:
> > > > Hi Alexey,
> > > > 
> > > > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > > > > Hi everyone,
> > > > > 
> > > > > Here is followup after the Dan's community call we had weeks ago.
> > > > > 
> > > > > Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
> > > > > confidential VMs without trusting the HV and with enabled IDE (encryption)
> > > > > and IOMMU (performance, compared to current SWIOTLB). I am aware of other
> > > > > uses and vendors and I spend hours unsuccessfully trying to generalize all
> > > > > this in a meaningful way.
> > > > > 
> > > > > The AMD SEV TIO verbs can be simplified as:
> > > > > 
> > > > > - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
> > > > > IDE_KM to program the keys;
> > > > > - device_reclaim - undo the connect;
> > > > > - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
> > > > > interface report;
> > > > 
> > > >   From a VF to TVM use case, I think tdi_bind should only transition to
> > > > LOCKED, but not RUN. RUN should only be reached once the TVM approves
> > > > the device, and afaiu this is a host call.
> > > 
> > > What is the point in separating these? What is that thing which requires the
> > > device to be in LOCKED but not RUN state (besides the obvious
> > > START_INTERFACE_REQUEST)?
> > 
> > Because they're two very different steps of the TDI assignment into a
> > TVM.
> > TDISP moves to RUN upon TVM accepting the TDI into its TCB.
> > LOCKED is typically driven by the host, in order to lock the TDI
> > configuration while the TVM verifies, attest and accept or reject it
> > from its TCB.
> > 
> > When the TSM moves the TDI to RUN, by TVM request, all IO paths (DMA and
> > MMIO) are supposed to be functional. I understand most architectures
> > have ways to prevent TDIs from accessing access confidential memory
> > regardless of their TDISP state, but a TDI in the RUN state should not
> > be forbidden from DMA'ing the TVM confidential memory. Preventing it
> > from doing so should be an error case, not the nominal flow.
> 
> There is always a driver which has to enable the device and tell it where it
> can DMA to/from anyway so the RUN state does not really let the device start
> doing things once it is moved to RUN 

I agree. But setting RUN from the host means that the guest can start
configuring and using that device at any point in time, i.e. even before
any guest component could verify, validate and attest to the TDI. RUN is
precisely defined for that purpose: Telling the TDI that it should now
accept T-bit TLPs, and you want to do that *after* the TVM accepts the
TDI. Here, by having the host move the TDI to RUN, potentially even before
the TVM has even booted, you're not giving the guest a chance to explictly
accept the TDI.

> (except may be P2P but this is not in
> our focus atm).
> 
> 
> > > > > - tdi_info - read measurements/certs/interface report;
> > > > > - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
> > > > > parameters).
> > > > 
> > > > That's equivalent to the TVM accepting the TDI, and this should
> > > > transition the TDI from LOCKED to RUN.
> > > 
> > > Even if the device was in RUN, it would not work until the validation is
> > > done == RMP+IOMMU are updated by the TSM.
> > 
> > Right, and that makes sense from a security perspective. But a device in
> > the RUN state will expect IO to work, because it's a TDISP semantic for
> > it being accepted into the TVM and as such the TVM allowed access to its
> > confidential memory.
> 
> I've read about RUN that "TDI resources are operational and permitted to be
> accessed and managed by the TVM". They are, the TDI setup is done at this
> point. It is the TVM's responsibility to request the RC side of things to be
> configured.
> 
> 
> > > This may be different for other
> > > architectures though, dunno. RMP == reverse map table, an SEV SNP thing used
> > > for verifying memory accesses.
> > > 
> > > 
> > > > > The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
> > > > > These are implemented in the AMD PSP (platform processor).
> > > > > There are CMA/SPDM, IDE_KV, TDISP in use.
> > > > > 
> > > > > Now, my strawman code does this on the host (I simplified a bit):
> > > > > - after PCI discovery but before probing: walk through all TDISP-capable
> > > > > (TEE-IO in PCIe caps) endpoint devices and call device_connect;
> > > > 
> > > > Would the host call device_connect unconditionally for all TEE-IO device
> > > > probed on the host? Wouldn't you want to do so only before the first
> > > > tdi_bind for a TDI that belongs to the physical device?
> > > 
> > > 
> > > Well, in the SEV TIO, device_connect enables IDE which has value for the
> > > host on its own.
> > 
> > Ok, that makes sense to me. And the TSM would be responsible for
> > supporting this. Then TDISP is exercised on a particular TDI for the
> > device when this TDI is passed through to a specific TVM.
> > 
> > > 
> > > > > - when drivers probe - it is all set up and the device measurements are
> > > > > visible to the driver;
> > > > > - when constructing a TVM, tdi_bind is called;
> > > > 
> > > > Here as well, the tdi_bind could be asynchronous to e.g. support hot
> > > > plugging TDIs into TVMs.
> > > 
> > > 
> > > I do not really see a huge difference between starting a VM with already
> > > bound TDISP device or hotplugging a device - either way the host calls
> > > tdi_bind and it does not really care about what the guest is doing at that
> > > moment and when the guest sees a TDISP device - it is always bound.
> > 
> > I agree. What I meant is that bind can be called at TVM construction
> > time, or asynchronously whenever the host decides to attach a TDI to the
> > previously constructed TVM.
> 
> +1.
> 
> > > > > and then in the TVM:
> > > > > - after PCI discovery but before probing: walk through all TDIs (which will
> > > > > have TEE IO bit set) and call tdi_info, verify the report, if ok - call
> > > > > tdi_validate;
> > > > 
> > > > By verify you mean verify the reported MMIO ranges? With support from
> > > > the TSM?
> > > 
> > > The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the
> > > MMIO values and enable them in the RMP.
> > 
> > Sounds good.
> > 
> > > > We discussed that a few times, but the device measurements and
> > > > attestation report should also be attested, i.e. run against a relying
> > > > party. The kernel may not be the right place for that, and I'm proposing
> > > > for the guest kernel to rely on a user space component and offload the
> > > > attestation part to it. This userspace component would then
> > > > synchronously return to the guest kernel with an attestation result.
> > > 
> > > What bothers me here is that the userspace works when PCI is probed so when
> > > the userspace is called for attestation - the device is up and running and
> > > hosting the rootfs.
> > 
> > I guess you're talking about a use case where one would pass a storage
> > device through, and that device would hold the guest rootfs?
> > With the approach we're proposing, attestation would be optional and
> > upon the kernel's decision. In that case, the kernel would not require
> > userspace to run attestation (because there is no userspace...) but the
> > actual guest attestation would still happen whenever the guest would
> > want to fetch an attestation gated secret. And that attestation flow
> > would include the storage device attestation report, because it's part
> > of the guest TCB. So, eventually, the device would be attested, but not
> > right when the device is attached to the guest.
> > 
> > > The userspace will need a knob which transitions the
> > > device into the trusted state (switch SWIOTLB to direct DMA, for example). I
> > > guess if the userspace is initramdisk, it could still reload the driver
> > > which is not doing useful work just yet...
> > > 
> > > 
> > > > > - when drivers probe - it is all set up and the driver decides if/which DMA
> > > > > mode to use (SWIOTLB or direct), or panic().
> > > > > 
> > > > 
> > > > When would it panic?
> > > 
> > > When attestation failed.
> > 
> > Attestation failure should only trigger a rejection from the TVM, i.e.
> > the TDI would not be probed. That should be reported back to the host,
> > who may decide to call unbind on that TDI (and thus moved it back to
> > UNLOCKED).
> > 
> > > > > Uff. Too long already. Sorry. Now, go to the problems:
> > > > > 
> > > > > If the user wants only CMA/SPDM,
> > > > 
> > > > By user here, you mean the user controlling the host? Or the TVM
> > > > user/owner? I assume the former.
> > > 
> > > Yes, the physical host owner.
> > > 
> > > > > the Lukas'es patched will do that without
> > > > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > > > sessions).
> > > > > 
> > > > > If the user wants only IDE, the AMD PSP's device_connect needs to be called
> > > > > and the host OS does not get to know the IDE keys. Other vendors allow
> > > > > programming IDE keys to the RC on the baremetal, and this also may co-exist
> > > > > with a TSM running outside of Linux - the host still manages trafic classes
> > > > > and streams.
> > > > > 
> > > > > If the user wants TDISP for VMs, this assumes the user does not trust the
> > > > > host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
> > > > > 
> > > > > The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
> > > > > seem capable of co-existing, TDISP does not.
> > > > 
> > > > Which makes sense, TDISP is not designed to be used outside of the
> > > > TEE-IO VFs assigned to TVM use case.
> > > > 
> > > > > 
> > > > > However there are common bits.
> > > > > - certificates/measurements/reports blobs: storing, presenting to the
> > > > > userspace (results of device_connect and tdi_bind);
> > > > > - place where we want to authenticate the device and enable IDE
> > > > > (device_connect);
> > > > > - place where we want to bind TDI to a TVM (tdi_bind).
> > > > > 
> > > > > I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
> > > > > and a hack for VFIO PCI device to call tdi_bind.
> > > > > 
> > > > > The next steps:
> > > > > - expose blobs via configfs (like Dan did configfs-tsm);
> > > > > - s/tdisp.ko/coco.ko/;
> > > > > - ask the audience - what is missing to make it reusable for other vendors
> > > > > and uses?
> > > > 
> > > > The connect-bind-run flow is similar to the one we have defined for
> > > > RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
> > > > details, but nothing there is architectural and could somehow apply to
> > > > other architectures.
> > > 
> > > Yeah, it is good one!
> > 
> > Thanks. Comments and improvements proposal are welcome.
> > 
> > > I am still missing the need to have sbi_covg_start_interface() as a separate
> > > step though. Thanks,
> > 
> > Just to reiterate: start_interface is a guest call into the TSM, to let
> > it know that it accepts the TDI. That makes the TSM do two things:
> > 
> > 1. Enable the MMIO and DMA mappings.
> > 2. Move the TDI to RUN.
> > 
> > After that call, the TDI is usable from a TVM perspective. Before that
> > call it is not, but its configuration and state are locked.
> Right. I still wonder what bad thing can happen if we move to RUN before
> starting the TVM (I suspect there is something), or it is all about
> semantics (for the AMD TIO usecase, at least)?

It's not only about semantics, it's about ownership. By moving to RUN
before the TVM starts, you're basically saying the host decides if the
TDI is acceptable by the TVM or not. The TVM is responsible for making
that decision and does not trust the host VMM to do so on its behalf, at
least in the confidential computing threat model.

Is there any specific reason why you wouldn't move the TDI to RUN when
the SEV guest calls into the validat ABI?

Cheers,
Samuel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-11 22:45         ` Dan Williams
@ 2023-11-24 14:52           ` Jonathan Cameron
  0 siblings, 0 replies; 20+ messages in thread
From: Jonathan Cameron @ 2023-11-24 14:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alexey Kardashevskiy, Lukas Wunner, linux-coco, kvm, linux-pci,
	Jonathan Cameron, suzuki.poulose

On Sat, 11 Nov 2023 14:45:54 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> >    
> > > >>> - tdi_info - read measurements/certs/interface report;    
> > > >>
> > > >> Does this return cached cert chains and measurements from the device
> > > >> or does it retrieve them anew?  (Measurements might have changed if
> > > >> MEAS_FRESH_CAP is supported.)
> > > >>
> > > >>    
> > > >>> If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > > >>> the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > >>> sessions).    
> > > >>
> > > >> It can co-exist if the pci_cma_claim_ownership() library call
> > > >> provided by patch 12/12 is invoked upon device_connect.
> > > >>
> > > >> It would seem advantageous if you could delay device_connect
> > > >> until a device is actually passed through.  Then the OS can
> > > >> initially authenticate and measure devices and the PSP takes
> > > >> over when needed.    
> > > > 
> > > > Would that delay mean IDE isn't up - I think that wants to be
> > > > available whether or not pass through is going on.
> > > > 
> > > > Given potential restrictions on IDE resources, I'd expect to see an explicit
> > > > opt in from userspace on the host to start that process for a given
> > > > device.  (udev rule or similar might kick it off for simple setups).
> > > > 
> > > > Would that work for the flows described?    
> > > 
> > > This would work but my (likely wrong) intention was also to run 
> > > necessary setup in both host and guest at the same time before drivers 
> > > probe devices. And while delaying it in the host is fine (well, for us 
> > > in AMD, as we are aiming for CoCo/TDISP), in the guest this means less 
> > > flexibility in enlightening the PCI subsystem and the guest driver: 
> > > ideally (or at least initially) the driver is supposed to probe already 
> > > enabled and verified device, as otherwise it has to do SWIOTLB until the 
> > > userspace does the verification and kicks the driver to go proper direct 
> > > DMA (or reload the driver?).  
> > 
> > In the case of a guest getting a VF, there probably won't be any way for
> > the kernel to run any native attestation anyway, so policy would have to
> > rely on the CoCo paths. Kernel stuff Lukas has would just not try to attest
> > or claim anything about it. If a VF has a CMA capable DOE instance
> > then that's not there for IDE stuff at all, but for the guest to get
> > direct measurements etc without PSP or anything else getting involved
> > in which case the guest using that directly is a reasonable thing to do.  
> 
> Is that a practical reality that VFs are going to implement CMA?
Maybe?  CMA definition allows for it.
>  My
> expectation is CMA is a PF facility and the TSM retrieves measurements
> for TDIs through that. At least that seems to be fundamental assumption
> of the TDISP specification. Given config-cycles are always host-mediated
> I expect guest CMA will always be a proxy whether it is is per-VF CMA
> interface or not.

There's a different between proxying where we just pass the reads and writes
through unmodified as PCI config accesses and where we provide a userspace
interface to do it because we need to maintain locking etc vs any potential
host accesses. But sure, it's emulated even in this path.


> 
> >   
> > >   
> > > > Next bit probably has holes...  Key is that a lot of the checks
> > > > may fail, and it's up to host userspace policy to decide whether
> > > > to proceed (other policy in the secure VM side of things obviously)
> > > > 
> > > > So my rough thinking is - for the two options (IDE / TDISP)
> > > > 
> > > > Comparing with Alexey's flow I think only real difference is that
> > > > I call out explicit host userspace policy controls. I'd also like    
> > > 
> > > My imagination fails me :) What is the host supposed to do if the device 
> > > verification fails/succeeds later, and how much later, and the device is 
> > > a boot disk? Or is this userspace going to be limited to initramdisk? 
> > > What is that thing which we are protecting against? Or it is for CUDA 
> > > and such (which yeah, it can wait)?  
> > 
> > There are a bunch of non obvious cases indeed.  Hence make it all policy.
> > Though if you have a flow where verification is needed for boot disk and
> > it fails (and policy says that's not acceptable) then bad luck you
> > probably need to squirt a cert into your ramdisk or UEFI or similar.  
> 
> It seems policy mechanisms should be incrementally added as clear need
> for policy dictates, because that has ABI implications and
> kernel-depedency-on-userpace expectations.

Agreed, but I'd expect anything we implement in kernel to at least anticipate
that we may want policy.  If there are multiple possible sources of
verfication I'd be very surprised if we didn't need controls on whether
we require all to pass, none to pass, one specific one to pass, or any one to pass.

> 
> > > > to use similar interfaces to convey state to host userspace as
> > > > per Lukas' existing approaches.  Sure there will also be in
> > > > kernel interfaces for driver to get data if it knows what to do
> > > > with it.  I'd also like to enable the non tdisp flow to handle
> > > > IDE setup 'natively' if that's possible on particular hardware.
> > > > 
> > > > 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
> > > >     a failure in general so reject device - or it might decide it's up to
> > > >     the PSP etc.   (userspace can see if it succeeded)
> > > >     I'd argue host software can launch this at any time.  It will
> > > >     be a denial of service attack but so are many other things the host
> > > >     can do.    
> > > 
> > > Trying to visualize it in my head - policy is a kernel cmdline or module 
> > > parameter?  
> > 
> > Neither - it's bind not happening until userspace decides to kick it off.
> > The module could provide it's own policy on top of this - so userspace
> > could defer to that if it makes sense (so bind but rely on probe failing
> > if policy not met).  
> 
> udev module policy can already gate binding, its not clear new policy
> mechanism is needed here.

Yeah, I think that works - but to be sure definitely want to see a PoC.

...

Jonathan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-10 23:30     ` Dan Williams
@ 2023-11-24 16:25       ` Jonathan Cameron
  0 siblings, 0 replies; 20+ messages in thread
From: Jonathan Cameron @ 2023-11-24 16:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Lukas Wunner, Alexey Kardashevskiy, linux-coco, kvm, linux-pci,
	Jonathan Cameron, suzuki.poulose

On Fri, 10 Nov 2023 15:30:57 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Wed, 1 Nov 2023 08:27:17 +0100
> > Lukas Wunner <lukas@wunner.de> wrote:
> > 
> > Thanks Alexy, this is a great discussion to kick off.
> >   
> > > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:  
> > > > - device_connect - starts CMA/SPDM session, returns measurements/certs,
> > > > runs IDE_KM to program the keys;    
> > > 
> > > Does the PSP have a set of trusted root certificates?
> > > If so, where does it get them from?
> > > 
> > > If not, does the PSP just blindly trust the validity of the cert chain?
> > > Who validates the cert chain, and when?
> > > Which slot do you use?
> > > Do you return only the cert chain of that single slot or of all slots?
> > > Does the PSP read out all measurements available?  This may take a while
> > > if the measurements are large and there are a lot of them.  
> > 
> > I'd definitely like their to be a path for certs and measurement to be
> > checked by the Host OS (for the non TDISP path). Whether the
> > policy setup cares about result is different question ;)
> >   
> > > 
> > >   
> > > > - tdi_info - read measurements/certs/interface report;    
> > > 
> > > Does this return cached cert chains and measurements from the device
> > > or does it retrieve them anew?  (Measurements might have changed if
> > > MEAS_FRESH_CAP is supported.)
> > > 
> > >   
> > > > If the user wants only CMA/SPDM, the Lukas'es patched will do that without
> > > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > > sessions).    
> > > 
> > > It can co-exist if the pci_cma_claim_ownership() library call
> > > provided by patch 12/12 is invoked upon device_connect.
> > > 
> > > It would seem advantageous if you could delay device_connect
> > > until a device is actually passed through.  Then the OS can
> > > initially authenticate and measure devices and the PSP takes
> > > over when needed.  
> > 
> > Would that delay mean IDE isn't up - I think that wants to be
> > available whether or not pass through is going on.
> > 
> > Given potential restrictions on IDE resources, I'd expect to see an explicit
> > opt in from userspace on the host to start that process for a given
> > device.  (udev rule or similar might kick it off for simple setups).
> > 
> > Would that work for the flows described?  
> > 
> > Next bit probably has holes...  Key is that a lot of the checks
> > may fail, and it's up to host userspace policy to decide whether
> > to proceed (other policy in the secure VM side of things obviously)
> > 
> > So my rough thinking is - for the two options (IDE / TDISP)
> > 
> > Comparing with Alexey's flow I think only real difference is that
> > I call out explicit host userspace policy controls. I'd also like
> > to use similar interfaces to convey state to host userspace as
> > per Lukas' existing approaches.  Sure there will also be in
> > kernel interfaces for driver to get data if it knows what to do
> > with it.  I'd also like to enable the non tdisp flow to handle
> > IDE setup 'natively' if that's possible on particular hardware.  
> 
> Are there any platforms that have IDE host capability that are not also
> shipping a TSM. I know that some platform allow for either the TSM or
> the OS to own that setup, but there are no standards there. I am not
> opposed to the native path, but given a cross-vendor "TSM" concept is
> needed and that a TSM is likely available on all IDE capable platforms
> it seems reasonable for Linux to rely on TSM managed IDE for the near
> term if not the long term as well.

Just for completeness, (I mentioned it in the LPC discussion):
IDE might well be link based between a switch inside the chassis and devices
outside the chassis in which case it is all standards defined and the host
isn't involved.  Not TDISP related though in that case.

> 
> > 
> > 1. Host has a go at CMA/SPDM. Policy might say that a failure here is
> >    a failure in general so reject device - or it might decide it's up to
> >    the PSP etc.   (userspace can see if it succeeded)
> >    I'd argue host software can launch this at any time.  It will
> >    be a denial of service attack but so are many other things the host
> >    can do.
> > 2. TDISP policy decision from host (userspace policy control)
> >    Need to know end goal.  
> 
> If the TSM owns the TDISP state what this policy decision rely comes
> down to is IDE stream resource management, I otherwise struggle to
> conceptualize "TDISP policy".
> 
> The policy is userspace deciding to assign an interface to a TVM, and
> that TVM requests that the assigned interface be allowed to access
> private memory. So it's not necessarily TDISP policy, its assigned
> interface is allowed to transition to private operation.
Agreed - that is probably enough. I was avoiding calling out specific
policy method, just don't want it to all flow through in the kernel without
a hook.  If we assume that we do stuff only when allocated to a TVM
then that acts as the gate.

> 
> > 3. IDE opt in from userspace.  Policy decision.
> >   - If not TDISP 
> >     - device_connect(IDE ONLY) - bunch of proxying in host OS.
> >     - Cert chain and measurements presented to host, host can then check if
> >       it is happy and expose for next policy decision.
> >     - Hooks exposed for host to request more measurements, key refresh etc.
> >       Idea being that the flow is host driven with PSP providing required
> >       services.  If host can just do setup directly that's fine too.
> >   - If TDISP (technically you can run tdisp from host, but lets assume
> >     for now no one wants to do that? (yet)).
> >     - device_connect(TDISP) - bunch of proxying in host OS.
> >     - Cert chain and measurements presented to host, host can then check if
> >       it is happy and expose for next policy decision.
> > 
> > 4. Flow after this depends on early or late binding (lockdown)
> >    but could load driver at this point.  Userspace policy.
> >    tdi-bind etc.  
> 
> It is valid to load the driver and operate the device in shared mode, so
> I am not sure that acceptance should gate driver loading. It also seems
> like something that could be managed with module policy if someone
> wanted to prevent shared operation before acceptance.

Indeed that might work.  Depends on device and whether it needs to be exposed
in shared mode (which may well require driver code auditing etc that can be relaxed
if it's up with TDISP and we know it's not a 'fake').

> 
> [..]
> > > > The next steps:
> > > > - expose blobs via configfs (like Dan did configfs-tsm);  
> 
> I am missing the context here, but for measurements I think those are
> better in sysfs. configs was only to allow for multiple containers to grab
> attestation reports, measurements are device local and containers can
> all see the same measurements.

Ah. Fair point.

> 
> > > > - s/tdisp.ko/coco.ko/;  
> 
> My bikeshed contribution, perhaps tsm.ko? I am still not someone who can
> say "coco" for confidential computing with a straight face.

Then definitely should be coco.ko :)

Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: TDISP enablement
  2023-11-14 15:35         ` Samuel Ortiz
@ 2023-12-06  4:43           ` Dan Williams
  0 siblings, 0 replies; 20+ messages in thread
From: Dan Williams @ 2023-12-06  4:43 UTC (permalink / raw)
  To: Samuel Ortiz, Alexey Kardashevskiy; +Cc: linux-coco, kvm, linux-pci

Samuel Ortiz wrote:
[..]
> > There is always a driver which has to enable the device and tell it where it
> > can DMA to/from anyway so the RUN state does not really let the device start
> > doing things once it is moved to RUN 
> 
> I agree. But setting RUN from the host means that the guest can start
> configuring and using that device at any point in time, i.e. even before
> any guest component could verify, validate and attest to the TDI. RUN is
> precisely defined for that purpose: Telling the TDI that it should now
> accept T-bit TLPs, and you want to do that *after* the TVM accepts the
> TDI. Here, by having the host move the TDI to RUN, potentially even before
> the TVM has even booted, you're not giving the guest a chance to explictly
> accept the TDI.

I wanted to circle back to this to agree about allowing the guest to
control the transition from LOCKED to RUN. Recall the Plumbers
conversation where I mentioned TDX moving closer to TIO to streamline
the common TSM interface in Linux, and foreshadowing other vendors
making similar concessions. This is an example where the "as simple as
possible, but no simpler" threshold looks to have been crossed.

TDX like COVE allows for guest to trigger LOCKED to RUN transition. For
vendor alignment purposes this looks like an opportunity for TIO to
enable the same and prevent a vendor-specific semantic difference in the
TSM common infrastructure.

[..]
[inclue Samuel's further justification that I also Ack]
> > > After that call, the TDI is usable from a TVM perspective. Before that
> > > call it is not, but its configuration and state are locked.
> > Right. I still wonder what bad thing can happen if we move to RUN before
> > starting the TVM (I suspect there is something), or it is all about
> > semantics (for the AMD TIO usecase, at least)?
> 
> It's not only about semantics, it's about ownership. By moving to RUN
> before the TVM starts, you're basically saying the host decides if the
> TDI is acceptable by the TVM or not. The TVM is responsible for making
> that decision and does not trust the host VMM to do so on its behalf, at
> least in the confidential computing threat model.
> 
> Is there any specific reason why you wouldn't move the TDI to RUN when
> the SEV guest calls into the validat ABI?
> 
> Cheers,
> Samuel.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-12-06  4:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-31 22:56 TDISP enablement Alexey Kardashevskiy
2023-10-31 23:40 ` Dionna Amalie Glaze
2023-11-01  7:38   ` Lukas Wunner
2023-11-01  7:27 ` Lukas Wunner
2023-11-01 11:05   ` Jonathan Cameron
2023-11-02  2:28     ` Alexey Kardashevskiy
2023-11-03 16:44       ` Jonathan Cameron
2023-11-11 22:45         ` Dan Williams
2023-11-24 14:52           ` Jonathan Cameron
2023-11-10 23:38       ` Dan Williams
2023-11-10 23:30     ` Dan Williams
2023-11-24 16:25       ` Jonathan Cameron
2023-11-13  6:04     ` Samuel Ortiz
2023-11-01 11:43   ` Alexey Kardashevskiy
2023-11-13  5:43 ` Samuel Ortiz
2023-11-13  6:46   ` Alexey Kardashevskiy
2023-11-13 15:10     ` Samuel Ortiz
2023-11-14  0:57       ` Alexey Kardashevskiy
2023-11-14 15:35         ` Samuel Ortiz
2023-12-06  4:43           ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).