Linux Confidential Computing Development

Linux Confidential Computing Development
 help / color / mirror / Atom feed

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Jason Gunthorpe @ 2026-05-19 16:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Mostafa Saleh, iommu, linux-arm-kernel, linux-kernel, linux-coco,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5a8q9fs7ud.fsf@kernel.org>

On Tue, May 19, 2026 at 09:35:30PM +0530, Aneesh Kumar K.V wrote:
> Yes, that also resulted in simpler and cleaner code.
> 
> swiotlb_tbl_map_single
> 	/*
> 	 * If the physical address is encrypted but the device requires
> 	 * decrypted DMA, use a decrypted io_tlb_mem and update the
> 	 * attributes so the caller knows that a decrypted io_tlb_mem
> 	 * was used.
> 	 */
> 	if (!(*attrs & DMA_ATTR_CC_SHARED) && force_dma_unencrypted(dev))
> 		*attrs |= DMA_ATTR_CC_SHARED;
> 
> 	if (mem->unencrypted != !!(*attrs & DMA_ATTR_CC_SHARED))
> 		return (phys_addr_t)DMA_MAPPING_ERROR;

Yeah, exactly that is so much clearer now that the mem->unecrypted is
tied directly.

That logic is reversed though, the incoming ATTR_CC doesn't matter for
swiotlb, that is just the source of the memcpy.

/* swiotlb pool is incorrect for this device */
if (mem->unencrypted != force_dma_unencrypted(dev))
    return (phys_addr_t)DMA_MAPPING_ERROR;

/* Force attrs to match the kind of memory in the pool */
if (mem->unencrypted)
     *attrs |= DMA_ATTR_CC_SHARED;
else
     *attrs &= ~DMA_ATTR_CC_SHARED;


Attrs should be forced to whatever memory swiotlb selected.

Jason

^ permalink raw reply

* Re: [PATCH v9 14/23] x86/virt/seamldr: Shut down the current TDX module
From: Dave Hansen @ 2026-05-19 16:24 UTC (permalink / raw)
  To: Chao Gao, Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Li, Xiaoyao, Huang, Kai,
	Zhao, Yan Y, dave.hansen@linux.intel.com, kas@kernel.org,
	Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
	binbin.wu@linux.intel.com, Verma, Vishal L, nik.borisov@suse.com,
	mingo@redhat.com, Weiny, Ira, tony.lindgren@linux.intel.com,
	Annapurve, Vishal, Shahar, Sagi, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <agxSAsUvgcHj/Ywl@intel.com>

On 5/19/26 05:05, Chao Gao wrote:
>>  Why not just WARN_ON_ONCE(get_tdx_sys_info_handoff(&handoff));
>>  And we can drop the ret var. Save 2 LOC.
> Dave had a different preference here:
> 
> https://lore.kernel.org/kvm/8b9d7fa7-6534-48e7-a4fa-c21260b1c762@intel.com/

I almost never optimize for lines of code.

The _only_ reason to worry about it is when you have a chunk of logic
that's having issues fitting on a "screen". There, squishing a few lines
together can mean the difference between seeing a whole loop on one
screen or having to page around.

But, at the point you're doing *that*, you probably need to think about
refactoring anyway.

^ permalink raw reply

* SVSM Development Call May 20th, 2026
From: Jörg Rödel @ 2026-05-19 18:05 UTC (permalink / raw)
  To: coconut-svsm, linux-coco

Hi,

Here is the call for agenda items for this weeks SVSM development call.  Please
send any agenda items you have in mind as a reply to this email or raise them
in the meeting.

We will use the LF Zoom instance. Details of the meeting  can be found in our
governance repository at:

	https://github.com/coconut-svsm/governance

The link to the COCONUT-SVSM calendar is:

	https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week

The meeting will be recorded and the recording eventually published.

Regards,

	Jörg

^ permalink raw reply

* Re: [PATCH v2 15/15] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
From: Huang, Kai @ 2026-05-20  0:59 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: dwmw2@infradead.org, Edgecombe, Rick P, x86@kernel.org,
	kas@kernel.org, binbin.wu@linux.intel.com,
	dave.hansen@linux.intel.com, vkuznets@redhat.com, paul@xen.org,
	yosry@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <agx77QB3UVmJr5xP@google.com>

On Tue, 2026-05-19 at 08:04 -0700, Sean Christopherson wrote:
> On Tue, May 19, 2026, Kai Huang wrote:
> > > @@ -12712,19 +11913,8 @@ static void store_regs(struct kvm_vcpu *vcpu)
> > >  
> > >  static int sync_regs(struct kvm_vcpu *vcpu)
> > >  {
> > > -	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
> > > -		__set_regs(vcpu, &vcpu->run->s.regs.regs);
> > > -		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
> > > -	}
> > > -
> > > -	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
> > > -		struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
> > > -
> > > -		if (__set_sregs(vcpu, &sregs))
> > > -			return -EINVAL;
> > > -
> > > -		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
> > > -	}
> > > +	if (kvm_run_set_regs(vcpu))
> > > +		return -EINVAL;
> > 
> > Nit:
> > 
> > Do you think 'kvm_run_sync_regs()' is better than 'kvm_run_set_regs()'?
> > 
> > Because I think "sync" reflects better that vcpu->run->kvm_dirty_regs is cleared
> > after the "set" operation.
> 
> The problem I have with "sync" is that it doesn't communicate the direction of
> the sync.  What about kvm_run_sync_regs_{to,from}_user()?

Yeah that's better to me too.

> 
> > >  
> > >  	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
> > >  		struct kvm_vcpu_events events = vcpu->run->s.regs.events;
> > 
> > Also, I wonder whether it's better to add a helper for events so sync_regs() and
> > store_regs() can be simplified to:
> > 
> > static int sync_regs(struct kvm_vcpu *vcpu)
> > {
> > 	if (kvm_run_sync_regs(vcpu))
> > 		return -EINVAL;
> > 	return kvm_run_sync_events(vcpu);
> > }
> > 
> > static void store_regs(struct kvm_vcpu *vcpu)
> > {
> > 	kvm_run_get_regs(vcpu);
> > 	kvm_run_get_events(vcpu);
> > }
> > 
> > And maybe 'kvm_run_get_regs()' could be 'kvm_run_store_regs()' too , so that the
> > store_regs() could be:
> > 
> > static void store_regs(struct kvm_vcpu *vcpu)
> > {
> > 	kvm_run_store_regs(vcpu);
> > 	kvm_run_store_events(vcpu);
> > }
> 
> {store,sync}_regs() look pretty, but IMO the overall code is uglier because we
> end up with super small helpers that have one caller, e.g.
> 
>   static void kvm_run_sync_events_to_user(struct kvm_vcpu *vcpu)
>   {
> 	if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_EVENTS)
> 		kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &vcpu->run->s.regs.events);
>   }
> 
>   static void store_regs(struct kvm_vcpu *vcpu)
>   {
> 	BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
> 
> 	kvm_run_sync_regs_to_user(vcpu);
> 	kvm_run_sync_events_to_user(vcpu);
>   }
> 
> For me, the extra "jump" is undesirable, but it allows burying __{g,s}et_{s,}regs()
> in regs.c, and so is a net positive for registers.  But for events, it's pure
> overhead.

Sure.

Just wondering is it possible we might want to move events handling to some
other C file since you are cleanup x86.c?  But we can deal with this when it
happens.

> 
> > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > > index 185062a26924..fd55cd031b1c 100644
> > > --- a/arch/x86/kvm/x86.h
> > > +++ b/arch/x86/kvm/x86.h
> > > @@ -414,6 +414,7 @@ int handle_ud(struct kvm_vcpu *vcpu);
> > >  
> > >  void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
> > >  				   struct kvm_queued_exception *ex);
> > > +void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu);
> > >  
> > >  int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> > >  int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
> > > @@ -604,6 +605,7 @@ static inline void kvm_machine_check(void)
> > >  int kvm_spec_ctrl_test_value(u64 value);
> > >  int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
> > >  			      struct x86_exception *e);
> > > +void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid);
> > > 
> > 
> > If I read correct, this is because "regs.c" calls kvm_invalidate_pcid() but you
> > want to keep it in x86.c.  But it seems the "x86.h" isn't included by "regs.c"
> > directly but via other headers ("mmu.h" does include "x86.h").
> > 
> > Should the "regs.c" include "x86.h" directly?
> 
> Oh, yeah, I just goofed that.
> 
> > Btw, I am a bit confused the relationship between "x86.h" and other headers like
> > "mmu.h" and the new "regs.h".  That is, headers like "mmu.h" include "x86.h",
> > but headers like "regs.h" do not (instead, "x86.h" includes them).
> 
> Heh, don't look for a theme/plan, because there isn't one.  Over the years, x86.h
> and x86.c became dumping grounds for everything that didn't have an obvious home,
> and so there aren't real "rules".

My guess too.

> 
> Hmm, though looking at all of this again, I think we're actually quite close to
> having somewhat sane rules.  Over the past few years, I've tried multiple times
> to move what I felt should be KVM-internal structures from asm/kvm_host.h to x86.h,
> and I've failed miserably every time because inevitably even the most innocuous
> struct manages to have usage that leads to cyclical header dependencies and/or is
> used by arch-neutral KVM code.

The problem is some other kernel code includes <linux/kvm_host.h> (which in turn
includes <asm/kvm_host.h>) but the KVM internal structures have nothing to do
with them.

E.g., some drivers are using <linux/kvm_host.h>:

#$ grep kvm_host.h drivers/ -Rn
drivers/vfio/pci/vfio_pci_zdev.c:14:#include <linux/kvm_host.h>
drivers/vfio/vfio_main.c:20:#include <linux/kvm_host.h>
drivers/firmware/arm_sdei.c:19:#include <linux/kvm_host.h>
drivers/hwtracing/coresight/coresight-trbe.c:20:#include <linux/kvm_host.h>
drivers/hwtracing/coresight/coresight-etm4x-core.c:10:#include
<linux/kvm_host.h>
drivers/s390/crypto/vfio_ap_ops.c:17:#include <linux/kvm_host.h>
drivers/s390/crypto/vfio_ap_private.h:20:#include <linux/kvm_host.h>

But looking at them, AFAICT what they need is only some structure declarations
(e.g., 'struct kvm;') for type safety (plus some function declarations), but
don't actually need to see the actual structure.

For x86, AFAICT there's (only) "arch/x86/events/intel/core.c" actually uses the
'struct kvm_pmu', though.  I haven't checked other ARCHs whether there's cases
actually need to use any structure.

> 
> I think it's probably time to admit I've been looking at the asm/kvm_host.h vs.
> x86.h split all wrong, i.e. finally give up on moving structures out of kvm_host.h,
> and do the exact opposite: commit to using kvm_host.h to define and declare widely
> used structures.

If the structure(s) are only used within arch/x86/kvm/, it doesn't seem right to
define them in asm/kvm_host.h?

> 
> Because literally the only reason that x86.h doesn't include mmu.h is that mmu.h
> references struct kvm_host, which is currently defined in x86.h.  
> 

Yes. But I wouldn't worry about this too much since it's a small thing we can
always find a way to fix.  E.g., we can move kvm_mmu_max_gfn() out of "mmu.h"
(with a renaming perhaps).

> If we "fix"
> that, then (a) we can make x86.h the "central" include everyone expects it to be,
> and (b) it can be the start of a cleanup of asm/kvm_host.h and a big step towards
> defining maintainable "rules" for what goes where.  E.g. there are a pile of
> functional declarations in asm/kvm_host.h that can live elsewhere; if we trim
> those down, then the rules become:
> 
>   - asm/kvm_host.h holds "common" structure definitions and associated key global
>     variables, and things that are referenced by arch-neutral KVM.

It's a bit weird the arch-neutral KVM code needs to reference variables in
asm/kvm_host.h, and I am afraid the "common" structure definitions will
effectively be a lot of structures only used by arch/x86/kvm/.  

Which isn't necessarily a bad thing, from the perspective we might finally clean
this up by a giant move.

E.g., <linux/kvm_types.h> is already used by other kernel components where they
don't need <linux/kvm_host.h>.  Ideally, maybe eventually we can use
<linux/kvm_types.h> and <asm/kvm_types.h> for things needed by other kernel
components, or keep <linux/kvm_host.h> and <asm/kvm_host.h> minimal after moving
majority things to some KVM internal headers.

E.g., maybe:

  virt/kvm/include/kvm_host.h
  arch/x86/kvm/kvm_host.h (can even be merged to x86.h)

I think the problem is "struct kvm_arch" and "struct kvm_vcpu_arch", that they
are not a pointer but a fully embedded structure in "struct kvm" and "struct
kvm_vcpu" respectively.  That caused that you need to keep the actual structure
definition of "struct kvm_arch" and "kvm_vcpu_arch" in asm/kvm_host.h, which in
turns makes a lot of structures only used by arch/x86/kvm/ need to stay in
asm/kvm_host.h.

I am not sure whether there's a mandatory requirement that "struct kvm_arch" and
"struct kvm_vcpu_arch" must be fully embedded, and it would be kinda painful to
covert to a pointer (e.g., there's kvm_x86_ops::vm_size), but perhaps that is
also an option to consider?

>   - <area>.{c,h} holds relevant declarations and definitions.
>   - x86.{c,h} is the kitchen sink for everything else.

Yeah the two are reasonable to me.

^ permalink raw reply

* Re: [PATCH v2 15/15] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
From: Sean Christopherson @ 2026-05-20  1:25 UTC (permalink / raw)
  To: Kai Huang
  Cc: dwmw2@infradead.org, Rick P Edgecombe, x86@kernel.org,
	kas@kernel.org, binbin.wu@linux.intel.com,
	dave.hansen@linux.intel.com, vkuznets@redhat.com, paul@xen.org,
	yosry@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <729c4191d16e4c768c231ffb9bb8420306039210.camel@intel.com>

On Wed, May 20, 2026, Kai Huang wrote:
> On Tue, 2026-05-19 at 08:04 -0700, Sean Christopherson wrote:
> > On Tue, May 19, 2026, Kai Huang wrote:
> Just wondering is it possible we might want to move events handling to some
> other C file since you are cleanup x86.c?  But we can deal with this when it
> happens.

Events are a hard one.  There's a decent amount of code, but not _so_ much that
it's a no-brainer to move them out of x86.c.  And there's no super clear cut
boundary, e.g. events can mean exceptions, INIT+SIPI, IRQs, APIC stuff, etc.,
several of which already have substantial amounts of code outside of x86.c.

> > Hmm, though looking at all of this again, I think we're actually quite close to
> > having somewhat sane rules.  Over the past few years, I've tried multiple times
> > to move what I felt should be KVM-internal structures from asm/kvm_host.h to x86.h,
> > and I've failed miserably every time because inevitably even the most innocuous
> > struct manages to have usage that leads to cyclical header dependencies and/or is
> > used by arch-neutral KVM code.
> 
> The problem is some other kernel code includes <linux/kvm_host.h> (which in turn
> includes <asm/kvm_host.h>) but the KVM internal structures have nothing to do
> with them.
> 
> E.g., some drivers are using <linux/kvm_host.h>:
> 
> #$ grep kvm_host.h drivers/ -Rn
> drivers/vfio/pci/vfio_pci_zdev.c:14:#include <linux/kvm_host.h>
> drivers/vfio/vfio_main.c:20:#include <linux/kvm_host.h>
> drivers/firmware/arm_sdei.c:19:#include <linux/kvm_host.h>
> drivers/hwtracing/coresight/coresight-trbe.c:20:#include <linux/kvm_host.h>
> drivers/hwtracing/coresight/coresight-etm4x-core.c:10:#include
> <linux/kvm_host.h>
> drivers/s390/crypto/vfio_ap_ops.c:17:#include <linux/kvm_host.h>
> drivers/s390/crypto/vfio_ap_private.h:20:#include <linux/kvm_host.h>
> 
> But looking at them, AFAICT what they need is only some structure declarations
> (e.g., 'struct kvm;') for type safety (plus some function declarations), but
> don't actually need to see the actual structure.

Ya.

> For x86, AFAICT there's (only) "arch/x86/events/intel/core.c" actually uses the
> 'struct kvm_pmu', though.

I have a patch to fix that :-)

https://lore.kernel.org/all/20260508231353.406465-7-seanjc@google.com

> I haven't checked other ARCHs whether there's cases actually need to use any
> structure.

PPC, arm64, and IIRC s390 all have assets defined by KVM that are consumed by
the kernel at-large.  E.g. because KVM for arm64 can't be built as a module, the
kernel calls directly into KVM during boot.  IIRC, PPC has similar code.

A few years ago (wow, time flies), I was able to hide KVM internals, using #ifdef
shenanigans to deal with cases where non-KVM really truly needed to get at things
defined in kvm_host.h

https://lore.kernel.org/all/20230916003118.2540661-27-seanjc@google.com

More recently, I tried to standardize KVM arch=>common includes[1], to help pave
the way to splitting up kvm_host.h, but then s390's crazy arm64 support killed
that (at least for now).

[1] https://lore.kernel.org/all/20250611001042.170501-1-seanjc@google.com
[2] https://lore.kernel.org/all/20260428160527.1378085-1-seiden@linux.ibm.com

> > I think it's probably time to admit I've been looking at the asm/kvm_host.h vs.
> > x86.h split all wrong, i.e. finally give up on moving structures out of kvm_host.h,
> > and do the exact opposite: commit to using kvm_host.h to define and declare widely
> > used structures.
> 
> If the structure(s) are only used within arch/x86/kvm/, it doesn't seem right to
> define them in asm/kvm_host.h?

The problem is that anything that feeds into kvm_vcpu_arch needs to be visible
to virt/kvm.  And burying kvm_x86_ops in arch/kvm/x86 would mean one-liners like
kvm_arch_vcpu_blocking() couldn't be inlined.

I've looked at this far too many times :-)

> > Because literally the only reason that x86.h doesn't include mmu.h is that mmu.h
> > references struct kvm_host, which is currently defined in x86.h.  
> > 
> 
> Yes. But I wouldn't worry about this too much since it's a small thing we can
> always find a way to fix.  E.g., we can move kvm_mmu_max_gfn() out of "mmu.h"
> (with a renaming perhaps).

I hacked on moving more stuff out of x86.{c,h} and kvm_host.h.  The diff stats
are quite promising :-)

 arch/x86/include/asm/kvm_host.h           |  444 ++-------------
 arch/x86/kvm/x86.c                        | 3784 +++-----------------------------------------------------------------------------------------------------------------------
 arch/x86/kvm/x86.h                        |  474 ++++++++--------

> > If we "fix"
> > that, then (a) we can make x86.h the "central" include everyone expects it to be,
> > and (b) it can be the start of a cleanup of asm/kvm_host.h and a big step towards
> > defining maintainable "rules" for what goes where.  E.g. there are a pile of
> > functional declarations in asm/kvm_host.h that can live elsewhere; if we trim
> > those down, then the rules become:
> > 
> >   - asm/kvm_host.h holds "common" structure definitions and associated key global
> >     variables, and things that are referenced by arch-neutral KVM.
> 
> It's a bit weird the arch-neutral KVM code needs to reference variables in
> asm/kvm_host.h, and I am afraid the "common" structure definitions will
> effectively be a lot of structures only used by arch/x86/kvm/.  
> 
> Which isn't necessarily a bad thing, from the perspective we might finally clean
> this up by a giant move.
> 
> E.g., <linux/kvm_types.h> is already used by other kernel components where they
> don't need <linux/kvm_host.h>.  Ideally, maybe eventually we can use
> <linux/kvm_types.h> and <asm/kvm_types.h> for things needed by other kernel
> components, or keep <linux/kvm_host.h> and <asm/kvm_host.h> minimal after moving
> majority things to some KVM internal headers.
> 
> E.g., maybe:
> 
>   virt/kvm/include/kvm_host.h
>   arch/x86/kvm/kvm_host.h (can even be merged to x86.h)
> 
> I think the problem is "struct kvm_arch" and "struct kvm_vcpu_arch", that they
> are not a pointer but a fully embedded structure in "struct kvm" and "struct
> kvm_vcpu" respectively.  That caused that you need to keep the actual structure
> definition of "struct kvm_arch" and "kvm_vcpu_arch" in asm/kvm_host.h, which in
> turns makes a lot of structures only used by arch/x86/kvm/ need to stay in
> asm/kvm_host.h.
> 
> I am not sure whether there's a mandatory requirement that "struct kvm_arch" and
> "struct kvm_vcpu_arch" must be fully embedded, and it would be kinda painful to
> covert to a pointer (e.g., there's kvm_x86_ops::vm_size), but perhaps that is
> also an option to consider?

The idea I had in the past, and where I was going with things before s390's love
for arm64 came along, was to add a kvm_arch.h in arch/<arch>/kvm, and have virt/kvm
include _that_ instead of kvm_host.h.  That way we don't need to make any fundamental
changes to structures, but we can still significantly cut down on what's exposed
via kvm_host.h.  At some point I'll try to take another look; it's really the
s390+arm64 combo that's problematic :-/

^ permalink raw reply

* Re: [PATCH v2 15/15] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
From: Huang, Kai @ 2026-05-20  2:29 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: dwmw2@infradead.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, binbin.wu@linux.intel.com,
	vkuznets@redhat.com, x86@kernel.org, kas@kernel.org, paul@xen.org,
	yosry@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <ag0NlJ2FEIL7GJIj@google.com>

On Tue, 2026-05-19 at 18:25 -0700, Sean Christopherson wrote:
> On Wed, May 20, 2026, Kai Huang wrote:
> > On Tue, 2026-05-19 at 08:04 -0700, Sean Christopherson wrote:
> > > On Tue, May 19, 2026, Kai Huang wrote:
> > Just wondering is it possible we might want to move events handling to some
> > other C file since you are cleanup x86.c?  But we can deal with this when it
> > happens.
> 
> Events are a hard one.  There's a decent amount of code, but not _so_ much that
> it's a no-brainer to move them out of x86.c.  And there's no super clear cut
> boundary, e.g. events can mean exceptions, INIT+SIPI, IRQs, APIC stuff, etc.,
> several of which already have substantial amounts of code outside of x86.c.

Yes agreed.

> 
> > > Hmm, though looking at all of this again, I think we're actually quite close to
> > > having somewhat sane rules.  Over the past few years, I've tried multiple times
> > > to move what I felt should be KVM-internal structures from asm/kvm_host.h to x86.h,
> > > and I've failed miserably every time because inevitably even the most innocuous
> > > struct manages to have usage that leads to cyclical header dependencies and/or is
> > > used by arch-neutral KVM code.
> > 
> > The problem is some other kernel code includes <linux/kvm_host.h> (which in turn
> > includes <asm/kvm_host.h>) but the KVM internal structures have nothing to do
> > with them.
> > 
> > E.g., some drivers are using <linux/kvm_host.h>:
> > 
> > #$ grep kvm_host.h drivers/ -Rn
> > drivers/vfio/pci/vfio_pci_zdev.c:14:#include <linux/kvm_host.h>
> > drivers/vfio/vfio_main.c:20:#include <linux/kvm_host.h>
> > drivers/firmware/arm_sdei.c:19:#include <linux/kvm_host.h>
> > drivers/hwtracing/coresight/coresight-trbe.c:20:#include <linux/kvm_host.h>
> > drivers/hwtracing/coresight/coresight-etm4x-core.c:10:#include
> > <linux/kvm_host.h>
> > drivers/s390/crypto/vfio_ap_ops.c:17:#include <linux/kvm_host.h>
> > drivers/s390/crypto/vfio_ap_private.h:20:#include <linux/kvm_host.h>
> > 
> > But looking at them, AFAICT what they need is only some structure declarations
> > (e.g., 'struct kvm;') for type safety (plus some function declarations), but
> > don't actually need to see the actual structure.
> 
> Ya.
> 
> > For x86, AFAICT there's (only) "arch/x86/events/intel/core.c" actually uses the
> > 'struct kvm_pmu', though.
> 
> I have a patch to fix that :-)
> 
> https://lore.kernel.org/all/20260508231353.406465-7-seanjc@google.com

Oh great!

> 
> > I haven't checked other ARCHs whether there's cases actually need to use any
> > structure.
> 
> PPC, arm64, and IIRC s390 all have assets defined by KVM that are consumed by
> the kernel at-large.  E.g. because KVM for arm64 can't be built as a module, the
> kernel calls directly into KVM during boot.  IIRC, PPC has similar code.
> 
> A few years ago (wow, time flies), I was able to hide KVM internals, using #ifdef
> shenanigans to deal with cases where non-KVM really truly needed to get at things
> defined in kvm_host.h
> 
> https://lore.kernel.org/all/20230916003118.2540661-27-seanjc@google.com

Oh I never thought from this perspective (thanks for the info):

  --
  Hiding KVM details for all architectures will, in the very distant future, 
  allow loading a new (or old) KVM module without needing to rebuild and reboot 
  the entire kernel, or to even allow loading and running multiple versions of 
  KVM simultaneously on a single host.
  --

> 
> More recently, I tried to standardize KVM arch=>common includes[1], to help pave
> the way to splitting up kvm_host.h, but then s390's crazy arm64 support killed
> that (at least for now).
> 
> [1] https://lore.kernel.org/all/20250611001042.170501-1-seanjc@google.com
> [2] https://lore.kernel.org/all/20260428160527.1378085-1-seiden@linux.ibm.com

:-)

> 
> > > I think it's probably time to admit I've been looking at the asm/kvm_host.h vs.
> > > x86.h split all wrong, i.e. finally give up on moving structures out of kvm_host.h,
> > > and do the exact opposite: commit to using kvm_host.h to define and declare widely
> > > used structures.
> > 
> > If the structure(s) are only used within arch/x86/kvm/, it doesn't seem right to
> > define them in asm/kvm_host.h?
> 
> The problem is that anything that feeds into kvm_vcpu_arch needs to be visible
> to virt/kvm.  
> 

Yeah that's the problem.

> And burying kvm_x86_ops in arch/kvm/x86 would mean one-liners like
> kvm_arch_vcpu_blocking() couldn't be inlined.

Oh right, sad but acceptable tradeoff I guess.

> 
> I've looked at this far too many times :-)
> 
> > > Because literally the only reason that x86.h doesn't include mmu.h is that mmu.h
> > > references struct kvm_host, which is currently defined in x86.h.  
> > > 
> > 
> > Yes. But I wouldn't worry about this too much since it's a small thing we can
> > always find a way to fix.  E.g., we can move kvm_mmu_max_gfn() out of "mmu.h"
> > (with a renaming perhaps).
> 
> I hacked on moving more stuff out of x86.{c,h} and kvm_host.h.  The diff stats
> are quite promising :-)
> 
>  arch/x86/include/asm/kvm_host.h           |  444 ++-------------
>  arch/x86/kvm/x86.c                        | 3784 +++-----------------------------------------------------------------------------------------------------------------------
>  arch/x86/kvm/x86.h                        |  474 ++++++++--------
> 

Indeed!

> > > If we "fix"
> > > that, then (a) we can make x86.h the "central" include everyone expects it to be,
> > > and (b) it can be the start of a cleanup of asm/kvm_host.h and a big step towards
> > > defining maintainable "rules" for what goes where.  E.g. there are a pile of
> > > functional declarations in asm/kvm_host.h that can live elsewhere; if we trim
> > > those down, then the rules become:
> > > 
> > >   - asm/kvm_host.h holds "common" structure definitions and associated key global
> > >     variables, and things that are referenced by arch-neutral KVM.
> > 
> > It's a bit weird the arch-neutral KVM code needs to reference variables in
> > asm/kvm_host.h, and I am afraid the "common" structure definitions will
> > effectively be a lot of structures only used by arch/x86/kvm/.  
> > 
> > Which isn't necessarily a bad thing, from the perspective we might finally clean
> > this up by a giant move.
> > 
> > E.g., <linux/kvm_types.h> is already used by other kernel components where they
> > don't need <linux/kvm_host.h>.  Ideally, maybe eventually we can use
> > <linux/kvm_types.h> and <asm/kvm_types.h> for things needed by other kernel
> > components, or keep <linux/kvm_host.h> and <asm/kvm_host.h> minimal after moving
> > majority things to some KVM internal headers.
> > 
> > E.g., maybe:
> > 
> >   virt/kvm/include/kvm_host.h
> >   arch/x86/kvm/kvm_host.h (can even be merged to x86.h)
> > 
> > I think the problem is "struct kvm_arch" and "struct kvm_vcpu_arch", that they
> > are not a pointer but a fully embedded structure in "struct kvm" and "struct
> > kvm_vcpu" respectively.  That caused that you need to keep the actual structure
> > definition of "struct kvm_arch" and "kvm_vcpu_arch" in asm/kvm_host.h, which in
> > turns makes a lot of structures only used by arch/x86/kvm/ need to stay in
> > asm/kvm_host.h.
> > 
> > I am not sure whether there's a mandatory requirement that "struct kvm_arch" and
> > "struct kvm_vcpu_arch" must be fully embedded, and it would be kinda painful to
> > covert to a pointer (e.g., there's kvm_x86_ops::vm_size), but perhaps that is
> > also an option to consider?
> 
> The idea I had in the past, and where I was going with things before s390's love
> for arm64 came along, was to add a kvm_arch.h in arch/<arch>/kvm, and have virt/kvm
> include _that_ instead of kvm_host.h.  
> 

Not sure whether there's other code doing so? :-)

> That way we don't need to make any fundamental
> changes to structures, but we can still significantly cut down on what's exposed
> via kvm_host.h.  
> 

Yeah.

I saw below from you in [1]:

  --
  We've explore several alternatives to the #ifdef __KVM__ approach, and
  they all sucked, hard.  What I really wanted (and still want) to do, is to
  bury the bulk of kvm_host.h (and other KVM headers) in virt/kvm, but every
  attempt to do that ended in flames.  Even with the __KVM__ guards in place,
  each architecture's kvm_host.h is too intertwined with the common kvm_host.h,
  and trying to extract small-ish pieces just doesn't work (each patch
  inevitably snowballed into a gigantic beast).

  The other idea we considered (which I thought of, and feel dirty for even
  proposing it internally), is to move all headers under virt/kvm, add
  virt/kvm/include to the global header path, and then have KVM x86 omit
  virt/kvm/include when configured to hide KVM internals.  I hate this idea
  because it sets a bad precedent, and requires a lot of file movement
  without providing any benefit to other architectures.  E.g. I hope that
  guarding KVM internals with #ifdef __KVM__ will allow us to slowly clean
  things up so that some day KVM only exposes a handful of APIs to the rest
  of the kernel (probably a pipe dream).
  --

I haven't looked into details of your #ifdef __KVM__ approach yet, but seems you
don't quite like moving KVM internal staff to virt/kvm/include/ ?

But if we want to hide KVM internal structures, I don't see any other options
except virt/kvm/include/ is the place to go?

Btw, have you considered reverting the inclusion of "strut kvm" and "struct
kvm_arch" (and the vcpu structure), i.e., to make "struct kvm_arch" include
"struct kvm"?  I don't have any clue of whether it is feasible or how much
effort it needs, though -- it's just something came to mind when replying.

[1]: https://lore.kernel.org/all/20230916003118.2540661-1-seanjc@google.com/

> At some point I'll try to take another look; it's really the
> s390+arm64 combo that's problematic :-/

If you want, I can take a look.  I think I'll have bandwidth in near feature.

Given you have tried multiple times so I am not sure what I can achieve, though.

Anyway, seems "allow loading a new (or old) KVM module without needing to
rebuild and reboot the entire kernel" is a good reason to do this.

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Aneesh Kumar K.V @ 2026-05-20  3:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mostafa Saleh, iommu, linux-arm-kernel, linux-kernel, linux-coco,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260519161120.GO7702@ziepe.ca>

Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Tue, May 19, 2026 at 09:35:30PM +0530, Aneesh Kumar K.V wrote:
>> Yes, that also resulted in simpler and cleaner code.
>> 
>> swiotlb_tbl_map_single
>> 	/*
>> 	 * If the physical address is encrypted but the device requires
>> 	 * decrypted DMA, use a decrypted io_tlb_mem and update the
>> 	 * attributes so the caller knows that a decrypted io_tlb_mem
>> 	 * was used.
>> 	 */
>> 	if (!(*attrs & DMA_ATTR_CC_SHARED) && force_dma_unencrypted(dev))
>> 		*attrs |= DMA_ATTR_CC_SHARED;
>> 
>> 	if (mem->unencrypted != !!(*attrs & DMA_ATTR_CC_SHARED))
>> 		return (phys_addr_t)DMA_MAPPING_ERROR;
>
> Yeah, exactly that is so much clearer now that the mem->unecrypted is
> tied directly.
>
> That logic is reversed though, the incoming ATTR_CC doesn't matter for
> swiotlb, that is just the source of the memcpy.
>
> /* swiotlb pool is incorrect for this device */
> if (mem->unencrypted != force_dma_unencrypted(dev))
>     return (phys_addr_t)DMA_MAPPING_ERROR;
>
> /* Force attrs to match the kind of memory in the pool */
> if (mem->unencrypted)
>      *attrs |= DMA_ATTR_CC_SHARED;
> else
>      *attrs &= ~DMA_ATTR_CC_SHARED;
>
>
> Attrs should be forced to whatever memory swiotlb selected.
>

But that will not handle a T=1 device that wants to use swiotlb to
bounce unencrypted memory. That is:

force_dma_unencrypted(dev) == 0  /* T=1 device */
attrs = DMA_ATTR_CC_SHARED;

In that case, it should use an unencrypted io_tlb_mem:
mem->unencrypted == 1

-aneesh

^ permalink raw reply

* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Binbin Wu @ 2026-05-20  5:02 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, Paul Durrant,
	Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang
In-Reply-To: <dc62e58e-b6ee-41e1-84a5-0716822fefc8@linux.intel.com>



On 5/18/2026 5:55 PM, Binbin Wu wrote:
> 
> 
> On 5/18/2026 5:50 PM, David Woodhouse wrote:
>> On Mon, 2026-05-18 at 17:43 +0800, Binbin Wu wrote:
>>>
>>>
>>> On 5/18/2026 3:15 PM, David Woodhouse wrote:
>>>> On Mon, 2026-05-18 at 10:19 +0800, Binbin Wu wrote:
>>>>>  
>>>>>>>>   	longmode = is_64_bit_hypercall(vcpu);
>>>>>>>
>>>>>>> Is the variable name misleading?
>>>>>>
>>>>>> It most definitely is.  However, @longmode is passed around quite a few locations
>>>>>> in xen.c, and so I don't want to opportunistically fix this one variable.  Though
>>>>>> I'm definitely not opposed to a separate patch to rename them all to is_64bit or
>>>>>> something.
>>>>>
>>>>> OK, I can do it.
>>>>
>>>> This one (as shown above) is clearly indicating whether this particular
>>>> vCPU is in 64-bit mode for this particular hypercall. Changing that to
>>>> is_64bit makes sense.
>>>>
>>>> However, there is a separate overall mode for the VM, which is stored
>>>> in 'kvm->arch.xen.long_mode' and accessed by userspace using the
>>>> KVM_XEN_ATTR_TYPE_LONG_MODE attribute. It affects the datatypes used by
>>>> shared memory data structures, and is also latched by the kernel when
>>>> the guest writes the MSR for the hypercall page. That one should
>>>> probably keep its name.
>>>
>>> For this one, I think the current KVM code is consistent.
>>> The format is determined by EFER.LMA, whether the guest is running in 64 bit or
>>> compatible mode doesn't change the ABI.

I still have a point of confusion.

I noticed a behavioral mismatch between KVM and Xen regarding when they switch
to the standard/compat shared info.
- In Xen: The 32-bit shared info structure is latched if the current vCPU is
  not in 64-bit mode:
  hvm_latch_shinfo_size
      d->arch.has_32bit_shinfo = hvm_guest_x86_mode(current) != X86_MODE_64BIT

- In KVM: It evaluates is_long_mode(vcpu) instead. E.g.,
  kvm_xen_write_hypercall_page
      bool lm = is_long_mode(vcpu);
      ...
      kvm->arch.xen.long_mode = lm;

In theory, these two checks could differ when the guest kernel is running in
a 32-bit compatibility mode. However, I believe this mismatch is fine in
practice for two reasons:
- Mainstream 64-bit OSes don't run in compatibility mode for kernel code after
  the early init.
- By default, HVM guests cannot issue hypercalls from userspace. The only one
  exception HVMOP_guest_request_vm_event is not related to the share info.

So the vCPU will never be in compatibility mode when a related hypercall occurs.
In this specific operational context, evaluating is_long_mode() yields the
exact same functional outcome as checking for 64-bit execution mode. Am I
missing anything here?


>>
>> Agreed. For the hypercall case you're looking at, switching the name to
>> is_64bit makes sense.
>>
>>> struct compat_shared_info is used only when the guest is running natively in a
>>> 32-bit build.
>>
>> The struct compat_shared_info is also used in !kvm->arch.xen.long_mode
>> on a 64-bit host, as that's what means the guest is considered to be a
>> 32-bit guest.
>>
>> It's somewhat orthogonal from whether any given vCPU is making any
>> given hypercall while in 64-bit mode. The 'long_mode' is *latched* at
>> certain specific times which are defined by Xen's historical behaviour.
>>
>> I'm suggesting that you clean up longmode→is_64bit for the *hypercalls*
>> but leave 'long_mode' as is.
>>
> 
> Yes, will only do it for is_64_bit_hypercall().
> 
>>
> 


^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Aneesh Kumar K.V @ 2026-05-20  8:11 UTC (permalink / raw)
  To: Greg KH
  Cc: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel,
	Catalin Marinas, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <yq5apl2txmav.fsf@kernel.org>


Hi Greg,

Aneesh Kumar K.V <aneesh.kumar@kernel.org> writes:

> Greg KH <gregkh@linuxfoundation.org> writes:
>
>> On Thu, May 14, 2026 at 08:07:27PM +0530, Aneesh Kumar K.V wrote:
>>> Greg KH <gregkh@linuxfoundation.org> writes:
>>> 
>>> > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
>>> >> Hi Aneesh
>>> >> 
>>> >> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
>>> >> > Make the SMCCC driver responsible for registering the arm-smccc platform
>>> >> > device and after confirming the relevant SMCCC function IDs, create
>>> >> > the arm_cca_guest auxiliary device.
>>> >> > 
>>> >> 
>>> >> There are a few changes squashed in to this patch. Please could we
>>> >> split the patch in the following order ?
>>> >> 
>>> >> 1. Add platform device for arm-smccc
>>> >
>>> > Do not make any more "fake" platform devices please.
>>> >
>>> >> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
>>> >> it before the RSI changes)
>>> >
>>> > No, move it to the faux api please.
>>> >
>>> 
>>> 
>>> Maybe I was not complete in my previous reply. I did not want to repeat
>>> the entire thread, so I quoted the lore link for more details.
>>> 
>>> 1. We have platform firmware-provided SMCCC interfaces. Based on the
>>> support/availability of these function IDs, we want to load multiple
>>> drivers.
>>> 2. This patch series adds a platform device to represent the
>>> firmware-provided SMCCC resource.
>>> 3. Different SMCCC ranges are now represented as auxiliary devices.
>>> 4. Different subsystems, such as TSM, can autoload their backend drivers
>>> based on the availability of these SMCCC ranges, which are now
>>> represented as auxiliary devices.
>>> 
>>> You had agreed to all of this in the previous discussion here:
>>> https://lore.kernel.org/all/2025101516-handbook-hyphen-62ec@gregkh
>>
>> Then why did someone say "this is a fake platform device with no actual
>> resources"?  That's what I was triggering off of.
>>
>> Again, if you have actual platform resources, GREAT, use a platform
>> device and aux.  If you do not, then do NOT use a platform device.
>>
>> totally confused,
>>
>> greg k-h
>
> I have now rewritten the cover letter as below. Let me know if this
> helps.
>
> Switch Arm SMCCC firmware services to auxiliary devices
>
> As discussed here:
> https://lore.kernel.org/all/20250728135216.48084-12-aneesh.kumar@kernel.org
>
> The earlier CCA guest support used an arm-cca-dev platform device as a pure
> software anchor for the TSM class device. That platform device did not
> correspond to a DT/ACPI described device, MMIO range, interrupt, or other
> platform resource; it existed only to make the CCA guest driver bind and to
> place the resulting TSM device in the driver model. The same pattern also
> exists for smccc_trng. Creating separate platform devices for such
> SMCCC-discovered features is misleading, because those features are not
> independent platform devices.
>
> This series changes the model so that there is a single arm-smccc platform
> device representing the SMCCC firmware interface itself. The firmware
> interface, including its discoverable SMCCC function space, is the
> resource: after PSCI/SMCCC conduit discovery, the kernel can query SMCCC
> function IDs and determine whether optional firmware services are present.
> Services such as SMCCC TRNG and Realm Services Interface (RSI) are
> therefore represented as children of the arm-smccc device, and are created
> only when the required SMCCC function IDs and ABI checks succeed.
>
> The child devices use the auxiliary bus deliberately: they are intended to
> bind independent feature drivers, not just to provide a driverless object for
> sysfs or other class-device anchoring. They are firmware-provided functions
> of the parent SMCCC interface that are consumed by separate kernel drivers
> in different subsystems, such as hwrng and virt/coco/TSM. Those drivers
> need normal driver-core matching, probe/remove lifetime, and module
> autoloading based on the discovered firmware feature. The auxiliary bus
> provides a MODALIAS and id-table based binding model for that case, while
> keeping the feature drivers off the platform bus. A faux device was
> considered, but not used because it is suited for simple software objects
> that do not need independent bus/driver binding. The faux bus has no
> feature-driver id-table or MODALIAS matching, so it would not preserve the
> module-autoload flow that the current platform-device based users rely on.
>
> In other words, the parent arm-smccc device represents the firmware
> resource exposed through the SMCCC conduit, and each auxiliary child
> represents one discovered firmware service of that parent. This removes the
> unnecessary per-feature platform devices while retaining automatic loading
> and independent subsystem drivers for the SMCCC services.
>
> The TSM framework uses the device abstraction to provide cross-architecture
> TSM and TEE I/O functionality, including enumerating available platform TEE
> I/O capabilities and provisioning connections between the platform TSM and
> device DSMs.  For Arm CCA, the RSI auxiliary device continues to provide the
> device anchor used by the CCA guest TSM provider.
>
> For the CCA platform, the resulting device hierarchy appears as follows.
> Note that the auxiliary device is parented by the arm-smccc platform device,
> so the sysfs path remains under /devices/platform/arm-smccc/:
>
> $ cd /sys/class/tsm/
> $ ls -al
> total 0
> drwxr-xr-x    2 root     root             0 Jan  1 00:02 .
> drwxr-xr-x   23 root     root             0 Jan  1 00:00 ..
> lrwxrwxrwx    1 root     root             0 Jan  1 00:03 tsm0 -> ../../devices/platform/arm-smccc/arm_cca_guest.arm-rsi-dev.0/tsm/tsm0
> $
>
> The series also replaces the old arm-cca-dev userspace-visible dummy device
> with /sys/firmware/cca/realm_guest for detecting whether the kernel is
> running in a Realm.  This keeps the guest-state ABI under /sys/firmware and
> separates it from the internal driver-binding device used by the CCA guest
> TSM provider.
>
>
> -aneesh

Gentle ping, could you let me know if the updated cover letter helps
clarify the confusion regarding the platform-device usage here?

-aneesh

^ permalink raw reply

* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: David Woodhouse @ 2026-05-20  8:27 UTC (permalink / raw)
  To: Binbin Wu, Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, Paul Durrant,
	Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang
In-Reply-To: <3beeaf04-e4f9-44cf-a3a3-04fa12912848@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 2980 bytes --]

On Wed, 2026-05-20 at 13:02 +0800, Binbin Wu wrote:
> > > > For this one, I think the current KVM code is consistent.
> > > > The format is determined by EFER.LMA, whether the guest is running in 64 bit or
> > > > compatible mode doesn't change the ABI.
> 
> I still have a point of confusion.
> 
> I noticed a behavioral mismatch between KVM and Xen regarding when they switch
> to the standard/compat shared info.
> - In Xen: The 32-bit shared info structure is latched if the current vCPU is
>   not in 64-bit mode:
>   hvm_latch_shinfo_size
>       d->arch.has_32bit_shinfo = hvm_guest_x86_mode(current) != X86_MODE_64BIT
> 
> - In KVM: It evaluates is_long_mode(vcpu) instead. E.g.,
>   kvm_xen_write_hypercall_page
>       bool lm = is_long_mode(vcpu);
>       ...
>       kvm->arch.xen.long_mode = lm;

Nice catch. That should probably use is_64_bit_hypercall() too, yes?

> In theory, these two checks could differ when the guest kernel is running in
> a 32-bit compatibility mode. However, I believe this mismatch is fine in
> practice for two reasons:
> - Mainstream 64-bit OSes don't run in compatibility mode for kernel code after
>   the early init.

Although... some of the early init does involve 32-bit mode and it
wouldn't be impossible for the guest to build the hypercall page from
32-bit mode during startup.

> - By default, HVM guests cannot issue hypercalls from userspace. The only one
>   exception HVMOP_guest_request_vm_event is not related to the share info.

A third reason: This is only one of *two* places where the guest mode
gets latched. The guest's mode is also latched when it sets the
HVM_PARAM_CALLBACK_IRQ parameter. When running under KVM, the VMM
handles this and tells the kernel via the KVM_XEN_ATTR_TYPE_LONG_MODE
attribute.

So even if it doesn't latch correctly when setting the hypercall page,
it will later.

And Linux at least will set KVM_PARAM_CALLBACK_IRQ even if it isn't
using it, in xen_set_upcall_vector() with a comment saying 'Trick
toolstack to think we are enlightened'.

> So the vCPU will never be in compatibility mode when a related hypercall occurs.
> In this specific operational context, evaluating is_long_mode() yields the
> exact same functional outcome as checking for 64-bit execution mode. Am I
> missing anything here?

I think you're right. We have thus far launched about 5 billion Xen
guests with this, and I've never heard any report about guests latching
the wrong mode. And there are a *lot* of random home-grown and network
appliance operating systems out there, and basic things like event
channels would fail to work if the shared_info was in the wrong mode.

That said, I would not be averse to fixing it. We could just use
is_64_bit_hypercall() in kvm_xen_write_hypercall_page(), right?

Or at the very least a comment saying that it *should* be doing so, but
that we're too nervous to change it now? 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: David Woodhouse @ 2026-05-20  8:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-4-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

On Thu, 2026-05-14 at 14:53 -0700, Sean Christopherson wrote:
> Don't truncate RAX when handling a Xen hypercall for a guest with protected
> state, as KVM's ABI is to assume the guest is in 64-bit for such cases
> (the guest leaving garbage in 63:32 after a transition to 32-bit mode is
> far less likely than 63:32 being necessary to complete the hypercall).
> 

The latter isn't likely either. RAX is the system call number, and
those numbers are only in double digits; it'll be a while before we
need more than 8 bits for those, let alone 32 :)

But sure, as a cleanup it makes sense. Thanks.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

> Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/xen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 6d9be74bb673..895095dc684e 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -1678,15 +1678,14 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  	bool handled = false;
>  	u8 cpl;
>  
> -	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
> -
>  	/* Hyper-V hypercalls get bit 31 set in EAX */
> -	if ((input & 0x80000000) &&
> +	if ((kvm_rax_read(vcpu) & 0x80000000) &&
>  	    kvm_hv_hypercall_enabled(vcpu))
>  		return kvm_hv_hypercall(vcpu);
>  
>  	longmode = is_64_bit_hypercall(vcpu);
>  	if (!longmode) {
> +		input = (u32)kvm_rax_read(vcpu);
>  		params[0] = (u32)kvm_rbx_read(vcpu);
>  		params[1] = (u32)kvm_rcx_read(vcpu);
>  		params[2] = (u32)kvm_rdx_read(vcpu);
> @@ -1696,6 +1695,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  	}
>  	else {
>  #ifdef CONFIG_X86_64
> +		input = (u64)kvm_rax_read(vcpu);
>  		params[0] = (u64)kvm_rdi_read(vcpu);
>  		params[1] = (u64)kvm_rsi_read(vcpu);
>  		params[2] = (u64)kvm_rdx_read(vcpu);


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v9 10/23] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates
From: Binbin Wu @ 2026-05-20  9:18 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, dave.hansen, djbw, ira.weiny,
	kai.huang, kas, nik.borisov, paulmck, pbonzini, reinette.chatre,
	rick.p.edgecombe, sagis, seanjc, tony.lindgren, vannapurve,
	vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-11-chao.gao@intel.com>



On 5/13/2026 11:09 PM, Chao Gao wrote:

> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
> index 7269a239bc22..7b345000d7c3 100644
> --- a/arch/x86/virt/vmx/tdx/seamldr.c
> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
> @@ -6,6 +6,7 @@
>   */
>  #define pr_fmt(fmt)	"seamldr: " fmt
>  
> +#include <linux/mm.h>

This is not needed in this patch.

>  #include <linux/spinlock.h>
>  
>  #include <asm/seamldr.h>
> @@ -41,3 +42,17 @@ int seamldr_get_info(struct seamldr_info *seamldr_info)
>  	return seamldr_call(P_SEAMLDR_INFO, &args);
>  }
>  EXPORT_SYMBOL_FOR_MODULES(seamldr_get_info, "tdx-host");
> +
> +/**
> + * seamldr_install_module - Install a new TDX module.
> + * @data: Pointer to the TDX module image.
> + * @size: Size of the TDX module image.
> + *
> + * Returns 0 on success, negative error code on failure.
> + */
> +int seamldr_install_module(const u8 *data, u32 size)
> +{
> +	/* TODO: Update TDX module here */
> +	return 0;
> +}
> +EXPORT_SYMBOL_FOR_MODULES(seamldr_install_module, "tdx-host");



^ permalink raw reply

* Re: [PATCH v9 10/23] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates
From: Chao Gao @ 2026-05-20 11:23 UTC (permalink / raw)
  To: Binbin Wu
  Cc: kvm, linux-coco, linux-kernel, dave.hansen, djbw, ira.weiny,
	kai.huang, kas, nik.borisov, paulmck, pbonzini, reinette.chatre,
	rick.p.edgecombe, sagis, seanjc, tony.lindgren, vannapurve,
	vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <804827af-f094-41e6-bba9-5f57d0e35cd0@linux.intel.com>

On Wed, May 20, 2026 at 05:18:03PM +0800, Binbin Wu wrote:
>
>
>On 5/13/2026 11:09 PM, Chao Gao wrote:
>
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index 7269a239bc22..7b345000d7c3 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -6,6 +6,7 @@
>>   */
>>  #define pr_fmt(fmt)	"seamldr: " fmt
>>  
>> +#include <linux/mm.h>
>
>This is not needed in this patch.

Right. linux/mm.h is only needed for vmalloc_to_pfn() in the next
patch, so I will move it there.

^ permalink raw reply

* Re: [PATCH v6 02/43] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
From: Fuad Tabba @ 2026-05-20 12:08 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-2-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Rename the per-VM memory attributes Kconfig to make it explicitly about
> per-VM attributes in anticipation of adding memory attributes support to
> guest_memfd, at which point it will be possible (and desirable) to have
> memory attributes without the per-VM support, even in x86.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/Kconfig            |  6 +++---
>  arch/x86/kvm/mmu/mmu.c          |  2 +-
>  arch/x86/kvm/x86.c              |  2 +-
>  include/linux/kvm_host.h        |  8 ++++----
>  include/trace/events/kvm.h      |  4 ++--
>  virt/kvm/Kconfig                |  2 +-
>  virt/kvm/kvm_main.c             | 14 +++++++-------
>  8 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c470e40a00aa4..60b997764beef 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2369,7 +2369,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>                        int tdp_max_root_level, int tdp_huge_page_level);
>
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
>  #endif
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 801bf9e520db3..26f6afd51bbdc 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -84,7 +84,7 @@ config KVM_SW_PROTECTED_VM
>         bool "Enable support for KVM software-protected VMs"
>         depends on EXPERT
>         depends on KVM_X86 && X86_64
> -       select KVM_GENERIC_MEMORY_ATTRIBUTES
> +       select KVM_VM_MEMORY_ATTRIBUTES
>         help
>           Enable support for KVM software-protected VMs.  Currently, software-
>           protected VMs are purely a development and testing vehicle for
> @@ -135,7 +135,7 @@ config KVM_INTEL_TDX
>         bool "Intel Trust Domain Extensions (TDX) support"
>         default y
>         depends on INTEL_TDX_HOST
> -       select KVM_GENERIC_MEMORY_ATTRIBUTES
> +       select KVM_VM_MEMORY_ATTRIBUTES
>         select HAVE_KVM_ARCH_GMEM_POPULATE
>         help
>           Provides support for launching Intel Trust Domain Extensions (TDX)
> @@ -159,7 +159,7 @@ config KVM_AMD_SEV
>         depends on KVM_AMD && X86_64
>         depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>         select ARCH_HAS_CC_PLATFORM
> -       select KVM_GENERIC_MEMORY_ATTRIBUTES
> +       select KVM_VM_MEMORY_ATTRIBUTES
>         select HAVE_KVM_ARCH_GMEM_PREPARE
>         select HAVE_KVM_ARCH_GMEM_INVALIDATE
>         select HAVE_KVM_ARCH_GMEM_POPULATE
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 892246204435c..a80a876ab4ad6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7899,7 +7899,7 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
>                 vhost_task_stop(kvm->arch.nx_huge_page_recovery_thread);
>  }
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn,
>                                 int level)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0a1b63c63d1a9..1560de1e95be0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13625,7 +13625,7 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
>                 }
>         }
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         kvm_mmu_init_memslot_memory_attributes(kvm, slot);
>  #endif
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 4c14aee1fb063..7b9faa3545300 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -#ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifndef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  {
>         return false;
> @@ -871,7 +871,7 @@ struct kvm {
>  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
>         struct notifier_block pm_notifier;
>  #endif
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         /* Protected by slots_lock (for writes) and RCU (for reads) */
>         struct xarray mem_attr_array;
>  #endif
> @@ -2528,7 +2528,7 @@ static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
>         return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
>  }
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>  {
>         return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
> @@ -2550,7 +2550,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return false;
>  }
> -#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> +#endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  #ifdef CONFIG_KVM_GUEST_MEMFD
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
> index b282e3a867696..1ba72bd73ea2f 100644
> --- a/include/trace/events/kvm.h
> +++ b/include/trace/events/kvm.h
> @@ -358,7 +358,7 @@ TRACE_EVENT(kvm_dirty_ring_exit,
>         TP_printk("vcpu %d", __entry->vcpu_id)
>  );
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  /*
>   * @start:     Starting address of guest memory range
>   * @end:       End address of guest memory range
> @@ -383,7 +383,7 @@ TRACE_EVENT(kvm_vm_set_mem_attributes,
>         TP_printk("%#016llx -- %#016llx [0x%lx]",
>                   __entry->start, __entry->end, __entry->attr)
>  );
> -#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> +#endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  TRACE_EVENT(kvm_unmap_hva_range,
>         TP_PROTO(unsigned long start, unsigned long end),
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 794976b88c6f9..5119cb37145fc 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -100,7 +100,7 @@ config KVM_ELIDE_TLB_FLUSH_IF_YOUNG
>  config KVM_MMU_LOCKLESS_AGING
>         bool
>
> -config KVM_GENERIC_MEMORY_ATTRIBUTES
> +config KVM_VM_MEMORY_ATTRIBUTES
>         bool
>
>  config KVM_GUEST_MEMFD
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 89489996fbc1e..306153abbafa5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1115,7 +1115,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>         spin_lock_init(&kvm->mn_invalidate_lock);
>         rcuwait_init(&kvm->mn_memslots_update_rcuwait);
>         xa_init(&kvm->vcpu_array);
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         xa_init(&kvm->mem_attr_array);
>  #endif
>
> @@ -1300,7 +1300,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>         cleanup_srcu_struct(&kvm->irq_srcu);
>         srcu_barrier(&kvm->srcu);
>         cleanup_srcu_struct(&kvm->srcu);
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         xa_destroy(&kvm->mem_attr_array);
>  #endif
>         kvm_arch_free_vm(kvm);
> @@ -2418,7 +2418,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static u64 kvm_supported_mem_attributes(struct kvm *kvm)
>  {
>         if (!kvm || kvm_arch_has_private_mem(kvm))
> @@ -2623,7 +2623,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>
>         return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
>  }
> -#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> +#endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
>  {
> @@ -4921,7 +4921,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>         case KVM_CAP_SYSTEM_EVENT_DATA:
>         case KVM_CAP_DEVICE_CTRL:
>                 return 1;
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         case KVM_CAP_MEMORY_ATTRIBUTES:
>                 return kvm_supported_mem_attributes(kvm);
>  #endif
> @@ -5325,7 +5325,7 @@ static long kvm_vm_ioctl(struct file *filp,
>                 break;
>         }
>  #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         case KVM_SET_MEMORY_ATTRIBUTES: {
>                 struct kvm_memory_attributes attrs;
>
> @@ -5336,7 +5336,7 @@ static long kvm_vm_ioctl(struct file *filp,
>                 r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs);
>                 break;
>         }
> -#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> +#endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>         case KVM_CREATE_DEVICE: {
>                 struct kvm_create_device cd;
>
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 03/43] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined
From: Fuad Tabba @ 2026-05-20 12:08 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-3-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Explicitly guard reporting support for KVM_MEMORY_ATTRIBUTE_PRIVATE based
> on kvm_arch_has_private_mem being #defined in anticipation of decoupling
> kvm_supported_mem_attributes() from CONFIG_KVM_VM_MEMORY_ATTRIBUTES.
> guest_memfd support for memory attributes will be unconditional to avoid
> yet more macros (all architectures that support guest_memfd are expected to
> use per-gmem attributes at some point), at which point enumerating support
> KVM_MEMORY_ATTRIBUTE_PRIVATE based solely on memory attributes being
> supported _somewhere_ would result in KVM over-reporting support on arm64.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  include/linux/kvm_host.h | 2 +-
>  virt/kvm/kvm_main.c      | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7b9faa3545300..7d079f9701346 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -#ifndef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#ifndef kvm_arch_has_private_mem
>  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  {
>         return false;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 306153abbafa5..abb9cfa3eb04d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2421,8 +2421,10 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static u64 kvm_supported_mem_attributes(struct kvm *kvm)
>  {
> +#ifdef kvm_arch_has_private_mem
>         if (!kvm || kvm_arch_has_private_mem(kvm))
>                 return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +#endif
>
>         return 0;
>  }
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 04/43] KVM: Stub in ability to disable per-VM memory attribute tracking
From: Fuad Tabba @ 2026-05-20 12:08 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-4-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Introduce the basic infrastructure to allow per-VM memory attribute
> tracking to be disabled. This will be built-upon in a later patch, where a
> module param can disable per-VM memory attribute tracking.
>
> Split the Kconfig option into a base KVM_MEMORY_ATTRIBUTES and the
> existing KVM_VM_MEMORY_ATTRIBUTES. The base option provides the core
> plumbing, while the latter enables the full per-VM tracking via an xarray
> and the associated ioctls.
>
> kvm_get_memory_attributes() now performs a static call that either looks up
> kvm->mem_attr_array with CONFIG_KVM_VM_MEMORY_ATTRIBUTES is enabled, or
> just returns 0 otherwise. The static call can be patched depending on
> whether per-VM tracking is enabled by the CONFIG.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  include/linux/kvm_host.h        | 23 ++++++++++++---------
>  virt/kvm/Kconfig                |  4 ++++
>  virt/kvm/kvm_main.c             | 44 ++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 62 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 60b997764beef..c9aa50bcdac2d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2369,7 +2369,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>                        int tdp_max_root_level, int tdp_huge_page_level);
>
>
> -#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
>  #endif
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7d079f9701346..c5ba2cb34e45c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2528,19 +2528,15 @@ static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
>         return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
>  }
>
> -#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> +typedef unsigned long (kvm_get_memory_attributes_t)(struct kvm *kvm, gfn_t gfn);
> +DECLARE_STATIC_CALL(__kvm_get_memory_attributes, kvm_get_memory_attributes_t);
> +
>  static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>  {
> -       return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
> +       return static_call(__kvm_get_memory_attributes)(kvm, gfn);
>  }
>
> -bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> -                                    unsigned long mask, unsigned long attrs);
> -bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> -                                       struct kvm_gfn_range *range);
> -bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> -                                        struct kvm_gfn_range *range);
> -
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> @@ -2550,6 +2546,15 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return false;
>  }
> +#endif
> +
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> +                                    unsigned long mask, unsigned long attrs);
> +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> +                                       struct kvm_gfn_range *range);
> +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +                                        struct kvm_gfn_range *range);
>  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  #ifdef CONFIG_KVM_GUEST_MEMFD
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 5119cb37145fc..3fea89c45cfb4 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -100,7 +100,11 @@ config KVM_ELIDE_TLB_FLUSH_IF_YOUNG
>  config KVM_MMU_LOCKLESS_AGING
>         bool
>
> +config KVM_MEMORY_ATTRIBUTES
> +       bool
> +
>  config KVM_VM_MEMORY_ATTRIBUTES
> +       select KVM_MEMORY_ATTRIBUTES
>         bool
>
>  config KVM_GUEST_MEMFD
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index abb9cfa3eb04d..ee26f1d9b5fda 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -101,6 +101,17 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink);
>  static bool __ro_after_init allow_unsafe_mappings;
>  module_param(allow_unsafe_mappings, bool, 0444);
>
> +#ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +static bool vm_memory_attributes = true;
> +#else
> +#define vm_memory_attributes false
> +#endif
> +DEFINE_STATIC_CALL_RET0(__kvm_get_memory_attributes, kvm_get_memory_attributes_t);
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_KEY(__kvm_get_memory_attributes));
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_TRAMP(__kvm_get_memory_attributes));
> +#endif
> +
>  /*
>   * Ordering of locks:
>   *
> @@ -2418,7 +2429,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
> -#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
>  static u64 kvm_supported_mem_attributes(struct kvm *kvm)
>  {
>  #ifdef kvm_arch_has_private_mem
> @@ -2429,6 +2440,12 @@ static u64 kvm_supported_mem_attributes(struct kvm *kvm)
>         return 0;
>  }
>
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> +{
> +       return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
> +}
> +
>  /*
>   * Returns true if _all_ gfns in the range [@start, @end) have attributes
>   * such that the bits in @mask match @attrs.
> @@ -2625,7 +2642,24 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>
>         return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
>  }
> +#else  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
> +static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> +{
> +       BUILD_BUG_ON(1);
> +}
>  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
> +static void kvm_init_memory_attributes(void)
> +{
> +       if (vm_memory_attributes)
> +               static_call_update(__kvm_get_memory_attributes,
> +                                  kvm_get_vm_memory_attributes);
> +       else
> +               static_call_update(__kvm_get_memory_attributes,
> +                                  (void *)__static_call_return0);
> +}
> +#else  /* CONFIG_KVM_MEMORY_ATTRIBUTES */
> +static void kvm_init_memory_attributes(void) { }
> +#endif /* CONFIG_KVM_MEMORY_ATTRIBUTES */
>
>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
>  {
> @@ -4925,6 +4959,9 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>                 return 1;
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         case KVM_CAP_MEMORY_ATTRIBUTES:
> +               if (!vm_memory_attributes)
> +                       return 0;
> +
>                 return kvm_supported_mem_attributes(kvm);
>  #endif
>  #ifdef CONFIG_KVM_GUEST_MEMFD
> @@ -5331,6 +5368,10 @@ static long kvm_vm_ioctl(struct file *filp,
>         case KVM_SET_MEMORY_ATTRIBUTES: {
>                 struct kvm_memory_attributes attrs;
>
> +               r = -ENOTTY;
> +               if (!vm_memory_attributes)
> +                       goto out;
> +
>                 r = -EFAULT;
>                 if (copy_from_user(&attrs, argp, sizeof(attrs)))
>                         goto out;
> @@ -6527,6 +6568,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
>         kvm_preempt_ops.sched_in = kvm_sched_in;
>         kvm_preempt_ops.sched_out = kvm_sched_out;
>
> +       kvm_init_memory_attributes();
>         kvm_init_debug();
>
>         r = kvm_vfio_ops_init();
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Fuad Tabba @ 2026-05-20 12:08 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-5-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Implement kvm_gmem_get_memory_attributes() for guest_memfd to allow the KVM
> core and architecture code to query per-GFN memory attributes.
>
> kvm_gmem_get_memory_attributes() finds the memory slot for a given GFN and
> queries the guest_memfd file's to determine if the page is marked as
> private.
>
> If vm_memory_attributes is not enabled, there is no shared/private tracking
> at the VM level. Install the guest_memfd implementation as long as
> guest_memfd is enabled to give guest_memfd a chance to respond on
> attributes.
>
> guest_memfd should look up attributes regardless of whether this memslot is
> gmem-only since attributes are now tracked by gmem regardless of whether
> mmap() is enabled.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  include/linux/kvm_host.h |  2 ++
>  virt/kvm/guest_memfd.c   | 31 +++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  3 +++
>  3 files changed, 36 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c5ba2cb34e45c..28a54298d27db 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2557,6 +2557,8 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>                                          struct kvm_gfn_range *range);
>  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn);
> +
>  #ifdef CONFIG_KVM_GUEST_MEMFD
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                      gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 5011d38820d0d..f055e058a3f28 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -509,6 +509,37 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>         return 0;
>  }
>
> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> +{
> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> +       struct inode *inode;
> +
> +       /*
> +        * If this gfn has no associated memslot, there's no chance of the gfn
> +        * being backed by private memory, since guest_memfd must be used for
> +        * private memory, and guest_memfd must be associated with some memslot.
> +        */
> +       if (!slot)
> +               return 0;
> +
> +       CLASS(gmem_get_file, file)(slot);
> +       if (!file)
> +               return 0;
> +
> +       inode = file_inode(file);
> +
> +       /*
> +        * Rely on the maple tree's internal RCU lock to ensure a
> +        * stable result. This result can become stale as soon as the
> +        * lock is dropped, so the caller _must_ still protect
> +        * consumption of private vs. shared by checking
> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> +        * against ongoing attribute updates.
> +        */
> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
> +}

Doesn't this imply that all consumers of kvm_mem_is_private() should
validate the result using mmu_lock and the invalidation sequence?
sev_handle_rmp_fault() calls kvm_mem_is_private() without holding
mmu_lock and without any retry mechanism. Is that a problem?

Cheers,
/fuad


> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_memory_attributes);
> +
>  static struct file_operations kvm_gmem_fops = {
>         .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ee26f1d9b5fda..4139e903f756a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2653,6 +2653,9 @@ static void kvm_init_memory_attributes(void)
>         if (vm_memory_attributes)
>                 static_call_update(__kvm_get_memory_attributes,
>                                    kvm_get_vm_memory_attributes);
> +       else if (IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
> +               static_call_update(__kvm_get_memory_attributes,
> +                                  kvm_gmem_get_memory_attributes);
>         else
>                 static_call_update(__kvm_get_memory_attributes,
>                                    (void *)__static_call_return0);
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 06/43] KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level
From: Fuad Tabba @ 2026-05-20 13:33 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-6-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> When the maximum mapping level is queried, KVM's MMU lock is held, and
> while the MMU lock is held, guest_memfd cannot take the
> filemap_invalidate_lock() to look up the current shared/private state of
> the gfn, for these reasons:
>
> + The MMU lock is a spinlock or rwlock and cannot be held while taking a
>   lock that can sleep.
> + In guest_memfd's code paths (such as truncate), the
>   filemap_invalidate_lock() is held while taking the MMU lock, and taking
>   the locks in reverse order would introduce a AB-BA deadlock.
>
> Currently, the maximum mapping level is only queried from guest_memfd in
> the process of recovering huge pages, if dirty logging is disabled on a
> memslot. Dirty logging is not currently supported for guest_memfd, and
> guest_memfd memslots also cannot be updated.
>
> For now, bug the VM if guest_memfd needs to be queried to determine the
> maximum mapping level. This guard can be removed if/when support is added.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a80a876ab4ad6..153bcc5369985 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3357,6 +3357,15 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
>                 max_level = fault->max_level;
>                 is_private = fault->is_private;
>         } else {
> +               /*
> +                * Memory attributes cannot be obtained from guest_memfd while
> +                * the MMU lock is held.
> +                */
> +               if (KVM_BUG_ON(static_call_query(__kvm_get_memory_attributes) ==
> +                              kvm_gmem_get_memory_attributes, kvm)) {
> +                       return 0;
> +               }
> +

This directly takes the address of kvm_gmem_get_memory_attributes,
which is only compiled if CONFIG_KVM_GUEST_MEMFD=y. This breaks
ARCH=i386.

Cheers,
/fuad

>                 max_level = PG_LEVEL_NUM;
>                 is_private = kvm_mem_is_private(kvm, gfn);
>         }
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply

* [PATCH v10 00/25] Runtime TDX module update support
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, x86, linux-kernel, linux-rt-devel, linux-doc
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet, Shuah Khan

Hi Dave & Rick,

Thanks for your thorough review of v9. This v10 addresses the issues you
pointed out. The main changes in this version are polishing changelogs
and variable renames to improve readability. Specifically:

   - Patches 1-2 (new): Split the original "Consolidate TDX global
     initialization states" into two steps — first move the statics to
     file scope, then clarify the result-caching logic in
     try_init_module_global().
   - Patch 6: Removed user-facing Kconfig help text for TDX_HOST_SERVICES
     (now a silent tristate auto-selected by INTEL_TDX_HOST).
   - Patch 13: Renamed "size" to "data_len" in seamldr_install_module()
     and init_seamldr_params(); renamed "HEADER_SIZE" to
     "TDX_IMAGE_HEADER_SIZE"; renamed "primary" to "is_lead_cpu" in the
     update state machine.
   - Patch 13: Added early data_len validation and explicit bounds checks
     on sigstruct_nr_pages/module_nr_pages against SEAMLDR_MAX_NR_*
     limits, removing the implicit clamping in populate_pa_list().
   - Patch 22: Fixed BIT(16) -> BIT_ULL(16) for
     TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE.
   - Patch 22: Removed unused TDX_FEATURES0_UPDATE_COMPAT definition.
   - Various patches: Shortened sysfs ABI descriptions, tightened
     comments across seamldr.h and seamldr.c, and minor style fixes
     (return 0 -> return false, unfolded conditionals)

Please take a look at this new version. I hope it can still be merged
for 7.2.
---

(For transparency, note that I used AI tools to help proofread this
cover-letter and commit messages)

This series adds support for runtime TDX module updates that preserve
running TDX guests. It is also available at:

  https://github.com/gaochaointel/linux-dev/commits/tdx-module-updates-v10/

== Background ==

Intel TDX isolates Trusted Domains (TDs), or confidential guests, from the
host. A key component of Intel TDX is the TDX module, which enforces
security policies to protect the memory and CPU states of TDs from the
host. However, the TDX module is software that requires updates.

== Problems ==

Currently, the TDX module is loaded by the BIOS at boot time, and the only
way to update it is through a reboot, which results in significant system
downtime. Users expect the TDX module to be updatable at runtime without
disrupting TDX guests.

== Solution ==

On TDX platforms, P-SEAMLDR[1] is a component within the protected SEAM
range. It is loaded by the BIOS and provides the host with functions to
install a TDX module at runtime.

This series implements runtime TDX module updates through the fw_upload
mechanism. That interface is a good fit because TDX module selection is not
a simple "load a known file from disk" problem. The update image to load
depends on module versioning, compatibility rules. fw_upload lets userspace
choose the module explicitly while the kernel provides the update
mechanism.

This design intentionally keeps most update validation/policy in userspace.
The kernel exposes the information userspace needs, such as TDX module
version and P-SEAMLDR information, but userspace is responsible for
understanding TDX module's versioning and compatibility rules and for
choosing an appropriate update image (see "TDX module versioning" below).

The kernel still enforces the pieces that must be handled in-kernel:

1. Validate the tdx_blob header fields that are not passed through tothe
TDX module. Just the standard overflow and reserved bits defensive ABI stuff.

2. Make sure no non-update SEAMCALLs are called during the update.

3. Make sure SEAMCALLs are on the right CPU, for any the user has made
available to the kernel.

4. Handle the race between updates and concurrent TD builds by
returning -EBUSY to userspace.

Everything else remains a userspace responsibility.

In the unlikely event the update fails, for example userspace picks an
incompatible update image, or the image is otherwise corrupted, all TDs
will experience SEAMCALL failures and be killed. The recovery of TD
operation from that event requires a reboot.

Given there is no mechanism to quiesce SEAMCALLs, the TDs themselves must
pause execution over an update. The most straightforward way to meet the
'pause TDs while update executes' constraint is to run the update in
stop_machine() context. All other evaluated solutions export more
complexity to KVM, or exports more fragility to userspace.

== How to test this series ==

NOTE: This v10 uses a new tdx_blob format. The scripts and module blobs in
https://github.com/intel/tdx-module-binaries have not yet been updated
to match this version. Those updates will be done separately later.

== Other information relevant to Runtime TDX module updates ==

=== TDX module versioning ===

Each TDX module is assigned a version number x.y.z, where x represents the
"major" version, y the "minor" version, and z the "update" version.

Runtime TDX module updates are restricted to Z-stream releases.

Note that Z-stream releases do not necessarily guarantee compatibility. A
new release may not be compatible with all previous versions. To address this,
Intel provides a separate file containing compatibility information, which
specifies the minimum module version required for a particular update. This
information is referenced by the tool to determine if two modules are
compatible.

=== TCB Stability ===

Updates change the TCB as viewed by attestation reports. In TDX there is
a distinction between "launch-time" version and "current" version where
runtime TDX module updates cause that "current" version number to change,
subject to Z-stream constraints.

The concern that a malicious host may attack confidential VMs by loading
insecure updates was addressed by Alex in [3]. Similarly, the scenario
where some "theoretical paranoid tenant" in the cloud wants to audit
updates and stop trusting the host after updates until audit completion
was also addressed in [4]. Users not in the cloud control the host machine
and can manage updates themselves, so they don't have these concerns.

See more about the implications of current TCB version changes in
attestation as summarized by Dave in [5].

=== TDX module Distribution Model ===

At a high level, Intel publishes all TDX modules on the github [2], along
with a mapping_file.json which documents the compatibility information
about each TDX module and a userspace tool to install the TDX module. OS
vendors can package these modules and distribute them. Administrators
install the package and use the tool to select the appropriate TDX module
and install it via the interfaces exposed by this series.

[1]: https://cdrdv2.intel.com/v1/dl/getContent/733584
[2]: https://github.com/intel/tdx-module-binaries
[3]: https://lore.kernel.org/all/665c5ae0-4b7c-4852-8995-255adf7b3a2f@amazon.com/
[4]: https://lore.kernel.org/all/5d1da767-491b-4077-b472-2cc3d73246d6@amazon.com/
[5]: https://lore.kernel.org/all/94d6047e-3b7c-4bc1-819c-85c16ff85abf@intel.com/

Chao Gao (24):
  x86/virt/tdx: Clarify try_init_module_global() result caching
  x86/virt/tdx: Move TDX global initialization states to file scope
  x86/virt/tdx: Consolidate TDX global initialization states
  x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
  coco/tdx-host: Introduce a "tdx_host" device
  coco/tdx-host: Expose TDX module version
  x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs
  x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information
  coco/tdx-host: Expose P-SEAMLDR information via sysfs
  coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
  coco/tdx-host: Implement firmware upload sysfs ABI for TDX module
    updates
  x86/virt/seamldr: Allocate and populate a module update request
  x86/virt/seamldr: Introduce skeleton for TDX module updates
  x86/virt/seamldr: Abort updates after a failed step
  x86/virt/seamldr: Shut down the current TDX module
  x86/virt/tdx: Reset software states during TDX module shutdown
  x86/virt/seamldr: Install a new TDX module
  x86/virt/seamldr: Do TDX global and per-CPU init after module
    installation
  x86/virt/tdx: Restore TDX module state
  x86/virt/tdx: Refresh TDX module version after update
  x86/virt/tdx: Reject updates during compatibility-sensitive operations
  x86/virt/tdx: Enable TDX module runtime updates
  coco/tdx-host: Document TDX module update compatibility criteria
  x86/virt/tdx: Document TDX module update

Kai Huang (1):
  x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h>

 .../ABI/testing/sysfs-devices-faux-tdx-host   |  66 ++++
 Documentation/arch/x86/tdx.rst                |  34 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/seamldr.h                |  36 ++
 arch/x86/include/asm/tdx.h                    |  66 +---
 arch/x86/include/asm/tdx_global_metadata.h    |   4 +
 arch/x86/include/asm/vmx.h                    |   1 +
 arch/x86/virt/vmx/tdx/Makefile                |   2 +-
 arch/x86/virt/vmx/tdx/seamcall_internal.h     | 109 ++++++
 arch/x86/virt/vmx/tdx/seamldr.c               | 324 ++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c                   | 169 +++++----
 arch/x86/virt/vmx/tdx/tdx.h                   |   8 +-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c   |  17 +-
 drivers/virt/coco/Kconfig                     |   2 +
 drivers/virt/coco/Makefile                    |   1 +
 drivers/virt/coco/tdx-host/Kconfig            |   6 +
 drivers/virt/coco/tdx-host/Makefile           |   1 +
 drivers/virt/coco/tdx-host/tdx-host.c         | 231 +++++++++++++
 18 files changed, 965 insertions(+), 113 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-faux-tdx-host
 create mode 100644 arch/x86/include/asm/seamldr.h
 create mode 100644 arch/x86/virt/vmx/tdx/seamcall_internal.h
 create mode 100644 arch/x86/virt/vmx/tdx/seamldr.c
 create mode 100644 drivers/virt/coco/tdx-host/Kconfig
 create mode 100644 drivers/virt/coco/tdx-host/Makefile
 create mode 100644 drivers/virt/coco/tdx-host/tdx-host.c

base-commit: 5209e5bfe5cab593476c3e7754e42c5e47ce36de
-- 
2.52.0

^ permalink raw reply

* [PATCH v10 01/25] x86/virt/tdx: Clarify try_init_module_global() result caching
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

TDX module global initialization is executed only once. The first call
caches both the result and the "done" state, and later callers reuse the
saved result. A lock protects that cached state.

The current code is hard to read because sysinit_done is accessed under
the lock, while sysinit_ret is not.

To improve readability, move sysinit_ret accesses within the lock.

Group sysinit_ret/sysinit_done updates right after initialization so
Caching the result is separate from the initialization itself.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c0c6281b08a5..ad56f142dd0b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -115,28 +115,34 @@ static int try_init_module_global(void)
 	static DEFINE_RAW_SPINLOCK(sysinit_lock);
 	static bool sysinit_done;
 	static int sysinit_ret;
+	int ret;
 
 	raw_spin_lock(&sysinit_lock);
 
-	if (sysinit_done)
+	/* Return the "cached" return code. */
+	if (sysinit_done) {
+		ret = sysinit_ret;
 		goto out;
+	}
 
 	/* RCX is module attributes and all bits are reserved */
 	args.rcx = 0;
-	sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
+	ret = seamcall_prerr(TDH_SYS_INIT, &args);
 
 	/*
 	 * The first SEAMCALL also detects the TDX module, thus
 	 * it can fail due to the TDX module is not loaded.
 	 * Dump message to let the user know.
 	 */
-	if (sysinit_ret == -ENODEV)
+	if (ret == -ENODEV)
 		pr_err("module not loaded\n");
 
+	/* Save the return code for later callers. */
 	sysinit_done = true;
+	sysinit_ret = ret;
 out:
 	raw_spin_unlock(&sysinit_lock);
-	return sysinit_ret;
+	return ret;
 }
 
 /**
-- 
2.52.0


^ permalink raw reply related

* [PATCH v10 02/25] x86/virt/tdx: Move TDX global initialization states to file scope
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

TDX module global initialization is executed only once. The first call
caches both the result and the "done" state, and later callers reuse the
saved result. A lock protects that cached states.

Those states and the lock are currently kept as function-local statics
because they are used only by try_init_module_global().

TDX module updates need to reset the cached states so TDX global
initialization can be run again after an update. That will add another
access site in the same file.

Move the cached states to file scope so it is accessible outside
try_init_module_global(), and move the lock along with the states it
protects.

No functional change intended.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index ad56f142dd0b..40444a3c5cdd 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -105,6 +105,10 @@ static __always_inline int sc_retry_prerr(sc_func_t func,
 #define seamcall_prerr_ret(__fn, __args)					\
 	sc_retry_prerr(__seamcall_ret, seamcall_err_ret, (__fn), (__args))
 
+static DEFINE_RAW_SPINLOCK(sysinit_lock);
+static bool sysinit_done;
+static int sysinit_ret;
+
 /*
  * Do the module global initialization once and return its result.
  * It can be done on any cpu, and from task or IRQ context.
@@ -112,9 +116,6 @@ static __always_inline int sc_retry_prerr(sc_func_t func,
 static int try_init_module_global(void)
 {
 	struct tdx_module_args args = {};
-	static DEFINE_RAW_SPINLOCK(sysinit_lock);
-	static bool sysinit_done;
-	static int sysinit_ret;
 	int ret;
 
 	raw_spin_lock(&sysinit_lock);
-- 
2.52.0


^ permalink raw reply related

* [PATCH v10 03/25] x86/virt/tdx: Consolidate TDX global initialization states
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

The kernel uses several global flags to guard one-time TDX initialization
flows and prevent them from being repeated.

When the TDX module is updated, all of those states must be reset so that
the module can be initialized again. Today those states are kept as
separate global variables, which makes the reset path awkward and easy to
miss when a new state is added.

Group the states into a single structure so they can be reset together, for
example with memset(), and so a newly added state won't be missed.

Drop the __ro_after_init annotation from tdx_module_initialized because
the other two states do not have it. And with TDX module update support,
all the states need to be writable at runtime.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 40444a3c5cdd..71d39a79ef3f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -44,6 +44,13 @@
 #include <asm/virt.h>
 #include "tdx.h"
 
+struct tdx_module_state {
+	bool initialized;
+	bool sysinit_done;
+	int sysinit_ret;
+};
+
+static struct tdx_module_state tdx_module_state;
 static u32 tdx_global_keyid __ro_after_init;
 static u32 tdx_guest_keyid_start __ro_after_init;
 static u32 tdx_nr_guest_keyids __ro_after_init;
@@ -58,7 +65,6 @@ static struct tdmr_info_list tdx_tdmr_list;
 static LIST_HEAD(tdx_memlist);
 
 static struct tdx_sys_info tdx_sysinfo __ro_after_init;
-static bool tdx_module_initialized __ro_after_init;
 
 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
 
@@ -106,8 +112,6 @@ static __always_inline int sc_retry_prerr(sc_func_t func,
 	sc_retry_prerr(__seamcall_ret, seamcall_err_ret, (__fn), (__args))
 
 static DEFINE_RAW_SPINLOCK(sysinit_lock);
-static bool sysinit_done;
-static int sysinit_ret;
 
 /*
  * Do the module global initialization once and return its result.
@@ -121,8 +125,8 @@ static int try_init_module_global(void)
 	raw_spin_lock(&sysinit_lock);
 
 	/* Return the "cached" return code. */
-	if (sysinit_done) {
-		ret = sysinit_ret;
+	if (tdx_module_state.sysinit_done) {
+		ret = tdx_module_state.sysinit_ret;
 		goto out;
 	}
 
@@ -139,8 +143,8 @@ static int try_init_module_global(void)
 		pr_err("module not loaded\n");
 
 	/* Save the return code for later callers. */
-	sysinit_done = true;
-	sysinit_ret = ret;
+	tdx_module_state.sysinit_done = true;
+	tdx_module_state.sysinit_ret = ret;
 out:
 	raw_spin_unlock(&sysinit_lock);
 	return ret;
@@ -1306,7 +1310,7 @@ static __init int tdx_enable(void)
 
 	register_syscore(&tdx_syscore);
 
-	tdx_module_initialized = true;
+	tdx_module_state.initialized = true;
 	pr_info("TDX-Module initialized\n");
 	return 0;
 }
@@ -1561,7 +1565,7 @@ void __init tdx_init(void)
 
 const struct tdx_sys_info *tdx_get_sysinfo(void)
 {
-	if (!tdx_module_initialized)
+	if (!tdx_module_state.initialized)
 		return NULL;
 
 	return (const struct tdx_sys_info *)&tdx_sysinfo;
-- 
2.52.0


^ permalink raw reply related

* [PATCH v10 04/25] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

Future changes will add support for new TDX features exposed as
TDX_FEATURES0 bits. The presence of these features will need to be checked
outside of arch/x86/virt. So the feature query helpers, and the
TDX_FEATURES0 defines they reference, will need to live in the widely
accessible asm/tdx.h header. Move the existing TDX_FEATURES0 to asm/tdx.h
so that they can all be kept together.

Opportunistically switch to BIT_ULL() since TDX_FEATURES0 is 64-bit.

No functional change intended.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/kvm/20260427152854.101171-17-chao.gao@intel.com/ # [1]
Link: https://lore.kernel.org/kvm/20251121005125.417831-16-rick.p.edgecombe@intel.com/ # [2]
---
 arch/x86/include/asm/tdx.h  | 3 +++
 arch/x86/virt/vmx/tdx/tdx.h | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 15eac89b0afb..e2430dd0e4d5 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -32,6 +32,9 @@
 #define TDX_SUCCESS		0ULL
 #define TDX_RND_NO_ENTROPY	0x8000020300000000ULL
 
+/* Bit definitions of TDX_FEATURES0 metadata field */
+#define TDX_FEATURES0_NO_RBP_MOD	BIT_ULL(18)
+
 #ifndef __ASSEMBLER__
 
 #include <uapi/asm/mce.h>
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index e2cf2dd48755..76c5fb1e1ffe 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -85,9 +85,6 @@ struct tdmr_info {
 	DECLARE_FLEX_ARRAY(struct tdmr_reserved_area, reserved_areas);
 } __packed __aligned(TDMR_INFO_ALIGNMENT);
 
-/* Bit definitions of TDX_FEATURES0 metadata field */
-#define TDX_FEATURES0_NO_RBP_MOD	BIT(18)
-
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!
-- 
2.52.0


^ permalink raw reply related

* [PATCH v10 05/25] x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h>
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Zhenzhong Duan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

From: Kai Huang <kai.huang@intel.com>

TDX host core code implements three seamcall*() helpers to make SEAMCALLs
to the TDX module.  Currently, they are implemented in <asm/tdx.h> and
are exposed to other kernel code which includes <asm/tdx.h>.

However, other than the TDX host core, seamcall*() are not expected to
be used by other kernel code directly.  For instance, for all SEAMCALLs
that are used by KVM, the TDX host core exports a wrapper function for
each of them.

Move seamcall*() and related code out of <asm/tdx.h> and make them only
visible to TDX host core.

Since TDX host core tdx.c is already very heavy, don't put low level
seamcall*() code there but to a new dedicated "seamcall_internal.h".  Also,
currently tdx.c has seamcall_prerr*() helpers which additionally print
error message when calling seamcall*() fails.  Move them to
"seamcall_internal.h" as well. In such way all low level SEAMCALL helpers
are in a dedicated place, which is much more readable.

Copy the copyright notice from the original files and consolidate the
date ranges to:

	Copyright (C) 2021-2023 Intel Corporation

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vishal Annapurve <vannapurve@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/tdx.h                |  47 ----------
 arch/x86/virt/vmx/tdx/seamcall_internal.h | 109 ++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c               |  47 +---------
 3 files changed, 111 insertions(+), 92 deletions(-)
 create mode 100644 arch/x86/virt/vmx/tdx/seamcall_internal.h

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index e2430dd0e4d5..8b739ac01479 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -100,54 +100,7 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
 
 #ifdef CONFIG_INTEL_TDX_HOST
-u64 __seamcall(u64 fn, struct tdx_module_args *args);
-u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
-u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
 void tdx_init(void);
-
-#include <linux/preempt.h>
-#include <asm/archrandom.h>
-#include <asm/processor.h>
-
-typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
-
-static __always_inline u64 __seamcall_dirty_cache(sc_func_t func, u64 fn,
-						  struct tdx_module_args *args)
-{
-	lockdep_assert_preemption_disabled();
-
-	/*
-	 * SEAMCALLs are made to the TDX module and can generate dirty
-	 * cachelines of TDX private memory.  Mark cache state incoherent
-	 * so that the cache can be flushed during kexec.
-	 *
-	 * This needs to be done before actually making the SEAMCALL,
-	 * because kexec-ing CPU could send NMI to stop remote CPUs,
-	 * in which case even disabling IRQ won't help here.
-	 */
-	this_cpu_write(cache_state_incoherent, true);
-
-	return func(fn, args);
-}
-
-static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
-			   struct tdx_module_args *args)
-{
-	int retry = RDRAND_RETRY_LOOPS;
-	u64 ret;
-
-	do {
-		preempt_disable();
-		ret = __seamcall_dirty_cache(func, fn, args);
-		preempt_enable();
-	} while (ret == TDX_RND_NO_ENTROPY && --retry);
-
-	return ret;
-}
-
-#define seamcall(_fn, _args)		sc_retry(__seamcall, (_fn), (_args))
-#define seamcall_ret(_fn, _args)	sc_retry(__seamcall_ret, (_fn), (_args))
-#define seamcall_saved_ret(_fn, _args)	sc_retry(__seamcall_saved_ret, (_fn), (_args))
 const char *tdx_dump_mce_info(struct mce *m);
 const struct tdx_sys_info *tdx_get_sysinfo(void);
 
diff --git a/arch/x86/virt/vmx/tdx/seamcall_internal.h b/arch/x86/virt/vmx/tdx/seamcall_internal.h
new file mode 100644
index 000000000000..be5f446467df
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/seamcall_internal.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SEAMCALL utilities for TDX host-side operations.
+ *
+ * Provides convenient wrappers around SEAMCALL assembly with retry logic,
+ * error reporting and cache coherency tracking.
+ *
+ * Copyright (C) 2021-2023 Intel Corporation
+ */
+
+#ifndef _X86_VIRT_SEAMCALL_INTERNAL_H
+#define _X86_VIRT_SEAMCALL_INTERNAL_H
+
+#include <linux/printk.h>
+#include <linux/types.h>
+#include <asm/archrandom.h>
+#include <asm/processor.h>
+#include <asm/tdx.h>
+
+u64 __seamcall(u64 fn, struct tdx_module_args *args);
+u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
+u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
+
+typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
+
+static __always_inline u64 __seamcall_dirty_cache(sc_func_t func, u64 fn,
+						  struct tdx_module_args *args)
+{
+	lockdep_assert_preemption_disabled();
+
+	/*
+	 * SEAMCALLs are made to the TDX module and can generate dirty
+	 * cachelines of TDX private memory.  Mark cache state incoherent
+	 * so that the cache can be flushed during kexec.
+	 *
+	 * This needs to be done before actually making the SEAMCALL,
+	 * because kexec-ing CPU could send NMI to stop remote CPUs,
+	 * in which case even disabling IRQ won't help here.
+	 */
+	this_cpu_write(cache_state_incoherent, true);
+
+	return func(fn, args);
+}
+
+static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
+			   struct tdx_module_args *args)
+{
+	int retry = RDRAND_RETRY_LOOPS;
+	u64 ret;
+
+	do {
+		preempt_disable();
+		ret = __seamcall_dirty_cache(func, fn, args);
+		preempt_enable();
+	} while (ret == TDX_RND_NO_ENTROPY && --retry);
+
+	return ret;
+}
+
+#define seamcall(_fn, _args)		sc_retry(__seamcall, (_fn), (_args))
+#define seamcall_ret(_fn, _args)	sc_retry(__seamcall_ret, (_fn), (_args))
+#define seamcall_saved_ret(_fn, _args)	sc_retry(__seamcall_saved_ret, (_fn), (_args))
+
+typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
+
+static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
+{
+	pr_err("SEAMCALL (0x%016llx) failed: 0x%016llx\n", fn, err);
+}
+
+static inline void seamcall_err_ret(u64 fn, u64 err,
+				    struct tdx_module_args *args)
+{
+	seamcall_err(fn, err, args);
+	pr_err("RCX 0x%016llx RDX 0x%016llx R08 0x%016llx\n",
+			args->rcx, args->rdx, args->r8);
+	pr_err("R09 0x%016llx R10 0x%016llx R11 0x%016llx\n",
+			args->r9, args->r10, args->r11);
+}
+
+static __always_inline int sc_retry_prerr(sc_func_t func,
+					  sc_err_func_t err_func,
+					  u64 fn, struct tdx_module_args *args)
+{
+	u64 sret = sc_retry(func, fn, args);
+
+	if (sret == TDX_SUCCESS)
+		return 0;
+
+	if (sret == TDX_SEAMCALL_VMFAILINVALID)
+		return -ENODEV;
+
+	if (sret == TDX_SEAMCALL_GP)
+		return -EOPNOTSUPP;
+
+	if (sret == TDX_SEAMCALL_UD)
+		return -EACCES;
+
+	err_func(fn, sret, args);
+	return -EIO;
+}
+
+#define seamcall_prerr(__fn, __args)						\
+	sc_retry_prerr(__seamcall, seamcall_err, (__fn), (__args))
+
+#define seamcall_prerr_ret(__fn, __args)					\
+	sc_retry_prerr(__seamcall_ret, seamcall_err_ret, (__fn), (__args))
+
+#endif /* _X86_VIRT_SEAMCALL_INTERNAL_H */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 71d39a79ef3f..b329791db9c2 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -42,6 +42,8 @@
 #include <asm/processor.h>
 #include <asm/mce.h>
 #include <asm/virt.h>
+
+#include "seamcall_internal.h"
 #include "tdx.h"
 
 struct tdx_module_state {
@@ -66,51 +68,6 @@ static LIST_HEAD(tdx_memlist);
 
 static struct tdx_sys_info tdx_sysinfo __ro_after_init;
 
-typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
-
-static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
-{
-	pr_err("SEAMCALL (0x%016llx) failed: 0x%016llx\n", fn, err);
-}
-
-static inline void seamcall_err_ret(u64 fn, u64 err,
-				    struct tdx_module_args *args)
-{
-	seamcall_err(fn, err, args);
-	pr_err("RCX 0x%016llx RDX 0x%016llx R08 0x%016llx\n",
-			args->rcx, args->rdx, args->r8);
-	pr_err("R09 0x%016llx R10 0x%016llx R11 0x%016llx\n",
-			args->r9, args->r10, args->r11);
-}
-
-static __always_inline int sc_retry_prerr(sc_func_t func,
-					  sc_err_func_t err_func,
-					  u64 fn, struct tdx_module_args *args)
-{
-	u64 sret = sc_retry(func, fn, args);
-
-	if (sret == TDX_SUCCESS)
-		return 0;
-
-	if (sret == TDX_SEAMCALL_VMFAILINVALID)
-		return -ENODEV;
-
-	if (sret == TDX_SEAMCALL_GP)
-		return -EOPNOTSUPP;
-
-	if (sret == TDX_SEAMCALL_UD)
-		return -EACCES;
-
-	err_func(fn, sret, args);
-	return -EIO;
-}
-
-#define seamcall_prerr(__fn, __args)						\
-	sc_retry_prerr(__seamcall, seamcall_err, (__fn), (__args))
-
-#define seamcall_prerr_ret(__fn, __args)					\
-	sc_retry_prerr(__seamcall_ret, seamcall_err_ret, (__fn), (__args))
-
 static DEFINE_RAW_SPINLOCK(sysinit_lock);
 
 /*
-- 
2.52.0


^ permalink raw reply related

* [PATCH v10 06/25] coco/tdx-host: Introduce a "tdx_host" device
From: Chao Gao @ 2026-05-20 13:38 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Jonathan Cameron,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com>

TDX depends on a platform firmware module that is invoked via instructions
similar to vmenter (i.e. enter into a new privileged "root-mode" context to
manage private memory and private device mechanisms). It is a software
construct that depends on the CPU vmxon state to enable invocation of
TDX module ABIs. Unlike other Trusted Execution Environment (TEE) platform
implementations that employ a firmware module running on a PCI device with
an MMIO mailbox for communication, TDX has no hardware device to point to
as the TEE Secure Manager (TSM).

Create a virtual device not only to align with other implementations but
also to make it easier to

 - expose metadata (e.g., TDX module version, seamldr version etc) to
   the userspace as device attributes

 - implement firmware uploader APIs which are tied to a device. This is
   needed to support TDX module runtime updates

 - enable TDX Connect which will share a common infrastructure with other
   platform implementations. In the TDX Connect context, every
   architecture has a TSM, represented by a PCIe or virtual device. The
   new "tdx_host" device will serve the TSM role.

A faux device is used for TDX because the TDX module is singular within
the system and lacks associated platform resources. Using a faux device
eliminates the need to create a stub bus.

The call to tdx_get_sysinfo() ensures that the TDX module is ready to
provide services.

Note that AMD has a PCI device for the PSP for SEV and ARM CCA will
likely have a faux device [1].

Thanks to Dan and Yilun for all the help on this one.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/all/2025073035-bulginess-rematch-b92e@gregkh/ # [1]
---
v10:
 - Drop Kconfig prompt [Dave]
---
 arch/x86/virt/vmx/tdx/tdx.c           |  2 +-
 drivers/virt/coco/Kconfig             |  2 ++
 drivers/virt/coco/Makefile            |  1 +
 drivers/virt/coco/tdx-host/Kconfig    |  4 +++
 drivers/virt/coco/tdx-host/Makefile   |  1 +
 drivers/virt/coco/tdx-host/tdx-host.c | 43 +++++++++++++++++++++++++++
 6 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 drivers/virt/coco/tdx-host/Kconfig
 create mode 100644 drivers/virt/coco/tdx-host/Makefile
 create mode 100644 drivers/virt/coco/tdx-host/tdx-host.c

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index b329791db9c2..5fb0441a9ac6 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1527,7 +1527,7 @@ const struct tdx_sys_info *tdx_get_sysinfo(void)
 
 	return (const struct tdx_sys_info *)&tdx_sysinfo;
 }
-EXPORT_SYMBOL_FOR_KVM(tdx_get_sysinfo);
+EXPORT_SYMBOL_FOR_MODULES(tdx_get_sysinfo, "kvm-intel,tdx-host");
 
 u32 tdx_get_nr_guest_keyids(void)
 {
diff --git a/drivers/virt/coco/Kconfig b/drivers/virt/coco/Kconfig
index df1cfaf26c65..f7691f64fbe3 100644
--- a/drivers/virt/coco/Kconfig
+++ b/drivers/virt/coco/Kconfig
@@ -17,5 +17,7 @@ source "drivers/virt/coco/arm-cca-guest/Kconfig"
 source "drivers/virt/coco/guest/Kconfig"
 endif
 
+source "drivers/virt/coco/tdx-host/Kconfig"
+
 config TSM
 	bool
diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
index cb52021912b3..b323b0ae4f82 100644
--- a/drivers/virt/coco/Makefile
+++ b/drivers/virt/coco/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
 obj-$(CONFIG_ARM_PKVM_GUEST)	+= pkvm-guest/
 obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
 obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
+obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx-host/
 obj-$(CONFIG_ARM_CCA_GUEST)	+= arm-cca-guest/
 obj-$(CONFIG_TSM) 		+= tsm-core.o
 obj-$(CONFIG_TSM_GUEST)		+= guest/
diff --git a/drivers/virt/coco/tdx-host/Kconfig b/drivers/virt/coco/tdx-host/Kconfig
new file mode 100644
index 000000000000..cfe81b9c0364
--- /dev/null
+++ b/drivers/virt/coco/tdx-host/Kconfig
@@ -0,0 +1,4 @@
+config TDX_HOST_SERVICES
+	tristate
+	depends on INTEL_TDX_HOST
+	default m
diff --git a/drivers/virt/coco/tdx-host/Makefile b/drivers/virt/coco/tdx-host/Makefile
new file mode 100644
index 000000000000..e61e749a8dff
--- /dev/null
+++ b/drivers/virt/coco/tdx-host/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_TDX_HOST_SERVICES) += tdx-host.o
diff --git a/drivers/virt/coco/tdx-host/tdx-host.c b/drivers/virt/coco/tdx-host/tdx-host.c
new file mode 100644
index 000000000000..c77885392b09
--- /dev/null
+++ b/drivers/virt/coco/tdx-host/tdx-host.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * TDX host user interface driver
+ *
+ * Copyright (C) 2025 Intel Corporation
+ */
+
+#include <linux/device/faux.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+
+#include <asm/cpu_device_id.h>
+#include <asm/tdx.h>
+
+static const struct x86_cpu_id tdx_host_ids[] = {
+	X86_MATCH_FEATURE(X86_FEATURE_TDX_HOST_PLATFORM, NULL),
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, tdx_host_ids);
+
+static struct faux_device *fdev;
+
+static int __init tdx_host_init(void)
+{
+	if (!x86_match_cpu(tdx_host_ids) || !tdx_get_sysinfo())
+		return -ENODEV;
+
+	fdev = faux_device_create(KBUILD_MODNAME, NULL, NULL);
+	if (!fdev)
+		return -ENODEV;
+
+	return 0;
+}
+module_init(tdx_host_init);
+
+static void __exit tdx_host_exit(void)
+{
+	faux_device_destroy(fdev);
+}
+module_exit(tdx_host_exit);
+
+MODULE_DESCRIPTION("TDX Host Services");
+MODULE_LICENSE("GPL");
-- 
2.52.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox