Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com,
	kai.huang@intel.com,  adrian.hunter@intel.com,
	reinette.chatre@intel.com, xiaoyao.li@intel.com,
	 tony.lindgren@intel.com, binbin.wu@linux.intel.com,
	dmatlack@google.com,  isaku.yamahata@intel.com,
	ira.weiny@intel.com, vannapurve@google.com,  david@redhat.com,
	ackerleytng@google.com, tabba@google.com,  chao.p.peng@intel.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()
Date: Fri, 11 Jul 2025 08:39:59 -0700	[thread overview]
Message-ID: <aHEwT4X0RcfZzHlt@google.com> (raw)
In-Reply-To: <20250711151719.goee7eqti4xyhsqr@amd.com>

On Fri, Jul 11, 2025, Michael Roth wrote:
> On Fri, Jul 11, 2025 at 12:36:24PM +0800, Yan Zhao wrote:
> > Besides, it can't address the 2nd AB-BA lock issue as mentioned in the patch
> > log:
> > 
> > Problem
> > ===
> > ...
> > (2)
> > Moreover, in step 2, get_user_pages_fast() may acquire mm->mmap_lock,
> > resulting in the following lock sequence in tdx_vcpu_init_mem_region():
> > - filemap invalidation lock --> mm->mmap_lock
> > 
> > However, in future code, the shared filemap invalidation lock will be held
> > in kvm_gmem_fault_shared() (see [6]), leading to the lock sequence:
> > - mm->mmap_lock --> filemap invalidation lock
> 
> I wouldn't expect kvm_gmem_fault_shared() to trigger for the
> KVM_MEMSLOT_SUPPORTS_GMEM_SHARED case (or whatever we end up naming it).

Irrespective of shared faults, I think the API could do with a bit of cleanup
now that TDX has landed, i.e. now that we can see a bit more of the picture.

As is, I'm pretty sure TDX is broken with respect to hugepage support, because
kvm_gmem_populate() marks an entire folio as prepared, but TDX only ever deals
with one page at a time.  So that needs to be changed.  I assume it's already
address in one of the many upcoming series, but it still shows a flaw in the API.

Hoisting the retrieval of the source page outside of filemap_invalidate_lock()
seems pretty straightforward, and would provide consistent ABI for all vendor
flavors.  E.g. as is, non-struct-page memory will work for SNP, but not TDX.  The
obvious downside is that struct-page becomes a requirement for SNP, but that

The below could be tweaked to batch get_user_pages() into an array of pointers,
but given that both SNP and TDX can only operate on one 4KiB page at a time, and
that hugepage support doesn't yet exist, trying to super optimize the hugepage
case straightaway doesn't seem like a pressing concern.

static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
				struct file *file, gfn_t gfn, void __user *src,
				kvm_gmem_populate_cb post_populate, void *opaque)
{
	pgoff_t index = kvm_gmem_get_index(slot, gfn);
	struct page *src_page = NULL;
	bool is_prepared = false;
	struct folio *folio;
	int ret, max_order;
	kvm_pfn_t pfn;

	if (src) {
		ret = get_user_pages((unsigned long)src, 1, 0, &src_page);
		if (ret < 0)
			return ret;
		if (ret != 1)
			return -ENOMEM;
	}

	filemap_invalidate_lock(file->f_mapping);

	if (!kvm_range_has_memory_attributes(kvm, gfn, gfn + 1,
					     KVM_MEMORY_ATTRIBUTE_PRIVATE,
					     KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
		ret = -EINVAL;
		goto out_unlock;
	}

	folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order);
	if (IS_ERR(folio)) {
		ret = PTR_ERR(folio);
		goto out_unlock;
	}

	folio_unlock(folio);

	if (is_prepared) {
		ret = -EEXIST;
		goto out_put_folio;
	}

	ret = post_populate(kvm, gfn, pfn, src_page, opaque);
	if (!ret)
		kvm_gmem_mark_prepared(folio);

out_put_folio:
	folio_put(folio);
out_unlock:
	filemap_invalidate_unlock(file->f_mapping);

	if (src_page)
		put_page(src_page);
	return ret;
}

long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
		       kvm_gmem_populate_cb post_populate, void *opaque)
{
	struct file *file;
	struct kvm_memory_slot *slot;
	void __user *p;
	int ret = 0;
	long i;

	lockdep_assert_held(&kvm->slots_lock);
	if (npages < 0)
		return -EINVAL;

	slot = gfn_to_memslot(kvm, start_gfn);
	if (!kvm_slot_can_be_private(slot))
		return -EINVAL;

	file = kvm_gmem_get_file(slot);
	if (!file)
		return -EFAULT;

	npages = min_t(ulong, slot->npages - (start_gfn - slot->base_gfn), npages);
	for (i = 0; i < npages; i ++) {
		if (signal_pending(current)) {
			ret = -EINTR;
			break;
		}

		p = src ? src + i * PAGE_SIZE : NULL;

		ret = __kvm_gmem_populate(kvm, slot, file, start_gfn + i, p,
					  post_populate, opaque);
		if (ret)
			break;
	}

	fput(file);
	return ret && !i ? ret : i;
}

next prev parent reply	other threads:[~2025-07-11 15:40 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03  6:26 [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate() Yan Zhao
2025-07-03 16:51 ` Vishal Annapurve
2025-07-09 23:21 ` Michael Roth
2025-07-10 16:24   ` Sean Christopherson
2025-07-11  1:41     ` Ira Weiny
2025-07-11 14:21       ` Sean Christopherson
2025-07-11  4:36     ` Yan Zhao
2025-07-11 15:17       ` Michael Roth
2025-07-11 15:39         ` Sean Christopherson [this message]
2025-07-11 16:34           ` Michael Roth
2025-07-11 18:38             ` Vishal Annapurve
2025-07-11 19:49               ` Michael Roth
2025-07-11 20:19                 ` Sean Christopherson
2025-07-11 20:25             ` Ira Weiny
2025-07-11 22:56               ` Sean Christopherson
2025-07-11 23:04                 ` Vishal Annapurve
2025-07-14 23:11                   ` Ira Weiny
2025-07-15  0:41                     ` Vishal Annapurve
2025-07-14 23:08                 ` Ira Weiny
2025-07-14 23:12                   ` Sean Christopherson
2025-07-11 18:46           ` Vishal Annapurve
2025-07-12 17:38             ` Vishal Annapurve
2025-07-14  6:15           ` Yan Zhao
2025-07-14 15:46             ` Sean Christopherson
2025-07-14 16:02               ` David Hildenbrand
2025-07-14 16:07                 ` Sean Christopherson
2025-07-15  1:10               ` Yan Zhao
2025-07-18  9:14                 ` Yan Zhao
2025-07-18 15:57                   ` Vishal Annapurve
2025-07-18 18:42                     ` Ira Weiny
2025-07-18 18:59                       ` Vishal Annapurve
2025-07-21 17:46                         ` Ira Weiny
2025-07-28  9:48                     ` Yan Zhao
2025-07-29  0:45                       ` Vishal Annapurve
2025-07-29  1:37                         ` Yan Zhao
2025-07-29 16:33                           ` Ira Weiny
2025-08-05  0:22                             ` Sean Christopherson
2025-08-05  1:20                               ` Vishal Annapurve
2025-08-05 14:30                                 ` Vishal Annapurve
2025-08-05 19:59                                 ` Sean Christopherson
2025-08-06  0:09                                   ` Vishal Annapurve
2025-10-02 21:01                                     ` Ira Weiny
2025-10-03  0:15                                       ` Sean Christopherson
2025-10-03 15:31                                         ` Ira Weiny
2025-07-14  3:20         ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHEwT4X0RcfZzHlt@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tabba@google.com \
    --cc=tony.lindgren@intel.com \
    --cc=vannapurve@google.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.