All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
	kvm@vger.kernel.org,  linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, pbonzini@redhat.com,  chenhuacai@kernel.org,
	mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  viro@zeniv.linux.org.uk,
	brauner@kernel.org, willy@infradead.org,
	 akpm@linux-foundation.org, xiaoyao.li@intel.com,
	yilun.xu@intel.com,  chao.p.peng@linux.intel.com,
	jarkko@kernel.org, amoorthy@google.com,  dmatlack@google.com,
	isaku.yamahata@intel.com, mic@digikod.net,  vbabka@suse.cz,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	 david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com,
	 liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com,
	peterx@redhat.com, pankaj.gupta@amd.com
Subject: Re: [PATCH v7 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private
Date: Thu, 3 Apr 2025 07:51:16 -0700	[thread overview]
Message-ID: <Z-6gZGSbOvfrTPjV@google.com> (raw)
In-Reply-To: <CA+EHjTwEFm1=pS6hBJ++zujkHCDQtCq548OKZirobPbzCzTqSA@mail.gmail.com>

On Thu, Apr 03, 2025, Fuad Tabba wrote:
> On Thu, 3 Apr 2025 at 00:56, Sean Christopherson <seanjc@google.com> wrote:
> > On Wed, Apr 02, 2025, Ackerley Tng wrote:
> > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > > index ac6b8853699d..cde16ed3b230 100644
> > > > --- a/virt/kvm/guest_memfd.c
> > > > +++ b/virt/kvm/guest_memfd.c
> > > > @@ -17,6 +17,18 @@ struct kvm_gmem {
> > > >     struct list_head entry;
> > > >  };
> > > >
> > > > +struct kvm_gmem_inode_private {
> > > > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > > > +   struct xarray shared_offsets;
> > > > +   rwlock_t offsets_lock;
> > >
> > > This lock doesn't work, either that or this lock can't be held while
> > > faulting, because holding this lock means we can't sleep, and we need to
> > > sleep to allocate.
> >
> > rwlock_t is a variant of a spinlock, which can't be held when sleeping.
> >
> > What exactly does offsets_lock protect, and what are the rules for holding it?
> > At a glance, it's flawed.  Something needs to prevent KVM from installing a mapping
> > for a private gfn that is being converted to shared.  KVM doesn't hold references
> > to PFNs while they're mapped into the guest, and kvm_gmem_get_pfn() doesn't check
> > shared_offsets let alone take offsets_lock.
> 
> You're right about the rwlock_t. The goal of the offsets_lock is to
> protect the shared offsets -- i.e., it's just meant to protect the
> SHARED/PRIVATE status of a folio, not more, hence why it's not checked
> in kvm_gmem_get_pfn(). It used to be protected by the
> filemap_invalidate_lock, but the problem is that it would be called
> from an interrupt context.
> 
> However, this is wrong, as you've pointed out. The purpose of locking
> is to ensure  that no two conversions of the same folio happen at the
> same time. An alternative I had written up is to rely on having
> exclusive access to the folio to ensure that, since this is tied to
> the folio. That could be either by acquiring the folio lock, or
> ensuring that the folio doesn't have any outstanding references,
> indicating that we have exclusive access to it. This would avoid the
> whole locking issue.
> 
> > ... Something needs to prevent KVM from installing a mapping
> > for a private gfn that is being converted to shared.  ...
> 
> > guest_memfd currently handles races between kvm_gmem_fault() and PUNCH_HOLE via
> > kvm_gmem_invalidate_{begin,end}().  I don't see any equivalent functionality in
> > the shared/private conversion code.
> 
> For in-place sharing, KVM can install a mapping for a SHARED gfn. What
> it cannot do is install a mapping for a transient (i.e., NONE) gfn. We
> don't rely on kvm_gmem_get_pfn() for that, but on the individual KVM
> mmu fault handlers, but that said...

Consumption of shared/private physical pages _must_ be enforced by guest_memfd.
The private vs. shared state in the MMU handlers is that VM's view of the world
and desired state.  The guest_memfd inode is the single source of true for the
state of the _physical_ page.

E.g. on TDX, if KVM installs a private SPTE for a PFN that is in actuality shared,
there will be machine checks and the host will likely crash.

> > I would much, much prefer one large series that shows the full picture than a
> > mish mash of partial series that I can't actually review, even if the big series
> > is 100+ patches (hopefully not).
> 
> Dropping the RFC from the second series was not intentional, the first
> series is the one where I intended to drop the RFC. I apologize for
> that.  Especially since I obviously don't know how to handle modules
> and wanted some input on how to do that :)

In this case, the rules for modules are pretty simple.  Code in mm/ can't call
into KVM.  Either avoid callbacks entirely, or implement via a layer of
indirection, e.g. function pointer or ops table, so that KVM can provide its
implementation at runtime.

  reply	other threads:[~2025-04-03 14:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-28 15:31 [PATCH v7 0/7] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-03-28 15:31 ` [PATCH v7 1/7] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-04-08 11:53   ` Shivank Garg
2025-03-28 15:31 ` [PATCH v7 2/7] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-03-28 15:31 ` [PATCH v7 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private Fuad Tabba
2025-04-02 22:25   ` Ackerley Tng
2025-04-02 23:56     ` Sean Christopherson
2025-04-03  8:58       ` Fuad Tabba
2025-04-03 14:51         ` Sean Christopherson [this message]
2025-04-04  6:43           ` Fuad Tabba
2025-03-28 15:31 ` [PATCH v7 4/7] KVM: guest_memfd: Folio sharing states and functions that manage their transition Fuad Tabba
2025-04-02 23:48   ` Ackerley Tng
2025-04-03  8:58     ` Fuad Tabba
2025-04-03  0:19   ` Sean Christopherson
2025-04-03  9:11     ` Fuad Tabba
2025-04-03 14:52       ` Sean Christopherson
2025-04-04  6:44         ` Fuad Tabba
2025-03-28 15:31 ` [PATCH v7 5/7] KVM: guest_memfd: Restore folio state after final folio_put() Fuad Tabba
2025-04-05 16:23   ` Matthew Wilcox
2025-03-28 15:31 ` [PATCH v7 6/7] KVM: guest_memfd: Handle invalidation of shared memory Fuad Tabba
2025-03-28 15:31 ` [PATCH v7 7/7] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as shared Fuad Tabba
2025-04-02 22:47   ` Ackerley Tng
2025-04-04  7:39     ` Fuad Tabba
2025-04-02 23:03   ` Ackerley Tng
2025-04-03 11:32     ` Fuad Tabba
2025-04-04 18:05 ` [PATCH v7 0/7] KVM: Restricted mapping of guest_memfd at the host and arm64 support Liam R. Howlett
2025-04-04 20:10   ` David Hildenbrand
2025-04-04 21:48     ` Sean Christopherson
2025-04-05  2:45       ` Liam R. Howlett
2025-04-07  6:28         ` Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-6gZGSbOvfrTPjV@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.