From: Elliot Berman <quic_eberman@quicinc.com>
To: Patrick Roy <roypat@amazon.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Fuad Tabba <tabba@google.com>,
David Hildenbrand <david@redhat.com>, <qperret@google.com>,
Ackerley Tng <ackerleytng@google.com>,
<linux-coco@lists.linux.dev>, <linux-arm-msm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<kvm@vger.kernel.org>, James Gowans <jgowans@amazon.com>,
"Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
"Manwaring, Derek" <derekmn@amazon.com>,
"Cali, Marco" <xmarcalx@amazon.co.uk>
Subject: Re: [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map
Date: Wed, 7 Aug 2024 12:06:33 -0700 [thread overview]
Message-ID: <20240807113514068-0700.eberman@hu-eberman-lv.qualcomm.com> (raw)
In-Reply-To: <90886a03-ad62-4e98-bc05-63875faa9ccc@amazon.co.uk>
On Wed, Aug 07, 2024 at 11:57:35AM +0100, Patrick Roy wrote:
> On Wed, 2024-08-07 at 07:48 +0100, Patrick Roy wrote:
> > On Tue, 2024-08-06 at 21:13 +0100, Elliot Berman wrote:
> >> On Tue, Aug 06, 2024 at 04:39:24PM +0100, Patrick Roy wrote:
> >>> On the other hand, as Paolo pointed out in my patches [1], just using a
> >>> page flag to track direct map presence for gmem is not enough. We
> >>> actually need to keep a refcount in folio->private to keep track of how
> >>> many different actors request a folio's direct map presence (in the
> >>> specific case in my patch series, it was different pfn_to_gfn_caches for
> >>> the kvm-clock structures of different vcpus, which the guest can place
> >>> into the same gfn). While this might not be a concern for the the
> >>> pKVM/Gunyah case, where the guest dictates memory state, it's required
> >>> for the non-CoCo case where KVM/userspace can set arbitrary guest gfns
> >>> to shared if it needs/wants to access them for whatever reason. So for
> >>> this we'd need to have PG_private=1 mean "direct map entry restored" (as
> >>> if PG_private=0, there is no folio->private).
> >>>
> >>> [1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/T/#m0608c4b6a069b3953d7ee97f48577d32688a3315
> >>>
> >>
> >> I wonder if we can use the folio refcount itself, assuming we can rely
> >> on refcount == 1 means we can do shared->private conversion.
> >>
> >> In gpc_map_gmem, we convert private->shared. There's no problem here in
> >> the non-CoCo case.
> >>
> >> In gpc_unmap, we *try* to convert back from shared->private. If
> >> refcount>2, then the conversion would fail. The last gpc_unmap would be
> >> able to successfully convert back to private.
> >>
> >> Do you see any concerns with this approach?
> >
> > The gfn_to_pfn_cache does not keep an elevated refcount on the cached
> > page, and instead responds to MMU notifiers to detect whether the cached
> > translation has been invalidated, iirc. So the folio refcount will
> > not reflect the number of gpcs holding that folio.
> >
Ah, fair enough. This is kinda like a GUP pin which would prevent us
from making page private, but without the pin part.
[...]
> >>>> struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags)
> >>>> {
> >>>> + unsigned long gmem_flags = (unsigned long)file->private_data;
> >>>> struct inode *inode = file_inode(file);
> >>>> struct guest_memfd_operations *ops = inode->i_private;
> >>>> struct folio *folio;
> >>>> @@ -43,6 +89,12 @@ struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags
> >>>> goto out_err;
> >>>> }
> >>>>
> >>>> + if (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
> >>>> + r = guest_memfd_folio_private(folio);
> >>>> + if (r)
> >>>> + goto out_err;
> >>>> + }
> >>>> +
> >>>
> >>> How does a caller of guest_memfd_grab_folio know whether a folio needs
> >>> to be removed from the direct map? E.g. how can a caller know ahead of
> >>> time whether guest_memfd_grab_folio will return a freshly allocated
> >>> folio (which thus needs to be removed from the direct map), vs a folio
> >>> that already exists and has been removed from the direct map (probably
> >>> fine to remove from direct map again), vs a folio that already exists
> >>> and is currently re-inserted into the direct map for whatever reason
> >>> (must not remove these from the direct map, as other parts of
> >>> KVM/userspace probably don't expect the direct map entries to disappear
> >>> from underneath them). I couldn't figure this one out for my series,
> >>> which is why I went with hooking into the PG_uptodate logic to always
> >>> remove direct map entries on freshly allocated folios.
> >>>
> >>
> >> gmem_flags come from the owner. If the caller (in non-CoCo case) wants
>
> Ah, oops, I got it mixed up with the new `flags` parameter.
>
> >> to restore the direct map right away, it'd have to be a direct
> >> operation. As an optimization, we could add option that asks for page in
> >> "shared" state. If allocating new page, we can return it right away
> >> without removing from direct map. If grabbing existing folio, it would
> >> try to do the private->shared conversion.
>
> My concern is more with the implicit shared->private conversion that
> happens on every call to guest_memfd_grab_folio (and thus
> kvm_gmem_get_pfn) when grabbing existing folios. If something else
> marked the folio as shared, then we cannot punch it out of the direct
> map again until that something is done using the folio (when working on
> my RFC, kvm_gmem_get_pfn was indeed called on existing folios that were
> temporarily marked shared, as I was seeing panics because of this). And
> if the folio is currently private, there's nothing to do. So either way,
> guest_memfd_grab_folio shouldn't touch the direct map entry for existing
> folios.
>
What I did could be documented/commented better.
If ops->accessible() is *not* provided, all guest_memfd allocations will
immediately remove from direct map and treat them immediately like guest
private (goal is to match what KVM does today on tip).
If ops->accessible() is provided, then guest_memfd allocations start
as "shared" and KVM/Gunyah need to do the shared->private conversion
when they want to do the private conversion on the folio. "Shared" is
the default because that is effectively a no-op.
For the non-CoCo case you're interested in, we'd have the
ops->accessible() provided and we wouldn't pull out the direct map from
gpc.
Thanks,
Elliot
next prev parent reply other threads:[~2024-08-07 19:06 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48 ` David Hildenbrand
2024-08-08 18:39 ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08 ` David Hildenbrand
2024-08-08 0:14 ` Manwaring, Derek
2024-08-15 19:08 ` Manwaring, Derek
2024-08-06 15:39 ` Patrick Roy
2024-08-06 20:13 ` Elliot Berman
2024-08-07 6:48 ` Patrick Roy
2024-08-07 10:57 ` Patrick Roy
2024-08-07 19:06 ` Elliot Berman [this message]
2024-08-08 13:05 ` Patrick Roy
2024-08-08 22:16 ` Elliot Berman
2024-08-09 15:02 ` Patrick Roy
2024-08-19 10:09 ` Mike Rapoport
2024-08-20 16:56 ` Elliot Berman
2024-08-21 14:26 ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51 ` David Hildenbrand
2024-08-06 17:14 ` Elliot Berman
2024-08-07 16:12 ` David Hildenbrand
2024-08-08 21:41 ` Elliot Berman
2024-08-08 21:55 ` David Hildenbrand
2024-08-08 22:26 ` Elliot Berman
2024-08-09 7:16 ` David Hildenbrand
2024-08-15 7:24 ` Fuad Tabba
2024-08-16 9:48 ` David Hildenbrand
2024-08-16 11:19 ` Fuad Tabba
2024-08-16 17:45 ` Ackerley Tng
2024-08-16 18:08 ` David Hildenbrand
2024-08-16 21:52 ` Ackerley Tng
2024-08-16 22:03 ` David Hildenbrand
2024-08-16 23:52 ` Elliot Berman
2024-08-06 15:48 ` Patrick Roy
2024-08-06 20:22 ` Elliot Berman
2024-08-06 21:16 ` Elliot Berman
2024-08-08 18:51 ` Ackerley Tng
2024-08-08 21:42 ` Elliot Berman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240807113514068-0700.eberman@hu-eberman-lv.qualcomm.com \
--to=quic_eberman@quicinc.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=jgowans@amazon.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=xmarcalx@amazon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox