From: David Hildenbrand <david@redhat.com>
To: Elliot Berman <quic_eberman@quicinc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Fuad Tabba <tabba@google.com>, Patrick Roy <roypat@amazon.co.uk>,
qperret@google.com, Ackerley Tng <ackerleytng@google.com>,
linux-coco@lists.linux.dev, linux-arm-msm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
Date: Thu, 8 Aug 2024 23:55:15 +0200 [thread overview]
Message-ID: <6f3b5c38-fc33-43cd-8ab7-5b0f49169d5c@redhat.com> (raw)
In-Reply-To: <20240808101944778-0700.eberman@hu-eberman-lv.qualcomm.com>
On 08.08.24 23:41, Elliot Berman wrote:
> On Wed, Aug 07, 2024 at 06:12:00PM +0200, David Hildenbrand wrote:
>> On 06.08.24 19:14, Elliot Berman wrote:
>>> On Tue, Aug 06, 2024 at 03:51:22PM +0200, David Hildenbrand wrote:
>>>>> - if (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
>>>>> + if (!ops->accessible && (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) {
>>>>> r = guest_memfd_folio_private(folio);
>>>>> if (r)
>>>>> goto out_err;
>>>>> @@ -107,6 +109,82 @@ struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(guest_memfd_grab_folio);
>>>>> +int guest_memfd_make_inaccessible(struct file *file, struct folio *folio)
>>>>> +{
>>>>> + unsigned long gmem_flags = (unsigned long)file->private_data;
>>>>> + unsigned long i;
>>>>> + int r;
>>>>> +
>>>>> + unmap_mapping_folio(folio);
>>>>> +
>>>>> + /**
>>>>> + * We can't use the refcount. It might be elevated due to
>>>>> + * guest/vcpu trying to access same folio as another vcpu
>>>>> + * or because userspace is trying to access folio for same reason
>>>>
>>>> As discussed, that's insufficient. We really have to drive the refcount to 1
>>>> -- the single reference we expect.
>>>>
>>>> What is the exact problem you are running into here? Who can just grab a
>>>> reference and maybe do nasty things with it?
>>>>
>>>
>>> Right, I remember we had discussed it. The problem I faced was if 2
>>> vcpus fault on same page, they would race to look up the folio in
>>> filemap, increment refcount, then try to lock the folio. One of the
>>> vcpus wins the lock, while the other waits. The vcpu that gets the
>>> lock vcpu will see the elevated refcount.
>>>
>>> I was in middle of writing an explanation why I think this is best
>>> approach and realized I think it should be possible to do
>>> shared->private conversion and actually have single reference. There
>>> would be some cost to walk through the allocated folios and convert them
>>> to private before any vcpu runs. The approach I had gone with was to
>>> do conversions as late as possible.
>>
>> We certainly have to support conversion while the VCPUs are running.
>>
>> The VCPUs might be able to avoid grabbing a folio reference for the
>> conversion and only do the folio_lock(): as long as we have a guarantee that
>> we will disallow freeing the folio in gmem, for example, by syncing against
>> FALLOC_FL_PUNCH_HOLE.
>>
>> So if we can rely on the "gmem" reference to the folio that cannot go away
>> while we do what we do, we should be fine.
>>
>> <random though>
>>
>> Meanwhile, I was thinking if we would want to track the references we
>> hand out to "safe" users differently.
>>
>> Safe references would only be references that would survive a
>> private<->shared conversion, like KVM MMU mappings maybe?
>>
>> KVM would then have to be thought to return these gmem references
>> differently.
>>
>> The idea would be to track these "safe" references differently
>> (page->private?) and only allow dropping *our* guest_memfd reference if all
>> these "safe" references are gone. That is, FALLOC_FL_PUNCH_HOLE would also
>> fail if there are any "safe" reference remaining.
>>
>> <\random though>
>>
>
> I didn't find a path in filemap where we can grab folio without
> increasing its refcount. I liked the idea of keeping track of a "safe"
> refcount, but I believe there is a small window to race comparing the
> main folio refcount and the "safe" refcount.
There are various possible models. To detect unexpected references, we
could either use
folio_ref_count(folio) == gmem_folio_safe_ref_count(folio) + 1
[we increment both ref counter]
or
folio_ref_count(folio) == 1
[we only increment the safe refcount and let other magic handle it as
described]
A vcpu could have
> incremented the main folio refcount and on the way to increment the safe
> refcount. Before that happens, another thread does the comparison and
> sees a mismatch.
Likely there won't be a way around coming up with code that is able to
deal with such temporary, "speculative" folio references.
In the simplest case, these references will be obtained from our gmem
code only, and we'll have to detect that it happened and retry (a
seqcount would be a naive solution).
In the complex case, these references are temporarily obtained from
other core-mm code -- using folio_try_get(). We can minimize some of
them (speculative references from GUP or the pagecache), and try
optimizing others (PFN walkers like page migration).
But likely we'll need some retry magic, at least initially.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-08-08 21:55 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48 ` David Hildenbrand
2024-08-08 18:39 ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08 ` David Hildenbrand
[not found] ` <396fb134-f43e-4263-99a8-cfcef82bfd99@amazon.com>
2024-08-15 19:08 ` Manwaring, Derek
2024-08-06 15:39 ` Patrick Roy
[not found] ` <20240806104702482-0700.eberman@hu-eberman-lv.qualcomm.com>
[not found] ` <a43ae745-9907-425f-b09d-a49405d6bc2d@amazon.co.uk>
[not found] ` <90886a03-ad62-4e98-bc05-63875faa9ccc@amazon.co.uk>
[not found] ` <20240807113514068-0700.eberman@hu-eberman-lv.qualcomm.com>
[not found] ` <7166d51c-7757-44f2-a6f8-36da3e86bf90@amazon.co.uk>
2024-08-08 22:16 ` Elliot Berman
2024-08-09 15:02 ` Patrick Roy
2024-08-19 10:09 ` Mike Rapoport
2024-08-20 16:56 ` Elliot Berman
2024-08-21 14:26 ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51 ` David Hildenbrand
[not found] ` <20240806093625007-0700.eberman@hu-eberman-lv.qualcomm.com>
[not found] ` <a7c5bfc0-1648-4ae1-ba08-e706596e014b@redhat.com>
2024-08-08 21:41 ` Elliot Berman
2024-08-08 21:55 ` David Hildenbrand [this message]
2024-08-08 22:26 ` Elliot Berman
2024-08-09 7:16 ` David Hildenbrand
2024-08-15 7:24 ` Fuad Tabba
2024-08-16 9:48 ` David Hildenbrand
2024-08-16 11:19 ` Fuad Tabba
2024-08-16 17:45 ` Ackerley Tng
2024-08-16 18:08 ` David Hildenbrand
2024-08-16 21:52 ` Ackerley Tng
2024-08-16 22:03 ` David Hildenbrand
2024-08-16 23:52 ` Elliot Berman
2024-08-06 15:48 ` Patrick Roy
2024-08-08 18:51 ` Ackerley Tng
2024-08-08 21:42 ` Elliot Berman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f3b5c38-fc33-43cd-8ab7-5b0f49169d5c@redhat.com \
--to=david@redhat.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_eberman@quicinc.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).