linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ackerley Tng <ackerleytng@google.com>, Fuad Tabba <tabba@google.com>
Cc: Elliot Berman <quic_eberman@quicinc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Patrick Roy <roypat@amazon.co.uk>,
	qperret@google.com, linux-coco@lists.linux.dev,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kvm@vger.kernel.org
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
Date: Fri, 16 Aug 2024 20:08:38 +0200	[thread overview]
Message-ID: <94c5d735-821c-40ba-ae85-1881c6f4445d@redhat.com> (raw)
In-Reply-To: <diqzjzggmkf7.fsf@ackerleytng-ctop.c.googlers.com>

On 16.08.24 19:45, Ackerley Tng wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 15.08.24 09:24, Fuad Tabba wrote:
>>> Hi David,
>>
>> Hi!
>>
>>>
>>> On Tue, 6 Aug 2024 at 14:51, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>>
>>>>> -     if (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
>>>>> +     if (!ops->accessible && (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) {
>>>>>                 r = guest_memfd_folio_private(folio);
>>>>>                 if (r)
>>>>>                         goto out_err;
>>>>> @@ -107,6 +109,82 @@ struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags
>>>>>     }
>>>>>     EXPORT_SYMBOL_GPL(guest_memfd_grab_folio);
>>>>>
>>>>> +int guest_memfd_make_inaccessible(struct file *file, struct folio *folio)
>>>>> +{
>>>>> +     unsigned long gmem_flags = (unsigned long)file->private_data;
>>>>> +     unsigned long i;
>>>>> +     int r;
>>>>> +
>>>>> +     unmap_mapping_folio(folio);
>>>>> +
>>>>> +     /**
>>>>> +      * We can't use the refcount. It might be elevated due to
>>>>> +      * guest/vcpu trying to access same folio as another vcpu
>>>>> +      * or because userspace is trying to access folio for same reason
>>>>
>>>> As discussed, that's insufficient. We really have to drive the refcount
>>>> to 1 -- the single reference we expect.
>>>>
>>>> What is the exact problem you are running into here? Who can just grab a
>>>> reference and maybe do nasty things with it?
>>>
>>> I was wondering, why do we need to check the refcount? Isn't it enough
>>> to check for page_mapped() || page_maybe_dma_pinned(), while holding
>>> the folio lock?
> 
> Thank you Fuad for asking!
> 
>>
>> (folio_mapped() + folio_maybe_dma_pinned())
>>
>> Not everything goes trough FOLL_PIN. vmsplice() is an example, or just
>> some very simple read/write through /proc/pid/mem. Further, some
>> O_DIRECT implementations still don't use FOLL_PIN.
>>
>> So if you see an additional folio reference, as soon as you mapped that
>> thing to user space, you have to assume that it could be someone
>> reading/writing that memory in possibly sane context. (vmsplice() should
>> be using FOLL_PIN|FOLL_LONGTERM, but that's a longer discussion)
>>
> 
> Thanks David for the clarification, this example is very helpful!
> 
> IIUC folio_lock() isn't a prerequisite for taking a refcount on the
> folio.

Right, to do folio_lock() you only have to guarantee that the folio 
cannot get freed concurrently. So you piggyback on another reference 
(you hold indirectly).

> 
> Even if we are able to figure out a "safe" refcount, and check that the
> current refcount == "safe" refcount before removing from direct map,
> what's stopping some other part of the kernel from taking a refcount
> just after the check happens and causing trouble with the folio's
> removal from direct map?

Once the page was unmapped from user space, and there were no additional 
references (e.g., GUP, whatever), any new references can only be 
(should, unless BUG :) ) temporary speculative references that should 
not try accessing page content, and that should back off if the folio is 
not deemed interesting or cannot be locked. (e.g., page 
migration/compaction/offlining).

Of course, there are some corner cases (kgdb, hibernation, /proc/kcore), 
but most of these can be dealt with in one way or the other (make these 
back off and not read/write page content, similar to how we handled it 
for secretmem).

These (kgdb, /proc/kcore) might not even take a folio reference, they 
just "access stuff" and we only have to teach them to "not access that".

> 
>> (noting that also folio_maybe_dma_pinned() can have false positives in
>> some cases due to speculative references or *many* references).
> 
> Are false positives (speculative references) okay since it's better to
> be safe than remove from direct map prematurely?

folio_maybe_dma_pinned() is primarily used in fork context. Copying more 
(if the folio maybe pinned and, therefore, must not get COW-shared with 
other processes and must instead create a private page copy) is the 
"better safe than sorry". So false positives (that happen rarely) are 
tolerable.

Regading the directmap, it would -- just like with additional references 
-- detect that the page cannot currently be removed from the direct map. 
It's similarly "better safe than sorry", but here means that we likely 
must retry if we cannot easily fallback to something else like for the 
fork+COW case.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-08-16 18:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48   ` David Hildenbrand
2024-08-08 18:39   ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08   ` David Hildenbrand
     [not found]     ` <396fb134-f43e-4263-99a8-cfcef82bfd99@amazon.com>
2024-08-15 19:08       ` Manwaring, Derek
2024-08-06 15:39   ` Patrick Roy
     [not found]     ` <20240806104702482-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]       ` <a43ae745-9907-425f-b09d-a49405d6bc2d@amazon.co.uk>
     [not found]         ` <90886a03-ad62-4e98-bc05-63875faa9ccc@amazon.co.uk>
     [not found]           ` <20240807113514068-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]             ` <7166d51c-7757-44f2-a6f8-36da3e86bf90@amazon.co.uk>
2024-08-08 22:16               ` Elliot Berman
2024-08-09 15:02                 ` Patrick Roy
2024-08-19 10:09   ` Mike Rapoport
2024-08-20 16:56     ` Elliot Berman
2024-08-21 14:26       ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51   ` David Hildenbrand
     [not found]     ` <20240806093625007-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]       ` <a7c5bfc0-1648-4ae1-ba08-e706596e014b@redhat.com>
2024-08-08 21:41         ` Elliot Berman
2024-08-08 21:55           ` David Hildenbrand
2024-08-08 22:26             ` Elliot Berman
2024-08-09  7:16               ` David Hildenbrand
2024-08-15  7:24     ` Fuad Tabba
2024-08-16  9:48       ` David Hildenbrand
2024-08-16 11:19         ` Fuad Tabba
2024-08-16 17:45         ` Ackerley Tng
2024-08-16 18:08           ` David Hildenbrand [this message]
2024-08-16 21:52             ` Ackerley Tng
2024-08-16 22:03               ` David Hildenbrand
2024-08-16 23:52                 ` Elliot Berman
2024-08-06 15:48   ` Patrick Roy
2024-08-08 18:51   ` Ackerley Tng
2024-08-08 21:42     ` Elliot Berman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94c5d735-821c-40ba-ae85-1881c6f4445d@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).