From: Elliot Berman <quic_eberman@quicinc.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
Fuad Tabba <tabba@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Patrick Roy <roypat@amazon.co.uk>, <qperret@google.com>,
<linux-coco@lists.linux.dev>, <linux-arm-msm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<kvm@vger.kernel.org>
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
Date: Fri, 16 Aug 2024 16:52:46 -0700 [thread overview]
Message-ID: <20240816164546141-0700.eberman@hu-eberman-lv.qualcomm.com> (raw)
In-Reply-To: <93a010dd-d938-4c49-8643-047c7c1b33b9@redhat.com>
On Sat, Aug 17, 2024 at 12:03:50AM +0200, David Hildenbrand wrote:
> On 16.08.24 23:52, Ackerley Tng wrote:
> > David Hildenbrand <david@redhat.com> writes:
> >
> > > On 16.08.24 19:45, Ackerley Tng wrote:
> > > >
> > > > <snip>
> > > >
> > > > IIUC folio_lock() isn't a prerequisite for taking a refcount on the
> > > > folio.
> > >
> > > Right, to do folio_lock() you only have to guarantee that the folio
> > > cannot get freed concurrently. So you piggyback on another reference
> > > (you hold indirectly).
> > >
> > > >
> > > > Even if we are able to figure out a "safe" refcount, and check that the
> > > > current refcount == "safe" refcount before removing from direct map,
> > > > what's stopping some other part of the kernel from taking a refcount
> > > > just after the check happens and causing trouble with the folio's
> > > > removal from direct map?
> > >
> > > Once the page was unmapped from user space, and there were no additional
> > > references (e.g., GUP, whatever), any new references can only be
> > > (should, unless BUG :) ) temporary speculative references that should
> > > not try accessing page content, and that should back off if the folio is
> > > not deemed interesting or cannot be locked. (e.g., page
> > > migration/compaction/offlining).
> >
> > I thought about it again - I think the vmsplice() cases are taken care
> > of once we check that the folios are not mapped into userspace, since
> > vmsplice() reads from a mapping.
> >
> > splice() reads from the fd directly, but that's taken care since
> > guest_memfd doesn't have a .splice_read() handler.
> >
> > Reading /proc/pid/mem also requires the pages to first be mapped, IIUC,
> > otherwise the pages won't show up, so checking that there are no more
> > mappings to userspace takes care of this.
>
> You have a misconception.
>
> You can map pages to user space, GUP them, and then unmap them from user
> space. A GUP reference can outlive your user space mappings, easily.
>
> So once there is a raised refcount, it could as well just be from vmsplice,
> or a pending reference from /proc/pid/mem, O_DIRECT, ...
>
> >
> > >
> > > Of course, there are some corner cases (kgdb, hibernation, /proc/kcore),
> > > but most of these can be dealt with in one way or the other (make these
> > > back off and not read/write page content, similar to how we handled it
> > > for secretmem).
> >
> > Does that really leave us with these corner cases? And so perhaps we
> > could get away with just taking the folio_lock() to keep away the
> > speculative references? So something like
> >
> > 1. Check that the folio is not mapped and not pinned.
>
> To do that, you have to lookup the folio first. That currently requires a
> refcount increment, even if only temporarily. Maybe we could avoid that, if
> we can guarantee that we are the only one modifying the pageache here, and
> we sync against that ourselves.
>
> > 2. folio_lock() all the folios about to be removed from direct map
> > -- With the lock, all other accesses should be speculative --
> > 3. Check that the refcount == "safe" refcount
> > 3a. Unlock and return to userspace with -EAGAIN
> > 4. Remove from direct map
> > 5. folio_unlock() all those folios
> >
> > Perhaps a very naive question: can the "safe" refcount be statically
> > determined by walking through the code and counting where refcount is
> > expected to be incremented?
>
>
> Depends on how we design it. But if you hand out "safe" references to KVM
> etc, you'd have to track that -- and how often -- somehow. At which point we
> are at "increment/decrement" safe reference to track that for you.
>
Just a status update: I've gotten the "safe" reference counter
implementation working for Gunyah now. It feels a bit flimsy because
we're juggling 3 reference counters*, but it seems like the right thing
to do after all the discussions here. It's passing all the Gunyah unit
tests I have which have so far been pretty good at finding issues.
I need to clean up the patches now and I'm aiming to have it out for RFC
next week.
* folio refcount, "accessible" refcount, and "safe" refcount
Thanks,
Elliot
next prev parent reply other threads:[~2024-08-16 23:53 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48 ` David Hildenbrand
2024-08-08 18:39 ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08 ` David Hildenbrand
2024-08-08 0:14 ` Manwaring, Derek
2024-08-15 19:08 ` Manwaring, Derek
2024-08-06 15:39 ` Patrick Roy
2024-08-06 20:13 ` Elliot Berman
2024-08-07 6:48 ` Patrick Roy
2024-08-07 10:57 ` Patrick Roy
2024-08-07 19:06 ` Elliot Berman
2024-08-08 13:05 ` Patrick Roy
2024-08-08 22:16 ` Elliot Berman
2024-08-09 15:02 ` Patrick Roy
2024-08-19 10:09 ` Mike Rapoport
2024-08-20 16:56 ` Elliot Berman
2024-08-21 14:26 ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51 ` David Hildenbrand
2024-08-06 17:14 ` Elliot Berman
2024-08-07 16:12 ` David Hildenbrand
2024-08-08 21:41 ` Elliot Berman
2024-08-08 21:55 ` David Hildenbrand
2024-08-08 22:26 ` Elliot Berman
2024-08-09 7:16 ` David Hildenbrand
2024-08-15 7:24 ` Fuad Tabba
2024-08-16 9:48 ` David Hildenbrand
2024-08-16 11:19 ` Fuad Tabba
2024-08-16 17:45 ` Ackerley Tng
2024-08-16 18:08 ` David Hildenbrand
2024-08-16 21:52 ` Ackerley Tng
2024-08-16 22:03 ` David Hildenbrand
2024-08-16 23:52 ` Elliot Berman [this message]
2024-08-06 15:48 ` Patrick Roy
2024-08-06 20:22 ` Elliot Berman
2024-08-06 21:16 ` Elliot Berman
2024-08-08 18:51 ` Ackerley Tng
2024-08-08 21:42 ` Elliot Berman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240816164546141-0700.eberman@hu-eberman-lv.qualcomm.com \
--to=quic_eberman@quicinc.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox