Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
From: Elliot Berman <quic_eberman@quicinc.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
	Fuad Tabba <tabba@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Patrick Roy <roypat@amazon.co.uk>, <qperret@google.com>,
	<linux-coco@lists.linux.dev>, <linux-arm-msm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<kvm@vger.kernel.org>
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
Date: Fri, 16 Aug 2024 16:52:46 -0700	[thread overview]
Message-ID: <20240816164546141-0700.eberman@hu-eberman-lv.qualcomm.com> (raw)
In-Reply-To: <93a010dd-d938-4c49-8643-047c7c1b33b9@redhat.com>

On Sat, Aug 17, 2024 at 12:03:50AM +0200, David Hildenbrand wrote:
> On 16.08.24 23:52, Ackerley Tng wrote:
> > David Hildenbrand <david@redhat.com> writes:
> > 
> > > On 16.08.24 19:45, Ackerley Tng wrote:
> > > > 
> > > > <snip>
> > > > 
> > > > IIUC folio_lock() isn't a prerequisite for taking a refcount on the
> > > > folio.
> > > 
> > > Right, to do folio_lock() you only have to guarantee that the folio
> > > cannot get freed concurrently. So you piggyback on another reference
> > > (you hold indirectly).
> > > 
> > > > 
> > > > Even if we are able to figure out a "safe" refcount, and check that the
> > > > current refcount == "safe" refcount before removing from direct map,
> > > > what's stopping some other part of the kernel from taking a refcount
> > > > just after the check happens and causing trouble with the folio's
> > > > removal from direct map?
> > > 
> > > Once the page was unmapped from user space, and there were no additional
> > > references (e.g., GUP, whatever), any new references can only be
> > > (should, unless BUG :) ) temporary speculative references that should
> > > not try accessing page content, and that should back off if the folio is
> > > not deemed interesting or cannot be locked. (e.g., page
> > > migration/compaction/offlining).
> > 
> > I thought about it again - I think the vmsplice() cases are taken care
> > of once we check that the folios are not mapped into userspace, since
> > vmsplice() reads from a mapping.
> > 
> > splice() reads from the fd directly, but that's taken care since
> > guest_memfd doesn't have a .splice_read() handler.
> > 
> > Reading /proc/pid/mem also requires the pages to first be mapped, IIUC,
> > otherwise the pages won't show up, so checking that there are no more
> > mappings to userspace takes care of this.
> 
> You have a misconception.
> 
> You can map pages to user space, GUP them, and then unmap them from user
> space. A GUP reference can outlive your user space mappings, easily.
> 
> So once there is a raised refcount, it could as well just be from vmsplice,
> or a pending reference from /proc/pid/mem, O_DIRECT, ...
> 
> > 
> > > 
> > > Of course, there are some corner cases (kgdb, hibernation, /proc/kcore),
> > > but most of these can be dealt with in one way or the other (make these
> > > back off and not read/write page content, similar to how we handled it
> > > for secretmem).
> > 
> > Does that really leave us with these corner cases? And so perhaps we
> > could get away with just taking the folio_lock() to keep away the
> > speculative references? So something like
> > 
> >    1. Check that the folio is not mapped and not pinned.
> 
> To do that, you have to lookup the folio first. That currently requires a
> refcount increment, even if only temporarily. Maybe we could avoid that, if
> we can guarantee that we are the only one modifying the pageache here, and
> we sync against that ourselves.
> 
> >    2. folio_lock() all the folios about to be removed from direct map
> >    -- With the lock, all other accesses should be speculative --
> >    3. Check that the refcount == "safe" refcount
> >        3a. Unlock and return to userspace with -EAGAIN
> >    4. Remove from direct map
> >    5. folio_unlock() all those folios
> > 
> > Perhaps a very naive question: can the "safe" refcount be statically
> > determined by walking through the code and counting where refcount is
> > expected to be incremented?
> 
> 
> Depends on how we design it. But if you hand out "safe" references to KVM
> etc, you'd have to track that -- and how often -- somehow. At which point we
> are at "increment/decrement" safe reference to track that for you.
>

Just a status update: I've gotten the "safe" reference counter
implementation working for Gunyah now. It feels a bit flimsy because
we're juggling 3 reference counters*, but it seems like the right thing
to do after all the discussions here. It's passing all the Gunyah unit
tests I have which have so far been pretty good at finding issues.

I need to clean up the patches now and I'm aiming to have it out for RFC
next week.

* folio refcount, "accessible" refcount, and "safe" refcount

Thanks,
Elliot


  reply	other threads:[~2024-08-16 23:53 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48   ` David Hildenbrand
2024-08-08 18:39   ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08   ` David Hildenbrand
2024-08-08  0:14     ` Manwaring, Derek
2024-08-15 19:08       ` Manwaring, Derek
2024-08-06 15:39   ` Patrick Roy
2024-08-06 20:13     ` Elliot Berman
2024-08-07  6:48       ` Patrick Roy
2024-08-07 10:57         ` Patrick Roy
2024-08-07 19:06           ` Elliot Berman
2024-08-08 13:05             ` Patrick Roy
2024-08-08 22:16               ` Elliot Berman
2024-08-09 15:02                 ` Patrick Roy
2024-08-19 10:09   ` Mike Rapoport
2024-08-20 16:56     ` Elliot Berman
2024-08-21 14:26       ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51   ` David Hildenbrand
2024-08-06 17:14     ` Elliot Berman
2024-08-07 16:12       ` David Hildenbrand
2024-08-08 21:41         ` Elliot Berman
2024-08-08 21:55           ` David Hildenbrand
2024-08-08 22:26             ` Elliot Berman
2024-08-09  7:16               ` David Hildenbrand
2024-08-15  7:24     ` Fuad Tabba
2024-08-16  9:48       ` David Hildenbrand
2024-08-16 11:19         ` Fuad Tabba
2024-08-16 17:45         ` Ackerley Tng
2024-08-16 18:08           ` David Hildenbrand
2024-08-16 21:52             ` Ackerley Tng
2024-08-16 22:03               ` David Hildenbrand
2024-08-16 23:52                 ` Elliot Berman [this message]
2024-08-06 15:48   ` Patrick Roy
2024-08-06 20:22     ` Elliot Berman
2024-08-06 21:16   ` Elliot Berman
2024-08-08 18:51   ` Ackerley Tng
2024-08-08 21:42     ` Elliot Berman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240816164546141-0700.eberman@hu-eberman-lv.qualcomm.com \
    --to=quic_eberman@quicinc.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox