Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, John Hubbard <jhubbard@nvidia.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast
Date: Tue, 30 Aug 2022 15:45:42 -0300	[thread overview]
Message-ID: <Yw5a1paQJ9MNdgmW@nvidia.com> (raw)
In-Reply-To: <2e20c90d-4d1f-dd83-aa63-9d8d17021263@redhat.com>

On Tue, Aug 30, 2022 at 08:23:52PM +0200, David Hildenbrand wrote:
> ... and looking into the details of TLB flush and GUP-fast interaction
> nowadays, that case is no longer relevant. A TLB flush is no longer
> sufficient to stop concurrent GUP-fast ever since we introduced generic
> RCU GUP-fast.

Yes, we've had RCU GUP fast for a while, and it is more widely used
now, IIRC.

It has been a bit, but if I remember, GUP fast in RCU mode worked on a
few principles:

 - The PTE page must not be freed without RCU
 - The PTE page content must be convertable to a struct page using the
   usual rules (eg PTE Special)
 - That struct page refcount may go from 0->1 inside the RCU
 - In the case the refcount goes from 0->1 there must be sufficient
   barriers such that GUP fast observing the refcount of 1 will also
   observe the PTE entry has changed. ie before the refcount is
   dropped in the zap it has to clear the PTE entry, the refcount
   decr has to be a 'release' and the refcount incr in gup fast has be
   to be an 'acquire'.
 - The rest of the system must tolerate speculative refcount
   increments from GUP on any random page

The basic idea being that if GUP fast obtains a valid reference on a
page *and* the PTE entry has not changed then everything is fine.

The tricks with TLB invalidation are just a "poor mans" RCU, and
arguably these days aren't really needed since I think we could make
everything use real RCU always without penalty if we really wanted.

Today we can create a unique 'struct pagetable_page' as Matthew has
been doing in other places that guarentees a rcu_head is always
available for every page used in a page table. Using that we could
drop the code in the TLB flusher that allocates memory for the
rcu_head and hopes for the best. (Or even is the common struct page
rcu_head already guarenteed to exist for pagetable pages now a days?)

IMHO that is the main reason we still have the non-RCU mode at all..

Jason

next prev parent reply	other threads:[~2022-08-30 18:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 16:46 [PATCH v1 0/3] mm: minor cleanups around NUMA hinting David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 1/3] mm/gup: replace FOLL_NUMA by gup_can_follow_protnone() David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast David Hildenbrand
2022-08-26 14:59   ` David Hildenbrand
2022-08-30 18:23     ` David Hildenbrand
2022-08-30 18:45       ` Jason Gunthorpe [this message]
2022-08-30 18:53         ` David Hildenbrand
2022-08-30 19:18           ` John Hubbard
2022-08-30 19:23             ` David Hildenbrand
2022-08-30 23:44               ` Jason Gunthorpe
2022-08-31  7:44                 ` David Hildenbrand
2022-08-31 16:21               ` Peter Xu
2022-08-31 16:31                 ` David Hildenbrand
2022-08-31 18:23                   ` Peter Xu
2022-08-31 19:25                     ` David Hildenbrand
2022-09-01  7:55                       ` Alistair Popple
2022-08-30 19:57           ` Jason Gunthorpe
2022-08-30 20:12             ` John Hubbard
2022-08-30 22:39               ` Jason Gunthorpe
2022-08-31  7:15             ` David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 3/3] mm: fixup documentation regarding pte_numa() and PROT_NUMA David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yw5a1paQJ9MNdgmW@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.