From: Peter Xu <peterx@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Ryan Roberts <ryan.roberts@arm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Yin Fengwei <fengwei.yin@intel.com>,
Yang Shi <shy828301@gmail.com>
Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios
Date: Fri, 11 Aug 2023 18:18:23 -0400 [thread overview]
Message-ID: <ZNazr4ylywFZcIcG@x1n> (raw)
In-Reply-To: <14C73423-C643-4B72-B3DD-573F5636B5E0@nvidia.com>
On Fri, Aug 11, 2023 at 12:11:55PM -0400, Zi Yan wrote:
> On 11 Aug 2023, at 12:08, David Hildenbrand wrote:
>
> > On 11.08.23 17:58, Peter Xu wrote:
> >> On Fri, Aug 11, 2023 at 05:32:37PM +0200, David Hildenbrand wrote:
> >>> On 11.08.23 17:18, Peter Xu wrote:
> >>>> On Fri, Aug 11, 2023 at 12:27:13AM +0200, David Hildenbrand wrote:
> >>>>> On 10.08.23 23:48, Matthew Wilcox wrote:
> >>>>>> On Thu, Aug 10, 2023 at 04:57:11PM -0400, Peter Xu wrote:
> >>>>>>> AFAICS if that patch was all correct (while I'm not yet sure..), you can
> >>>>>>> actually fit your new total mapcount field into page 1 so even avoid the
> >>>>>>> extra cacheline access. You can have a look: the trick is refcount for
> >>>>>>> tail page 1 is still seems to be free on 32 bits (if that was your worry
> >>>>>>> before). Then it'll be very nice if to keep Hugh's counter all in tail 1.
> >>>>>>
> >>>>>> No, refcount must be 0 on all tail pages. We rely on this in many places
> >>>>>> in the MM.
> >>>>>
> >>>>> Very right.
> >>>>
> >>>> Obviously I could have missed this in the past.. can I ask for an example
> >>>> explaining why refcount will be referenced before knowing it's a head?
> >>>
> >>> I think the issue is, when coming from a PFN walker (or GUP-fast), you might
> >>> see "oh, this is a folio, let's lookup the head page". And you do that.
> >>>
> >>> Then, you try taking a reference on that head page. (see try_get_folio()).
> >>>
> >>> But as you didn't hold a reference on the folio yet, it can happily get
> >>> freed + repurposed in the meantime, so maybe it's not a head page anymore.
> >>>
> >>> So if the field would get reused for something else, grabbing a reference
> >>> would corrupt whatever is now stored in there.
> >>
> >> Not an issue before large folios, am I right? Because having a head page
> >> reused as tail cannot happen iiuc with current thps if only pmd-sized,
> >> because the head page is guaranteed to be pmd aligned physically.
> >
> > There are other users of compound pages, no? THP and hugetlb are just two examples I think. For example, I can spot __GFP_COMP in slab code.
> >
> > Must such compound pages would not be applicable to GUP, though, but to PFN walkers could end up trying to grab them.
> >
> For FS supporting large folios, their page cache pages can be any order <= PMD_ORDER.
> See page_cache_ra_order() in mm/readahead.c
Ah yes..
>
> >>
> >> I don't really know, where a hugetlb 2M head can be reused by a 1G huge
> >> later right during the window of fast-gup walking. But obviously that's not
> >> common either if that could ever happen.
> >>
> >> Maybe Matthew was referring to something else (per "in many places")?
> >
> > There are some other cases where PFN walkers want to identify tail pages to skip over them. See the comment in has_unmovable_pages().
Indeed.
Thanks!
--
Peter Xu
next prev parent reply other threads:[~2023-08-11 22:19 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 8:32 [PATCH mm-unstable v1] mm: add a total mapcount for large folios David Hildenbrand
2023-08-09 15:45 ` Zi Yan
2023-08-09 19:07 ` Ryan Roberts
2023-08-09 19:17 ` David Hildenbrand
2023-08-10 10:40 ` Ryan Roberts
2023-08-10 11:14 ` David Hildenbrand
2023-08-10 11:27 ` David Hildenbrand
2023-08-10 11:32 ` David Hildenbrand
2023-08-10 11:35 ` Ryan Roberts
2023-08-09 19:21 ` Matthew Wilcox
2023-08-09 19:26 ` David Hildenbrand
2023-08-10 3:14 ` Yin Fengwei
2023-08-09 21:23 ` Peter Xu
2023-08-10 3:25 ` Matthew Wilcox
2023-08-10 8:37 ` David Hildenbrand
2023-08-10 21:48 ` Peter Xu
2023-08-10 21:54 ` Matthew Wilcox
2023-08-10 21:59 ` David Hildenbrand
2023-08-11 15:03 ` Peter Xu
2023-08-11 15:14 ` Zi Yan
2023-08-11 15:17 ` David Hildenbrand
2023-08-10 8:59 ` David Hildenbrand
2023-08-10 10:48 ` Ryan Roberts
2023-08-10 17:15 ` Peter Xu
2023-08-10 17:47 ` David Hildenbrand
2023-08-10 19:02 ` Ryan Roberts
2023-08-10 20:57 ` Peter Xu
2023-08-10 21:48 ` Matthew Wilcox
2023-08-10 22:27 ` David Hildenbrand
2023-08-11 15:18 ` Peter Xu
2023-08-11 15:32 ` David Hildenbrand
2023-08-11 15:58 ` Peter Xu
2023-08-11 16:08 ` David Hildenbrand
2023-08-11 16:11 ` Zi Yan
2023-08-11 22:18 ` Peter Xu [this message]
2023-08-10 22:16 ` David Hildenbrand
2023-08-10 3:24 ` Yin Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNazr4ylywFZcIcG@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.