From: Matthew Wilcox <willy@infradead.org>
To: Mina Almasry <almasrymina@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
David Hildenbrand <david@redhat.com>,
"Paul E . McKenney" <paulmckrcu@fb.com>,
Yu Zhao <yuzhao@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Xu <peterx@redhat.com>,
Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
Florian Schmidt <florian.schmidt@nutanix.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
Date: Tue, 23 Nov 2021 22:59:54 +0000 [thread overview]
Message-ID: <YZ1yapOMZOXdFHG9@casper.infradead.org> (raw)
In-Reply-To: <CAHS8izO0EMRgH8_qt58_O9-MBSwFXLgr1g79gJGrY1N0dTKutg@mail.gmail.com>
On Tue, Nov 23, 2021 at 02:23:23PM -0800, Mina Almasry wrote:
> On Tue, Nov 23, 2021 at 2:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Nov 23, 2021 at 01:47:33PM -0800, Mina Almasry wrote:
> > > On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > What I've been trying to communicate over the N reviews of this
> > > > patch series is that *the same thing is about to happen to THPs*.
> > > > Only more so. THPs are going to be of arbitrary power-of-two size, not
> > > > necessarily sizes supported by the hardware. That means that we need to
> > > > be extremely precise about what we mean by "is this a THP?" Do we just
> > > > mean "This is a compound page?" Do we mean "this is mapped by a PMD?"
> > > > Or do we mean something else? And I feel like I haven't been able to
> > > > get that information out of you.
> > >
> > > Yes, I'm very sorry for the trouble, but I'm also confused what the
> > > disconnect is. To allocate hugepages I can do like so:
> > >
> > > mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs
> > >
> > > or
> > >
> > > madvise(..., MADV_HUGEPAGE)
> > >
> > > Note I don't ask the kernel for a specific size, or a specific mapping
> > > mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
> > > 'huge' pages. I would like to know whether the kernel was successful
> > > in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
> > > mapped + is_transparent_hugepage(), which is the check I have here. In
> > > the future, THP may become an arbitrary power of two size, and I think
> > > I'll need to update this querying interface once/if that gets merged
> > > to the kernel. I.e, if in the future I allocate pages by using:
> > >
> > > mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs
> > >
> > > I need the kernel to tell me whether the mapping is 2MB size or not.
> > >
> > > If I allocate pages by using:
> > >
> > > mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,
> > >
> > > Then I need the kernel to tell me whether the pages are PMD mapped or
> > > not, as I'm doing here.
> > >
> > > The current implementation is based on what the current THP
> > > implementation is in the kernel, and depending on future changes to
> > > THP I may need to update it in the future. Does that make sense?
> >
> > Well, no. You're adding (or changing, if you like) a userspace API.
> > We need to be precise about what that userspace API *means*, so that we
> > don't break it in the future when the implementation changes. You're
> > still being fuzzy above.
> >
> > I have no intention of adding an API like the ones you suggest above to
> > allow the user to specify what size pages to use. That seems very strange
> > to me; how should the user (or sysadmin, or application) know what size is
> > best for the kernel to use to cache files? Instead, the kernel observes
> > the usage pattern of the file (through the readahead mechanism) and grows
> > the allocation size to fit what the kernel thinks will be most effective.
> >
> > I do honour some of the existing hints that userspace can provide; eg
> > VM_HUGEPAGE makes the pagefault path allocate PMD sized pages (if it can).
>
> Right, so since VM_HUGEPAGE makes the kernel allocate PMD mapped THP
> if it can, then I want to know if the page is actually a PMD mapped
> THP or not. The implementation and documentation that I'm adding seem
> consistent with that AFAICT, but sorry if I missed something.
So what userspace cares about is that the kernel is mapping the
memory with a PMD entry; it doesn't care whether the file is
being cached in 2MB (or larger) chunks. So we can drop the 'THP'
from all of this, and just call the bit the PMD mapping bit?
next prev parent reply other threads:[~2021-11-23 23:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-23 0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
2021-11-23 1:10 ` Peter Xu
2021-11-23 1:50 ` David Rientjes
2021-11-23 12:05 ` David Hildenbrand
2021-11-23 20:51 ` Matthew Wilcox
2021-11-23 21:10 ` Mina Almasry
2021-11-23 21:30 ` Matthew Wilcox
2021-11-23 21:47 ` Mina Almasry
2021-11-23 22:03 ` Matthew Wilcox
2021-11-23 22:23 ` Mina Almasry
2021-11-23 22:59 ` Matthew Wilcox [this message]
2021-11-23 23:16 ` Mina Almasry
2021-11-28 4:10 ` Matthew Wilcox
2021-12-14 0:22 ` Mina Almasry
2022-01-04 23:04 ` Mina Almasry
2022-01-05 4:39 ` Matthew Wilcox
2022-01-11 23:35 ` William Kucharski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YZ1yapOMZOXdFHG9@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=florian.schmidt@nutanix.com \
--cc=ivan.teterevkov@nutanix.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmckrcu@fb.com \
--cc=peterx@redhat.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).