linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mina Almasry <almasrymina@google.com>, Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>,
	"Paul E . McKenney" <paulmckrcu@fb.com>,
	Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <florian.schmidt@nutanix.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
Date: Tue, 23 Nov 2021 13:05:01 +0100	[thread overview]
Message-ID: <476868f6-8d32-b8d2-855e-4b19e8a54cc2@redhat.com> (raw)
In-Reply-To: <20211123000102.4052105-1-almasrymina@google.com>

On 23.11.21 01:01, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.
> 
> PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
> level and the underlying page is a transparent huge page.
> 
> A few options were considered:
> 1. Add /proc/pid/pageflags that exports the same info as
>    /proc/kpageflags.  This is not appropriate because many kpageflags are
>    inappropriate to expose to userspace processes.
> 2. Simply get this info from the existing /proc/pid/smaps interface.
>    There are a couple of issues with that:
>    1. /proc/pid/smaps output is human readable and unfriendly to
>       programatically parse.
>    2. /proc/pid/smaps is slow because it must read the whole memory range
>       rather than a small range we care about.  The cost of reading
>       /proc/pid/smaps into userspace buffers is about ~800us per call,
>       and this doesn't include parsing the output to get the information
>       you need. The cost of querying 1 virt address in /proc/pid/pagemaps
>       however is around 5-7us.
> 
> Tested manually by adding logging into transhuge-stress, and by
> allocating THP and querying the PM_THP_MAPPED flag at those
> virtual addresses.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Rientjes rientjes@google.com
> Cc: Paul E. McKenney <paulmckrcu@fb.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
> Cc: Florian Schmidt <florian.schmidt@nutanix.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> 
> 
> ---
> 
> Changes in v7:
> - Added clarification that smaps is only slow because it looks at the
>   whole address space.
> 
> Changes in v6:
> - Renamed to PM_THP_MAPPED
> - Removed changes to transhuge-stress
> 
> Changes in v5:
> - Added justification for this interface in the commit message!
> 
> Changes in v4:
> - Removed unnecessary moving of flags variable declaration
> 
> Changes in v3:
> - Renamed PM_THP to PM_HUGE_THP_MAPPING
> - Fixed checks to set PM_HUGE_THP_MAPPING
> - Added PM_HUGE_THP_MAPPING docs
> ---
>  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
>  fs/proc/task_mmu.c                       | 3 +++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index fdc19fbc10839..8a0f0064ff336 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -23,7 +23,8 @@ There are four components to pagemap:
>      * Bit  56    page exclusively mapped (since 4.2)
>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>        :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
> -    * Bits 57-60 zero
> +    * Bit  58    page is a huge (PMD size) THP mapping
> +    * Bits 59-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index ad667dbc96f5c..d784a97aa209a 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1302,6 +1302,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY		BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>  #define PM_UFFD_WP		BIT_ULL(57)
> +#define PM_THP_MAPPED		BIT_ULL(58)
>  #define PM_FILE			BIT_ULL(61)
>  #define PM_SWAP			BIT_ULL(62)
>  #define PM_PRESENT		BIT_ULL(63)
> @@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  
>  		if (page && page_mapcount(page) == 1)
>  			flags |= PM_MMAP_EXCLUSIVE;
> +		if (page && is_transparent_hugepage(page))
> +			flags |= PM_THP_MAPPED;
>  
>  		for (; addr != end; addr += PAGE_SIZE) {
>  			pagemap_entry_t pme = make_pme(frame, flags);
> 

Thanks!

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


  parent reply	other threads:[~2021-11-23 12:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
2021-11-23  1:10 ` Peter Xu
2021-11-23  1:50 ` David Rientjes
2021-11-23 12:05 ` David Hildenbrand [this message]
2021-11-23 20:51 ` Matthew Wilcox
2021-11-23 21:10   ` Mina Almasry
2021-11-23 21:30     ` Matthew Wilcox
2021-11-23 21:47       ` Mina Almasry
2021-11-23 22:03         ` Matthew Wilcox
2021-11-23 22:23           ` Mina Almasry
2021-11-23 22:59             ` Matthew Wilcox
2021-11-23 23:16               ` Mina Almasry
2021-11-28  4:10 ` Matthew Wilcox
2021-12-14  0:22   ` Mina Almasry
2022-01-04 23:04     ` Mina Almasry
2022-01-05  4:39       ` Matthew Wilcox
2022-01-11 23:35       ` William Kucharski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=476868f6-8d32-b8d2-855e-4b19e8a54cc2@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=corbet@lwn.net \
    --cc=florian.schmidt@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmckrcu@fb.com \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).