linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fs/proc/task_mmu: add guard region bit to pagemap
@ 2025-02-21 12:05 Lorenzo Stoakes
  2025-02-21 12:05 ` [PATCH 1/2] " Lorenzo Stoakes
  2025-02-21 12:05 ` [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap Lorenzo Stoakes
  0 siblings, 2 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-21 12:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api

Currently there is no means of determining whether a give page in a mapping
range is designated a guard region (as installed via madvise() using the
MADV_GUARD_INSTALL flag).

This is generally not an issue, but in some instances users may wish to
determine whether this is the case.

This series adds this ability via /proc/$pid/pagemap, updates the
documentation and adds a self test to assert that this functions correctly.

Lorenzo Stoakes (2):
  fs/proc/task_mmu: add guard region bit to pagemap
  tools/selftests: add guard region test for /proc/$pid/pagemap

 Documentation/admin-guide/mm/pagemap.rst   |  3 +-
 fs/proc/task_mmu.c                         |  6 ++-
 tools/testing/selftests/mm/guard-regions.c | 47 ++++++++++++++++++++++
 tools/testing/selftests/mm/vm_util.h       |  1 +
 4 files changed, 55 insertions(+), 2 deletions(-)

--
2.48.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-21 12:05 [PATCH 0/2] fs/proc/task_mmu: add guard region bit to pagemap Lorenzo Stoakes
@ 2025-02-21 12:05 ` Lorenzo Stoakes
  2025-02-21 17:10   ` Kalesh Singh
  2025-02-24  9:27   ` David Hildenbrand
  2025-02-21 12:05 ` [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap Lorenzo Stoakes
  1 sibling, 2 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-21 12:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api

Currently there is no means by which users can determine whether a given
page in memory is in fact a guard region, that is having had the
MADV_GUARD_INSTALL madvise() flag applied to it.

This is intentional, as to provide this information in VMA metadata would
contradict the intent of the feature (providing a means to change fault
behaviour at a page table level rather than a VMA level), and would require
VMA metadata operations to scan page tables, which is unacceptable.

In many cases, users have no need to reflect and determine what regions
have been designated guard regions, as it is the user who has established
them in the first place.

But in some instances, such as monitoring software, or software that relies
upon being able to ascertain the nature of mappings within a remote process
for instance, it becomes useful to be able to determine which pages have
the guard region marker applied.

This patch makes use of an unused pagemap bit (58) to provide this
information.

This patch updates the documentation at the same time as making the change
such that the implementation of the feature and the documentation of it are
tied together.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 Documentation/admin-guide/mm/pagemap.rst | 3 ++-
 fs/proc/task_mmu.c                       | 6 +++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index caba0f52dd36..a297e824f990 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -21,7 +21,8 @@ There are four components to pagemap:
     * Bit  56    page exclusively mapped (since 4.2)
     * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
       Documentation/admin-guide/mm/userfaultfd.rst)
-    * Bits 58-60 zero
+    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)
+    * Bits 59-60 zero
     * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f02cd362309a..c17615e21a5d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1632,6 +1632,7 @@ struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_GUARD_REGION		BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1732,6 +1733,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			page = pfn_swap_entry_to_page(entry);
 		if (pte_marker_entry_uffd_wp(entry))
 			flags |= PM_UFFD_WP;
+		if (is_guard_swp_entry(entry))
+			flags |=  PM_GUARD_REGION;
 	}
 
 	if (page) {
@@ -1931,7 +1934,8 @@ static const struct mm_walk_ops pagemap_ops = {
  * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
  * Bit  56    page exclusively mapped
  * Bit  57    pte is uffd-wp write-protected
- * Bits 58-60 zero
+ * Bit  58    pte is a guard region
+ * Bits 59-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap
  2025-02-21 12:05 [PATCH 0/2] fs/proc/task_mmu: add guard region bit to pagemap Lorenzo Stoakes
  2025-02-21 12:05 ` [PATCH 1/2] " Lorenzo Stoakes
@ 2025-02-21 12:05 ` Lorenzo Stoakes
  2025-02-21 13:51   ` Lorenzo Stoakes
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-21 12:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api

Add a test to the guard region self tests to assert that the
/proc/$pid/pagemap information now made availabile to the user correctly
identifies and reports guard regions.

As a part of this change, update vm_util.h to add the new bit (note there
is no header file in the kernel where this is exposed, the user is expected
to provide their own mask) and utilise the helper functions there for
pagemap functionality.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/selftests/mm/guard-regions.c | 47 ++++++++++++++++++++++
 tools/testing/selftests/mm/vm_util.h       |  1 +
 2 files changed, 48 insertions(+)

diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index ea9b5815e828..0c7183e8b661 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -19,6 +19,7 @@
 #include <sys/syscall.h>
 #include <sys/uio.h>
 #include <unistd.h>
+#include "vm_util.h"
 
 /*
  * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1:
@@ -2032,4 +2033,50 @@ TEST_F(guard_regions, anon_zeropage)
 	ASSERT_EQ(munmap(ptr, 10 * page_size), 0);
 }
 
+/*
+ * Assert that /proc/$pid/pagemap correctly identifies guard region ranges.
+ */
+TEST_F(guard_regions, pagemap)
+{
+	const unsigned long page_size = self->page_size;
+	int proc_fd;
+	char *ptr;
+	int i;
+
+	proc_fd = open("/proc/self/pagemap", O_RDONLY);
+	ASSERT_NE(proc_fd, -1);
+
+	ptr = mmap_(self, variant, NULL, 10 * page_size,
+		    PROT_READ | PROT_WRITE, 0, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+
+	/* Read from pagemap, and assert no guard regions are detected. */
+	for (i = 0; i < 10; i++) {
+		char *ptr_p = &ptr[i * page_size];
+		unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
+		unsigned long masked = entry & PM_GUARD_REGION_MASK;
+
+		ASSERT_EQ(masked, 0);
+	}
+
+	/* Install a guard region in every other page. */
+	for (i = 0; i < 10; i += 2) {
+		char *ptr_p = &ptr[i * page_size];
+
+		ASSERT_EQ(madvise(ptr_p, page_size, MADV_GUARD_INSTALL), 0);
+	}
+
+	/* Re-read from pagemap, and assert guard regions are detected. */
+	for (i = 0; i < 10; i++) {
+		char *ptr_p = &ptr[i * page_size];
+		unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
+		unsigned long masked = entry & PM_GUARD_REGION_MASK;
+
+		ASSERT_EQ(masked, i % 2 == 0 ? PM_GUARD_REGION_MASK : 0);
+	}
+
+	ASSERT_EQ(close(proc_fd), 0);
+	ASSERT_EQ(munmap(ptr, 10 * page_size), 0);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index b60ac68a9dc8..73a11443b7f6 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -10,6 +10,7 @@
 #define PM_SOFT_DIRTY                 BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE             BIT_ULL(56)
 #define PM_UFFD_WP                    BIT_ULL(57)
+#define PM_GUARD_REGION_MASK          BIT_ULL(58)
 #define PM_FILE                       BIT_ULL(61)
 #define PM_SWAP                       BIT_ULL(62)
 #define PM_PRESENT                    BIT_ULL(63)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap
  2025-02-21 12:05 ` [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap Lorenzo Stoakes
@ 2025-02-21 13:51   ` Lorenzo Stoakes
  2025-02-21 17:14     ` Kalesh Singh
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-21 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api

On Fri, Feb 21, 2025 at 12:05:23PM +0000, Lorenzo Stoakes wrote:
> Add a test to the guard region self tests to assert that the
> /proc/$pid/pagemap information now made availabile to the user correctly
> identifies and reports guard regions.
>
> As a part of this change, update vm_util.h to add the new bit (note there
> is no header file in the kernel where this is exposed, the user is expected
> to provide their own mask) and utilise the helper functions there for
> pagemap functionality.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Andrew - Apologies,

I managed to not commit a change I quickly made before sending this out
(I'm ill, seems it is having an impact...)

If the series is ok would you mind tacking on this fix-patch? It's simply
to rename a clumsily named define here.

No functional changes...

Thanks!

----8<----
From 60be19e88b3bfe9a6ec459115f0027721c494b30 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 21 Feb 2025 13:45:48 +0000
Subject: [PATCH] fixup define name

Fix badly named define so it's consistent with the others.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/selftests/mm/guard-regions.c | 6 +++---
 tools/testing/selftests/mm/vm_util.h       | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index 0c7183e8b661..280d1831bf73 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -2054,7 +2054,7 @@ TEST_F(guard_regions, pagemap)
 	for (i = 0; i < 10; i++) {
 		char *ptr_p = &ptr[i * page_size];
 		unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
-		unsigned long masked = entry & PM_GUARD_REGION_MASK;
+		unsigned long masked = entry & PM_GUARD_REGION;

 		ASSERT_EQ(masked, 0);
 	}
@@ -2070,9 +2070,9 @@ TEST_F(guard_regions, pagemap)
 	for (i = 0; i < 10; i++) {
 		char *ptr_p = &ptr[i * page_size];
 		unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
-		unsigned long masked = entry & PM_GUARD_REGION_MASK;
+		unsigned long masked = entry & PM_GUARD_REGION;

-		ASSERT_EQ(masked, i % 2 == 0 ? PM_GUARD_REGION_MASK : 0);
+		ASSERT_EQ(masked, i % 2 == 0 ? PM_GUARD_REGION : 0);
 	}

 	ASSERT_EQ(close(proc_fd), 0);
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 73a11443b7f6..0e629586556b 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -10,7 +10,7 @@
 #define PM_SOFT_DIRTY                 BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE             BIT_ULL(56)
 #define PM_UFFD_WP                    BIT_ULL(57)
-#define PM_GUARD_REGION_MASK          BIT_ULL(58)
+#define PM_GUARD_REGION               BIT_ULL(58)
 #define PM_FILE                       BIT_ULL(61)
 #define PM_SWAP                       BIT_ULL(62)
 #define PM_PRESENT                    BIT_ULL(63)
--
2.48.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-21 12:05 ` [PATCH 1/2] " Lorenzo Stoakes
@ 2025-02-21 17:10   ` Kalesh Singh
  2025-02-21 17:45     ` Lorenzo Stoakes
  2025-02-24  9:27   ` David Hildenbrand
  1 sibling, 1 reply; 15+ messages in thread
From: Kalesh Singh @ 2025-02-21 17:10 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
	Vlastimil Babka, Paul E . McKenney, Jann Horn, Juan Yescas,
	linux-mm, linux-doc, linux-kernel, linux-fsdevel, linux-kselftest,
	linux-api

On Fri, Feb 21, 2025 at 4:05 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Currently there is no means by which users can determine whether a given
> page in memory is in fact a guard region, that is having had the
> MADV_GUARD_INSTALL madvise() flag applied to it.
>
> This is intentional, as to provide this information in VMA metadata would
> contradict the intent of the feature (providing a means to change fault
> behaviour at a page table level rather than a VMA level), and would require
> VMA metadata operations to scan page tables, which is unacceptable.
>
> In many cases, users have no need to reflect and determine what regions
> have been designated guard regions, as it is the user who has established
> them in the first place.
>
> But in some instances, such as monitoring software, or software that relies
> upon being able to ascertain the nature of mappings within a remote process
> for instance, it becomes useful to be able to determine which pages have
> the guard region marker applied.
>
> This patch makes use of an unused pagemap bit (58) to provide this
> information.
>
> This patch updates the documentation at the same time as making the change
> such that the implementation of the feature and the documentation of it are
> tied together.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
>  fs/proc/task_mmu.c                       | 6 +++++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index caba0f52dd36..a297e824f990 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -21,7 +21,8 @@ There are four components to pagemap:
>      * Bit  56    page exclusively mapped (since 4.2)
>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>        Documentation/admin-guide/mm/userfaultfd.rst)
> -    * Bits 58-60 zero
> +    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)

Should this be 6.14 ?

Other than that: Reviewed-by: Kalesh Singh <kaleshsingh@google.com>

Thanks,
Kalesh

> +    * Bits 59-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f02cd362309a..c17615e21a5d 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1632,6 +1632,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY          BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
>  #define PM_UFFD_WP             BIT_ULL(57)
> +#define PM_GUARD_REGION                BIT_ULL(58)
>  #define PM_FILE                        BIT_ULL(61)
>  #define PM_SWAP                        BIT_ULL(62)
>  #define PM_PRESENT             BIT_ULL(63)
> @@ -1732,6 +1733,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                         page = pfn_swap_entry_to_page(entry);
>                 if (pte_marker_entry_uffd_wp(entry))
>                         flags |= PM_UFFD_WP;
> +               if (is_guard_swp_entry(entry))
> +                       flags |=  PM_GUARD_REGION;
>         }
>
>         if (page) {
> @@ -1931,7 +1934,8 @@ static const struct mm_walk_ops pagemap_ops = {
>   * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
>   * Bit  56    page exclusively mapped
>   * Bit  57    pte is uffd-wp write-protected
> - * Bits 58-60 zero
> + * Bit  58    pte is a guard region
> + * Bits 59-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap
  2025-02-21 13:51   ` Lorenzo Stoakes
@ 2025-02-21 17:14     ` Kalesh Singh
  0 siblings, 0 replies; 15+ messages in thread
From: Kalesh Singh @ 2025-02-21 17:14 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
	Vlastimil Babka, Paul E . McKenney, Jann Horn, Juan Yescas,
	linux-mm, linux-doc, linux-kernel, linux-fsdevel, linux-kselftest,
	linux-api

On Fri, Feb 21, 2025 at 5:51 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Feb 21, 2025 at 12:05:23PM +0000, Lorenzo Stoakes wrote:
> > Add a test to the guard region self tests to assert that the
> > /proc/$pid/pagemap information now made availabile to the user correctly
> > identifies and reports guard regions.
> >
> > As a part of this change, update vm_util.h to add the new bit (note there
> > is no header file in the kernel where this is exposed, the user is expected
> > to provide their own mask) and utilise the helper functions there for
> > pagemap functionality.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>

>
> Andrew - Apologies,
>
> I managed to not commit a change I quickly made before sending this out
> (I'm ill, seems it is having an impact...)
>
> If the series is ok would you mind tacking on this fix-patch? It's simply
> to rename a clumsily named define here.
>
> No functional changes...
>
> Thanks!
>
> ----8<----
> From 60be19e88b3bfe9a6ec459115f0027721c494b30 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Fri, 21 Feb 2025 13:45:48 +0000
> Subject: [PATCH] fixup define name
>
> Fix badly named define so it's consistent with the others.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  tools/testing/selftests/mm/guard-regions.c | 6 +++---
>  tools/testing/selftests/mm/vm_util.h       | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
> index 0c7183e8b661..280d1831bf73 100644
> --- a/tools/testing/selftests/mm/guard-regions.c
> +++ b/tools/testing/selftests/mm/guard-regions.c
> @@ -2054,7 +2054,7 @@ TEST_F(guard_regions, pagemap)
>         for (i = 0; i < 10; i++) {
>                 char *ptr_p = &ptr[i * page_size];
>                 unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
> -               unsigned long masked = entry & PM_GUARD_REGION_MASK;
> +               unsigned long masked = entry & PM_GUARD_REGION;
>
>                 ASSERT_EQ(masked, 0);
>         }
> @@ -2070,9 +2070,9 @@ TEST_F(guard_regions, pagemap)
>         for (i = 0; i < 10; i++) {
>                 char *ptr_p = &ptr[i * page_size];
>                 unsigned long entry = pagemap_get_entry(proc_fd, ptr_p);
> -               unsigned long masked = entry & PM_GUARD_REGION_MASK;
> +               unsigned long masked = entry & PM_GUARD_REGION;
>
> -               ASSERT_EQ(masked, i % 2 == 0 ? PM_GUARD_REGION_MASK : 0);
> +               ASSERT_EQ(masked, i % 2 == 0 ? PM_GUARD_REGION : 0);
>         }
>
>         ASSERT_EQ(close(proc_fd), 0);
> diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
> index 73a11443b7f6..0e629586556b 100644
> --- a/tools/testing/selftests/mm/vm_util.h
> +++ b/tools/testing/selftests/mm/vm_util.h
> @@ -10,7 +10,7 @@
>  #define PM_SOFT_DIRTY                 BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE             BIT_ULL(56)
>  #define PM_UFFD_WP                    BIT_ULL(57)
> -#define PM_GUARD_REGION_MASK          BIT_ULL(58)
> +#define PM_GUARD_REGION               BIT_ULL(58)
>  #define PM_FILE                       BIT_ULL(61)
>  #define PM_SWAP                       BIT_ULL(62)
>  #define PM_PRESENT                    BIT_ULL(63)
> --
> 2.48.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-21 17:10   ` Kalesh Singh
@ 2025-02-21 17:45     ` Lorenzo Stoakes
  0 siblings, 0 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-21 17:45 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, David Hildenbrand,
	Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
	Vlastimil Babka, Paul E . McKenney, Jann Horn, Juan Yescas,
	linux-mm, linux-doc, linux-kernel, linux-fsdevel, linux-kselftest,
	linux-api

On Fri, Feb 21, 2025 at 09:10:42AM -0800, Kalesh Singh wrote:
> On Fri, Feb 21, 2025 at 4:05 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > Currently there is no means by which users can determine whether a given
> > page in memory is in fact a guard region, that is having had the
> > MADV_GUARD_INSTALL madvise() flag applied to it.
> >
> > This is intentional, as to provide this information in VMA metadata would
> > contradict the intent of the feature (providing a means to change fault
> > behaviour at a page table level rather than a VMA level), and would require
> > VMA metadata operations to scan page tables, which is unacceptable.
> >
> > In many cases, users have no need to reflect and determine what regions
> > have been designated guard regions, as it is the user who has established
> > them in the first place.
> >
> > But in some instances, such as monitoring software, or software that relies
> > upon being able to ascertain the nature of mappings within a remote process
> > for instance, it becomes useful to be able to determine which pages have
> > the guard region marker applied.
> >
> > This patch makes use of an unused pagemap bit (58) to provide this
> > information.
> >
> > This patch updates the documentation at the same time as making the change
> > such that the implementation of the feature and the documentation of it are
> > tied together.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
> >  fs/proc/task_mmu.c                       | 6 +++++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> > index caba0f52dd36..a297e824f990 100644
> > --- a/Documentation/admin-guide/mm/pagemap.rst
> > +++ b/Documentation/admin-guide/mm/pagemap.rst
> > @@ -21,7 +21,8 @@ There are four components to pagemap:
> >      * Bit  56    page exclusively mapped (since 4.2)
> >      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
> >        Documentation/admin-guide/mm/userfaultfd.rst)
> > -    * Bits 58-60 zero
> > +    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)
>
> Should this be 6.14 ?

We're aiming for the 6.15 merge window so this is correct :>) I don't think
this could be considered a hotfix haha!

>
> Other than that: Reviewed-by: Kalesh Singh <kaleshsingh@google.com>

Thanks! And thanks for review on the other patch also! Appreciated.

>
> Thanks,
> Kalesh
>
> > +    * Bits 59-60 zero
> >      * Bit  61    page is file-page or shared-anon (since 3.5)
> >      * Bit  62    page swapped
> >      * Bit  63    page present
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index f02cd362309a..c17615e21a5d 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -1632,6 +1632,7 @@ struct pagemapread {
> >  #define PM_SOFT_DIRTY          BIT_ULL(55)
> >  #define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> >  #define PM_UFFD_WP             BIT_ULL(57)
> > +#define PM_GUARD_REGION                BIT_ULL(58)
> >  #define PM_FILE                        BIT_ULL(61)
> >  #define PM_SWAP                        BIT_ULL(62)
> >  #define PM_PRESENT             BIT_ULL(63)
> > @@ -1732,6 +1733,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> >                         page = pfn_swap_entry_to_page(entry);
> >                 if (pte_marker_entry_uffd_wp(entry))
> >                         flags |= PM_UFFD_WP;
> > +               if (is_guard_swp_entry(entry))
> > +                       flags |=  PM_GUARD_REGION;
> >         }
> >
> >         if (page) {
> > @@ -1931,7 +1934,8 @@ static const struct mm_walk_ops pagemap_ops = {
> >   * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
> >   * Bit  56    page exclusively mapped
> >   * Bit  57    pte is uffd-wp write-protected
> > - * Bits 58-60 zero
> > + * Bit  58    pte is a guard region
> > + * Bits 59-60 zero
> >   * Bit  61    page is file-page or shared-anon
> >   * Bit  62    page swapped
> >   * Bit  63    page present
> > --
> > 2.48.1
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-21 12:05 ` [PATCH 1/2] " Lorenzo Stoakes
  2025-02-21 17:10   ` Kalesh Singh
@ 2025-02-24  9:27   ` David Hildenbrand
  2025-02-24 10:18     ` Lorenzo Stoakes
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-24  9:27 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: Jonathan Corbet, Shuah Khan, Suren Baghdasaryan, Kalesh Singh,
	Liam R . Howlett, Matthew Wilcox, Vlastimil Babka,
	Paul E . McKenney, Jann Horn, Juan Yescas, linux-mm, linux-doc,
	linux-kernel, linux-fsdevel, linux-kselftest, linux-api

On 21.02.25 13:05, Lorenzo Stoakes wrote:
> Currently there is no means by which users can determine whether a given
> page in memory is in fact a guard region, that is having had the
> MADV_GUARD_INSTALL madvise() flag applied to it.
> 
> This is intentional, as to provide this information in VMA metadata would
> contradict the intent of the feature (providing a means to change fault
> behaviour at a page table level rather than a VMA level), and would require
> VMA metadata operations to scan page tables, which is unacceptable.
> 
> In many cases, users have no need to reflect and determine what regions
> have been designated guard regions, as it is the user who has established
> them in the first place.
> 
> But in some instances, such as monitoring software, or software that relies
> upon being able to ascertain the nature of mappings within a remote process
> for instance, it becomes useful to be able to determine which pages have
> the guard region marker applied.
> 
> This patch makes use of an unused pagemap bit (58) to provide this
> information.
> 
> This patch updates the documentation at the same time as making the change
> such that the implementation of the feature and the documentation of it are
> tied together.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---


Acked-by: David Hildenbrand <david@redhat.com>

Something that might be interesting is also extending the PAGEMAP_SCAN 
ioctl.


See do_pagemap_scan().

The benefit here might be that one could effectively search/filter for 
guard regions without copying 64bit per base-page to user space.

But the idea would be to indicate something like PAGE_IS_GUARD_REGION as 
a category when we hit a guard region entry in pagemap_page_category().

(the code is a bit complicated, and I am not sure why we indicate 
PAGE_IS_SWAPPED for non-swap entries, likely wrong ...)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-24  9:27   ` David Hildenbrand
@ 2025-02-24 10:18     ` Lorenzo Stoakes
  2025-02-24 10:37       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-24 10:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, Suren Baghdasaryan,
	Kalesh Singh, Liam R . Howlett, Matthew Wilcox, Vlastimil Babka,
	Paul E . McKenney, Jann Horn, Juan Yescas, linux-mm, linux-doc,
	linux-kernel, linux-fsdevel, linux-kselftest, linux-api

On Mon, Feb 24, 2025 at 10:27:28AM +0100, David Hildenbrand wrote:
> On 21.02.25 13:05, Lorenzo Stoakes wrote:
> > Currently there is no means by which users can determine whether a given
> > page in memory is in fact a guard region, that is having had the
> > MADV_GUARD_INSTALL madvise() flag applied to it.
> >
> > This is intentional, as to provide this information in VMA metadata would
> > contradict the intent of the feature (providing a means to change fault
> > behaviour at a page table level rather than a VMA level), and would require
> > VMA metadata operations to scan page tables, which is unacceptable.
> >
> > In many cases, users have no need to reflect and determine what regions
> > have been designated guard regions, as it is the user who has established
> > them in the first place.
> >
> > But in some instances, such as monitoring software, or software that relies
> > upon being able to ascertain the nature of mappings within a remote process
> > for instance, it becomes useful to be able to determine which pages have
> > the guard region marker applied.
> >
> > This patch makes use of an unused pagemap bit (58) to provide this
> > information.
> >
> > This patch updates the documentation at the same time as making the change
> > such that the implementation of the feature and the documentation of it are
> > tied together.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
>
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks! :)
>
> Something that might be interesting is also extending the PAGEMAP_SCAN
> ioctl.

Yeah, funny you should mention that, I did see that, but on reading the man
page it struck me that it requires the region to be uffd afaict? All the
tests seem to establish uffd, and the man page implies it:

       To start tracking the written state (flag) of a page or range of
       memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
       ioctl(2) on userfaultfd and memory range must be registered with
       UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.

It would be a bit of a weird edge case to add support there. I was excited
when I first saw this ioctl, then disappointed afterwards... but maybe I
got it wrong?

>
>
> See do_pagemap_scan().
>
> The benefit here might be that one could effectively search/filter for guard
> regions without copying 64bit per base-page to user space.
>
> But the idea would be to indicate something like PAGE_IS_GUARD_REGION as a
> category when we hit a guard region entry in pagemap_page_category().
>
> (the code is a bit complicated, and I am not sure why we indicate
> PAGE_IS_SWAPPED for non-swap entries, likely wrong ...)

Yeah, I could go on here about how much I hate how uffd does a 'parallel
implementation' of a ton of stuff and then chucks in if (uffd) { go do
something weird + wonderful } but I'll resist the urge :P :))

Do you think, if it were uffd-specific, this would be useful?

At any rate, I'm not sure it's _hugely_ beneficial in this form as pagemap
is binary in any case so you're not having to deal with overhead of parsing
a text file at least!

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-24 10:18     ` Lorenzo Stoakes
@ 2025-02-24 10:37       ` David Hildenbrand
  2025-02-24 10:49         ` Lorenzo Stoakes
  2025-03-19 18:22         ` Andrei Vagin
  0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-24 10:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, Suren Baghdasaryan,
	Kalesh Singh, Liam R . Howlett, Matthew Wilcox, Vlastimil Babka,
	Paul E . McKenney, Jann Horn, Juan Yescas, linux-mm, linux-doc,
	linux-kernel, linux-fsdevel, linux-kselftest, linux-api

On 24.02.25 11:18, Lorenzo Stoakes wrote:
> On Mon, Feb 24, 2025 at 10:27:28AM +0100, David Hildenbrand wrote:
>> On 21.02.25 13:05, Lorenzo Stoakes wrote:
>>> Currently there is no means by which users can determine whether a given
>>> page in memory is in fact a guard region, that is having had the
>>> MADV_GUARD_INSTALL madvise() flag applied to it.
>>>
>>> This is intentional, as to provide this information in VMA metadata would
>>> contradict the intent of the feature (providing a means to change fault
>>> behaviour at a page table level rather than a VMA level), and would require
>>> VMA metadata operations to scan page tables, which is unacceptable.
>>>
>>> In many cases, users have no need to reflect and determine what regions
>>> have been designated guard regions, as it is the user who has established
>>> them in the first place.
>>>
>>> But in some instances, such as monitoring software, or software that relies
>>> upon being able to ascertain the nature of mappings within a remote process
>>> for instance, it becomes useful to be able to determine which pages have
>>> the guard region marker applied.
>>>
>>> This patch makes use of an unused pagemap bit (58) to provide this
>>> information.
>>>
>>> This patch updates the documentation at the same time as making the change
>>> such that the implementation of the feature and the documentation of it are
>>> tied together.
>>>
>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> ---
>>
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
> 
> Thanks! :)
>>
>> Something that might be interesting is also extending the PAGEMAP_SCAN
>> ioctl.
> 
> Yeah, funny you should mention that, I did see that, but on reading the man
> page it struck me that it requires the region to be uffd afaict? All the
> tests seem to establish uffd, and the man page implies it:
> 
>         To start tracking the written state (flag) of a page or range of
>         memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
>         ioctl(2) on userfaultfd and memory range must be registered with
>         UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
> 
> It would be a bit of a weird edge case to add support there. I was excited
> when I first saw this ioctl, then disappointed afterwards... but maybe I
> got it wrong?
> 

I never managed to review that fully, but I thing that 
UFFD_FEATURE_WP_ASYNC thingy is only required for PM_SCAN_CHECK_WPASYNC 
and PM_SCAN_WP_MATCHING.

See pagemap_scan_test_walk().

I do recall that it works on any VMA.

Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for 
pagemap_is_swapped() and friends via page_entry_is() to sanity check 
that what pagemap gives us is consistent with what pagemap_scan gives us.

So it should work independent of the uffd magic.
I might be wrong, though ...

>>
>>
>> See do_pagemap_scan().
>>
>> The benefit here might be that one could effectively search/filter for guard
>> regions without copying 64bit per base-page to user space.
>>
>> But the idea would be to indicate something like PAGE_IS_GUARD_REGION as a
>> category when we hit a guard region entry in pagemap_page_category().
>>
>> (the code is a bit complicated, and I am not sure why we indicate
>> PAGE_IS_SWAPPED for non-swap entries, likely wrong ...)
> 
> Yeah, I could go on here about how much I hate how uffd does a 'parallel
> implementation' of a ton of stuff and then chucks in if (uffd) { go do
> something weird + wonderful } but I'll resist the urge :P :))
> 
> Do you think, if it were uffd-specific, this would be useful?

If it really is completely uffd-specific for now, I agree that we should 
rather leave it alone.

> 
> At any rate, I'm not sure it's _hugely_ beneficial in this form as pagemap
> is binary in any case so you're not having to deal with overhead of parsing
> a text file at least!

My thinking was, that if you have a large VMA, with ordinary pagemap you 
have to copy 8byte per entry (and have room for that somewhere in user 
space). In theory, with the scanning feature, you can leave that ... 
scanning to the kernel and don't have to do any copying/allocate space 
for it in user space etc.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-24 10:37       ` David Hildenbrand
@ 2025-02-24 10:49         ` Lorenzo Stoakes
  2025-02-24 13:28           ` David Hildenbrand
  2025-03-19 18:22         ` Andrei Vagin
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-02-24 10:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, Suren Baghdasaryan,
	Kalesh Singh, Liam R . Howlett, Matthew Wilcox, Vlastimil Babka,
	Paul E . McKenney, Jann Horn, Juan Yescas, linux-mm, linux-doc,
	linux-kernel, linux-fsdevel, linux-kselftest, linux-api

On Mon, Feb 24, 2025 at 11:37:24AM +0100, David Hildenbrand wrote:
> On 24.02.25 11:18, Lorenzo Stoakes wrote:
> > On Mon, Feb 24, 2025 at 10:27:28AM +0100, David Hildenbrand wrote:
> > > On 21.02.25 13:05, Lorenzo Stoakes wrote:
> > > > Currently there is no means by which users can determine whether a given
> > > > page in memory is in fact a guard region, that is having had the
> > > > MADV_GUARD_INSTALL madvise() flag applied to it.
> > > >
> > > > This is intentional, as to provide this information in VMA metadata would
> > > > contradict the intent of the feature (providing a means to change fault
> > > > behaviour at a page table level rather than a VMA level), and would require
> > > > VMA metadata operations to scan page tables, which is unacceptable.
> > > >
> > > > In many cases, users have no need to reflect and determine what regions
> > > > have been designated guard regions, as it is the user who has established
> > > > them in the first place.
> > > >
> > > > But in some instances, such as monitoring software, or software that relies
> > > > upon being able to ascertain the nature of mappings within a remote process
> > > > for instance, it becomes useful to be able to determine which pages have
> > > > the guard region marker applied.
> > > >
> > > > This patch makes use of an unused pagemap bit (58) to provide this
> > > > information.
> > > >
> > > > This patch updates the documentation at the same time as making the change
> > > > such that the implementation of the feature and the documentation of it are
> > > > tied together.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > ---
> > >
> > >
> > > Acked-by: David Hildenbrand <david@redhat.com>
> >
> > Thanks! :)
> > >
> > > Something that might be interesting is also extending the PAGEMAP_SCAN
> > > ioctl.
> >
> > Yeah, funny you should mention that, I did see that, but on reading the man
> > page it struck me that it requires the region to be uffd afaict? All the
> > tests seem to establish uffd, and the man page implies it:
> >
> >         To start tracking the written state (flag) of a page or range of
> >         memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
> >         ioctl(2) on userfaultfd and memory range must be registered with
> >         UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
> >
> > It would be a bit of a weird edge case to add support there. I was excited
> > when I first saw this ioctl, then disappointed afterwards... but maybe I
> > got it wrong?
> >
>
> I never managed to review that fully, but I thing that UFFD_FEATURE_WP_ASYNC
> thingy is only required for PM_SCAN_CHECK_WPASYNC and PM_SCAN_WP_MATCHING.
>
> See pagemap_scan_test_walk().
>
> I do recall that it works on any VMA.

Oh ok well that's handy then!

>
> Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for
> pagemap_is_swapped() and friends via page_entry_is() to sanity check that
> what pagemap gives us is consistent with what pagemap_scan gives us.
>
> So it should work independent of the uffd magic.
> I might be wrong, though ...

No a quick glance makes me think you're right actually.

>
> > >
> > >
> > > See do_pagemap_scan().
> > >
> > > The benefit here might be that one could effectively search/filter for guard
> > > regions without copying 64bit per base-page to user space.
> > >
> > > But the idea would be to indicate something like PAGE_IS_GUARD_REGION as a
> > > category when we hit a guard region entry in pagemap_page_category().
> > >
> > > (the code is a bit complicated, and I am not sure why we indicate
> > > PAGE_IS_SWAPPED for non-swap entries, likely wrong ...)
> >
> > Yeah, I could go on here about how much I hate how uffd does a 'parallel
> > implementation' of a ton of stuff and then chucks in if (uffd) { go do
> > something weird + wonderful } but I'll resist the urge :P :))
> >
> > Do you think, if it were uffd-specific, this would be useful?
>
> If it really is completely uffd-specific for now, I agree that we should
> rather leave it alone.

Yeah agreed.

>
> >
> > At any rate, I'm not sure it's _hugely_ beneficial in this form as pagemap
> > is binary in any case so you're not having to deal with overhead of parsing
> > a text file at least!
>
> My thinking was, that if you have a large VMA, with ordinary pagemap you
> have to copy 8byte per entry (and have room for that somewhere in user
> space). In theory, with the scanning feature, you can leave that ...
> scanning to the kernel and don't have to do any copying/allocate space for
> it in user space etc.

That makes perfect sense!

I think this one will go a little lower on priorities + I'll come back to it but
I"ll put it on the one reliable todo list I have, the whiteboard in my home
office :) everything on that list at least eventually gets looked at, majority
get done.

>
> --
> Cheers,
>
> David / dhildenb
>

Great minds think alike though ;) as soon as I saw this I did think about
extending it, but seems I mistakenly dismissed for uffd reasons.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-24 10:49         ` Lorenzo Stoakes
@ 2025-02-24 13:28           ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-24 13:28 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, Suren Baghdasaryan,
	Kalesh Singh, Liam R . Howlett, Matthew Wilcox, Vlastimil Babka,
	Paul E . McKenney, Jann Horn, Juan Yescas, linux-mm, linux-doc,
	linux-kernel, linux-fsdevel, linux-kselftest, linux-api

>> My thinking was, that if you have a large VMA, with ordinary pagemap you
>> have to copy 8byte per entry (and have room for that somewhere in user
>> space). In theory, with the scanning feature, you can leave that ...
>> scanning to the kernel and don't have to do any copying/allocate space for
>> it in user space etc.
> 
> That makes perfect sense!
> 
> I think this one will go a little lower on priorities + I'll come back to it but
> I"ll put it on the one reliable todo list I have, the whiteboard in my home
> office :) everything on that list at least eventually gets looked at, majority
> get done.

Sounds good. I'm sure Android folks will speak up in case they require 
more efficient scanning.

> 
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
> 
> Great minds think alike though ;) as soon as I saw this I did think about
> extending it, but seems I mistakenly dismissed for uffd reasons.
We should probably look into cleaning up + improving the documentation 
around the pagemap scan feature at some point. Well, something for 
another day :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-02-24 10:37       ` David Hildenbrand
  2025-02-24 10:49         ` Lorenzo Stoakes
@ 2025-03-19 18:22         ` Andrei Vagin
  2025-03-19 19:12           ` Lorenzo Stoakes
  1 sibling, 1 reply; 15+ messages in thread
From: Andrei Vagin @ 2025-03-19 18:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lorenzo Stoakes, Andrew Morton, Jonathan Corbet, Shuah Khan,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api, criu, Alexander Mikhalitsyn,
	Pavel Tikhomirov, Mike Rapoport

On Mon, Feb 24, 2025 at 2:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.02.25 11:18, Lorenzo Stoakes wrote:
> > On Mon, Feb 24, 2025 at 10:27:28AM +0100, David Hildenbrand wrote:
> >> On 21.02.25 13:05, Lorenzo Stoakes wrote:
> >>> Currently there is no means by which users can determine whether a given
> >>> page in memory is in fact a guard region, that is having had the
> >>> MADV_GUARD_INSTALL madvise() flag applied to it.
> >>>
> >>> This is intentional, as to provide this information in VMA metadata would
> >>> contradict the intent of the feature (providing a means to change fault
> >>> behaviour at a page table level rather than a VMA level), and would require
> >>> VMA metadata operations to scan page tables, which is unacceptable.
> >>>
> >>> In many cases, users have no need to reflect and determine what regions
> >>> have been designated guard regions, as it is the user who has established
> >>> them in the first place.
> >>>
> >>> But in some instances, such as monitoring software, or software that relies
> >>> upon being able to ascertain the nature of mappings within a remote process
> >>> for instance, it becomes useful to be able to determine which pages have
> >>> the guard region marker applied.
> >>>
> >>> This patch makes use of an unused pagemap bit (58) to provide this
> >>> information.
> >>>
> >>> This patch updates the documentation at the same time as making the change
> >>> such that the implementation of the feature and the documentation of it are
> >>> tied together.
> >>>
> >>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >>> ---
> >>
> >>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >
> > Thanks! :)
> >>
> >> Something that might be interesting is also extending the PAGEMAP_SCAN
> >> ioctl.
> >
> > Yeah, funny you should mention that, I did see that, but on reading the man
> > page it struck me that it requires the region to be uffd afaict? All the
> > tests seem to establish uffd, and the man page implies it:
> >
> >         To start tracking the written state (flag) of a page or range of
> >         memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
> >         ioctl(2) on userfaultfd and memory range must be registered with
> >         UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
> >
> > It would be a bit of a weird edge case to add support there. I was excited
> > when I first saw this ioctl, then disappointed afterwards... but maybe I
> > got it wrong?

> >
>
> I never managed to review that fully, but I thing that
> UFFD_FEATURE_WP_ASYNC thingy is only required for PM_SCAN_CHECK_WPASYNC
> and PM_SCAN_WP_MATCHING.
>
> See pagemap_scan_test_walk().
>
> I do recall that it works on any VMA.
>
> Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for
> pagemap_is_swapped() and friends via page_entry_is() to sanity check
> that what pagemap gives us is consistent with what pagemap_scan gives us.
>
> So it should work independent of the uffd magic.
> I might be wrong, though ...


PAGEMAP_SCAN can work without the UFFD magic. CRIU utilizes PAGEMAP_SCAN
as a more efficient alternative to /proc/pid/pagemap:
https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa3575691977eb21753/criu/pagemap-cache.c#L178

For CRIU, obtaining information about guard regions is critical.
Without this functionality in the kernel, CRIU is broken. We probably should
consider backporting these changes to the 6.13 and 6.14 stable branches.

>
> >>
> >>
> >> See do_pagemap_scan().
> >>
> >> The benefit here might be that one could effectively search/filter for guard
> >> regions without copying 64bit per base-page to user space.
> >>
> >> But the idea would be to indicate something like PAGE_IS_GUARD_REGION as a
> >> category when we hit a guard region entry in pagemap_page_category().
> >>
> >> (the code is a bit complicated, and I am not sure why we indicate
> >> PAGE_IS_SWAPPED for non-swap entries, likely wrong ...)
> >
> > Yeah, I could go on here about how much I hate how uffd does a 'parallel
> > implementation' of a ton of stuff and then chucks in if (uffd) { go do
> > something weird + wonderful } but I'll resist the urge :P :))
> >
> > Do you think, if it were uffd-specific, this would be useful?
>
> If it really is completely uffd-specific for now, I agree that we should
> rather leave it alone.
>
> >
> > At any rate, I'm not sure it's _hugely_ beneficial in this form as pagemap
> > is binary in any case so you're not having to deal with overhead of parsing
> > a text file at least!
>
> My thinking was, that if you have a large VMA, with ordinary pagemap you
> have to copy 8byte per entry (and have room for that somewhere in user
> space). In theory, with the scanning feature, you can leave that ...
> scanning to the kernel and don't have to do any copying/allocate space
> for it in user space etc.

PAGEMAP_SCAN doesn't have this issue and it was one of the reasons to
implement it.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-03-19 18:22         ` Andrei Vagin
@ 2025-03-19 19:12           ` Lorenzo Stoakes
  2025-03-19 23:57             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-19 19:12 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: David Hildenbrand, Andrew Morton, Jonathan Corbet, Shuah Khan,
	Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api, criu, Alexander Mikhalitsyn,
	Pavel Tikhomirov, Mike Rapoport, Greg Kroah-Hartman

+cc Greg for stable question

On Wed, Mar 19, 2025 at 11:22:40AM -0700, Andrei Vagin wrote:
> On Mon, Feb 24, 2025 at 2:39 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 24.02.25 11:18, Lorenzo Stoakes wrote:

[snip]
> > >>
> > >> Acked-by: David Hildenbrand <david@redhat.com>
> > >
> > > Thanks! :)
> > >>
> > >> Something that might be interesting is also extending the PAGEMAP_SCAN
> > >> ioctl.
> > >
> > > Yeah, funny you should mention that, I did see that, but on reading the man
> > > page it struck me that it requires the region to be uffd afaict? All the
> > > tests seem to establish uffd, and the man page implies it:
> > >
> > >         To start tracking the written state (flag) of a page or range of
> > >         memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
> > >         ioctl(2) on userfaultfd and memory range must be registered with
> > >         UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
> > >
> > > It would be a bit of a weird edge case to add support there. I was excited
> > > when I first saw this ioctl, then disappointed afterwards... but maybe I
> > > got it wrong?
>
> > >
> >
> > I never managed to review that fully, but I thing that
> > UFFD_FEATURE_WP_ASYNC thingy is only required for PM_SCAN_CHECK_WPASYNC
> > and PM_SCAN_WP_MATCHING.
> >
> > See pagemap_scan_test_walk().
> >
> > I do recall that it works on any VMA.
> >
> > Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for
> > pagemap_is_swapped() and friends via page_entry_is() to sanity check
> > that what pagemap gives us is consistent with what pagemap_scan gives us.
> >
> > So it should work independent of the uffd magic.
> > I might be wrong, though ...
>
>
> PAGEMAP_SCAN can work without the UFFD magic. CRIU utilizes PAGEMAP_SCAN
> as a more efficient alternative to /proc/pid/pagemap:
> https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa3575691977eb21753/criu/pagemap-cache.c#L178
>

Yeah we ascertained that - is on my list, LSF coming up next week means we
aren't great on timing here, but I'll prioritise this. When I'm back.

> For CRIU, obtaining information about guard regions is critical.
> Without this functionality in the kernel, CRIU is broken. We probably should
> consider backporting these changes to the 6.13 and 6.14 stable branches.
>

I'm not sure on precedent for backporting a feature like this - Greg? Am
happy to do it though.

As a stop gap we can backport the pagemap feature if Greg feels this is
appropriate?

[snip]

> > My thinking was, that if you have a large VMA, with ordinary pagemap you
> > have to copy 8byte per entry (and have room for that somewhere in user
> > space). In theory, with the scanning feature, you can leave that ...
> > scanning to the kernel and don't have to do any copying/allocate space
> > for it in user space etc.
>
> PAGEMAP_SCAN doesn't have this issue and it was one of the reasons to
> implement it.

Ack.

>
> Thanks,
> Andrei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/proc/task_mmu: add guard region bit to pagemap
  2025-03-19 19:12           ` Lorenzo Stoakes
@ 2025-03-19 23:57             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2025-03-19 23:57 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrei Vagin, David Hildenbrand, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Suren Baghdasaryan, Kalesh Singh, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, Paul E . McKenney, Jann Horn,
	Juan Yescas, linux-mm, linux-doc, linux-kernel, linux-fsdevel,
	linux-kselftest, linux-api, criu, Alexander Mikhalitsyn,
	Pavel Tikhomirov, Mike Rapoport

On Wed, Mar 19, 2025 at 07:12:45PM +0000, Lorenzo Stoakes wrote:
> +cc Greg for stable question
> 
> On Wed, Mar 19, 2025 at 11:22:40AM -0700, Andrei Vagin wrote:
> > On Mon, Feb 24, 2025 at 2:39 AM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 24.02.25 11:18, Lorenzo Stoakes wrote:
> 
> [snip]
> > > >>
> > > >> Acked-by: David Hildenbrand <david@redhat.com>
> > > >
> > > > Thanks! :)
> > > >>
> > > >> Something that might be interesting is also extending the PAGEMAP_SCAN
> > > >> ioctl.
> > > >
> > > > Yeah, funny you should mention that, I did see that, but on reading the man
> > > > page it struck me that it requires the region to be uffd afaict? All the
> > > > tests seem to establish uffd, and the man page implies it:
> > > >
> > > >         To start tracking the written state (flag) of a page or range of
> > > >         memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API
> > > >         ioctl(2) on userfaultfd and memory range must be registered with
> > > >         UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
> > > >
> > > > It would be a bit of a weird edge case to add support there. I was excited
> > > > when I first saw this ioctl, then disappointed afterwards... but maybe I
> > > > got it wrong?
> >
> > > >
> > >
> > > I never managed to review that fully, but I thing that
> > > UFFD_FEATURE_WP_ASYNC thingy is only required for PM_SCAN_CHECK_WPASYNC
> > > and PM_SCAN_WP_MATCHING.
> > >
> > > See pagemap_scan_test_walk().
> > >
> > > I do recall that it works on any VMA.
> > >
> > > Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for
> > > pagemap_is_swapped() and friends via page_entry_is() to sanity check
> > > that what pagemap gives us is consistent with what pagemap_scan gives us.
> > >
> > > So it should work independent of the uffd magic.
> > > I might be wrong, though ...
> >
> >
> > PAGEMAP_SCAN can work without the UFFD magic. CRIU utilizes PAGEMAP_SCAN
> > as a more efficient alternative to /proc/pid/pagemap:
> > https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa3575691977eb21753/criu/pagemap-cache.c#L178
> >
> 
> Yeah we ascertained that - is on my list, LSF coming up next week means we
> aren't great on timing here, but I'll prioritise this. When I'm back.
> 
> > For CRIU, obtaining information about guard regions is critical.
> > Without this functionality in the kernel, CRIU is broken. We probably should
> > consider backporting these changes to the 6.13 and 6.14 stable branches.
> >
> 
> I'm not sure on precedent for backporting a feature like this - Greg? Am
> happy to do it though.

If it's a regression, sure, we can take it for stable.

> As a stop gap we can backport the pagemap feature if Greg feels this is
> appropriate?

Which do the maintainers of the code feel is appropriate?  I'll defer to
them for making that call :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-03-19 23:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21 12:05 [PATCH 0/2] fs/proc/task_mmu: add guard region bit to pagemap Lorenzo Stoakes
2025-02-21 12:05 ` [PATCH 1/2] " Lorenzo Stoakes
2025-02-21 17:10   ` Kalesh Singh
2025-02-21 17:45     ` Lorenzo Stoakes
2025-02-24  9:27   ` David Hildenbrand
2025-02-24 10:18     ` Lorenzo Stoakes
2025-02-24 10:37       ` David Hildenbrand
2025-02-24 10:49         ` Lorenzo Stoakes
2025-02-24 13:28           ` David Hildenbrand
2025-03-19 18:22         ` Andrei Vagin
2025-03-19 19:12           ` Lorenzo Stoakes
2025-03-19 23:57             ` Greg Kroah-Hartman
2025-02-21 12:05 ` [PATCH 2/2] tools/selftests: add guard region test for /proc/$pid/pagemap Lorenzo Stoakes
2025-02-21 13:51   ` Lorenzo Stoakes
2025-02-21 17:14     ` Kalesh Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).