* [PATCH v4 0/3] mm/mprotect: Fix soft-dirty checks
@ 2022-07-25 14:20 Peter Xu
2022-07-25 14:20 ` [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable() Peter Xu
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Peter Xu @ 2022-07-25 14:20 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: peterx, Nadav Amit, Andrea Arcangeli, Andrew Morton,
David Hildenbrand
v4:
- Add r-bs
- Fix spellings and commit messages (on Fixes) [David]
- In the new test case unlink() after open() to not leave temp file even failed
v2: https://lore.kernel.org/lkml/20220720220324.88538-1-peterx@redhat.com
v3: https://lore.kernel.org/lkml/20220721183338.27871-1-peterx@redhat.com
Patch 1 is the previous patch and real fix. Two more test patches added to
add mprotect test to soft-dirty.c, meanwhile add soft-dirty test into the
vm test loop.
Please review, thanks.
Peter Xu (3):
mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
selftests: soft-dirty: Add test for mprotect
selftests: Add soft-dirty into run_vmtests.sh
mm/internal.h | 18 ++++++
mm/mmap.c | 2 +-
mm/mprotect.c | 2 +-
tools/testing/selftests/vm/run_vmtests.sh | 2 +
tools/testing/selftests/vm/soft-dirty.c | 67 ++++++++++++++++++++++-
5 files changed, 88 insertions(+), 3 deletions(-)
--
2.32.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-07-25 14:20 [PATCH v4 0/3] mm/mprotect: Fix soft-dirty checks Peter Xu
@ 2022-07-25 14:20 ` Peter Xu
2022-11-18 20:16 ` Muhammad Usama Anjum
2022-07-25 14:20 ` [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect Peter Xu
2022-07-25 14:20 ` [PATCH v4 3/3] selftests: Add soft-dirty into run_vmtests.sh Peter Xu
2 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-07-25 14:20 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: peterx, Nadav Amit, Andrea Arcangeli, Andrew Morton,
David Hildenbrand
The check wanted to make sure when soft-dirty tracking is enabled we won't
grant write bit by accident, as a page fault is needed for dirty tracking.
The intention is correct but we didn't check it right because VM_SOFTDIRTY
set actually means soft-dirty tracking disabled. Fix it.
There's another thing tricky about soft-dirty is that, we can't check the
vma flag !(vma_flags & VM_SOFTDIRTY) directly but only check it after we
checked CONFIG_MEM_SOFT_DIRTY because otherwise VM_SOFTDIRTY will be
defined as zero, and !(vma_flags & VM_SOFTDIRTY) will constantly return
true. To avoid misuse, introduce a helper for checking whether vma has
soft-dirty tracking enabled.
We can easily verify this with any exclusive anonymous page, like program
below:
=======8<======
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <inttypes.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdbool.h>
#define BIT_ULL(nr) (1ULL << (nr))
#define PM_SOFT_DIRTY BIT_ULL(55)
unsigned int psize;
char *page;
uint64_t pagemap_read_vaddr(int fd, void *vaddr)
{
uint64_t value;
int ret;
ret = pread(fd, &value, sizeof(uint64_t),
((uint64_t)vaddr >> 12) * sizeof(uint64_t));
assert(ret == sizeof(uint64_t));
return value;
}
void clear_refs_write(void)
{
int fd = open("/proc/self/clear_refs", O_RDWR);
assert(fd >= 0);
write(fd, "4", 2);
close(fd);
}
#define check_soft_dirty(str, expect) do { \
bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \
if (dirty != expect) { \
printf("ERROR: %s, soft-dirty=%d (expect: %d)\n", str, dirty, expect); \
exit(-1); \
} \
} while (0)
int main(void)
{
int fd = open("/proc/self/pagemap", O_RDONLY);
assert(fd >= 0);
psize = getpagesize();
page = mmap(NULL, psize, PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
assert(page != MAP_FAILED);
*page = 1;
check_soft_dirty("Just faulted in page", 1);
clear_refs_write();
check_soft_dirty("Clear_refs written", 0);
mprotect(page, psize, PROT_READ);
check_soft_dirty("Marked RO", 0);
mprotect(page, psize, PROT_READ|PROT_WRITE);
check_soft_dirty("Marked RW", 0);
*page = 2;
check_soft_dirty("Wrote page again", 1);
munmap(page, psize);
close(fd);
printf("Test passed.\n");
return 0;
}
=======8<======
Here we attach a Fixes to commit 64fe24a3e05e only for easy tracking, as
this patch won't apply to a tree before that point. However the commit
wasn't the source of problem, but instead 64e455079e1b. It's just that
after 64fe24a3e05e anonymous memory will also suffer from this problem with
mprotect().
Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Fixes: 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection")
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/internal.h | 18 ++++++++++++++++++
mm/mmap.c | 2 +-
mm/mprotect.c | 2 +-
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 15e8cb118832..e2d442e3c0b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -860,4 +860,22 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
+static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
+{
+ /*
+ * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
+ * enablements, because when without soft-dirty being compiled in,
+ * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
+ * will be constantly true.
+ */
+ if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+ return false;
+
+ /*
+ * Soft-dirty is kind of special: its tracking is enabled when the
+ * vma flags not set.
+ */
+ return !(vma->vm_flags & VM_SOFTDIRTY);
+}
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index 125e8903c93c..93f9913409ea 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1518,7 +1518,7 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
return 0;
/* Do we need to track softdirty? */
- if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY))
+ if (vma_soft_dirty_enabled(vma))
return 1;
/* Specialty mapping? */
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0420c3ed936c..c403e84129d4 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -49,7 +49,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
return false;
/* Do we need write faults for softdirty tracking? */
- if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
+ if (vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte))
return false;
/* Do we need write faults for uffd-wp tracking? */
--
2.32.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect
2022-07-25 14:20 [PATCH v4 0/3] mm/mprotect: Fix soft-dirty checks Peter Xu
2022-07-25 14:20 ` [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable() Peter Xu
@ 2022-07-25 14:20 ` Peter Xu
2022-07-25 14:28 ` David Hildenbrand
2022-07-25 14:20 ` [PATCH v4 3/3] selftests: Add soft-dirty into run_vmtests.sh Peter Xu
2 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-07-25 14:20 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: peterx, Nadav Amit, Andrea Arcangeli, Andrew Morton,
David Hildenbrand
Add two soft-dirty test cases for mprotect() on both anon or file.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/vm/soft-dirty.c | 67 ++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/soft-dirty.c b/tools/testing/selftests/vm/soft-dirty.c
index 08ab62a4a9d0..e3a43f5d4fa2 100644
--- a/tools/testing/selftests/vm/soft-dirty.c
+++ b/tools/testing/selftests/vm/soft-dirty.c
@@ -121,13 +121,76 @@ static void test_hugepage(int pagemap_fd, int pagesize)
free(map);
}
+static void test_mprotect(int pagemap_fd, int pagesize, bool anon)
+{
+ const char *type[] = {"file", "anon"};
+ const char *fname = "./soft-dirty-test-file";
+ int test_fd;
+ char *map;
+
+ if (anon) {
+ map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE,
+ MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ if (!map)
+ ksft_exit_fail_msg("anon mmap failed\n");
+ } else {
+ test_fd = open(fname, O_RDWR | O_CREAT);
+ if (test_fd < 0) {
+ ksft_test_result_skip("Test %s open() file failed\n", __func__);
+ return;
+ }
+ unlink(fname);
+ ftruncate(test_fd, pagesize);
+ map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE,
+ MAP_SHARED, test_fd, 0);
+ if (!map)
+ ksft_exit_fail_msg("file mmap failed\n");
+ }
+
+ *map = 1;
+ ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 1,
+ "Test %s-%s dirty bit of new written page\n",
+ __func__, type[anon]);
+ clear_softdirty();
+ ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
+ "Test %s-%s soft-dirty clear after clear_refs\n",
+ __func__, type[anon]);
+ mprotect(map, pagesize, PROT_READ);
+ ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
+ "Test %s-%s soft-dirty clear after marking RO\n",
+ __func__, type[anon]);
+ mprotect(map, pagesize, PROT_READ|PROT_WRITE);
+ ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
+ "Test %s-%s soft-dirty clear after marking RW\n",
+ __func__, type[anon]);
+ *map = 2;
+ ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 1,
+ "Test %s-%s soft-dirty after rewritten\n",
+ __func__, type[anon]);
+
+ munmap(map, pagesize);
+
+ if (!anon)
+ close(test_fd);
+}
+
+static void test_mprotect_anon(int pagemap_fd, int pagesize)
+{
+ test_mprotect(pagemap_fd, pagesize, true);
+}
+
+static void test_mprotect_file(int pagemap_fd, int pagesize)
+{
+ test_mprotect(pagemap_fd, pagesize, false);
+}
+
int main(int argc, char **argv)
{
int pagemap_fd;
int pagesize;
ksft_print_header();
- ksft_set_plan(5);
+ ksft_set_plan(15);
pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
if (pagemap_fd < 0)
@@ -138,6 +201,8 @@ int main(int argc, char **argv)
test_simple(pagemap_fd, pagesize);
test_vma_reuse(pagemap_fd, pagesize);
test_hugepage(pagemap_fd, pagesize);
+ test_mprotect_anon(pagemap_fd, pagesize);
+ test_mprotect_file(pagemap_fd, pagesize);
close(pagemap_fd);
--
2.32.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 3/3] selftests: Add soft-dirty into run_vmtests.sh
2022-07-25 14:20 [PATCH v4 0/3] mm/mprotect: Fix soft-dirty checks Peter Xu
2022-07-25 14:20 ` [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable() Peter Xu
2022-07-25 14:20 ` [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect Peter Xu
@ 2022-07-25 14:20 ` Peter Xu
2 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2022-07-25 14:20 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: peterx, Nadav Amit, Andrea Arcangeli, Andrew Morton,
David Hildenbrand
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/vm/run_vmtests.sh | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
index 2af563a9652e..de86983b8a0f 100755
--- a/tools/testing/selftests/vm/run_vmtests.sh
+++ b/tools/testing/selftests/vm/run_vmtests.sh
@@ -190,4 +190,6 @@ then
run_test ./protection_keys_64
fi
+run_test ./soft-dirty
+
exit $exitcode
--
2.32.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect
2022-07-25 14:20 ` [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect Peter Xu
@ 2022-07-25 14:28 ` David Hildenbrand
0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2022-07-25 14:28 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: Nadav Amit, Andrea Arcangeli, Andrew Morton
On 25.07.22 16:20, Peter Xu wrote:
> Add two soft-dirty test cases for mprotect() on both anon or file.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> tools/testing/selftests/vm/soft-dirty.c | 67 ++++++++++++++++++++++++-
> 1 file changed, 66 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/vm/soft-dirty.c b/tools/testing/selftests/vm/soft-dirty.c
> index 08ab62a4a9d0..e3a43f5d4fa2 100644
> --- a/tools/testing/selftests/vm/soft-dirty.c
> +++ b/tools/testing/selftests/vm/soft-dirty.c
> @@ -121,13 +121,76 @@ static void test_hugepage(int pagemap_fd, int pagesize)
> free(map);
> }
>
> +static void test_mprotect(int pagemap_fd, int pagesize, bool anon)
> +{
> + const char *type[] = {"file", "anon"};
> + const char *fname = "./soft-dirty-test-file";
> + int test_fd;
> + char *map;
> +
> + if (anon) {
> + map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE,
> + MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> + if (!map)
> + ksft_exit_fail_msg("anon mmap failed\n");
> + } else {
> + test_fd = open(fname, O_RDWR | O_CREAT);
> + if (test_fd < 0) {
> + ksft_test_result_skip("Test %s open() file failed\n", __func__);
> + return;
> + }
> + unlink(fname);
> + ftruncate(test_fd, pagesize);
> + map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE,
> + MAP_SHARED, test_fd, 0);
> + if (!map)
> + ksft_exit_fail_msg("file mmap failed\n");
> + }
> +
> + *map = 1;
> + ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 1,
> + "Test %s-%s dirty bit of new written page\n",
> + __func__, type[anon]);
> + clear_softdirty();
> + ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
> + "Test %s-%s soft-dirty clear after clear_refs\n",
> + __func__, type[anon]);
> + mprotect(map, pagesize, PROT_READ);
> + ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
> + "Test %s-%s soft-dirty clear after marking RO\n",
> + __func__, type[anon]);
> + mprotect(map, pagesize, PROT_READ|PROT_WRITE);
> + ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 0,
> + "Test %s-%s soft-dirty clear after marking RW\n",
> + __func__, type[anon]);
> + *map = 2;
> + ksft_test_result(pagemap_is_softdirty(pagemap_fd, map) == 1,
> + "Test %s-%s soft-dirty after rewritten\n",
> + __func__, type[anon]);
> +
> + munmap(map, pagesize);
> +
> + if (!anon)
> + close(test_fd);
> +}
> +
> +static void test_mprotect_anon(int pagemap_fd, int pagesize)
> +{
> + test_mprotect(pagemap_fd, pagesize, true);
> +}
> +
> +static void test_mprotect_file(int pagemap_fd, int pagesize)
> +{
> + test_mprotect(pagemap_fd, pagesize, false);
> +}
> +
> int main(int argc, char **argv)
> {
> int pagemap_fd;
> int pagesize;
>
> ksft_print_header();
> - ksft_set_plan(5);
> + ksft_set_plan(15);
>
> pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
> if (pagemap_fd < 0)
> @@ -138,6 +201,8 @@ int main(int argc, char **argv)
> test_simple(pagemap_fd, pagesize);
> test_vma_reuse(pagemap_fd, pagesize);
> test_hugepage(pagemap_fd, pagesize);
> + test_mprotect_anon(pagemap_fd, pagesize);
> + test_mprotect_file(pagemap_fd, pagesize);
>
> close(pagemap_fd);
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-07-25 14:20 ` [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable() Peter Xu
@ 2022-11-18 20:16 ` Muhammad Usama Anjum
2022-11-18 23:14 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-11-18 20:16 UTC (permalink / raw)
To: Peter Xu, David Hildenbrand
Cc: Muhammad Usama Anjum, Nadav Amit, Andrea Arcangeli, Andrew Morton,
linux-kernel, linux-mm
Hi Peter and David,
On 7/25/22 7:20 PM, Peter Xu wrote:
> The check wanted to make sure when soft-dirty tracking is enabled we won't
> grant write bit by accident, as a page fault is needed for dirty tracking.
> The intention is correct but we didn't check it right because VM_SOFTDIRTY
> set actually means soft-dirty tracking disabled. Fix it.
[...]
> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> +{
> + /*
> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
> + * enablements, because when without soft-dirty being compiled in,
> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> + * will be constantly true.
> + */
> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> + return false;
> +
> + /*
> + * Soft-dirty is kind of special: its tracking is enabled when the
> + * vma flags not set.
> + */
> + return !(vma->vm_flags & VM_SOFTDIRTY);
> +}
I'm sorry. I'm unable to understand the inversion here.
> its tracking is enabled when the vma flags not set.
VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
soft-dirty. When we write to clear_refs to clear soft-dirty bit,
VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
is enabled when the vma flags not set? I'm missing some obvious thing.
Maybe the meaning of tracking is to see if VM_SOFTDIRTY needs to be set. If
VM_SOFTDIRTY is already set, tracking isn't needed. Can you give an example
here?
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-11-18 20:16 ` Muhammad Usama Anjum
@ 2022-11-18 23:14 ` Peter Xu
2022-11-21 14:57 ` Muhammad Usama Anjum
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-11-18 23:14 UTC (permalink / raw)
To: Muhammad Usama Anjum
Cc: David Hildenbrand, Nadav Amit, Andrea Arcangeli, Andrew Morton,
linux-kernel, linux-mm
On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
> Hi Peter and David,
Hi, Muhammad,
>
> On 7/25/22 7:20 PM, Peter Xu wrote:
> > The check wanted to make sure when soft-dirty tracking is enabled we won't
> > grant write bit by accident, as a page fault is needed for dirty tracking.
> > The intention is correct but we didn't check it right because VM_SOFTDIRTY
> > set actually means soft-dirty tracking disabled. Fix it.
> [...]
> > +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> > +{
> > + /*
> > + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
> > + * enablements, because when without soft-dirty being compiled in,
> > + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> > + * will be constantly true.
> > + */
> > + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> > + return false;
> > +
> > + /*
> > + * Soft-dirty is kind of special: its tracking is enabled when the
> > + * vma flags not set.
> > + */
> > + return !(vma->vm_flags & VM_SOFTDIRTY);
> > +}
> I'm sorry. I'm unable to understand the inversion here.
> > its tracking is enabled when the vma flags not set.
> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
> is enabled when the vma flags not set?
Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
only until then the real tracking starts (by removing write bits on ptes).
> I'm missing some obvious thing. Maybe the meaning of tracking is to see
> if VM_SOFTDIRTY needs to be set. If VM_SOFTDIRTY is already set, tracking
> isn't needed. Can you give an example here?
If VM_SOFTDIRTY is set, pagemap will treat all pages as soft-dirty, please
see pagemap_pmd_range():
if (vma->vm_flags & VM_SOFTDIRTY)
flags |= PM_SOFT_DIRTY;
So fundamentally it reports nothing useful when VM_SOFTDIRTY set. That's
also why we need the clear_refs first before we can have anything useful.
Feel free to reference to the doc page (admin-guide/mm/soft-dirty.rst):
---8<---
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should
1. Clear soft-dirty bits from the task's PTEs.
This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the
task in question.
2. Wait some time.
3. Read soft-dirty bits from the PTEs.
This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the
64-bit qword is the soft-dirty one. If set, the respective PTE was
written to since step 1.
---8<---
The tracking starts at step 1, where is when the flag is cleared.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-11-18 23:14 ` Peter Xu
@ 2022-11-21 14:57 ` Muhammad Usama Anjum
2022-11-21 21:17 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-11-21 14:57 UTC (permalink / raw)
To: Peter Xu
Cc: Muhammad Usama Anjum, David Hildenbrand, Nadav Amit,
Andrea Arcangeli, Andrew Morton, linux-kernel, linux-mm
Hi Peter,
Thank you so much for replying.
On 11/19/22 4:14 AM, Peter Xu wrote:
> On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
>> Hi Peter and David,
>
> Hi, Muhammad,
>
>>
>> On 7/25/22 7:20 PM, Peter Xu wrote:
>>> The check wanted to make sure when soft-dirty tracking is enabled we won't
>>> grant write bit by accident, as a page fault is needed for dirty tracking.
>>> The intention is correct but we didn't check it right because VM_SOFTDIRTY
>>> set actually means soft-dirty tracking disabled. Fix it.
>> [...]
>>> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
>>> +{
>>> + /*
>>> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
>>> + * enablements, because when without soft-dirty being compiled in,
>>> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
>>> + * will be constantly true.
>>> + */
>>> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>> + return false;
>>> +
>>> + /*
>>> + * Soft-dirty is kind of special: its tracking is enabled when the
>>> + * vma flags not set.
>>> + */
>>> + return !(vma->vm_flags & VM_SOFTDIRTY);
>>> +}
>> I'm sorry. I'm unable to understand the inversion here.
>>> its tracking is enabled when the vma flags not set.
>> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
>> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
>> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
>> is enabled when the vma flags not set?
>
> Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
> only until then the real tracking starts (by removing write bits on ptes).
But even if the VM_SOFTDIRTY is set on the VMA, the individual pages are
still marked soft-dirty. Both are independent.
It means tracking is enabled all the time in individual pages. Only the
soft-dirty bit status in individual page isn't significant if VM_SOFTDIRTY
already is set. Right?
>
>> I'm missing some obvious thing. Maybe the meaning of tracking is to see
>> if VM_SOFTDIRTY needs to be set. If VM_SOFTDIRTY is already set, tracking
>> isn't needed. Can you give an example here?
>
> If VM_SOFTDIRTY is set, pagemap will treat all pages as soft-dirty, please
> see pagemap_pmd_range():
>
> if (vma->vm_flags & VM_SOFTDIRTY)
> flags |= PM_SOFT_DIRTY;
>
> So fundamentally it reports nothing useful when VM_SOFTDIRTY set. That's
> also why we need the clear_refs first before we can have anything useful.
>
> Feel free to reference to the doc page (admin-guide/mm/soft-dirty.rst):
>
> ---8<---
> The soft-dirty is a bit on a PTE which helps to track which pages a task
> writes to. In order to do this tracking one should
>
> 1. Clear soft-dirty bits from the task's PTEs.
>
> This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the
> task in question.
>
> 2. Wait some time.
>
> 3. Read soft-dirty bits from the PTEs.
>
> This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the
> 64-bit qword is the soft-dirty one. If set, the respective PTE was
> written to since step 1.
> ---8<---
>
> The tracking starts at step 1, where is when the flag is cleared.
>
> Thanks,
>
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-11-21 14:57 ` Muhammad Usama Anjum
@ 2022-11-21 21:17 ` Peter Xu
2022-12-19 12:19 ` Muhammad Usama Anjum
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-11-21 21:17 UTC (permalink / raw)
To: Muhammad Usama Anjum
Cc: David Hildenbrand, Nadav Amit, Andrea Arcangeli, Andrew Morton,
linux-kernel, linux-mm
On Mon, Nov 21, 2022 at 07:57:05PM +0500, Muhammad Usama Anjum wrote:
> Hi Peter,
>
> Thank you so much for replying.
>
> On 11/19/22 4:14 AM, Peter Xu wrote:
> > On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
> >> Hi Peter and David,
> >
> > Hi, Muhammad,
> >
> >>
> >> On 7/25/22 7:20 PM, Peter Xu wrote:
> >>> The check wanted to make sure when soft-dirty tracking is enabled we won't
> >>> grant write bit by accident, as a page fault is needed for dirty tracking.
> >>> The intention is correct but we didn't check it right because VM_SOFTDIRTY
> >>> set actually means soft-dirty tracking disabled. Fix it.
> >> [...]
> >>> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> >>> +{
> >>> + /*
> >>> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
> >>> + * enablements, because when without soft-dirty being compiled in,
> >>> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> >>> + * will be constantly true.
> >>> + */
> >>> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>> + return false;
> >>> +
> >>> + /*
> >>> + * Soft-dirty is kind of special: its tracking is enabled when the
> >>> + * vma flags not set.
> >>> + */
> >>> + return !(vma->vm_flags & VM_SOFTDIRTY);
> >>> +}
> >> I'm sorry. I'm unable to understand the inversion here.
> >>> its tracking is enabled when the vma flags not set.
> >> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
> >> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
> >> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
> >> is enabled when the vma flags not set?
> >
> > Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
> > only until then the real tracking starts (by removing write bits on ptes).
> But even if the VM_SOFTDIRTY is set on the VMA, the individual pages are
> still marked soft-dirty. Both are independent.
>
> It means tracking is enabled all the time in individual pages.
IMHO it depends on how we define "tracking enabled" - before clear_refs
even if no pages written they'll also be reported as dirty, then the
information is useless.
> Only the soft-dirty bit status in individual page isn't significant if
> VM_SOFTDIRTY already is set. Right?
Yes. But I'd say it makes more sense to say "tracking enabled" if we
really started tracking (by removing the write bits in ptes, otherwise we
did nothing). When vma created we didn't track anything.
I don't know the rational of why soft-dirty was defined like that. I think
it's somehow related to the fact that we allow false positive dirty pages
not false negative. IOW, it's a bug to leak a page being dirtied, but not
vice versa if we report clean page dirty.
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-11-21 21:17 ` Peter Xu
@ 2022-12-19 12:19 ` Muhammad Usama Anjum
2022-12-20 16:03 ` Peter Xu
2022-12-28 14:14 ` Muhammad Usama Anjum
0 siblings, 2 replies; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-12-19 12:19 UTC (permalink / raw)
To: Peter Xu, David Hildenbrand, Cyrill Gorcunov, Andrew Morton,
Paul Gofman
Cc: Muhammad Usama Anjum, Nadav Amit, Andrea Arcangeli, linux-kernel,
linux-mm, kernel
On 11/22/22 2:17 AM, Peter Xu wrote:
> On Mon, Nov 21, 2022 at 07:57:05PM +0500, Muhammad Usama Anjum wrote:
>> Hi Peter,
>>
>> Thank you so much for replying.
>>
>> On 11/19/22 4:14 AM, Peter Xu wrote:
>>> On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
>>>> Hi Peter and David,
>>>
>>> Hi, Muhammad,
>>>
>>>>
>>>> On 7/25/22 7:20 PM, Peter Xu wrote:
>>>>> The check wanted to make sure when soft-dirty tracking is enabled we won't
>>>>> grant write bit by accident, as a page fault is needed for dirty tracking.
>>>>> The intention is correct but we didn't check it right because VM_SOFTDIRTY
>>>>> set actually means soft-dirty tracking disabled. Fix it.
>>>> [...]
>>>>> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
>>>>> +{
>>>>> + /*
>>>>> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
>>>>> + * enablements, because when without soft-dirty being compiled in,
>>>>> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
>>>>> + * will be constantly true.
>>>>> + */
>>>>> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>>>> + return false;
>>>>> +
>>>>> + /*
>>>>> + * Soft-dirty is kind of special: its tracking is enabled when the
>>>>> + * vma flags not set.
>>>>> + */
>>>>> + return !(vma->vm_flags & VM_SOFTDIRTY);
>>>>> +}
>>>> I'm sorry. I'm unable to understand the inversion here.
>>>>> its tracking is enabled when the vma flags not set.
>>>> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
>>>> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
>>>> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
>>>> is enabled when the vma flags not set?
>>>
>>> Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
>>> only until then the real tracking starts (by removing write bits on ptes).
>> But even if the VM_SOFTDIRTY is set on the VMA, the individual pages are
>> still marked soft-dirty. Both are independent.
>>
>> It means tracking is enabled all the time in individual pages.
Addition of vma_soft_dirty_enabled() has tinkered with the soft-dirty PTE
bit status setting. The internal behavior has changed. The test case was
shared by David
(https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/).
The explanation is as following:
_Before_ addition of this patch(76aefad628aae),
m = mmap(2 pages)
clear_softdirty()
mremap(m + pag_size)
mprotect(READ)
mprotect(READ | WRITE);
memset(m)
After memset(),
PAGE-1 PAGE-2
VM_SOFTDIRTY set set
PTE softdirty flag set set
/proc//pagemap view set set
_After_ addition of this patch(76aefad628aae)
m = mmap(2 pages)
clear_softdirty()
mremap(m + page_size)
mprotect(READ)
mprotect(READ | WRITE);
memset(m)
After memset(),
PAGE-1 PAGE-2
VM_SOFTDIRTY set set
PTE softdirty flag *not set* set
/proc//pagemap view set set
The user's point of view hasn't changed. But internally after this patch,
the soft-dirty tracking in PTEs gets turn off if VM_SOFTDIRTY is set. The
soft-dirty tracking in the PTEs shouldn't be just turned off when mprotect
is used. Why? Because soft-dirty tracking in the PTEs is always enabled
regardless of VM_SOFTDIRTY is set or not. Example:
m = mem(2 pages)
At this point:
PAGE-1 PAGE-2
VM_SOFTDIRTY set set
PTE softdirty flag not set not set
/proc//pagemap view set set
memset(m)
At this point:
PAGE-1 PAGE-2
VM_SOFTDIRTY set set
PTE softdirty flag set set
/proc//pagemap view set set
This example proves that soft-dirty flag on the PTE is set regardless of
the VM_SOFTDIRTY.
The simplest hack to get rid this changed behavior and revert to the
previous behaviour is as following:
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -860,6 +860,8 @@ static inline bool vma_soft_dirty_enabled(struct
vm_area_struct *vma)
if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
return false;
+ return true;
+
/*
* Soft-dirty is kind of special: its tracking is enabled when the
* vma flags not set.
I was trying to verify this hack. But I couldn't previously until @Paul has
mentioned this again. I've verified with limited tests that this hack
in-deed works. We are unaware that does this hack create problems in other
areas or not. We can think of better way to solve this. Once we get the
comments from the community.
This internal behavior change is affecting the new feature addition to the
soft-dirty flag which is already delicate and has issues.
(https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.com/)
>
> IMHO it depends on how we define "tracking enabled" - before clear_refs
> even if no pages written they'll also be reported as dirty, then the
> information is useless.
>
>> Only the soft-dirty bit status in individual page isn't significant if
>> VM_SOFTDIRTY already is set. Right?
>
> Yes. But I'd say it makes more sense to say "tracking enabled" if we
> really started tracking (by removing the write bits in ptes, otherwise we
> did nothing). When vma created we didn't track anything.
>
> I don't know the rational of why soft-dirty was defined like that. I think
> it's somehow related to the fact that we allow false positive dirty pages
> not false negative. IOW, it's a bug to leak a page being dirtied, but not
> vice versa if we report clean page dirty.
>
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-19 12:19 ` Muhammad Usama Anjum
@ 2022-12-20 16:03 ` Peter Xu
2022-12-20 18:15 ` Muhammad Usama Anjum
2022-12-28 14:14 ` Muhammad Usama Anjum
1 sibling, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-12-20 16:03 UTC (permalink / raw)
To: Muhammad Usama Anjum
Cc: David Hildenbrand, Cyrill Gorcunov, Andrew Morton, Paul Gofman,
Nadav Amit, Andrea Arcangeli, linux-kernel, linux-mm, kernel
On Mon, Dec 19, 2022 at 05:19:12PM +0500, Muhammad Usama Anjum wrote:
> On 11/22/22 2:17 AM, Peter Xu wrote:
> > On Mon, Nov 21, 2022 at 07:57:05PM +0500, Muhammad Usama Anjum wrote:
> >> Hi Peter,
> >>
> >> Thank you so much for replying.
> >>
> >> On 11/19/22 4:14 AM, Peter Xu wrote:
> >>> On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
> >>>> Hi Peter and David,
> >>>
> >>> Hi, Muhammad,
> >>>
> >>>>
> >>>> On 7/25/22 7:20 PM, Peter Xu wrote:
> >>>>> The check wanted to make sure when soft-dirty tracking is enabled we won't
> >>>>> grant write bit by accident, as a page fault is needed for dirty tracking.
> >>>>> The intention is correct but we didn't check it right because VM_SOFTDIRTY
> >>>>> set actually means soft-dirty tracking disabled. Fix it.
> >>>> [...]
> >>>>> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> >>>>> +{
> >>>>> + /*
> >>>>> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
> >>>>> + * enablements, because when without soft-dirty being compiled in,
> >>>>> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> >>>>> + * will be constantly true.
> >>>>> + */
> >>>>> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>>>> + return false;
> >>>>> +
> >>>>> + /*
> >>>>> + * Soft-dirty is kind of special: its tracking is enabled when the
> >>>>> + * vma flags not set.
> >>>>> + */
> >>>>> + return !(vma->vm_flags & VM_SOFTDIRTY);
> >>>>> +}
> >>>> I'm sorry. I'm unable to understand the inversion here.
> >>>>> its tracking is enabled when the vma flags not set.
> >>>> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
> >>>> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
> >>>> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
> >>>> is enabled when the vma flags not set?
> >>>
> >>> Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
> >>> only until then the real tracking starts (by removing write bits on ptes).
> >> But even if the VM_SOFTDIRTY is set on the VMA, the individual pages are
> >> still marked soft-dirty. Both are independent.
> >>
> >> It means tracking is enabled all the time in individual pages.
> Addition of vma_soft_dirty_enabled() has tinkered with the soft-dirty PTE
> bit status setting. The internal behavior has changed. The test case was
> shared by David
> (https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/).
> The explanation is as following:
>
> _Before_ addition of this patch(76aefad628aae),
> m = mmap(2 pages)
> clear_softdirty()
> mremap(m + pag_size)
> mprotect(READ)
> mprotect(READ | WRITE);
> memset(m)
> After memset(),
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag set set
> /proc//pagemap view set set
>
>
> _After_ addition of this patch(76aefad628aae)
> m = mmap(2 pages)
> clear_softdirty()
> mremap(m + page_size)
> mprotect(READ)
> mprotect(READ | WRITE);
> memset(m)
> After memset(),
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag *not set* set
> /proc//pagemap view set set
>
> The user's point of view hasn't changed. But internally after this patch,
> the soft-dirty tracking in PTEs gets turn off if VM_SOFTDIRTY is set. The
> soft-dirty tracking in the PTEs shouldn't be just turned off when mprotect
> is used. Why? Because soft-dirty tracking in the PTEs is always enabled
> regardless of VM_SOFTDIRTY is set or not. Example:
>
> m = mem(2 pages)
> At this point:
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag not set not set
> /proc//pagemap view set set
> memset(m)
> At this point:
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag set set
> /proc//pagemap view set set
>
> This example proves that soft-dirty flag on the PTE is set regardless of
> the VM_SOFTDIRTY.
IMHO this is not a proof good enough - it's a kernel internal detail, and
the userspace cannot detect it, right? Then it looks fine to not keep the
same behavior on the ptes I think. After all currently the soft-dirty is
designed as "taking either VM_SOFTDIRTY of pte soft-dirty as input of being
dirty". Nothing violates that.
Your approach introduced PAGEMAP_NO_REUSED_REGIONS but that special
information is not remembered in vma, IIUC that's why you find things
messed up. Fundamentally, it's because you're trying to reuse soft-dirty
design but it's not completely soft-dirty anymore.
That's also why I mentioned the other async uffd-wp approach because with
that there's no fiddling with vma flags (since it'll be always set as
pre-requisite), and this specific problem shouldn't exist either because
uffd-wp was originally designed to be pte-based as I mentioned, so we can't
grant write if pte is not checked.
Your below change will resolve your problem for now, but it's definitely
not wanted because it has a much broader impact on the whole system, for
example, on vma_wants_writenotify(). We may still have some paths using
default vm_page_prot (especially for file memories, not for the generic PF
path but some others) that will start to lose write bits where we used to
have them set. That's bad for performance because resolving each of them
needs one more page fault after the change as it mostly invalidated the
write bit in vm_page_prot.
You can also introduce yet another flag in the vma so you can detect which
vma has NEW soft-dirty enabled (your new approach) rather than the OLD
(which still relies on vma flags besides ptes) but that'll really be ugly
and making soft-dirty code unnecessarily complicated.
>
> The simplest hack to get rid this changed behavior and revert to the
> previous behaviour is as following:
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -860,6 +860,8 @@ static inline bool vma_soft_dirty_enabled(struct
> vm_area_struct *vma)
> if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> return false;
>
> + return true;
> +
> /*
> * Soft-dirty is kind of special: its tracking is enabled when the
> * vma flags not set.
> I was trying to verify this hack. But I couldn't previously until @Paul has
> mentioned this again. I've verified with limited tests that this hack
> in-deed works. We are unaware that does this hack create problems in other
> areas or not. We can think of better way to solve this. Once we get the
> comments from the community.
>
> This internal behavior change is affecting the new feature addition to the
> soft-dirty flag which is already delicate and has issues.
> (https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.com/)
>
> >
> > IMHO it depends on how we define "tracking enabled" - before clear_refs
> > even if no pages written they'll also be reported as dirty, then the
> > information is useless.
> >
> >> Only the soft-dirty bit status in individual page isn't significant if
> >> VM_SOFTDIRTY already is set. Right?
> >
> > Yes. But I'd say it makes more sense to say "tracking enabled" if we
> > really started tracking (by removing the write bits in ptes, otherwise we
> > did nothing). When vma created we didn't track anything.
> >
> > I don't know the rational of why soft-dirty was defined like that. I think
> > it's somehow related to the fact that we allow false positive dirty pages
> > not false negative. IOW, it's a bug to leak a page being dirtied, but not
> > vice versa if we report clean page dirty.
> >
>
> --
> BR,
> Muhammad Usama Anjum
>
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-20 16:03 ` Peter Xu
@ 2022-12-20 18:15 ` Muhammad Usama Anjum
2022-12-20 19:02 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-12-20 18:15 UTC (permalink / raw)
To: Peter Xu
Cc: Muhammad Usama Anjum, David Hildenbrand, Cyrill Gorcunov,
Andrew Morton, Paul Gofman, Nadav Amit, Andrea Arcangeli,
linux-kernel, linux-mm, kernel
Hi Peter,
Thank you for replying.
On 12/20/22 9:03 PM, Peter Xu wrote:
> On Mon, Dec 19, 2022 at 05:19:12PM +0500, Muhammad Usama Anjum wrote:
>> On 11/22/22 2:17 AM, Peter Xu wrote:
>>> On Mon, Nov 21, 2022 at 07:57:05PM +0500, Muhammad Usama Anjum wrote:
>>>> Hi Peter,
>>>>
>>>> Thank you so much for replying.
>>>>
>>>> On 11/19/22 4:14 AM, Peter Xu wrote:
>>>>> On Sat, Nov 19, 2022 at 01:16:26AM +0500, Muhammad Usama Anjum wrote:
>>>>>> Hi Peter and David,
>>>>>
>>>>> Hi, Muhammad,
>>>>>
>>>>>>
>>>>>> On 7/25/22 7:20 PM, Peter Xu wrote:
>>>>>>> The check wanted to make sure when soft-dirty tracking is enabled we won't
>>>>>>> grant write bit by accident, as a page fault is needed for dirty tracking.
>>>>>>> The intention is correct but we didn't check it right because VM_SOFTDIRTY
>>>>>>> set actually means soft-dirty tracking disabled. Fix it.
>>>>>> [...]
>>>>>>> +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
>>>>>>> +{
>>>>>>> + /*
>>>>>>> + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty
>>>>>>> + * enablements, because when without soft-dirty being compiled in,
>>>>>>> + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
>>>>>>> + * will be constantly true.
>>>>>>> + */
>>>>>>> + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>>>>>> + return false;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Soft-dirty is kind of special: its tracking is enabled when the
>>>>>>> + * vma flags not set.
>>>>>>> + */
>>>>>>> + return !(vma->vm_flags & VM_SOFTDIRTY);
>>>>>>> +}
>>>>>> I'm sorry. I'm unable to understand the inversion here.
>>>>>>> its tracking is enabled when the vma flags not set.
>>>>>> VM_SOFTDIRTY is set on the VMA when new VMA is allocated to mark is
>>>>>> soft-dirty. When we write to clear_refs to clear soft-dirty bit,
>>>>>> VM_SOFTDIRTY is cleared from the VMA as well. Then why do you say tracking
>>>>>> is enabled when the vma flags not set?
>>>>>
>>>>> Because only when 4>clear_refs happens would VM_SOFTDIRTY be cleared, and
>>>>> only until then the real tracking starts (by removing write bits on ptes).
>>>> But even if the VM_SOFTDIRTY is set on the VMA, the individual pages are
>>>> still marked soft-dirty. Both are independent.
>>>>
>>>> It means tracking is enabled all the time in individual pages.
>> Addition of vma_soft_dirty_enabled() has tinkered with the soft-dirty PTE
>> bit status setting. The internal behavior has changed. The test case was
>> shared by David
>> (https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/).
>> The explanation is as following:
>>
>> _Before_ addition of this patch(76aefad628aae),
>> m = mmap(2 pages)
>> clear_softdirty()
>> mremap(m + pag_size)
>> mprotect(READ)
>> mprotect(READ | WRITE);
>> memset(m)
>> After memset(),
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag set set
>> /proc//pagemap view set set
>>
>>
>> _After_ addition of this patch(76aefad628aae)
>> m = mmap(2 pages)
>> clear_softdirty()
>> mremap(m + page_size)
>> mprotect(READ)
>> mprotect(READ | WRITE);
>> memset(m)
>> After memset(),
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag *not set* set
>> /proc//pagemap view set set
>>
>> The user's point of view hasn't changed. But internally after this patch,
>> the soft-dirty tracking in PTEs gets turn off if VM_SOFTDIRTY is set. The
>> soft-dirty tracking in the PTEs shouldn't be just turned off when mprotect
>> is used. Why? Because soft-dirty tracking in the PTEs is always enabled
>> regardless of VM_SOFTDIRTY is set or not. Example:
>>
>> m = mem(2 pages)
>> At this point:
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag not set not set
>> /proc//pagemap view set set
>> memset(m)
>> At this point:
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag set set
>> /proc//pagemap view set set
>>
>> This example proves that soft-dirty flag on the PTE is set regardless of
>> the VM_SOFTDIRTY.
>
> IMHO this is not a proof good enough - it's a kernel internal detail, and
> the userspace cannot detect it, right? Then it looks fine to not keep the
> same behavior on the ptes I think. After all currently the soft-dirty is
> designed as "taking either VM_SOFTDIRTY of pte soft-dirty as input of being
> dirty". Nothing violates that.
Nothing has changed for the userspace. But when the default soft-dirty
feature always updates the soft-dirty flag in the PTEs regardless of
VM_SOFTDIRTY is set or not, why does other components of the mm stop caring
for soft-dirty flag in the PTE when VM_SOFTDIRTY is set?
>
> Your approach introduced PAGEMAP_NO_REUSED_REGIONS but that special
> information is not remembered in vma, IIUC that's why you find things
> messed up. Fundamentally, it's because you're trying to reuse soft-dirty
> design but it's not completely soft-dirty anymore.
Correct, that's why I'm trying to find a way to correct the soft-dirty
support instead of using anything else. We should try and correct it. I've
sent a RFC to track the soft-dirty flags for sub regions in the VMA.
>
> That's also why I mentioned the other async uffd-wp approach because with
> that there's no fiddling with vma flags (since it'll be always set as
> pre-requisite), and this specific problem shouldn't exist either because
> uffd-wp was originally designed to be pte-based as I mentioned, so we can't
> grant write if pte is not checked.
>
> Your below change will resolve your problem for now, but it's definitely
> not wanted because it has a much broader impact on the whole system, for
> example, on vma_wants_writenotify(). We may still have some paths using
> default vm_page_prot (especially for file memories, not for the generic PF
> path but some others) that will start to lose write bits where we used to
> have them set. That's bad for performance because resolving each of them
> needs one more page fault after the change as it mostly invalidated the
> write bit in vm_page_prot.
>
> You can also introduce yet another flag in the vma so you can detect which
> vma has NEW soft-dirty enabled (your new approach) rather than the OLD
> (which still relies on vma flags besides ptes) but that'll really be ugly
> and making soft-dirty code unnecessarily complicated.
>
>>
>> The simplest hack to get rid this changed behavior and revert to the
>> previous behaviour is as following:
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -860,6 +860,8 @@ static inline bool vma_soft_dirty_enabled(struct
>> vm_area_struct *vma)
>> if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>> return false;
>>
>> + return true;
>> +
>> /*
>> * Soft-dirty is kind of special: its tracking is enabled when the
>> * vma flags not set.
>> I was trying to verify this hack. But I couldn't previously until @Paul has
>> mentioned this again. I've verified with limited tests that this hack
>> in-deed works. We are unaware that does this hack create problems in other
>> areas or not. We can think of better way to solve this. Once we get the
>> comments from the community.
>>
>> This internal behavior change is affecting the new feature addition to the
>> soft-dirty flag which is already delicate and has issues.
>> (https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.com/)
>>
>>>
>>> IMHO it depends on how we define "tracking enabled" - before clear_refs
>>> even if no pages written they'll also be reported as dirty, then the
>>> information is useless.
>>>
>>>> Only the soft-dirty bit status in individual page isn't significant if
>>>> VM_SOFTDIRTY already is set. Right?
>>>
>>> Yes. But I'd say it makes more sense to say "tracking enabled" if we
>>> really started tracking (by removing the write bits in ptes, otherwise we
>>> did nothing). When vma created we didn't track anything.
>>>
>>> I don't know the rational of why soft-dirty was defined like that. I think
>>> it's somehow related to the fact that we allow false positive dirty pages
>>> not false negative. IOW, it's a bug to leak a page being dirtied, but not
>>> vice versa if we report clean page dirty.
>>>
>>
>> --
>> BR,
>> Muhammad Usama Anjum
>>
>
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-20 18:15 ` Muhammad Usama Anjum
@ 2022-12-20 19:02 ` Peter Xu
2022-12-21 8:17 ` Muhammad Usama Anjum
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2022-12-20 19:02 UTC (permalink / raw)
To: Muhammad Usama Anjum
Cc: David Hildenbrand, Cyrill Gorcunov, Andrew Morton, Paul Gofman,
Nadav Amit, Andrea Arcangeli, linux-kernel, linux-mm, kernel
On Tue, Dec 20, 2022 at 11:15:00PM +0500, Muhammad Usama Anjum wrote:
> Hi Peter,
Hi, Muhammad,
[...]
> Nothing has changed for the userspace. But when the default soft-dirty
> feature always updates the soft-dirty flag in the PTEs regardless of
> VM_SOFTDIRTY is set or not
But it was not? I don't see why the pte flags must be considered at all if
VM_SOFTDIRTY is set in existing code base, or before this patch.
> why does other components of the mm stop caring for soft-dirty flag in
> the PTE when VM_SOFTDIRTY is set?
>
> >
> > Your approach introduced PAGEMAP_NO_REUSED_REGIONS but that special
> > information is not remembered in vma, IIUC that's why you find things
> > messed up. Fundamentally, it's because you're trying to reuse soft-dirty
> > design but it's not completely soft-dirty anymore.
> Correct, that's why I'm trying to find a way to correct the soft-dirty
> support instead of using anything else. We should try and correct it. I've
> sent a RFC to track the soft-dirty flags for sub regions in the VMA.
Note that I'm not against the change if that's servicing the purpose of the
enhancement you're proposing. But again I wouldn't use "correct" as the
word here because I still didn't see anything wrong with the old code.
so IMHO the extra complexity on handling VM_SOFTDIRTY (even if to drop it
and replace with other structures to maintain ranged soft-dirty) needs to
be justified along with the feature introduced, not be justified as a fix.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-20 19:02 ` Peter Xu
@ 2022-12-21 8:17 ` Muhammad Usama Anjum
0 siblings, 0 replies; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-12-21 8:17 UTC (permalink / raw)
To: Peter Xu
Cc: Muhammad Usama Anjum, David Hildenbrand, Cyrill Gorcunov,
Andrew Morton, Paul Gofman, Nadav Amit, Andrea Arcangeli,
linux-kernel, linux-mm, kernel
On 12/21/22 12:02 AM, Peter Xu wrote:
> On Tue, Dec 20, 2022 at 11:15:00PM +0500, Muhammad Usama Anjum wrote:
>> Hi Peter,
>
> Hi, Muhammad,
>
> [...]
>
>> Nothing has changed for the userspace. But when the default soft-dirty
>> feature always updates the soft-dirty flag in the PTEs regardless of
>> VM_SOFTDIRTY is set or not
>
> But it was not? I don't see why the pte flags must be considered at all if
> VM_SOFTDIRTY is set in existing code base, or before this patch.
I completely agree that setting pte flags isn't needed if VM_SOFTDIRTY is
set. It is wasted effort. Before this patch, the effort was being wasted in
the default part of the feature and in the other related components as
well. After this patch, the effort is being wasted only in the default part
of the feature and other related components have stopped doing this wasted
effort which is a good thing. But now there is discrepancy that one part of
code keeps on tracking pte while other has moved on/improved.
>
>> why does other components of the mm stop caring for soft-dirty flag in
>> the PTE when VM_SOFTDIRTY is set?
>>
>>>
>>> Your approach introduced PAGEMAP_NO_REUSED_REGIONS but that special
>>> information is not remembered in vma, IIUC that's why you find things
>>> messed up. Fundamentally, it's because you're trying to reuse soft-dirty
>>> design but it's not completely soft-dirty anymore.
>> Correct, that's why I'm trying to find a way to correct the soft-dirty
>> support instead of using anything else. We should try and correct it. I've
>> sent a RFC to track the soft-dirty flags for sub regions in the VMA.
>
> Note that I'm not against the change if that's servicing the purpose of the
> enhancement you're proposing. But again I wouldn't use "correct" as the
> word here because I still didn't see anything wrong with the old code.
>
> so IMHO the extra complexity on handling VM_SOFTDIRTY (even if to drop it
> and replace with other structures to maintain ranged soft-dirty) needs to
> be justified along with the feature introduced, not be justified as a fix.
The question of tracking re-used regions should be answered by the original
developers of the feature. I've been trying to get comments from them. But
I've not got any. Maybe some conference talk can work where the
maintainers/developers are present? Or I'll summarize the whole problem and
ask Andrew directly.
>
> Thanks,
>
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-19 12:19 ` Muhammad Usama Anjum
2022-12-20 16:03 ` Peter Xu
@ 2022-12-28 14:14 ` Muhammad Usama Anjum
2023-01-02 12:29 ` David Hildenbrand
1 sibling, 1 reply; 16+ messages in thread
From: Muhammad Usama Anjum @ 2022-12-28 14:14 UTC (permalink / raw)
To: Cyrill Gorcunov, Andrew Morton
Cc: Muhammad Usama Anjum, Nadav Amit, Andrea Arcangeli, linux-kernel,
linux-mm, kernel, Peter Xu, David Hildenbrand, Paul Gofman
On 12/19/22 5:19 PM, Muhammad Usama Anjum wrote:
> Addition of vma_soft_dirty_enabled() has tinkered with the soft-dirty PTE
> bit status setting. The internal behavior has changed. The test case was
> shared by David
> (https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/).
> The explanation is as following:
>
> _Before_ addition of this patch(76aefad628aae),
> m = mmap(2 pages)
> clear_softdirty()
> mremap(m + pag_size)
> mprotect(READ)
> mprotect(READ | WRITE);
> memset(m)
> After memset(),
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag set set
> /proc//pagemap view set set
>
>
> _After_ addition of this patch(76aefad628aae)
> m = mmap(2 pages)
> clear_softdirty()
> mremap(m + page_size)
> mprotect(READ)
> mprotect(READ | WRITE);
> memset(m)
> After memset(),
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag *not set* set
> /proc//pagemap view set set
>
> The user's point of view hasn't changed. But internally after this patch,
> the soft-dirty tracking in PTEs gets turn off if VM_SOFTDIRTY is set. The
> soft-dirty tracking in the PTEs shouldn't be just turned off when mprotect
> is used. Why? Because soft-dirty tracking in the PTEs is always enabled
> regardless of VM_SOFTDIRTY is set or not. Example:
>
> m = mem(2 pages)
> At this point:
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag not set not set
> /proc//pagemap view set set
> memset(m)
> At this point:
> PAGE-1 PAGE-2
> VM_SOFTDIRTY set set
> PTE softdirty flag set set
> /proc//pagemap view set set
>
> This example proves that soft-dirty flag on the PTE is set regardless of
> the VM_SOFTDIRTY.
Hi Andrew and Cyrill,
Peter doesn't agree with me here that this change in behavior should be
reverted etc. Please comment.
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()
2022-12-28 14:14 ` Muhammad Usama Anjum
@ 2023-01-02 12:29 ` David Hildenbrand
0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2023-01-02 12:29 UTC (permalink / raw)
To: Muhammad Usama Anjum, Cyrill Gorcunov, Andrew Morton
Cc: Nadav Amit, Andrea Arcangeli, linux-kernel, linux-mm, kernel,
Peter Xu, Paul Gofman
On 28.12.22 15:14, Muhammad Usama Anjum wrote:
> On 12/19/22 5:19 PM, Muhammad Usama Anjum wrote:
>> Addition of vma_soft_dirty_enabled() has tinkered with the soft-dirty PTE
>> bit status setting. The internal behavior has changed. The test case was
>> shared by David
>> (https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/).
>> The explanation is as following:
>>
>> _Before_ addition of this patch(76aefad628aae),
>> m = mmap(2 pages)
>> clear_softdirty()
>> mremap(m + pag_size)
>> mprotect(READ)
>> mprotect(READ | WRITE);
>> memset(m)
>> After memset(),
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag set set
>> /proc//pagemap view set set
>>
>>
>> _After_ addition of this patch(76aefad628aae)
>> m = mmap(2 pages)
>> clear_softdirty()
>> mremap(m + page_size)
>> mprotect(READ)
>> mprotect(READ | WRITE);
>> memset(m)
>> After memset(),
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag *not set* set
>> /proc//pagemap view set set
>>
>> The user's point of view hasn't changed. But internally after this patch,
>> the soft-dirty tracking in PTEs gets turn off if VM_SOFTDIRTY is set. The
>> soft-dirty tracking in the PTEs shouldn't be just turned off when mprotect
>> is used. Why? Because soft-dirty tracking in the PTEs is always enabled
>> regardless of VM_SOFTDIRTY is set or not. Example:
>>
>> m = mem(2 pages)
>> At this point:
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag not set not set
>> /proc//pagemap view set set
>> memset(m)
>> At this point:
>> PAGE-1 PAGE-2
>> VM_SOFTDIRTY set set
>> PTE softdirty flag set set
>> /proc//pagemap view set set
>>
>> This example proves that soft-dirty flag on the PTE is set regardless of
>> the VM_SOFTDIRTY.
>
> Hi Andrew and Cyrill,
>
> Peter doesn't agree with me here that this change in behavior should be
> reverted etc. Please comment.
For the records, I agree with Peter: As 76aefad628aa ("mm/mprotect: fix
soft-dirty check in can_change_pte_writable()") documents, this patch
fixed real problems.
/proc/pagemap works as expected right now such that we don't have an
under-indication. Internal representation is an implementation detail.
Whatever we do, there must not be an under-indication of softdirty. That
is the ABI guaranteed (especially for anonymous memory). "No
over-indication" was never the ABI guarantee.
For your use case, you want to reduce over-indication. I suggested
looked into alternatives.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-01-02 12:29 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-25 14:20 [PATCH v4 0/3] mm/mprotect: Fix soft-dirty checks Peter Xu
2022-07-25 14:20 ` [PATCH v4 1/3] mm/mprotect: Fix soft-dirty check in can_change_pte_writable() Peter Xu
2022-11-18 20:16 ` Muhammad Usama Anjum
2022-11-18 23:14 ` Peter Xu
2022-11-21 14:57 ` Muhammad Usama Anjum
2022-11-21 21:17 ` Peter Xu
2022-12-19 12:19 ` Muhammad Usama Anjum
2022-12-20 16:03 ` Peter Xu
2022-12-20 18:15 ` Muhammad Usama Anjum
2022-12-20 19:02 ` Peter Xu
2022-12-21 8:17 ` Muhammad Usama Anjum
2022-12-28 14:14 ` Muhammad Usama Anjum
2023-01-02 12:29 ` David Hildenbrand
2022-07-25 14:20 ` [PATCH v4 2/3] selftests: soft-dirty: Add test for mprotect Peter Xu
2022-07-25 14:28 ` David Hildenbrand
2022-07-25 14:20 ` [PATCH v4 3/3] selftests: Add soft-dirty into run_vmtests.sh Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).