linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests
@ 2025-06-16 16:06 Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues Aboorva Devarajan
                   ` (6 more replies)
  0 siblings, 7 replies; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

This patch series fixes some of the false positives in generic
mm selftests and skips tests that cannot run correctly due to
missing features or system limitations.

Please let us know if you have any feedback.

Thanks,  
Aboorva

Aboorva Devarajan (2):
  selftests/mm: Fix child process exit codes in KSM tests
  selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no
    1G pages

Donet Tom (4):
  mm/selftests: Fix virtual_address_range test issues.
  selftest/mm: Fix ksm_funtional_test failures
  selftests/mm : fix test_prctl_fork_exec failure
  mm/selftests: Fix split_huge_page_test failure on systems with 64KB
    page size

 .../selftests/mm/ksm_functional_tests.c       | 24 +++++++++++++------
 .../selftests/mm/split_huge_page_test.c       | 23 ++++++++++++++----
 tools/testing/selftests/mm/thuge-gen.c        | 11 +++++----
 .../selftests/mm/virtual_address_range.c      | 14 +++--------
 4 files changed, 45 insertions(+), 27 deletions(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 16:27   ` Dev Jain
  2025-06-16 16:06 ` [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures Aboorva Devarajan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

From: Donet Tom <donettom@linux.ibm.com>

In this patch, we are fixing three issues in the virtual_address_range
test.

1. validate_addr() checks if the allocated address is within the range.
In the current implementation, if addr is greater than HIGH_ADDR_MARK,
the test fails. However, addr will be greater than HIGH_ADDR_MARK if
high_addr is set. Therefore, if high_addr is set, we should not check
the (addr > HIGH_ADDR_MARK) condition.

2.In main(), the high address is stored in hptr, but for mark_range(),
the address passed is ptr, not hptr. Fixed this by changing ptr[i] to
hptr[i] in mark_range() function call.

3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
The gap between the first high address mapping and the previous mapping
is not smaller than MAP_CHUNK_SIZE.

$cat /proc/3713/maps
10000000-10010000 r-xp 00000000 fd:00 36140094
10010000-10020000 r--p 00000000 fd:00 36140094
10020000-10030000 rw-p 00010000 fd:00 36140094
4ee80000-4eeb0000 rw-p 00000000 00:00 0
578f0000-57c00000 rw-p 00000000 00:00 0
57c00000-7fff97c00000 r--p 00000000 00:00 0
7fff97c00000-7fff97e20000 r-xp 00000000 fd:00 33558923
7fff97e20000-7fff97e30000 r--p 00220000 fd:00 33558923
7fff97e30000-7fff97e40000 rw-p 00230000 fd:00 33558923
7fff97f40000-7fff98020000 r-xp 00000000 fd:00 33558924
7fff98020000-7fff98030000 r--p 000d0000 fd:00 33558924
7fff98030000-7fff98040000 rw-p 000e0000 fd:00 33558924
7fff98050000-7fff98090000 r--p 00000000 00:00 0
7fff98090000-7fff980a0000 r-xp 00000000 00:00 0
7fff980a0000-7fff980f0000 r-xp 00000000 fd:00 2634
7fff980f0000-7fff98100000 r--p 00040000 fd:00 2634
7fff98100000-7fff98110000 rw-p 00050000 fd:00 2634
7fffcf8a0000-7fffcf9b0000 rw-p 00000000 00:00 0
1000000000000-1000040000000 r--p 00000000 00:00 0   --> High Addr
2000000000000-2000040000000 r--p 00000000 00:00 0
4000000000000-4000040000000 r--p 00000000 00:00 0
8000000000000-8000040000000 r--p 00000000 00:00 0
e800098110000-fffff98110000 r--p 00000000 00:00 0
$

In this patch, the condition that checks for gaps smaller than MAP_CHUNK_SIZE has been removed.

Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output")
Fixes: b2a79f62133a ("selftests/mm: virtual_address_range: unmap chunks after validation")
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 tools/testing/selftests/mm/virtual_address_range.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index b380e102b22f..606e601a8984 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -80,7 +80,7 @@ static void validate_addr(char *ptr, int high_addr)
 	if (high_addr && addr < HIGH_ADDR_MARK)
 		ksft_exit_fail_msg("Bad address %lx\n", addr);
 
-	if (addr > HIGH_ADDR_MARK)
+	if (!high_addr && addr > HIGH_ADDR_MARK)
 		ksft_exit_fail_msg("Bad address %lx\n", addr);
 }
 
@@ -117,7 +117,7 @@ static int validate_lower_address_hint(void)
 
 static int validate_complete_va_space(void)
 {
-	unsigned long start_addr, end_addr, prev_end_addr;
+	unsigned long start_addr, end_addr;
 	char line[400];
 	char prot[6];
 	FILE *file;
@@ -134,7 +134,6 @@ static int validate_complete_va_space(void)
 	if (file == NULL)
 		ksft_exit_fail_msg("cannot open /proc/self/maps\n");
 
-	prev_end_addr = 0;
 	while (fgets(line, sizeof(line), file)) {
 		const char *vma_name = NULL;
 		int vma_name_start = 0;
@@ -151,12 +150,6 @@ static int validate_complete_va_space(void)
 		if (start_addr & (1UL << 63))
 			return 0;
 
-		/* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
-		if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
-			return 1;
-
-		prev_end_addr = end_addr;
-
 		if (prot[0] != 'r')
 			continue;
 
@@ -223,8 +216,7 @@ int main(int argc, char *argv[])
 
 		if (hptr[i] == MAP_FAILED)
 			break;
-
-		mark_range(ptr[i], MAP_CHUNK_SIZE);
+		mark_range(hptr[i], MAP_CHUNK_SIZE);
 		validate_addr(hptr[i], 1);
 	}
 	hchunks = i;
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 17:04   ` Liam R. Howlett
  2025-06-16 16:06 ` [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure Aboorva Devarajan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

From: Donet Tom <donettom@linux.ibm.com>

This patch fixed 2 issues.

1)After fork() in test_prctl_fork, the child process uses the file
descriptors from the parent process to read ksm_stat and
ksm_merging_pages. This results in incorrect values being read (parent
process ksm_stat and ksm_merge_pages will be read in child), causing
the test to fail.

This patch calls init_global_file_handles() in the child process to
ensure that the current process's file descriptors are used to read
ksm_stat and ksm_merging_pages.

2) All tests currently call ksm_merge to trigger page merging.
To ensure the system remains in a consistent state for subsequent
tests, it is better to call ksm_unmerge during the test cleanup phase

In the test_prctl_fork test, after a fork(), reading ksm_merging_pages
in the child process returns a non-zero value because a previous test
performed a merge, and the child's memory state is inherited from the
parent.

Although the child process calls ksm_unmerge, the ksm_merging_pages
counter in the parent is reset to zero, while the child's counter
remains unchanged. This discrepancy causes the test to fail.

To avoid this issue, each test should call ksm_unmerge during cleanup
to ensure the counter is reset and the system is in a clean state for
subsequent tests.

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 tools/testing/selftests/mm/ksm_functional_tests.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index b61803e36d1c..d7d3c22c077a 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd;
 static int pagemap_fd;
 static size_t pagesize;
 
+static void init_global_file_handles(void);
+
 static bool range_maps_duplicates(char *addr, unsigned long size)
 {
 	unsigned long offs_a, offs_b, pfn_a, pfn_b;
@@ -274,6 +276,7 @@ static void test_unmerge(void)
 	ksft_test_result(!range_maps_duplicates(map, size),
 			 "Pages were unmerged\n");
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 
@@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void)
 	ksft_test_result(!range_maps_duplicates(map, size),
 			"KSM zero pages were unmerged\n");
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 
@@ -366,6 +370,7 @@ static void test_unmerge_discarded(void)
 	ksft_test_result(!range_maps_duplicates(map, size),
 			 "Pages were unmerged\n");
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 
@@ -428,6 +433,7 @@ static void test_unmerge_uffd_wp(void)
 close_uffd:
 	close(uffd);
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 #endif
@@ -491,6 +497,7 @@ static int test_child_ksm(void)
 	else if (map == MAP_MERGE_SKIP)
 		return -3;
 
+	ksm_unmerge();
 	munmap(map, size);
 	return 0;
 }
@@ -524,6 +531,7 @@ static void test_prctl_fork(void)
 
 	child_pid = fork();
 	if (!child_pid) {
+		init_global_file_handles();
 		exit(test_child_ksm());
 	} else if (child_pid < 0) {
 		ksft_test_result_fail("fork() failed\n");
@@ -620,6 +628,7 @@ static void test_prctl_unmerge(void)
 	ksft_test_result(!range_maps_duplicates(map, size),
 			 "Pages were unmerged\n");
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 
@@ -653,6 +662,7 @@ static void test_prot_none(void)
 	ksft_test_result(!range_maps_duplicates(map, size),
 			 "Pages were unmerged\n");
 unmap:
+	ksm_unmerge();
 	munmap(map, size);
 }
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 16:28   ` Dev Jain
  2025-06-16 16:06 ` [PATCH 4/6] mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size Aboorva Devarajan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

From: Donet Tom <donettom@linux.ibm.com>

execv argument is an array of pointers to null-terminated strings.
In this patch we added NULL in the execv argument to fix the test
failure.

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 tools/testing/selftests/mm/ksm_functional_tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index d7d3c22c077a..6ea50272a0ba 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -579,7 +579,7 @@ static void test_prctl_fork_exec(void)
 		return;
 	} else if (child_pid == 0) {
 		char *prg_name = "./ksm_functional_tests";
-		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME };
+		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
 
 		execv(prg_name, argv_for_program);
 		return;
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/6] mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
                   ` (2 preceding siblings ...)
  2025-06-16 16:06 ` [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 5/6] selftests/mm: Fix child process exit codes in KSM tests Aboorva Devarajan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

From: Donet Tom <donettom@linux.ibm.com>

The split_huge_page_test fails on systems with a 64KB base page size.
This is because the order of a 2MB huge page is different:

On 64KB systems, the order is 5.

On 4KB systems, it's 9.

The test currently assumes a maximum huge page order of 9, which is only
valid for 4KB base page systems. On systems with 64KB pages, attempting
to split huge pages beyond their actual order (5) causes the test to fail.

In this patch, we calculate the huge page order based on the system's base
page size. With this change, the tests now run successfully on both 64KB
and 4KB page size systems.

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 .../selftests/mm/split_huge_page_test.c       | 23 +++++++++++++++----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index aa7400ed0e99..16f3e5b9ce6d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -514,6 +514,15 @@ void split_thp_in_pagecache_to_order_at(size_t fd_size, const char *fs_loc,
 	}
 }
 
+static unsigned int get_order(unsigned int pages)
+{
+	unsigned int order = 0;
+
+	while ((1U << order) < pages)
+		order++;
+	return order;
+}
+
 int main(int argc, char **argv)
 {
 	int i;
@@ -523,6 +532,7 @@ int main(int argc, char **argv)
 	const char *fs_loc;
 	bool created_tmp;
 	int offset;
+	unsigned int max_order;
 
 	ksft_print_header();
 
@@ -534,11 +544,14 @@ int main(int argc, char **argv)
 	if (argc > 1)
 		optional_xfs_path = argv[1];
 
-	ksft_set_plan(1+8+1+9+9+8*4+2);
 
 	pagesize = getpagesize();
 	pageshift = ffs(pagesize) - 1;
 	pmd_pagesize = read_pmd_pagesize();
+	max_order = get_order(pmd_pagesize/pagesize);
+
+	ksft_set_plan(1+(max_order-1)+1+max_order+max_order+(max_order-1)*4+2);
+
 	if (!pmd_pagesize)
 		ksft_exit_fail_msg("Reading PMD pagesize failed\n");
 
@@ -546,20 +559,20 @@ int main(int argc, char **argv)
 
 	split_pmd_zero_pages();
 
-	for (i = 0; i < 9; i++)
+	for (i = 0; i < max_order; i++)
 		if (i != 1)
 			split_pmd_thp_to_order(i);
 
 	split_pte_mapped_thp();
-	for (i = 0; i < 9; i++)
+	for (i = 0; i < max_order; i++)
 		split_file_backed_thp(i);
 
 	created_tmp = prepare_thp_fs(optional_xfs_path, fs_loc_template,
 			&fs_loc);
-	for (i = 8; i >= 0; i--)
+	for (i = (max_order-1); i >= 0; i--)
 		split_thp_in_pagecache_to_order_at(fd_size, fs_loc, i, -1);
 
-	for (i = 0; i < 9; i++)
+	for (i = 0; i < max_order; i++)
 		for (offset = 0;
 		     offset < pmd_pagesize / pagesize;
 		     offset += MAX(pmd_pagesize / pagesize / 4, 1 << i))
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 5/6] selftests/mm: Fix child process exit codes in KSM tests
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
                   ` (3 preceding siblings ...)
  2025-06-16 16:06 ` [PATCH 4/6] mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 16:06 ` [PATCH 6/6] selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no 1G pages Aboorva Devarajan
  2025-06-16 16:11 ` [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Lorenzo Stoakes
  6 siblings, 0 replies; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

In the KSM functional tests, test_child_ksm() returned negative values
to indicate errors. However, when passed to exit(), these were
interpreted as large unsigned values (eg, -2 became 254), leading to
incorrect handling in the parent process. As a result, some tests
appeared to be skipped or silently failed.

This patch changes test_child_ksm() to return positive error codes
(1, 2, 3) and updates test_child_ksm_err() to interpret them correctly.
This ensures the parent accurately detects and reports child process
failures.

Before patch:

- [RUN] test_unmerge
ok 1 Pages were unmerged
...
- [RUN] test_prctl_fork
- No pages got merged
- [RUN] test_prctl_fork_exec
ok 7 PR_SET_MEMORY_MERGE value is inherited
...
Bail out! 1 out of 8 tests failed
- Planned tests != run tests (9 != 8)
- Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0

After patch:

- [RUN] test_unmerge
ok 1 Pages were unmerged
...
- [RUN] test_prctl_fork
- No pages got merged
not ok 7 Merge in child failed
- [RUN] test_prctl_fork_exec
ok 8 PR_SET_MEMORY_MERGE value is inherited
...
Bail out! 2 out of 9 tests failed
- Totals: pass:7 fail:2 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 tools/testing/selftests/mm/ksm_functional_tests.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index 6ea50272a0ba..230c21c72f3e 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -488,14 +488,14 @@ static int test_child_ksm(void)
 
 	/* Test if KSM is enabled for the process. */
 	if (prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0) != 1)
-		return -1;
+		return 1;
 
 	/* Test if merge could really happen. */
 	map = __mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, KSM_MERGE_NONE);
 	if (map == MAP_MERGE_FAIL)
-		return -2;
+		return 2;
 	else if (map == MAP_MERGE_SKIP)
-		return -3;
+		return 3;
 
 	ksm_unmerge();
 	munmap(map, size);
@@ -504,11 +504,11 @@ static int test_child_ksm(void)
 
 static void test_child_ksm_err(int status)
 {
-	if (status == -1)
+	if (status == 1)
 		ksft_test_result_fail("unexpected PR_GET_MEMORY_MERGE result in child\n");
-	else if (status == -2)
+	else if (status == 2)
 		ksft_test_result_fail("Merge in child failed\n");
-	else if (status == -3)
+	else if (status == 3)
 		ksft_test_result_skip("Merge in child skipped\n");
 }
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 6/6] selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no 1G pages
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
                   ` (4 preceding siblings ...)
  2025-06-16 16:06 ` [PATCH 5/6] selftests/mm: Fix child process exit codes in KSM tests Aboorva Devarajan
@ 2025-06-16 16:06 ` Aboorva Devarajan
  2025-06-16 16:11 ` [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Lorenzo Stoakes
  6 siblings, 0 replies; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-16 16:06 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, shuah, pfalcato, david, ziy,
	baolin.wang, npache, ryan.roberts, dev.jain, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list,
	aboorvad

Make thuge-gen skip instead of fail when it can't run due to system
settings. If shmmax is too small or no 1G huge pages are available,
the test now prints a warning and is marked as skipped.

Before Patch:
-------------------
~ running ./thuge-gen
-------------------
~ TAP version 13
~ Bail out! Please do echo 262144 > /proc/sys/kernel/shmmax
~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
~ [FAIL]
not ok 28 thuge-gen ~ exit=1

After Patch:
-------------------
~ running ./thuge-gen
-------------------
~ TAP version 13
~ ~ WARNING: shmmax is too small to run this test.
~ ~ Please run the following command to increase shmmax:
~ ~ echo 262144 > /proc/sys/kernel/shmmax
~ 1..0 ~ SKIP Test skipped due to insufficient shmmax value.
~ [SKIP]
ok 29 thuge-gen ~ SKIP

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 tools/testing/selftests/mm/thuge-gen.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c
index 95b6f043a3cb..cc302a29d485 100644
--- a/tools/testing/selftests/mm/thuge-gen.c
+++ b/tools/testing/selftests/mm/thuge-gen.c
@@ -195,13 +195,16 @@ void find_pagesizes(void)
 	}
 	globfree(&g);
 
-	if (thuge_read_sysfs(0, "/proc/sys/kernel/shmmax") < NUM_PAGES * largest)
-		ksft_exit_fail_msg("Please do echo %lu > /proc/sys/kernel/shmmax",
-				   largest * NUM_PAGES);
+	if (thuge_read_sysfs(0, "/proc/sys/kernel/shmmax") < NUM_PAGES * largest) {
+		ksft_print_msg("WARNING: shmmax is too small to run this test.\n");
+		ksft_print_msg("Please run the following command to increase shmmax:\n");
+		ksft_print_msg("echo %lu > /proc/sys/kernel/shmmax\n", largest * NUM_PAGES);
+		ksft_exit_skip("Test skipped due to insufficient shmmax value.\n");
+	}
 
 #if defined(__x86_64__)
 	if (largest != 1U<<30) {
-		ksft_exit_fail_msg("No GB pages available on x86-64\n"
+		ksft_exit_skip("No GB pages available on x86-64\n"
 				   "Please boot with hugepagesz=1G hugepages=%d\n", NUM_PAGES);
 	}
 #endif
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests
  2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
                   ` (5 preceding siblings ...)
  2025-06-16 16:06 ` [PATCH 6/6] selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no 1G pages Aboorva Devarajan
@ 2025-06-16 16:11 ` Lorenzo Stoakes
  2025-06-17  7:53   ` Aboorva Devarajan
  6 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-16 16:11 UTC (permalink / raw)
  To: Aboorva Devarajan
  Cc: akpm, Liam.Howlett, shuah, pfalcato, david, ziy, baolin.wang,
	npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kselftest,
	linux-kernel, donettom, ritesh.list

Hi Aboorva,

It's (highly!) forgivable as things have changed fairly recently, but the
tippy tip of mm development is now done in the mm-new branch, against which
this series doesn't apply.

I thought this might be the case, as there is a series I implemented there
that changes the split huge page tests (possibly affecting more), hence me
checking!

So you'll need to rebase against mm-new, unfortunately!

Thanks, Lorenzo

On Mon, Jun 16, 2025 at 09:36:26PM +0530, Aboorva Devarajan wrote:
> This patch series fixes some of the false positives in generic
> mm selftests and skips tests that cannot run correctly due to
> missing features or system limitations.
>
> Please let us know if you have any feedback.
>
> Thanks,
> Aboorva
>
> Aboorva Devarajan (2):
>   selftests/mm: Fix child process exit codes in KSM tests
>   selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no
>     1G pages
>
> Donet Tom (4):
>   mm/selftests: Fix virtual_address_range test issues.
>   selftest/mm: Fix ksm_funtional_test failures
>   selftests/mm : fix test_prctl_fork_exec failure
>   mm/selftests: Fix split_huge_page_test failure on systems with 64KB
>     page size
>
>  .../selftests/mm/ksm_functional_tests.c       | 24 +++++++++++++------
>  .../selftests/mm/split_huge_page_test.c       | 23 ++++++++++++++----
>  tools/testing/selftests/mm/thuge-gen.c        | 11 +++++----
>  .../selftests/mm/virtual_address_range.c      | 14 +++--------
>  4 files changed, 45 insertions(+), 27 deletions(-)
>
> --
> 2.43.5
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-16 16:06 ` [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues Aboorva Devarajan
@ 2025-06-16 16:27   ` Dev Jain
  2025-06-18 10:06     ` Donet Tom
  2025-06-18 11:22     ` Lorenzo Stoakes
  0 siblings, 2 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-16 16:27 UTC (permalink / raw)
  To: Aboorva Devarajan, akpm, Liam.Howlett, lorenzo.stoakes, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list


On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
> From: Donet Tom <donettom@linux.ibm.com>
>
> In this patch, we are fixing three issues in the virtual_address_range
> test.
>
> 1. validate_addr() checks if the allocated address is within the range.
> In the current implementation, if addr is greater than HIGH_ADDR_MARK,
> the test fails. However, addr will be greater than HIGH_ADDR_MARK if
> high_addr is set. Therefore, if high_addr is set, we should not check
> the (addr > HIGH_ADDR_MARK) condition.
>
> 2.In main(), the high address is stored in hptr, but for mark_range(),
> the address passed is ptr, not hptr. Fixed this by changing ptr[i] to
> hptr[i] in mark_range() function call.
>
> 3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
> The gap between the first high address mapping and the previous mapping
> is not smaller than MAP_CHUNK_SIZE.

For this, can't we just elide the check when we cross the high boundary?
As I see it you are essentially nullifying the purpose of validate_complete_va_space;
I had written that function so as to have an alternate way of checking VA exhaustion
without relying on mmap correctness in a circular way.

Btw @Lorenzo, validate_complete_va_space was written by me as my first patch ever for
the Linux kernel : ) from the limited knowledge I have of VMA stuff, I guess the
only requirement for VMA alignment is PAGE_SIZE in this test, therefore, the only
check required is that the gap between two VMAs should be at least MAP_CHUNK_SIZE?
Or can such a gap still exist even when the VA has been exhausted?

>
> $cat /proc/3713/maps
> 10000000-10010000 r-xp 00000000 fd:00 36140094
> 10010000-10020000 r--p 00000000 fd:00 36140094
> 10020000-10030000 rw-p 00010000 fd:00 36140094
> 4ee80000-4eeb0000 rw-p 00000000 00:00 0
> 578f0000-57c00000 rw-p 00000000 00:00 0
> 57c00000-7fff97c00000 r--p 00000000 00:00 0
> 7fff97c00000-7fff97e20000 r-xp 00000000 fd:00 33558923
> 7fff97e20000-7fff97e30000 r--p 00220000 fd:00 33558923
> 7fff97e30000-7fff97e40000 rw-p 00230000 fd:00 33558923
> 7fff97f40000-7fff98020000 r-xp 00000000 fd:00 33558924
> 7fff98020000-7fff98030000 r--p 000d0000 fd:00 33558924
> 7fff98030000-7fff98040000 rw-p 000e0000 fd:00 33558924
> 7fff98050000-7fff98090000 r--p 00000000 00:00 0
> 7fff98090000-7fff980a0000 r-xp 00000000 00:00 0
> 7fff980a0000-7fff980f0000 r-xp 00000000 fd:00 2634
> 7fff980f0000-7fff98100000 r--p 00040000 fd:00 2634
> 7fff98100000-7fff98110000 rw-p 00050000 fd:00 2634
> 7fffcf8a0000-7fffcf9b0000 rw-p 00000000 00:00 0
> 1000000000000-1000040000000 r--p 00000000 00:00 0   --> High Addr
> 2000000000000-2000040000000 r--p 00000000 00:00 0
> 4000000000000-4000040000000 r--p 00000000 00:00 0
> 8000000000000-8000040000000 r--p 00000000 00:00 0
> e800098110000-fffff98110000 r--p 00000000 00:00 0
> $
>
> In this patch, the condition that checks for gaps smaller than MAP_CHUNK_SIZE has been removed.
>
> Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output")
> Fixes: b2a79f62133a ("selftests/mm: virtual_address_range: unmap chunks after validation")
> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>   tools/testing/selftests/mm/virtual_address_range.c | 14 +++-----------
>   1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index b380e102b22f..606e601a8984 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -80,7 +80,7 @@ static void validate_addr(char *ptr, int high_addr)
>   	if (high_addr && addr < HIGH_ADDR_MARK)
>   		ksft_exit_fail_msg("Bad address %lx\n", addr);
>   
> -	if (addr > HIGH_ADDR_MARK)
> +	if (!high_addr && addr > HIGH_ADDR_MARK)
>   		ksft_exit_fail_msg("Bad address %lx\n", addr);
>   }
>   
> @@ -117,7 +117,7 @@ static int validate_lower_address_hint(void)
>   
>   static int validate_complete_va_space(void)
>   {
> -	unsigned long start_addr, end_addr, prev_end_addr;
> +	unsigned long start_addr, end_addr;
>   	char line[400];
>   	char prot[6];
>   	FILE *file;
> @@ -134,7 +134,6 @@ static int validate_complete_va_space(void)
>   	if (file == NULL)
>   		ksft_exit_fail_msg("cannot open /proc/self/maps\n");
>   
> -	prev_end_addr = 0;
>   	while (fgets(line, sizeof(line), file)) {
>   		const char *vma_name = NULL;
>   		int vma_name_start = 0;
> @@ -151,12 +150,6 @@ static int validate_complete_va_space(void)
>   		if (start_addr & (1UL << 63))
>   			return 0;
>   
> -		/* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
> -		if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
> -			return 1;
> -
> -		prev_end_addr = end_addr;
> -
>   		if (prot[0] != 'r')
>   			continue;
>   
> @@ -223,8 +216,7 @@ int main(int argc, char *argv[])
>   
>   		if (hptr[i] == MAP_FAILED)
>   			break;
> -
> -		mark_range(ptr[i], MAP_CHUNK_SIZE);
> +		mark_range(hptr[i], MAP_CHUNK_SIZE);
>   		validate_addr(hptr[i], 1);
>   	}
>   	hchunks = i;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure
  2025-06-16 16:06 ` [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure Aboorva Devarajan
@ 2025-06-16 16:28   ` Dev Jain
  2025-06-17 15:04     ` donettom
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-16 16:28 UTC (permalink / raw)
  To: Aboorva Devarajan, akpm, Liam.Howlett, lorenzo.stoakes, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua
  Cc: linux-mm, linux-kselftest, linux-kernel, donettom, ritesh.list


On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
> From: Donet Tom <donettom@linux.ibm.com>
>
> execv argument is an array of pointers to null-terminated strings.
> In this patch we added NULL in the execv argument to fix the test
> failure.

Just a comment, how did this test suddenly start failing now? Also is a
fixes tag required? Clearly I am missing something.

> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>   tools/testing/selftests/mm/ksm_functional_tests.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
> index d7d3c22c077a..6ea50272a0ba 100644
> --- a/tools/testing/selftests/mm/ksm_functional_tests.c
> +++ b/tools/testing/selftests/mm/ksm_functional_tests.c
> @@ -579,7 +579,7 @@ static void test_prctl_fork_exec(void)
>   		return;
>   	} else if (child_pid == 0) {
>   		char *prg_name = "./ksm_functional_tests";
> -		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME };
> +		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
>   
>   		execv(prg_name, argv_for_program);
>   		return;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures
  2025-06-16 16:06 ` [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures Aboorva Devarajan
@ 2025-06-16 17:04   ` Liam R. Howlett
  2025-06-17 15:10     ` donettom
  0 siblings, 1 reply; 47+ messages in thread
From: Liam R. Howlett @ 2025-06-16 17:04 UTC (permalink / raw)
  To: Aboorva Devarajan
  Cc: akpm, lorenzo.stoakes, shuah, pfalcato, david, ziy, baolin.wang,
	npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kselftest,
	linux-kernel, donettom, ritesh.list

* Aboorva Devarajan <aboorvad@linux.ibm.com> [250616 12:07]:
> From: Donet Tom <donettom@linux.ibm.com>
> 
> This patch fixed 2 issues.
> 
> 1)After fork() in test_prctl_fork, the child process uses the file
> descriptors from the parent process to read ksm_stat and
> ksm_merging_pages. This results in incorrect values being read (parent
> process ksm_stat and ksm_merge_pages will be read in child), causing
> the test to fail.
> 
> This patch calls init_global_file_handles() in the child process to
> ensure that the current process's file descriptors are used to read
> ksm_stat and ksm_merging_pages.
> 
> 2) All tests currently call ksm_merge to trigger page merging.
> To ensure the system remains in a consistent state for subsequent
> tests, it is better to call ksm_unmerge during the test cleanup phase
> 
> In the test_prctl_fork test, after a fork(), reading ksm_merging_pages
> in the child process returns a non-zero value because a previous test
> performed a merge, and the child's memory state is inherited from the
> parent.
> 
> Although the child process calls ksm_unmerge, the ksm_merging_pages
> counter in the parent is reset to zero, while the child's counter
> remains unchanged. This discrepancy causes the test to fail.
> 
> To avoid this issue, each test should call ksm_unmerge during cleanup
> to ensure the counter is reset and the system is in a clean state for
> subsequent tests.
> 

Fixes: ?

> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>  tools/testing/selftests/mm/ksm_functional_tests.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
> index b61803e36d1c..d7d3c22c077a 100644
> --- a/tools/testing/selftests/mm/ksm_functional_tests.c
> +++ b/tools/testing/selftests/mm/ksm_functional_tests.c
> @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd;
>  static int pagemap_fd;
>  static size_t pagesize;
>  
> +static void init_global_file_handles(void);
> +
>  static bool range_maps_duplicates(char *addr, unsigned long size)
>  {
>  	unsigned long offs_a, offs_b, pfn_a, pfn_b;
> @@ -274,6 +276,7 @@ static void test_unmerge(void)
>  	ksft_test_result(!range_maps_duplicates(map, size),
>  			 "Pages were unmerged\n");
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  
> @@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void)
>  	ksft_test_result(!range_maps_duplicates(map, size),
>  			"KSM zero pages were unmerged\n");
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  
> @@ -366,6 +370,7 @@ static void test_unmerge_discarded(void)
>  	ksft_test_result(!range_maps_duplicates(map, size),
>  			 "Pages were unmerged\n");
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  
> @@ -428,6 +433,7 @@ static void test_unmerge_uffd_wp(void)
>  close_uffd:
>  	close(uffd);
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  #endif
> @@ -491,6 +497,7 @@ static int test_child_ksm(void)
>  	else if (map == MAP_MERGE_SKIP)
>  		return -3;
>  
> +	ksm_unmerge();
>  	munmap(map, size);
>  	return 0;
>  }
> @@ -524,6 +531,7 @@ static void test_prctl_fork(void)
>  
>  	child_pid = fork();
>  	if (!child_pid) {
> +		init_global_file_handles();
>  		exit(test_child_ksm());
>  	} else if (child_pid < 0) {
>  		ksft_test_result_fail("fork() failed\n");
> @@ -620,6 +628,7 @@ static void test_prctl_unmerge(void)
>  	ksft_test_result(!range_maps_duplicates(map, size),
>  			 "Pages were unmerged\n");
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  
> @@ -653,6 +662,7 @@ static void test_prot_none(void)
>  	ksft_test_result(!range_maps_duplicates(map, size),
>  			 "Pages were unmerged\n");
>  unmap:
> +	ksm_unmerge();
>  	munmap(map, size);
>  }
>  
> -- 
> 2.43.5
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests
  2025-06-16 16:11 ` [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Lorenzo Stoakes
@ 2025-06-17  7:53   ` Aboorva Devarajan
  0 siblings, 0 replies; 47+ messages in thread
From: Aboorva Devarajan @ 2025-06-17  7:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, Liam.Howlett, shuah, pfalcato, david, ziy, baolin.wang,
	npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kselftest,
	linux-kernel, donettom, ritesh.list

On Mon, 2025-06-16 at 17:11 +0100, Lorenzo Stoakes wrote:
> Hi Aboorva,
> 
> It's (highly!) forgivable as things have changed fairly recently, but the
> tippy tip of mm development is now done in the mm-new branch, against which
> this series doesn't apply.
> 
> I thought this might be the case, as there is a series I implemented there
> that changes the split huge page tests (possibly affecting more), hence me
> checking!
> 
> So you'll need to rebase against mm-new, unfortunately!
> 
> Thanks, Lorenzo

Hi Lorenzo,

Thanks for pointing this out.

We'll rebase the patches against mm-new and repost them as v2.

Regards,
Aboorva
> 
> On Mon, Jun 16, 2025 at 09:36:26PM +0530, Aboorva Devarajan wrote:
> > This patch series fixes some of the false positives in generic
> > mm selftests and skips tests that cannot run correctly due to
> > missing features or system limitations.
> > 
> > Please let us know if you have any feedback.
> > 
> > Thanks,
> > Aboorva
> > 
> > Aboorva Devarajan (2):
> >   selftests/mm: Fix child process exit codes in KSM tests
> >   selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no
> >     1G pages
> > 
> > Donet Tom (4):
> >   mm/selftests: Fix virtual_address_range test issues.
> >   selftest/mm: Fix ksm_funtional_test failures
> >   selftests/mm : fix test_prctl_fork_exec failure
> >   mm/selftests: Fix split_huge_page_test failure on systems with 64KB
> >     page size
> > 
> >  .../selftests/mm/ksm_functional_tests.c       | 24 +++++++++++++------
> >  .../selftests/mm/split_huge_page_test.c       | 23 ++++++++++++++----
> >  tools/testing/selftests/mm/thuge-gen.c        | 11 +++++----
> >  .../selftests/mm/virtual_address_range.c      | 14 +++--------
> >  4 files changed, 45 insertions(+), 27 deletions(-)
> > 
> > --
> > 2.43.5
> > 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure
  2025-06-16 16:28   ` Dev Jain
@ 2025-06-17 15:04     ` donettom
  0 siblings, 0 replies; 47+ messages in thread
From: donettom @ 2025-06-17 15:04 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, lorenzo.stoakes, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

esOn Mon, Jun 16, 2025 at 09:58:38PM +0530, Dev Jain wrote:
> 
> On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
> > From: Donet Tom <donettom@linux.ibm.com>
> > 
> > execv argument is an array of pointers to null-terminated strings.
> > In this patch we added NULL in the execv argument to fix the test
> > failure.
> 
> Just a comment, how did this test suddenly start failing now? Also is a
> fixes tag required? Clearly I am missing something.
> 

This test has been failing on my machine since the version in which
it was introduced.

Will add the fixes-by tag in next version.


Below is the test result on 0374af1da077- mm/ksm: test case for prctl fork/exec workflow

./ksm_functional_tests 
TAP version 13
1..9
 [RUN] test_unmerge
ok 1 Pages were unmerged
 [RUN] test_unmerge_zero_pages
ok 2 KSM zero pages were unmerged
 [RUN] test_unmerge_discarded
ok 3 Pages were unmerged
 [RUN] test_unmerge_uffd_wp
ok 4 # SKIP UFFD_FEATURE_PAGEFAULT_FLAG_WP not available
 [RUN] test_prot_none
ok 5 Pages were unmerged
 [RUN] test_prctl
ok 6 Setting/clearing PR_SET_MEMORY_MERGE works
 [RUN] test_prctl_fork
ok 7 PR_SET_MEMORY_MERGE value is inherited
 [RUN] test_prctl_fork_exec
 [RUN] test_prctl_unmerge
not ok 8 No pages got merged
Bail out! 1 out of 8 tests failed
 Planned tests != run tests (9 != 8)
 Totals: pass:6 fail:1 xfail:0 xpass:0 skip:1 error:0
not ok 8 KSM not enabled
 [RUN] test_prctl_unmerge
ok 9 Pages were unmerged
Bail out! 1 out of 9 tests failed
 Totals: pass:7 fail:1 xfail:0 xpass:0 skip:1 error:0


With the above patch the test is passing.

 [RUN] test_prctl_fork_exec
ok 8 PR_SET_MEMORY_MERGE value is inherited

> > Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >   tools/testing/selftests/mm/ksm_functional_tests.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
> > index d7d3c22c077a..6ea50272a0ba 100644
> > --- a/tools/testing/selftests/mm/ksm_functional_tests.c
> > +++ b/tools/testing/selftests/mm/ksm_functional_tests.c
> > @@ -579,7 +579,7 @@ static void test_prctl_fork_exec(void)
> >   		return;
> >   	} else if (child_pid == 0) {
> >   		char *prg_name = "./ksm_functional_tests";
> > -		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME };
> > +		char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
> >   		execv(prg_name, argv_for_program);
> >   		return;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures
  2025-06-16 17:04   ` Liam R. Howlett
@ 2025-06-17 15:10     ` donettom
  0 siblings, 0 replies; 47+ messages in thread
From: donettom @ 2025-06-17 15:10 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Aboorva Devarajan, akpm, lorenzo.stoakes, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Mon, Jun 16, 2025 at 01:04:40PM -0400, Liam R. Howlett wrote:
> * Aboorva Devarajan <aboorvad@linux.ibm.com> [250616 12:07]:
> > From: Donet Tom <donettom@linux.ibm.com>
> > 
> > This patch fixed 2 issues.
> > 
> > 1)After fork() in test_prctl_fork, the child process uses the file
> > descriptors from the parent process to read ksm_stat and
> > ksm_merging_pages. This results in incorrect values being read (parent
> > process ksm_stat and ksm_merge_pages will be read in child), causing
> > the test to fail.
> > 
> > This patch calls init_global_file_handles() in the child process to
> > ensure that the current process's file descriptors are used to read
> > ksm_stat and ksm_merging_pages.
> > 
> > 2) All tests currently call ksm_merge to trigger page merging.
> > To ensure the system remains in a consistent state for subsequent
> > tests, it is better to call ksm_unmerge during the test cleanup phase
> > 
> > In the test_prctl_fork test, after a fork(), reading ksm_merging_pages
> > in the child process returns a non-zero value because a previous test
> > performed a merge, and the child's memory state is inherited from the
> > parent.
> > 
> > Although the child process calls ksm_unmerge, the ksm_merging_pages
> > counter in the parent is reset to zero, while the child's counter
> > remains unchanged. This discrepancy causes the test to fail.
> > 
> > To avoid this issue, each test should call ksm_unmerge during cleanup
> > to ensure the counter is reset and the system is in a clean state for
> > subsequent tests.
> > 
> 
> Fixes: ?

Sorry I missed it. We will add and send a new version.

> 
> > Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >  tools/testing/selftests/mm/ksm_functional_tests.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
> > index b61803e36d1c..d7d3c22c077a 100644
> > --- a/tools/testing/selftests/mm/ksm_functional_tests.c
> > +++ b/tools/testing/selftests/mm/ksm_functional_tests.c
> > @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd;
> >  static int pagemap_fd;
> >  static size_t pagesize;
> >  
> > +static void init_global_file_handles(void);
> > +
> >  static bool range_maps_duplicates(char *addr, unsigned long size)
> >  {
> >  	unsigned long offs_a, offs_b, pfn_a, pfn_b;
> > @@ -274,6 +276,7 @@ static void test_unmerge(void)
> >  	ksft_test_result(!range_maps_duplicates(map, size),
> >  			 "Pages were unmerged\n");
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  
> > @@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void)
> >  	ksft_test_result(!range_maps_duplicates(map, size),
> >  			"KSM zero pages were unmerged\n");
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  
> > @@ -366,6 +370,7 @@ static void test_unmerge_discarded(void)
> >  	ksft_test_result(!range_maps_duplicates(map, size),
> >  			 "Pages were unmerged\n");
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  
> > @@ -428,6 +433,7 @@ static void test_unmerge_uffd_wp(void)
> >  close_uffd:
> >  	close(uffd);
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  #endif
> > @@ -491,6 +497,7 @@ static int test_child_ksm(void)
> >  	else if (map == MAP_MERGE_SKIP)
> >  		return -3;
> >  
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  	return 0;
> >  }
> > @@ -524,6 +531,7 @@ static void test_prctl_fork(void)
> >  
> >  	child_pid = fork();
> >  	if (!child_pid) {
> > +		init_global_file_handles();
> >  		exit(test_child_ksm());
> >  	} else if (child_pid < 0) {
> >  		ksft_test_result_fail("fork() failed\n");
> > @@ -620,6 +628,7 @@ static void test_prctl_unmerge(void)
> >  	ksft_test_result(!range_maps_duplicates(map, size),
> >  			 "Pages were unmerged\n");
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  
> > @@ -653,6 +662,7 @@ static void test_prot_none(void)
> >  	ksft_test_result(!range_maps_duplicates(map, size),
> >  			 "Pages were unmerged\n");
> >  unmap:
> > +	ksm_unmerge();
> >  	munmap(map, size);
> >  }
> >  
> > -- 
> > 2.43.5
> > 
> > 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-16 16:27   ` Dev Jain
@ 2025-06-18 10:06     ` Donet Tom
  2025-06-18 10:35       ` Dev Jain
  2025-06-18 11:22     ` Lorenzo Stoakes
  1 sibling, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-18 10:06 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, lorenzo.stoakes, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Mon, Jun 16, 2025 at 09:57:10PM +0530, Dev Jain wrote:
> 

Hi Dev

> On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
> > From: Donet Tom <donettom@linux.ibm.com>
> > 
> > In this patch, we are fixing three issues in the virtual_address_range
> > test.
> > 
> > 1. validate_addr() checks if the allocated address is within the range.
> > In the current implementation, if addr is greater than HIGH_ADDR_MARK,
> > the test fails. However, addr will be greater than HIGH_ADDR_MARK if
> > high_addr is set. Therefore, if high_addr is set, we should not check
> > the (addr > HIGH_ADDR_MARK) condition.
> > 
> > 2.In main(), the high address is stored in hptr, but for mark_range(),
> > the address passed is ptr, not hptr. Fixed this by changing ptr[i] to
> > hptr[i] in mark_range() function call.
> > 
> > 3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
> > The gap between the first high address mapping and the previous mapping
> > is not smaller than MAP_CHUNK_SIZE.
> 
> For this, can't we just elide the check when we cross the high boundary?
> As I see it you are essentially nullifying the purpose of validate_complete_va_space;
> I had written that function so as to have an alternate way of checking VA exhaustion
> without relying on mmap correctness in a circular way.
>

In this test, we first allocate 128TB of low memory and verify that
the allocated area falls within the expected range.

Next, we allocate memory at high addresses and check whether the
returned addresses are within the specified limits. To allocate
memory at a high address, we pass a hint address. This hint address
is derived using HIGH_ADDR_SHIFT, which is set to 48 — corresponding
to 256TB. So, we are requesting allocation at the 256TB address, and
the memory is successfully allocated there. Since the low address
region is allocated up to 128TB, there is a gap between the low address
VMA and the high address VMA .

Additionally, we use a random number to generate the hint address, so
the actual allocated address will vary but will always be above 256TB.


10000000-10010000 r-xp 00000000 fd:05 134255559                       
10010000-10020000 r--p 00000000 fd:05 134255559                         
10020000-10030000 rw-p 00010000 fd:05 134255559                       
10022b80000-10022bb0000 rw-p 00000000 00:00 0                            
7fff5c000000-7fff9c000000 r--p 00000000 00:00 0                          
7fff9cb30000-7fff9ce40000 rw-p 00000000 00:00 0 
7fff9ce40000-7fff9d070000 r-xp 00000000 fd:00 792355                   
7fff9d070000-7fff9d080000 r--p 00230000 fd:00 792355                  
7fff9d080000-7fff9d090000 rw-p 00240000 fd:00 792355                    
7fff9d090000-7fff9d170000 r-xp 00000000 fd:00 792358                    
7fff9d170000-7fff9d180000 r--p 000d0000 fd:00 792358                     
7fff9d180000-7fff9d190000 rw-p 000e0000 fd:00 792358                    
7fff9d1a0000-7fff9d1e0000 r--p 00000000 00:00 0                          
7fff9d1e0000-7fff9d1f0000 r-xp 00000000 00:00 0                      
7fff9d1f0000-7fff9d240000 r-xp 00000000 fd:00 792351                   
7fff9d240000-7fff9d250000 r--p 00040000 fd:00 792351                     
7fff9d250000-7fff9d260000 rw-p 00050000 fd:00 792351                    
7fffecfa0000-7fffecfd0000 rw-p 00000000 00:00 0                         
1000000000000-1000040000000 r--p 00000000 00:00 0           -> High address           
2000000000000-2000040000000 r--p 00000000 00:00 0                       
4000000000000-4000040000000 r--p 00000000 00:00 0                        
8000000000000-8000040000000 r--p 00000000 00:00 0                        
e80009d260000-fffff9d260000 r--p 00000000 00:00 0                       
                    

The high address we are getting is at 256TB because we are explicitly
requesting it. The gap between the high address VMA and the previous VMA
is large because the low memory allocation goes up to 128TB.

If I understand correctly, this test is verifying whether the allocated
address falls within the expected range, and validate_complete_va_space()
is validating the allocated virtual address space.

Why do we need to check whether the gap between two VMAs is within
MAP_CHUNK_SIZE? Should it validate only the allocated VMAs?

Thanks
Donet
 
> Btw @Lorenzo, validate_complete_va_space was written by me as my first patch ever for
> the Linux kernel : ) from the limited knowledge I have of VMA stuff, I guess the
> only requirement for VMA alignment is PAGE_SIZE in this test, therefore, the only
> check required is that the gap between two VMAs should be at least MAP_CHUNK_SIZE?
> Or can such a gap still exist even when the VA has been exhausted?
> 
> > 
> > $cat /proc/3713/maps
> > 10000000-10010000 r-xp 00000000 fd:00 36140094
> > 10010000-10020000 r--p 00000000 fd:00 36140094
> > 10020000-10030000 rw-p 00010000 fd:00 36140094
> > 4ee80000-4eeb0000 rw-p 00000000 00:00 0
> > 578f0000-57c00000 rw-p 00000000 00:00 0
> > 57c00000-7fff97c00000 r--p 00000000 00:00 0
> > 7fff97c00000-7fff97e20000 r-xp 00000000 fd:00 33558923
> > 7fff97e20000-7fff97e30000 r--p 00220000 fd:00 33558923
> > 7fff97e30000-7fff97e40000 rw-p 00230000 fd:00 33558923
> > 7fff97f40000-7fff98020000 r-xp 00000000 fd:00 33558924
> > 7fff98020000-7fff98030000 r--p 000d0000 fd:00 33558924
> > 7fff98030000-7fff98040000 rw-p 000e0000 fd:00 33558924
> > 7fff98050000-7fff98090000 r--p 00000000 00:00 0
> > 7fff98090000-7fff980a0000 r-xp 00000000 00:00 0
> > 7fff980a0000-7fff980f0000 r-xp 00000000 fd:00 2634
> > 7fff980f0000-7fff98100000 r--p 00040000 fd:00 2634
> > 7fff98100000-7fff98110000 rw-p 00050000 fd:00 2634
> > 7fffcf8a0000-7fffcf9b0000 rw-p 00000000 00:00 0
> > 1000000000000-1000040000000 r--p 00000000 00:00 0   --> High Addr
> > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > e800098110000-fffff98110000 r--p 00000000 00:00 0
> > $
> > 
> > In this patch, the condition that checks for gaps smaller than MAP_CHUNK_SIZE has been removed.
> > 
> > Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output")
> > Fixes: b2a79f62133a ("selftests/mm: virtual_address_range: unmap chunks after validation")
> > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >   tools/testing/selftests/mm/virtual_address_range.c | 14 +++-----------
> >   1 file changed, 3 insertions(+), 11 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index b380e102b22f..606e601a8984 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -80,7 +80,7 @@ static void validate_addr(char *ptr, int high_addr)
> >   	if (high_addr && addr < HIGH_ADDR_MARK)
> >   		ksft_exit_fail_msg("Bad address %lx\n", addr);
> > -	if (addr > HIGH_ADDR_MARK)
> > +	if (!high_addr && addr > HIGH_ADDR_MARK)
> >   		ksft_exit_fail_msg("Bad address %lx\n", addr);
> >   }
> > @@ -117,7 +117,7 @@ static int validate_lower_address_hint(void)
> >   static int validate_complete_va_space(void)
> >   {
> > -	unsigned long start_addr, end_addr, prev_end_addr;
> > +	unsigned long start_addr, end_addr;
> >   	char line[400];
> >   	char prot[6];
> >   	FILE *file;
> > @@ -134,7 +134,6 @@ static int validate_complete_va_space(void)
> >   	if (file == NULL)
> >   		ksft_exit_fail_msg("cannot open /proc/self/maps\n");
> > -	prev_end_addr = 0;
> >   	while (fgets(line, sizeof(line), file)) {
> >   		const char *vma_name = NULL;
> >   		int vma_name_start = 0;
> > @@ -151,12 +150,6 @@ static int validate_complete_va_space(void)
> >   		if (start_addr & (1UL << 63))
> >   			return 0;
> > -		/* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
> > -		if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
> > -			return 1;
> > -
> > -		prev_end_addr = end_addr;
> > -
> >   		if (prot[0] != 'r')
> >   			continue;
> > @@ -223,8 +216,7 @@ int main(int argc, char *argv[])
> >   		if (hptr[i] == MAP_FAILED)
> >   			break;
> > -
> > -		mark_range(ptr[i], MAP_CHUNK_SIZE);
> > +		mark_range(hptr[i], MAP_CHUNK_SIZE);
> >   		validate_addr(hptr[i], 1);
> >   	}
> >   	hchunks = i;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 10:06     ` Donet Tom
@ 2025-06-18 10:35       ` Dev Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-18 10:35 UTC (permalink / raw)
  To: Donet Tom
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, lorenzo.stoakes, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 18/06/25 3:36 pm, Donet Tom wrote:
> On Mon, Jun 16, 2025 at 09:57:10PM +0530, Dev Jain wrote:
> Hi Dev
>
>> On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
>>> From: Donet Tom <donettom@linux.ibm.com>
>>>
>>> In this patch, we are fixing three issues in the virtual_address_range
>>> test.
>>>
>>> 1. validate_addr() checks if the allocated address is within the range.
>>> In the current implementation, if addr is greater than HIGH_ADDR_MARK,
>>> the test fails. However, addr will be greater than HIGH_ADDR_MARK if
>>> high_addr is set. Therefore, if high_addr is set, we should not check
>>> the (addr > HIGH_ADDR_MARK) condition.
>>>
>>> 2.In main(), the high address is stored in hptr, but for mark_range(),
>>> the address passed is ptr, not hptr. Fixed this by changing ptr[i] to
>>> hptr[i] in mark_range() function call.
>>>
>>> 3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
>>> The gap between the first high address mapping and the previous mapping
>>> is not smaller than MAP_CHUNK_SIZE.
>> For this, can't we just elide the check when we cross the high boundary?
>> As I see it you are essentially nullifying the purpose of validate_complete_va_space;
>> I had written that function so as to have an alternate way of checking VA exhaustion
>> without relying on mmap correctness in a circular way.
>>
> In this test, we first allocate 128TB of low memory and verify that
> the allocated area falls within the expected range.
>
> Next, we allocate memory at high addresses and check whether the
> returned addresses are within the specified limits. To allocate
> memory at a high address, we pass a hint address. This hint address
> is derived using HIGH_ADDR_SHIFT, which is set to 48 — corresponding
> to 256TB. So, we are requesting allocation at the 256TB address, and
> the memory is successfully allocated there. Since the low address
> region is allocated up to 128TB, there is a gap between the low address
> VMA and the high address VMA .
>
> Additionally, we use a random number to generate the hint address, so
> the actual allocated address will vary but will always be above 256TB.
>
>
> 10000000-10010000 r-xp 00000000 fd:05 134255559
> 10010000-10020000 r--p 00000000 fd:05 134255559
> 10020000-10030000 rw-p 00010000 fd:05 134255559
> 10022b80000-10022bb0000 rw-p 00000000 00:00 0
> 7fff5c000000-7fff9c000000 r--p 00000000 00:00 0
> 7fff9cb30000-7fff9ce40000 rw-p 00000000 00:00 0
> 7fff9ce40000-7fff9d070000 r-xp 00000000 fd:00 792355
> 7fff9d070000-7fff9d080000 r--p 00230000 fd:00 792355
> 7fff9d080000-7fff9d090000 rw-p 00240000 fd:00 792355
> 7fff9d090000-7fff9d170000 r-xp 00000000 fd:00 792358
> 7fff9d170000-7fff9d180000 r--p 000d0000 fd:00 792358
> 7fff9d180000-7fff9d190000 rw-p 000e0000 fd:00 792358
> 7fff9d1a0000-7fff9d1e0000 r--p 00000000 00:00 0
> 7fff9d1e0000-7fff9d1f0000 r-xp 00000000 00:00 0
> 7fff9d1f0000-7fff9d240000 r-xp 00000000 fd:00 792351
> 7fff9d240000-7fff9d250000 r--p 00040000 fd:00 792351
> 7fff9d250000-7fff9d260000 rw-p 00050000 fd:00 792351
> 7fffecfa0000-7fffecfd0000 rw-p 00000000 00:00 0
> 1000000000000-1000040000000 r--p 00000000 00:00 0           -> High address
> 2000000000000-2000040000000 r--p 00000000 00:00 0
> 4000000000000-4000040000000 r--p 00000000 00:00 0
> 8000000000000-8000040000000 r--p 00000000 00:00 0
> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>                      
>
> The high address we are getting is at 256TB because we are explicitly
> requesting it. The gap between the high address VMA and the previous VMA
> is large because the low memory allocation goes up to 128TB.
>
> If I understand correctly, this test is verifying whether the allocated
> address falls within the expected range, and validate_complete_va_space()
> is validating the allocated virtual address space.
>
> Why do we need to check whether the gap between two VMAs is within
> MAP_CHUNK_SIZE? Should it validate only the allocated VMAs?

If you were trying to say here "Shouldn't it validate only the allocated VMAs",
that is exactly what we are doing. By "allocated" we mean mmapped. And those
VMAs will be shown in /proc/self/maps. This test is verifying whether mmap
can exhaust a process' VA space or not, therefore (assume no high address)
the gap between any two consecutive VMAs must not be greater than MAP_CHUNK_SIZE,
for if it was, then mmap should have been able to find it and allocate it there,
reducing that gap.

Now, taking into consideration the high address thingy, since the test completely
exhaust the low and high VA space, the gap condition should separately hold on
the low and high VA space. But it may not hold at the low-high boundary. So
you can simply avoid the check when you detect for the first time that the VMA
you are reading from /proc/self/maps comes from the high VA space.

>
> Thanks
> Donet
>   
>> Btw @Lorenzo, validate_complete_va_space was written by me as my first patch ever for
>> the Linux kernel : ) from the limited knowledge I have of VMA stuff, I guess the
>> only requirement for VMA alignment is PAGE_SIZE in this test, therefore, the only
>> check required is that the gap between two VMAs should be at least MAP_CHUNK_SIZE?
>> Or can such a gap still exist even when the VA has been exhausted?
>>
>>> $cat /proc/3713/maps
>>> 10000000-10010000 r-xp 00000000 fd:00 36140094
>>> 10010000-10020000 r--p 00000000 fd:00 36140094
>>> 10020000-10030000 rw-p 00010000 fd:00 36140094
>>> 4ee80000-4eeb0000 rw-p 00000000 00:00 0
>>> 578f0000-57c00000 rw-p 00000000 00:00 0
>>> 57c00000-7fff97c00000 r--p 00000000 00:00 0
>>> 7fff97c00000-7fff97e20000 r-xp 00000000 fd:00 33558923
>>> 7fff97e20000-7fff97e30000 r--p 00220000 fd:00 33558923
>>> 7fff97e30000-7fff97e40000 rw-p 00230000 fd:00 33558923
>>> 7fff97f40000-7fff98020000 r-xp 00000000 fd:00 33558924
>>> 7fff98020000-7fff98030000 r--p 000d0000 fd:00 33558924
>>> 7fff98030000-7fff98040000 rw-p 000e0000 fd:00 33558924
>>> 7fff98050000-7fff98090000 r--p 00000000 00:00 0
>>> 7fff98090000-7fff980a0000 r-xp 00000000 00:00 0
>>> 7fff980a0000-7fff980f0000 r-xp 00000000 fd:00 2634
>>> 7fff980f0000-7fff98100000 r--p 00040000 fd:00 2634
>>> 7fff98100000-7fff98110000 rw-p 00050000 fd:00 2634
>>> 7fffcf8a0000-7fffcf9b0000 rw-p 00000000 00:00 0
>>> 1000000000000-1000040000000 r--p 00000000 00:00 0   --> High Addr
>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>> e800098110000-fffff98110000 r--p 00000000 00:00 0
>>> $
>>>
>>> In this patch, the condition that checks for gaps smaller than MAP_CHUNK_SIZE has been removed.
>>>
>>> Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output")
>>> Fixes: b2a79f62133a ("selftests/mm: virtual_address_range: unmap chunks after validation")
>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
>>> ---
>>>    tools/testing/selftests/mm/virtual_address_range.c | 14 +++-----------
>>>    1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index b380e102b22f..606e601a8984 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -80,7 +80,7 @@ static void validate_addr(char *ptr, int high_addr)
>>>    	if (high_addr && addr < HIGH_ADDR_MARK)
>>>    		ksft_exit_fail_msg("Bad address %lx\n", addr);
>>> -	if (addr > HIGH_ADDR_MARK)
>>> +	if (!high_addr && addr > HIGH_ADDR_MARK)
>>>    		ksft_exit_fail_msg("Bad address %lx\n", addr);
>>>    }
>>> @@ -117,7 +117,7 @@ static int validate_lower_address_hint(void)
>>>    static int validate_complete_va_space(void)
>>>    {
>>> -	unsigned long start_addr, end_addr, prev_end_addr;
>>> +	unsigned long start_addr, end_addr;
>>>    	char line[400];
>>>    	char prot[6];
>>>    	FILE *file;
>>> @@ -134,7 +134,6 @@ static int validate_complete_va_space(void)
>>>    	if (file == NULL)
>>>    		ksft_exit_fail_msg("cannot open /proc/self/maps\n");
>>> -	prev_end_addr = 0;
>>>    	while (fgets(line, sizeof(line), file)) {
>>>    		const char *vma_name = NULL;
>>>    		int vma_name_start = 0;
>>> @@ -151,12 +150,6 @@ static int validate_complete_va_space(void)
>>>    		if (start_addr & (1UL << 63))
>>>    			return 0;
>>> -		/* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
>>> -		if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
>>> -			return 1;
>>> -
>>> -		prev_end_addr = end_addr;
>>> -
>>>    		if (prot[0] != 'r')
>>>    			continue;
>>> @@ -223,8 +216,7 @@ int main(int argc, char *argv[])
>>>    		if (hptr[i] == MAP_FAILED)
>>>    			break;
>>> -
>>> -		mark_range(ptr[i], MAP_CHUNK_SIZE);
>>> +		mark_range(hptr[i], MAP_CHUNK_SIZE);
>>>    		validate_addr(hptr[i], 1);
>>>    	}
>>>    	hchunks = i;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-16 16:27   ` Dev Jain
  2025-06-18 10:06     ` Donet Tom
@ 2025-06-18 11:22     ` Lorenzo Stoakes
  2025-06-18 11:28       ` Dev Jain
  1 sibling, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 11:22 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Mon, Jun 16, 2025 at 09:57:10PM +0530, Dev Jain wrote:
>
> On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
> > From: Donet Tom <donettom@linux.ibm.com>
> > 3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
> > The gap between the first high address mapping and the previous mapping
> > is not smaller than MAP_CHUNK_SIZE.
>
> For this, can't we just elide the check when we cross the high boundary?
> As I see it you are essentially nullifying the purpose of validate_complete_va_space;
> I had written that function so as to have an alternate way of checking VA exhaustion
> without relying on mmap correctness in a circular way.
>
> Btw @Lorenzo, validate_complete_va_space was written by me as my first patch ever for
> the Linux kernel : ) from the limited knowledge I have of VMA stuff, I guess the

:)

Mine was this utter triviality, but got me started :>)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1da1d573f67d11c2f80ffaf38d3cdd3fee97d4b

> only requirement for VMA alignment is PAGE_SIZE in this test, therefore, the only
> check required is that the gap between two VMAs should be at least MAP_CHUNK_SIZE?
> Or can such a gap still exist even when the VA has been exhausted?

VMAs are mapped at page granularity, the logic as to placement is determined by
the get unmapped area logic, for instance mm_get_unmapped_area_vmflags().

Unless a compatibility flag is set it'll be determined top-down.

I try to avoid thinking about 32-bit kernels at all so meh to all that :)

You get arch-specific stuff in arch_get_unmapped_area_topdown().

But the generic shared stuff is in vm_unmapped_area(), typically,
unmapped_area_topdown().

TL;DR, aside from arch stuff, the stack guard gap is the main additional
requirement, which puts (by default) 256 pages between an expanding stack and
the start of a new mapping. Which is 1 GB :) which maybe is why you chose this
value for MAP_CHUNK_SIZE?

For shadow stack we also have a 4 KB requirement. But only on x86-64 :)

Anyway I'm not sure there's huge value in sort of writing a test that too
closely mimics the code it is testing. Setting broad expections (which I presume
this test does) is better.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:22     ` Lorenzo Stoakes
@ 2025-06-18 11:28       ` Dev Jain
  2025-06-18 11:37         ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-18 11:28 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list


On 18/06/25 4:52 pm, Lorenzo Stoakes wrote:
> On Mon, Jun 16, 2025 at 09:57:10PM +0530, Dev Jain wrote:
>> On 16/06/25 9:36 pm, Aboorva Devarajan wrote:
>>> From: Donet Tom <donettom@linux.ibm.com>
>>> 3./proc/self/maps may not always have gaps smaller than MAP_CHUNK_SIZE.
>>> The gap between the first high address mapping and the previous mapping
>>> is not smaller than MAP_CHUNK_SIZE.
>> For this, can't we just elide the check when we cross the high boundary?
>> As I see it you are essentially nullifying the purpose of validate_complete_va_space;
>> I had written that function so as to have an alternate way of checking VA exhaustion
>> without relying on mmap correctness in a circular way.
>>
>> Btw @Lorenzo, validate_complete_va_space was written by me as my first patch ever for
>> the Linux kernel : ) from the limited knowledge I have of VMA stuff, I guess the
> :)
>
> Mine was this utter triviality, but got me started :>)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1da1d573f67d11c2f80ffaf38d3cdd3fee97d4b
>
>> only requirement for VMA alignment is PAGE_SIZE in this test, therefore, the only
>> check required is that the gap between two VMAs should be at least MAP_CHUNK_SIZE?
>> Or can such a gap still exist even when the VA has been exhausted?
> VMAs are mapped at page granularity, the logic as to placement is determined by
> the get unmapped area logic, for instance mm_get_unmapped_area_vmflags().
>
> Unless a compatibility flag is set it'll be determined top-down.
>
> I try to avoid thinking about 32-bit kernels at all so meh to all that :)
>
> You get arch-specific stuff in arch_get_unmapped_area_topdown().
>
> But the generic shared stuff is in vm_unmapped_area(), typically,
> unmapped_area_topdown().
>
> TL;DR, aside from arch stuff, the stack guard gap is the main additional
> requirement, which puts (by default) 256 pages between an expanding stack and
> the start of a new mapping. Which is 1 GB :) which maybe is why you chose this
> value for MAP_CHUNK_SIZE?

MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )

So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().

>
> For shadow stack we also have a 4 KB requirement. But only on x86-64 :)
>
> Anyway I'm not sure there's huge value in sort of writing a test that too
> closely mimics the code it is testing. Setting broad expections (which I presume
> this test does) is better.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:28       ` Dev Jain
@ 2025-06-18 11:37         ` Lorenzo Stoakes
  2025-06-18 11:45           ` Dev Jain
  2025-06-18 11:50           ` Lorenzo Stoakes
  0 siblings, 2 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 11:37 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 04:58:56PM +0530, Dev Jain wrote:
>
> MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )
>
> So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
> with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().

No haha, not at all!! Firstly fixed addressed override a lot of this, secondly
the 256 page gap (which is configurable btw) is only applicable for mappings
below a stack (in stack grow down arch).

This assumption is totally incorrect, sorry. I'd suggest making assertions about
this is really not all that useful, as things vary by arch and kernel
configuration.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:37         ` Lorenzo Stoakes
@ 2025-06-18 11:45           ` Dev Jain
  2025-06-18 11:57             ` Lorenzo Stoakes
  2025-06-18 11:50           ` Lorenzo Stoakes
  1 sibling, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-18 11:45 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list


On 18/06/25 5:07 pm, Lorenzo Stoakes wrote:
> On Wed, Jun 18, 2025 at 04:58:56PM +0530, Dev Jain wrote:
>> MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )
>>
>> So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
>> with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().
> No haha, not at all!! Firstly fixed addressed override a lot of this, secondly
> the 256 page gap (which is configurable btw) is only applicable for mappings
> below a stack (in stack grow down arch).

Sorry, I was making that assertion w.r.t this specific selftest. What the test
is doing is exhausting VA space without passing a hint or MAP_FIXED. With this
context, where does this assertion fail? One of them will be if the stack guard
gap is more than 256 pages.

Also, note that the test hasn't reported frequent failures post my change, so
in general settings, w.r.t this test, the assertion experimentally seems to
be true : )

>
> This assumption is totally incorrect, sorry. I'd suggest making assertions about
> this is really not all that useful, as things vary by arch and kernel
> configuration.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:37         ` Lorenzo Stoakes
  2025-06-18 11:45           ` Dev Jain
@ 2025-06-18 11:50           ` Lorenzo Stoakes
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 11:50 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 12:37:29PM +0100, Lorenzo Stoakes wrote:
> On Wed, Jun 18, 2025 at 04:58:56PM +0530, Dev Jain wrote:
> >
> > MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )
> >

To correct myself for being an idiot before, 256 x 4 KB is 1 MB not 1 GB
sorry :)

> > So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
> > with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().
>
> No haha, not at all!! Firstly fixed addressed override a lot of this, secondly
> the 256 page gap (which is configurable btw) is only applicable for mappings
> below a stack (in stack grow down arch).
>
> This assumption is totally incorrect, sorry. I'd suggest making assertions about
> this is really not all that useful, as things vary by arch and kernel
> configuration.

You can play with this program to see what happens in reality.

On my system the mappings of first two VMAs are immediately adjacent, then
the other is >1MB below:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
	char *ptr;

	ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
	if (ptr == MAP_FAILED) {
		perror("mmap 1");
		return EXIT_FAILURE;
	}
	printf("ptr1 = %p\n", ptr);

	ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE | MAP_GROWSDOWN, -1, 0);
	if (ptr == MAP_FAILED) {
		perror("mmap 2");
		return EXIT_FAILURE;
	}
	printf("ptr2 = %p\n", ptr);

	ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
	if (ptr == MAP_FAILED) {
		perror("mmap 3");
		return EXIT_FAILURE;
	}
	printf("ptr3 = %p\n", ptr);


	return EXIT_SUCCESS;
}

The definitive answers are in the get unmapped area logic.

But again not very useful to test imo beyond hand-wavey basics (and you
have to check that against all arches to be sure your hand waving is always
true :)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:45           ` Dev Jain
@ 2025-06-18 11:57             ` Lorenzo Stoakes
  2025-06-18 11:59               ` Lorenzo Stoakes
  2025-06-18 13:58               ` Dev Jain
  0 siblings, 2 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 11:57 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>
> On 18/06/25 5:07 pm, Lorenzo Stoakes wrote:
> > On Wed, Jun 18, 2025 at 04:58:56PM +0530, Dev Jain wrote:
> > > MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )
> > >
> > > So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
> > > with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().
> > No haha, not at all!! Firstly fixed addressed override a lot of this, secondly
> > the 256 page gap (which is configurable btw) is only applicable for mappings
> > below a stack (in stack grow down arch).
>
> Sorry, I was making that assertion w.r.t this specific selftest. What the test
> is doing is exhausting VA space without passing a hint or MAP_FIXED. With this
> context, where does this assertion fail? One of them will be if the stack guard
> gap is more than 256 pages.

Are you accounting for sys.max_map_count? If not, then you'll be hitting that
first.

>
> Also, note that the test hasn't reported frequent failures post my change, so
> in general settings, w.r.t this test, the assertion experimentally seems to
> be true : )

I don't really have time to dig into the test in detail sorry too much else on
at the moment.

But it isn't a big problem even if it happened to turn out that this test isn't
really testing quite what you expected :)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:57             ` Lorenzo Stoakes
@ 2025-06-18 11:59               ` Lorenzo Stoakes
  2025-06-18 13:58               ` Dev Jain
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 11:59 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 12:57:38PM +0100, Lorenzo Stoakes wrote:
> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> first.

* vm.max_map_count


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 11:57             ` Lorenzo Stoakes
  2025-06-18 11:59               ` Lorenzo Stoakes
@ 2025-06-18 13:58               ` Dev Jain
  2025-06-18 14:07                 ` Lorenzo Stoakes
  1 sibling, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-18 13:58 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list


On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>> On 18/06/25 5:07 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 04:58:56PM +0530, Dev Jain wrote:
>>>> MAP_CHUNK_SIZE was chosen randomly. Good to see it translates into something logical : )
>>>>
>>>> So I guess I am correct, if we can find two VMAs (except at the edge of the high addr boundary)
>>>> with a gap of greater than MAP_CHUNK_SIZE then there is a bug in mmap().
>>> No haha, not at all!! Firstly fixed addressed override a lot of this, secondly
>>> the 256 page gap (which is configurable btw) is only applicable for mappings
>>> below a stack (in stack grow down arch).
>> Sorry, I was making that assertion w.r.t this specific selftest. What the test
>> is doing is exhausting VA space without passing a hint or MAP_FIXED. With this
>> context, where does this assertion fail? One of them will be if the stack guard
>> gap is more than 256 pages.
> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> first.

run_vmtests.sh will run the test in overcommit mode so that won't be an issue.

>
>> Also, note that the test hasn't reported frequent failures post my change, so
>> in general settings, w.r.t this test, the assertion experimentally seems to
>> be true : )
> I don't really have time to dig into the test in detail sorry too much else on
> at the moment.
>
> But it isn't a big problem even if it happened to turn out that this test isn't
> really testing quite what you expected :)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 13:58               ` Dev Jain
@ 2025-06-18 14:07                 ` Lorenzo Stoakes
  2025-06-18 14:17                   ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 14:07 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>
> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > first.
>
> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.

Umm, what? You mean overcommit all mode, and that has no bearing on the max
mapping count check.

In do_mmap():

	/* Too many mappings? */
	if (mm->map_count > sysctl_max_map_count)
		return -ENOMEM;


As well as numerous other checks in mm/vma.c.

I'm not sure why an overcommit toggle is even necessary when you could use
MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?

I'm pretty confused as to what this test is really achieving honestly. This
isn't a useful way of asserting mmap() behaviour as far as I can tell.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 14:07                 ` Lorenzo Stoakes
@ 2025-06-18 14:17                   ` Dev Jain
  2025-06-18 14:35                     ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-18 14:17 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list


On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>> first.
>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> Umm, what? You mean overcommit all mode, and that has no bearing on the max
> mapping count check.
>
> In do_mmap():
>
> 	/* Too many mappings? */
> 	if (mm->map_count > sysctl_max_map_count)
> 		return -ENOMEM;
>
>
> As well as numerous other checks in mm/vma.c.

Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
this.

>
> I'm not sure why an overcommit toggle is even necessary when you could use
> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>
> I'm pretty confused as to what this test is really achieving honestly. This
> isn't a useful way of asserting mmap() behaviour as far as I can tell.

Well, seems like a useful way to me at least : ) Not sure if you are in the mood
to discuss that but if you'd like me to explain from start to end what the test
is doing, I can do that : )



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 14:17                   ` Dev Jain
@ 2025-06-18 14:35                     ` Lorenzo Stoakes
  2025-06-18 14:43                       ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-06-18 14:35 UTC (permalink / raw)
  To: Dev Jain
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list

On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>
> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > first.
> > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > mapping count check.
> >
> > In do_mmap():
> >
> > 	/* Too many mappings? */
> > 	if (mm->map_count > sysctl_max_map_count)
> > 		return -ENOMEM;
> >
> >
> > As well as numerous other checks in mm/vma.c.
>
> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> this.

No problem! It's hard to be aware of everything in mm :)

>
> >
> > I'm not sure why an overcommit toggle is even necessary when you could use
> > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> >
> > I'm pretty confused as to what this test is really achieving honestly. This
> > isn't a useful way of asserting mmap() behaviour as far as I can tell.
>
> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> to discuss that but if you'd like me to explain from start to end what the test
> is doing, I can do that : )
>

I just don't have time right now, I guess I'll have to come back to it
later... it's not the end of the world for it to be iffy in my view as long as
it passes, but it might just not be of great value.

Philosophically I'd rather we didn't assert internal implementation details like
where we place mappings in userland memory. At no point do we promise to not
leave larger gaps if we feel like it :)

I'm guessing, reading more, the _real_ test here is some mathematical assertion
about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.

But again I'm not sure that achieves much and again also is asserting internal
implementation details.

Correct behaviour of this kind of thing probably better belongs to tests in the
userland VMA testing I'd say.

Sorry I don't mean to do down work you've done before, just giving an honest
technical appraisal!

Anyway don't let this block work to fix the test if it's failing. We can revisit
this later.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 14:35                     ` Lorenzo Stoakes
@ 2025-06-18 14:43                       ` Dev Jain
  2025-06-19  8:23                         ` Donet Tom
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-18 14:43 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Aboorva Devarajan, akpm, Liam.Howlett, shuah, pfalcato, david,
	ziy, baolin.wang, npache, ryan.roberts, baohua, linux-mm,
	linux-kselftest, linux-kernel, donettom, ritesh.list


On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>> first.
>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>> mapping count check.
>>>
>>> In do_mmap():
>>>
>>> 	/* Too many mappings? */
>>> 	if (mm->map_count > sysctl_max_map_count)
>>> 		return -ENOMEM;
>>>
>>>
>>> As well as numerous other checks in mm/vma.c.
>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>> this.
> No problem! It's hard to be aware of everything in mm :)
>
>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>
>>> I'm pretty confused as to what this test is really achieving honestly. This
>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>> to discuss that but if you'd like me to explain from start to end what the test
>> is doing, I can do that : )
>>
> I just don't have time right now, I guess I'll have to come back to it
> later... it's not the end of the world for it to be iffy in my view as long as
> it passes, but it might just not be of great value.
>
> Philosophically I'd rather we didn't assert internal implementation details like
> where we place mappings in userland memory. At no point do we promise to not
> leave larger gaps if we feel like it :)

You have a fair point. Anyhow a debate for another day.

>
> I'm guessing, reading more, the _real_ test here is some mathematical assertion
> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>
> But again I'm not sure that achieves much and again also is asserting internal
> implementation details.
>
> Correct behaviour of this kind of thing probably better belongs to tests in the
> userland VMA testing I'd say.
>
> Sorry I don't mean to do down work you've done before, just giving an honest
> technical appraisal!

Nah, it will be rather hilarious to see it all go down the drain xD

>
> Anyway don't let this block work to fix the test if it's failing. We can revisit
> this later.

Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
the gap check at the crossing boundary. What do you think?



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-18 14:43                       ` Dev Jain
@ 2025-06-19  8:23                         ` Donet Tom
  2025-06-19  9:02                           ` Dev Jain
                                             ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Donet Tom @ 2025-06-19  8:23 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> 
> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > first.
> > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > mapping count check.
> > > > 
> > > > In do_mmap():
> > > > 
> > > > 	/* Too many mappings? */
> > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > 		return -ENOMEM;
> > > > 
> > > > 
> > > > As well as numerous other checks in mm/vma.c.
> > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > this.
> > No problem! It's hard to be aware of everything in mm :)
> > 
> > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > 
> > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > to discuss that but if you'd like me to explain from start to end what the test
> > > is doing, I can do that : )
> > > 
> > I just don't have time right now, I guess I'll have to come back to it
> > later... it's not the end of the world for it to be iffy in my view as long as
> > it passes, but it might just not be of great value.
> > 
> > Philosophically I'd rather we didn't assert internal implementation details like
> > where we place mappings in userland memory. At no point do we promise to not
> > leave larger gaps if we feel like it :)
> 
> You have a fair point. Anyhow a debate for another day.
> 
> > 
> > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > 
> > But again I'm not sure that achieves much and again also is asserting internal
> > implementation details.
> > 
> > Correct behaviour of this kind of thing probably better belongs to tests in the
> > userland VMA testing I'd say.
> > 
> > Sorry I don't mean to do down work you've done before, just giving an honest
> > technical appraisal!
> 
> Nah, it will be rather hilarious to see it all go down the drain xD
> 
> > 
> > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > this later.
> 
> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> the gap check at the crossing boundary. What do you think?
>

One problem I am seeing with this approach is that, since the hint address
is generated randomly, the VMAs are also being created at randomly based on
the hint address.So, for the VMAs created at high addresses, we cannot guarantee
that the gaps between them will be aligned to MAP_CHUNK_SIZE.

High address VMAs
-----------------
1000000000000-1000040000000 r--p 00000000 00:00 0                     
2000000000000-2000040000000 r--p 00000000 00:00 0                       
4000000000000-4000040000000 r--p 00000000 00:00 0                        
8000000000000-8000040000000 r--p 00000000 00:00 0                        
e80009d260000-fffff9d260000 r--p 00000000 00:00 0   

I have a different approach to solve this issue.

From 0 to 128TB, we map memory directly without using any hint. For the range above
256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
we use random hint addresses, but I have modified it to generate hint addresses linearly
starting from 128TB.

With this change:

The 0–128TB range is mapped without hints and verified accordingly.

The 128TB–512TB range is mapped using linear hint addresses and then verified.

Below are the VMAs obtained with this approach:

10000000-10010000 r-xp 00000000 fd:05 135019531                          
10010000-10020000 r--p 00000000 fd:05 135019531                          
10020000-10030000 rw-p 00010000 fd:05 135019531                          
20000000-10020000000 r--p 00000000 00:00 0                               
10020800000-10020830000 rw-p 00000000 00:00 0                           
1004bcf0000-1004c000000 rw-p 00000000 00:00 0 
1004c000000-7fff8c000000 r--p 00000000 00:00 0                       
7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355                     
7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355                    
7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355                    
7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358                     
7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358                   
7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358                     
7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0                        
7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0                      
7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351                
7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351                    
7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351                    
7fff8d000000-7fffcd000000 r--p 00000000 00:00 0                        
7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0                        
800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 4c4c35eac15e..0be008cba4b0 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -56,21 +56,21 @@
 
 #ifdef __aarch64__
 #define HIGH_ADDR_MARK  ADDR_MARK_256TB
-#define HIGH_ADDR_SHIFT 49
+#define HIGH_ADDR_SHIFT 48
 #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
 #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
 #else
 #define HIGH_ADDR_MARK  ADDR_MARK_128TB
-#define HIGH_ADDR_SHIFT 48
+#define HIGH_ADDR_SHIFT 47
 #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
 #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
 #endif
 
-static char *hint_addr(void)
+static char *hint_addr(int hint)
 {
-       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
+       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
 
-       return (char *) (1UL << bits);
+       return (char *) (addr);
 }
 
 static void validate_addr(char *ptr, int high_addr)
@@ -217,7 +217,7 @@ int main(int argc, char *argv[])
        }
 
        for (i = 0; i < NR_CHUNKS_HIGH; i++) {
-               hint = hint_addr();
+               hint = hint_addr(i);
                hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
                               MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);



Can we fix it this way?
 
> 


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-19  8:23                         ` Donet Tom
@ 2025-06-19  9:02                           ` Dev Jain
  2025-06-19 15:31                             ` Donet Tom
  2025-06-20 14:45                           ` Dev Jain
  2025-06-25 12:52                           ` Dev Jain
  2 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-19  9:02 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 19/06/25 1:53 pm, Donet Tom wrote:
> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>> first.
>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>> mapping count check.
>>>>>
>>>>> In do_mmap():
>>>>>
>>>>> 	/* Too many mappings? */
>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>> 		return -ENOMEM;
>>>>>
>>>>>
>>>>> As well as numerous other checks in mm/vma.c.
>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>> this.
>>> No problem! It's hard to be aware of everything in mm :)
>>>
>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>
>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>> is doing, I can do that : )
>>>>
>>> I just don't have time right now, I guess I'll have to come back to it
>>> later... it's not the end of the world for it to be iffy in my view as long as
>>> it passes, but it might just not be of great value.
>>>
>>> Philosophically I'd rather we didn't assert internal implementation details like
>>> where we place mappings in userland memory. At no point do we promise to not
>>> leave larger gaps if we feel like it :)
>> You have a fair point. Anyhow a debate for another day.
>>
>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>
>>> But again I'm not sure that achieves much and again also is asserting internal
>>> implementation details.
>>>
>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>> userland VMA testing I'd say.
>>>
>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>> technical appraisal!
>> Nah, it will be rather hilarious to see it all go down the drain xD
>>
>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>> this later.
>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>> the gap check at the crossing boundary. What do you think?
>>
> One problem I am seeing with this approach is that, since the hint address
> is generated randomly, the VMAs are also being created at randomly based on
> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>
> High address VMAs
> -----------------
> 1000000000000-1000040000000 r--p 00000000 00:00 0
> 2000000000000-2000040000000 r--p 00000000 00:00 0
> 4000000000000-4000040000000 r--p 00000000 00:00 0
> 8000000000000-8000040000000 r--p 00000000 00:00 0
> e80009d260000-fffff9d260000 r--p 00000000 00:00 0

Just confirming, the correct way to test this will be, put a sleep
after the VA gets exhausted by the test, and then examine /proc/pid/maps -
are you doing something similar?

The random generation of the hint addr should not be a problem - if we
cannot satisfy the request at addr, then the algorithm falls back to
the original approach.

FYI in arch/x86/kernel/sys_x86_64.c :

    * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
          * in the full address space.

Same happens for arm64; if we give a high addr hint, and the high VA space
has been exhausted, then we look for free space in the low VA space.

The only thing I am not sure about is the border. I remember witnessing weird
behaviour when I used to test with 16K page config on arm64, across the
border.
  

>
> I have a different approach to solve this issue.
>
>  From 0 to 128TB, we map memory directly without using any hint. For the range above
> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> we use random hint addresses, but I have modified it to generate hint addresses linearly
> starting from 128TB.
>
> With this change:
>
> The 0–128TB range is mapped without hints and verified accordingly.
>
> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>
> Below are the VMAs obtained with this approach:
>
> 10000000-10010000 r-xp 00000000 fd:05 135019531
> 10010000-10020000 r--p 00000000 fd:05 135019531
> 10020000-10030000 rw-p 00010000 fd:05 135019531
> 20000000-10020000000 r--p 00000000 00:00 0
> 10020800000-10020830000 rw-p 00000000 00:00 0
> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 4c4c35eac15e..0be008cba4b0 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -56,21 +56,21 @@
>   
>   #ifdef __aarch64__
>   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> -#define HIGH_ADDR_SHIFT 49
> +#define HIGH_ADDR_SHIFT 48
>   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>   #else
>   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> -#define HIGH_ADDR_SHIFT 48
> +#define HIGH_ADDR_SHIFT 47
>   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>   #endif
>   
> -static char *hint_addr(void)
> +static char *hint_addr(int hint)
>   {
> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>   
> -       return (char *) (1UL << bits);
> +       return (char *) (addr);
>   }
>   
>   static void validate_addr(char *ptr, int high_addr)
> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>          }
>   
>          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> -               hint = hint_addr();
> +               hint = hint_addr(i);
>                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
>
>
> Can we fix it this way?
>   


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-19  9:02                           ` Dev Jain
@ 2025-06-19 15:31                             ` Donet Tom
  2025-06-19 16:14                               ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-19 15:31 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

eOn Thu, Jun 19, 2025 at 02:32:19PM +0530, Dev Jain wrote:
> 
> On 19/06/25 1:53 pm, Donet Tom wrote:
> > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > first.
> > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > mapping count check.
> > > > > > 
> > > > > > In do_mmap():
> > > > > > 
> > > > > > 	/* Too many mappings? */
> > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > 		return -ENOMEM;
> > > > > > 
> > > > > > 
> > > > > > As well as numerous other checks in mm/vma.c.
> > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > this.
> > > > No problem! It's hard to be aware of everything in mm :)
> > > > 
> > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > 
> > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > is doing, I can do that : )
> > > > > 
> > > > I just don't have time right now, I guess I'll have to come back to it
> > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > it passes, but it might just not be of great value.
> > > > 
> > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > where we place mappings in userland memory. At no point do we promise to not
> > > > leave larger gaps if we feel like it :)
> > > You have a fair point. Anyhow a debate for another day.
> > > 
> > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > 
> > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > implementation details.
> > > > 
> > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > userland VMA testing I'd say.
> > > > 
> > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > technical appraisal!
> > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > 
> > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > this later.
> > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > the gap check at the crossing boundary. What do you think?
> > > 
> > One problem I am seeing with this approach is that, since the hint address
> > is generated randomly, the VMAs are also being created at randomly based on
> > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > 
> > High address VMAs
> > -----------------
> > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> 
> Just confirming, the correct way to test this will be, put a sleep
> after the VA gets exhausted by the test, and then examine /proc/pid/maps -
> are you doing something similar?
>

Yes. I am doing the same.
 
> The random generation of the hint addr should not be a problem - if we
> cannot satisfy the request at addr, then the algorithm falls back to
> the original approach.
> 
> FYI in arch/x86/kernel/sys_x86_64.c :
> 
>    * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
>          * in the full address space.
> 

Yes. Got it.

I ran the same test on x86, and what I am seeing is that mmap with a
hint in this test is always failing and exiting the loop and no high VMA
is getting created. Ideally mmap should be succeed with hint right?


> Same happens for arm64; if we give a high addr hint, and the high VA space
> has been exhausted, then we look for free space in the low VA space.


So, in the test program, is the second mmap (with hint) returning a
mapped address, or is it failing in your case?
 
> The only thing I am not sure about is the border. I remember witnessing weird
> behaviour when I used to test with 16K page config on arm64, across the
> border.
> 
> > 
> > I have a different approach to solve this issue.
> > 
> >  From 0 to 128TB, we map memory directly without using any hint. For the range above
> > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > starting from 128TB.
> > 
> > With this change:
> > 
> > The 0–128TB range is mapped without hints and verified accordingly.
> > 
> > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > 
> > Below are the VMAs obtained with this approach:
> > 
> > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > 10010000-10020000 r--p 00000000 fd:05 135019531
> > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > 20000000-10020000000 r--p 00000000 00:00 0
> > 10020800000-10020830000 rw-p 00000000 00:00 0
> > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 4c4c35eac15e..0be008cba4b0 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -56,21 +56,21 @@
> >   #ifdef __aarch64__
> >   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > -#define HIGH_ADDR_SHIFT 49
> > +#define HIGH_ADDR_SHIFT 48
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> >   #else
> >   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > -#define HIGH_ADDR_SHIFT 48
> > +#define HIGH_ADDR_SHIFT 47
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> >   #endif
> > -static char *hint_addr(void)
> > +static char *hint_addr(int hint)
> >   {
> > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > -       return (char *) (1UL << bits);
> > +       return (char *) (addr);
> >   }
> >   static void validate_addr(char *ptr, int high_addr)
> > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> >          }
> >          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > -               hint = hint_addr();
> > +               hint = hint_addr(i);
> >                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> >                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > 
> > 
> > 
> > Can we fix it this way?
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-19 15:31                             ` Donet Tom
@ 2025-06-19 16:14                               ` Dev Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-19 16:14 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 19/06/25 9:01 pm, Donet Tom wrote:
> eOn Thu, Jun 19, 2025 at 02:32:19PM +0530, Dev Jain wrote:
>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>> first.
>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>> mapping count check.
>>>>>>>
>>>>>>> In do_mmap():
>>>>>>>
>>>>>>> 	/* Too many mappings? */
>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>> 		return -ENOMEM;
>>>>>>>
>>>>>>>
>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>> this.
>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>
>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>
>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>> is doing, I can do that : )
>>>>>>
>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>> it passes, but it might just not be of great value.
>>>>>
>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>> leave larger gaps if we feel like it :)
>>>> You have a fair point. Anyhow a debate for another day.
>>>>
>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>
>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>> implementation details.
>>>>>
>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>> userland VMA testing I'd say.
>>>>>
>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>> technical appraisal!
>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>
>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>> this later.
>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>> the gap check at the crossing boundary. What do you think?
>>>>
>>> One problem I am seeing with this approach is that, since the hint address
>>> is generated randomly, the VMAs are also being created at randomly based on
>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>
>>> High address VMAs
>>> -----------------
>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>> Just confirming, the correct way to test this will be, put a sleep
>> after the VA gets exhausted by the test, and then examine /proc/pid/maps -
>> are you doing something similar?
>>
> Yes. I am doing the same.
>   
>> The random generation of the hint addr should not be a problem - if we
>> cannot satisfy the request at addr, then the algorithm falls back to
>> the original approach.
>>
>> FYI in arch/x86/kernel/sys_x86_64.c :
>>
>>     * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
>>           * in the full address space.
>>
> Yes. Got it.
>
> I ran the same test on x86, and what I am seeing is that mmap with a
> hint in this test is always failing and exiting the loop and no high VMA
> is getting created. Ideally mmap should be succeed with hint right?

No, that will succeed only if the CPU has LA57 feature, see arch/x86/include/asm/pgtable_64_types.h,
X86_FEATURE_LA57. On arm64 that will happen only if CPU supports FEAT_LPA2.

So the high address VMAs which you quoted above, were for arm64?

>
>
>> Same happens for arm64; if we give a high addr hint, and the high VA space
>> has been exhausted, then we look for free space in the low VA space.
>
> So, in the test program, is the second mmap (with hint) returning a
> mapped address, or is it failing in your case?
>   
>> The only thing I am not sure about is the border. I remember witnessing weird
>> behaviour when I used to test with 16K page config on arm64, across the
>> border.
>>
>>> I have a different approach to solve this issue.
>>>
>>>   From 0 to 128TB, we map memory directly without using any hint. For the range above
>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>> starting from 128TB.
>>>
>>> With this change:
>>>
>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>
>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>
>>> Below are the VMAs obtained with this approach:
>>>
>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>> 20000000-10020000000 r--p 00000000 00:00 0
>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 4c4c35eac15e..0be008cba4b0 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -56,21 +56,21 @@
>>>    #ifdef __aarch64__
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>> -#define HIGH_ADDR_SHIFT 49
>>> +#define HIGH_ADDR_SHIFT 48
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>    #else
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>> -#define HIGH_ADDR_SHIFT 48
>>> +#define HIGH_ADDR_SHIFT 47
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>    #endif
>>> -static char *hint_addr(void)
>>> +static char *hint_addr(int hint)
>>>    {
>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>> -       return (char *) (1UL << bits);
>>> +       return (char *) (addr);
>>>    }
>>>    static void validate_addr(char *ptr, int high_addr)
>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>           }
>>>           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>> -               hint = hint_addr();
>>> +               hint = hint_addr(i);
>>>                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>
>>>
>>>
>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-19  8:23                         ` Donet Tom
  2025-06-19  9:02                           ` Dev Jain
@ 2025-06-20 14:45                           ` Dev Jain
  2025-06-21 17:55                             ` Donet Tom
  2025-06-25 12:52                           ` Dev Jain
  2 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-20 14:45 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 19/06/25 1:53 pm, Donet Tom wrote:
> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>> first.
>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>> mapping count check.
>>>>>
>>>>> In do_mmap():
>>>>>
>>>>> 	/* Too many mappings? */
>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>> 		return -ENOMEM;
>>>>>
>>>>>
>>>>> As well as numerous other checks in mm/vma.c.
>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>> this.
>>> No problem! It's hard to be aware of everything in mm :)
>>>
>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>
>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>> is doing, I can do that : )
>>>>
>>> I just don't have time right now, I guess I'll have to come back to it
>>> later... it's not the end of the world for it to be iffy in my view as long as
>>> it passes, but it might just not be of great value.
>>>
>>> Philosophically I'd rather we didn't assert internal implementation details like
>>> where we place mappings in userland memory. At no point do we promise to not
>>> leave larger gaps if we feel like it :)
>> You have a fair point. Anyhow a debate for another day.
>>
>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>
>>> But again I'm not sure that achieves much and again also is asserting internal
>>> implementation details.
>>>
>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>> userland VMA testing I'd say.
>>>
>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>> technical appraisal!
>> Nah, it will be rather hilarious to see it all go down the drain xD
>>
>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>> this later.
>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>> the gap check at the crossing boundary. What do you think?
>>
> One problem I am seeing with this approach is that, since the hint address
> is generated randomly, the VMAs are also being created at randomly based on
> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>
> High address VMAs
> -----------------
> 1000000000000-1000040000000 r--p 00000000 00:00 0
> 2000000000000-2000040000000 r--p 00000000 00:00 0
> 4000000000000-4000040000000 r--p 00000000 00:00 0
> 8000000000000-8000040000000 r--p 00000000 00:00 0
> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>
> I have a different approach to solve this issue.

It is really weird that such a large amount of VA space
is left between the two VMAs yet mmap is failing.



Can you please do the following:
set /proc/sys/vm/max_map_count to the highest value possible.
If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
In validate_complete_va_space:

if (start_addr >= HIGH_ADDR_MARK && found == false) {
	found = true;
	continue;
}

where found is initialized to false. This will skip the check
for the boundary.

After this can you tell whether the test is still failing.

Also can you give me the complete output of proc/pid/maps
after putting a sleep at the end of the test.

>
>  From 0 to 128TB, we map memory directly without using any hint. For the range above
> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> we use random hint addresses, but I have modified it to generate hint addresses linearly
> starting from 128TB.
>
> With this change:
>
> The 0–128TB range is mapped without hints and verified accordingly.
>
> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>
> Below are the VMAs obtained with this approach:
>
> 10000000-10010000 r-xp 00000000 fd:05 135019531
> 10010000-10020000 r--p 00000000 fd:05 135019531
> 10020000-10030000 rw-p 00010000 fd:05 135019531
> 20000000-10020000000 r--p 00000000 00:00 0
> 10020800000-10020830000 rw-p 00000000 00:00 0
> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 4c4c35eac15e..0be008cba4b0 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -56,21 +56,21 @@
>   
>   #ifdef __aarch64__
>   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> -#define HIGH_ADDR_SHIFT 49
> +#define HIGH_ADDR_SHIFT 48
>   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>   #else
>   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> -#define HIGH_ADDR_SHIFT 48
> +#define HIGH_ADDR_SHIFT 47
>   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>   #endif
>   
> -static char *hint_addr(void)
> +static char *hint_addr(int hint)
>   {
> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>   
> -       return (char *) (1UL << bits);
> +       return (char *) (addr);
>   }
>   
>   static void validate_addr(char *ptr, int high_addr)
> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>          }
>   
>          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> -               hint = hint_addr();
> +               hint = hint_addr(i);
>                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
>
>
> Can we fix it this way?
>   


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-20 14:45                           ` Dev Jain
@ 2025-06-21 17:55                             ` Donet Tom
  2025-06-23  4:53                               ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-21 17:55 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
> 
> On 19/06/25 1:53 pm, Donet Tom wrote:
> > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > first.
> > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > mapping count check.
> > > > > > 
> > > > > > In do_mmap():
> > > > > > 
> > > > > > 	/* Too many mappings? */
> > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > 		return -ENOMEM;
> > > > > > 
> > > > > > 
> > > > > > As well as numerous other checks in mm/vma.c.
> > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > this.
> > > > No problem! It's hard to be aware of everything in mm :)
> > > > 
> > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > 
> > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > is doing, I can do that : )
> > > > > 
> > > > I just don't have time right now, I guess I'll have to come back to it
> > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > it passes, but it might just not be of great value.
> > > > 
> > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > where we place mappings in userland memory. At no point do we promise to not
> > > > leave larger gaps if we feel like it :)
> > > You have a fair point. Anyhow a debate for another day.
> > > 
> > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > 
> > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > implementation details.
> > > > 
> > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > userland VMA testing I'd say.
> > > > 
> > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > technical appraisal!
> > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > 
> > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > this later.
> > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > the gap check at the crossing boundary. What do you think?
> > > 
> > One problem I am seeing with this approach is that, since the hint address
> > is generated randomly, the VMAs are also being created at randomly based on
> > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > 
> > High address VMAs
> > -----------------
> > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > 
> > I have a different approach to solve this issue.
> 
> It is really weird that such a large amount of VA space
> is left between the two VMAs yet mmap is failing.
> 
> 
> 
> Can you please do the following:
> set /proc/sys/vm/max_map_count to the highest value possible.
> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
> In validate_complete_va_space:
> 
> if (start_addr >= HIGH_ADDR_MARK && found == false) {
> 	found = true;
> 	continue;
> }


Thanks Dev for the suggestion. I set max_map_count and set overcommit
memory to 1, added this code change as well, and then tried. Still, the
test is failing

> 
> where found is initialized to false. This will skip the check
> for the boundary.
> 
> After this can you tell whether the test is still failing.
> 
> Also can you give me the complete output of proc/pid/maps
> after putting a sleep at the end of the test.
> 


on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
total address space size is 4PB With hint it can map upto
4PB. Since the hint addres is random in this test random hing VMAs
are getting created. IIUC this is expected only.


10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]




If I give the hint address serially from 128TB then the address 
space is contigous and gap is also MAP_SIZE, the test is passing.

10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
1006f0f0000-10071000000 rw-p 00000000 00:00 0
10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]




> > 
> >  From 0 to 128TB, we map memory directly without using any hint. For the range above
> > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > starting from 128TB.
> > 
> > With this change:
> > 
> > The 0–128TB range is mapped without hints and verified accordingly.
> > 
> > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > 
> > Below are the VMAs obtained with this approach:
> > 
> > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > 10010000-10020000 r--p 00000000 fd:05 135019531
> > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > 20000000-10020000000 r--p 00000000 00:00 0
> > 10020800000-10020830000 rw-p 00000000 00:00 0
> > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 4c4c35eac15e..0be008cba4b0 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -56,21 +56,21 @@
> >   #ifdef __aarch64__
> >   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > -#define HIGH_ADDR_SHIFT 49
> > +#define HIGH_ADDR_SHIFT 48
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> >   #else
> >   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > -#define HIGH_ADDR_SHIFT 48
> > +#define HIGH_ADDR_SHIFT 47
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> >   #endif
> > -static char *hint_addr(void)
> > +static char *hint_addr(int hint)
> >   {
> > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > -       return (char *) (1UL << bits);
> > +       return (char *) (addr);
> >   }
> >   static void validate_addr(char *ptr, int high_addr)
> > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> >          }
> >          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > -               hint = hint_addr();
> > +               hint = hint_addr(i);
> >                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> >                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > 
> > 
> > 
> > Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-21 17:55                             ` Donet Tom
@ 2025-06-23  4:53                               ` Dev Jain
  2025-06-23  4:55                                 ` Dev Jain
  2025-06-23 17:32                                 ` Donet Tom
  0 siblings, 2 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-23  4:53 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 21/06/25 11:25 pm, Donet Tom wrote:
> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>> first.
>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>> mapping count check.
>>>>>>>
>>>>>>> In do_mmap():
>>>>>>>
>>>>>>> 	/* Too many mappings? */
>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>> 		return -ENOMEM;
>>>>>>>
>>>>>>>
>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>> this.
>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>
>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>
>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>> is doing, I can do that : )
>>>>>>
>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>> it passes, but it might just not be of great value.
>>>>>
>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>> leave larger gaps if we feel like it :)
>>>> You have a fair point. Anyhow a debate for another day.
>>>>
>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>
>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>> implementation details.
>>>>>
>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>> userland VMA testing I'd say.
>>>>>
>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>> technical appraisal!
>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>
>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>> this later.
>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>> the gap check at the crossing boundary. What do you think?
>>>>
>>> One problem I am seeing with this approach is that, since the hint address
>>> is generated randomly, the VMAs are also being created at randomly based on
>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>
>>> High address VMAs
>>> -----------------
>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>
>>> I have a different approach to solve this issue.
>> It is really weird that such a large amount of VA space
>> is left between the two VMAs yet mmap is failing.
>>
>>
>>
>> Can you please do the following:
>> set /proc/sys/vm/max_map_count to the highest value possible.
>> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
>> In validate_complete_va_space:
>>
>> if (start_addr >= HIGH_ADDR_MARK && found == false) {
>> 	found = true;
>> 	continue;
>> }
>
> Thanks Dev for the suggestion. I set max_map_count and set overcommit
> memory to 1, added this code change as well, and then tried. Still, the
> test is failing
>
>> where found is initialized to false. This will skip the check
>> for the boundary.
>>
>> After this can you tell whether the test is still failing.
>>
>> Also can you give me the complete output of proc/pid/maps
>> after putting a sleep at the end of the test.
>>
>
> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
> total address space size is 4PB With hint it can map upto
> 4PB. Since the hint addres is random in this test random hing VMAs
> are getting created. IIUC this is expected only.
>
>
> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> 10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
> 1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
> 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
>
>
>
> If I give the hint address serially from 128TB then the address
> space is contigous and gap is also MAP_SIZE, the test is passing.
>
> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> 33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> 10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
> 1006f0f0000-10071000000 rw-p 00000000 00:00 0
> 10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
> 800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]
>
>

Thank you for this output. I can't wrap my head around why this behaviour changes
when you generate the hint sequentially. The mmap() syscall is supposed to do the
following (irrespective of high VA space or not) - if the allocation at the hint
addr succeeds, then all is well, otherwise, do a top-down search for a large
enough gap. I am not aware of the nuances in powerpc but I really am suspecting
a bug in powerpc mmap code. Can you try to do some tracing - which function
eventually fails to find the empty gap?

Through my limited code tracing - we should end up in slice_find_area_topdown,
then we ask the generic code to find the gap using vm_unmapped_area. So I
suspect something is happening between this, probably slice_scan_available().

>
>>>   From 0 to 128TB, we map memory directly without using any hint. For the range above
>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>> starting from 128TB.
>>>
>>> With this change:
>>>
>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>
>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>
>>> Below are the VMAs obtained with this approach:
>>>
>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>> 20000000-10020000000 r--p 00000000 00:00 0
>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 4c4c35eac15e..0be008cba4b0 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -56,21 +56,21 @@
>>>    #ifdef __aarch64__
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>> -#define HIGH_ADDR_SHIFT 49
>>> +#define HIGH_ADDR_SHIFT 48
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>    #else
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>> -#define HIGH_ADDR_SHIFT 48
>>> +#define HIGH_ADDR_SHIFT 47
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>    #endif
>>> -static char *hint_addr(void)
>>> +static char *hint_addr(int hint)
>>>    {
>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>> -       return (char *) (1UL << bits);
>>> +       return (char *) (addr);
>>>    }
>>>    static void validate_addr(char *ptr, int high_addr)
>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>           }
>>>           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>> -               hint = hint_addr();
>>> +               hint = hint_addr(i);
>>>                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>
>>>
>>>
>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-23  4:53                               ` Dev Jain
@ 2025-06-23  4:55                                 ` Dev Jain
  2025-06-23 17:32                                 ` Donet Tom
  1 sibling, 0 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-23  4:55 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 23/06/25 10:23 am, Dev Jain wrote:
>
> On 21/06/25 11:25 pm, Donet Tom wrote:
>> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll 
>>>>>>>>>> be hitting that
>>>>>>>>>> first.
>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that 
>>>>>>>>> won't be an issue.
>>>>>>>> Umm, what? You mean overcommit all mode, and that has no 
>>>>>>>> bearing on the max
>>>>>>>> mapping count check.
>>>>>>>>
>>>>>>>> In do_mmap():
>>>>>>>>
>>>>>>>>     /* Too many mappings? */
>>>>>>>>     if (mm->map_count > sysctl_max_map_count)
>>>>>>>>         return -ENOMEM;
>>>>>>>>
>>>>>>>>
>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>> Ah sorry, didn't look at the code properly just assumed that 
>>>>>>> overcommit_always meant overriding
>>>>>>> this.
>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>
>>>>>>>> I'm not sure why an overcommit toggle is even necessary when 
>>>>>>>> you could use
>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the 
>>>>>>>> OVERCOMMIT_GUESS limits?
>>>>>>>>
>>>>>>>> I'm pretty confused as to what this test is really achieving 
>>>>>>>> honestly. This
>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I 
>>>>>>>> can tell.
>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you 
>>>>>>> are in the mood
>>>>>>> to discuss that but if you'd like me to explain from start to 
>>>>>>> end what the test
>>>>>>> is doing, I can do that : )
>>>>>>>
>>>>>> I just don't have time right now, I guess I'll have to come back 
>>>>>> to it
>>>>>> later... it's not the end of the world for it to be iffy in my 
>>>>>> view as long as
>>>>>> it passes, but it might just not be of great value.
>>>>>>
>>>>>> Philosophically I'd rather we didn't assert internal 
>>>>>> implementation details like
>>>>>> where we place mappings in userland memory. At no point do we 
>>>>>> promise to not
>>>>>> leave larger gaps if we feel like it :)
>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>
>>>>>> I'm guessing, reading more, the _real_ test here is some 
>>>>>> mathematical assertion
>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when 
>>>>>> using hints.
>>>>>>
>>>>>> But again I'm not sure that achieves much and again also is 
>>>>>> asserting internal
>>>>>> implementation details.
>>>>>>
>>>>>> Correct behaviour of this kind of thing probably better belongs 
>>>>>> to tests in the
>>>>>> userland VMA testing I'd say.
>>>>>>
>>>>>> Sorry I don't mean to do down work you've done before, just 
>>>>>> giving an honest
>>>>>> technical appraisal!
>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>
>>>>>> Anyway don't let this block work to fix the test if it's failing. 
>>>>>> We can revisit
>>>>>> this later.
>>>>> Sure. @Aboorva and Donet, I still believe that the correct 
>>>>> approach is to elide
>>>>> the gap check at the crossing boundary. What do you think?
>>>>>
>>>> One problem I am seeing with this approach is that, since the hint 
>>>> address
>>>> is generated randomly, the VMAs are also being created at randomly 
>>>> based on
>>>> the hint address.So, for the VMAs created at high addresses, we 
>>>> cannot guarantee
>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>
>>>> High address VMAs
>>>> -----------------
>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>
>>>> I have a different approach to solve this issue.
>>> It is really weird that such a large amount of VA space
>>> is left between the two VMAs yet mmap is failing.
>>>
>>>
>>>
>>> Can you please do the following:
>>> set /proc/sys/vm/max_map_count to the highest value possible.
>>> If running without run_vmtests.sh, set 
>>> /proc/sys/vm/overcommit_memory to 1.
>>> In validate_complete_va_space:
>>>
>>> if (start_addr >= HIGH_ADDR_MARK && found == false) {
>>>     found = true;
>>>     continue;
>>> }
>>
>> Thanks Dev for the suggestion. I set max_map_count and set overcommit
>> memory to 1, added this code change as well, and then tried. Still, the
>> test is failing
>>
>>> where found is initialized to false. This will skip the check
>>> for the boundary.
>>>
>>> After this can you tell whether the test is still failing.
>>>
>>> Also can you give me the complete output of proc/pid/maps
>>> after putting a sleep at the end of the test.
>>>
>>
>> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
>> total address space size is 4PB With hint it can map upto
>> 4PB. Since the hint addres is random in this test random hing VMAs
>> are getting created. IIUC this is expected only.
>>
>>
>> 10000000-10010000 r-xp 00000000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 10010000-10020000 r--p 00000000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 10020000-10030000 rw-p 00010000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 30000000-10030000000 r--p 00000000 00:00 
>> 0                               [anon:virtual_address_range]
>> 10030770000-100307a0000 rw-p 00000000 00:00 
>> 0                            [heap]
>> 1004f000000-7fff8f000000 r--p 00000000 00:00 
>> 0                           [anon:virtual_address_range]
>> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
>> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fff90030000-7fff90040000 r--p 00230000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fff90160000-7fff901a0000 r--p 00000000 00:00 
>> 0                          [vvar]
>> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 
>> 0                          [vdso]
>> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fff90200000-7fff90210000 r--p 00040000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 
>> 0                          [stack]
>> 1000000000000-1000040000000 r--p 00000000 00:00 
>> 0                        [anon:virtual_address_range]
>> 2000000000000-2000040000000 r--p 00000000 00:00 
>> 0                        [anon:virtual_address_range]
>> 4000000000000-4000040000000 r--p 00000000 00:00 
>> 0                        [anon:virtual_address_range]
>> 8000000000000-8000040000000 r--p 00000000 00:00 
>> 0                        [anon:virtual_address_range]
>> eb95410220000-fffff90220000 r--p 00000000 00:00 
>> 0                        [anon:virtual_address_range]
>>
>>
>>
>>
>> If I give the hint address serially from 128TB then the address
>> space is contigous and gap is also MAP_SIZE, the test is passing.
>>
>> 10000000-10010000 r-xp 00000000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 10010000-10020000 r--p 00000000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 10020000-10030000 rw-p 00010000 fd:05 134226638 
>> /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>> 33000000-10033000000 r--p 00000000 00:00 
>> 0                               [anon:virtual_address_range]
>> 10033380000-100333b0000 rw-p 00000000 00:00 
>> 0                            [heap]
>> 1006f0f0000-10071000000 rw-p 00000000 00:00 0
>> 10071000000-7fffb1000000 r--p 00000000 00:00 
>> 0                           [anon:virtual_address_range]
>> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 
>> 792355                     /usr/lib64/libc.so.6
>> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 
>> 792358                     /usr/lib64/libm.so.6
>> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 
>> 0                          [vvar]
>> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 
>> 0                          [vdso]
>> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 
>> 792351                     /usr/lib64/ld64.so.2
>> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 
>> 0                          [stack]
>> 800000000000-2aab000000000 r--p 00000000 00:00 
>> 0                         [anon:virtual_address_range]
>>
>>
>
> Thank you for this output. I can't wrap my head around why this 
> behaviour changes
> when you generate the hint sequentially. The mmap() syscall is 
> supposed to do the
> following (irrespective of high VA space or not) - if the allocation 
> at the hint
> addr succeeds, then all is well, otherwise, do a top-down search for a 
> large
> enough gap. I am not aware of the nuances in powerpc but I really am 
> suspecting
> a bug in powerpc mmap code. Can you try to do some tracing - which 
> function
> eventually fails to find the empty gap?
>
> Through my limited code tracing - we should end up in 
> slice_find_area_topdown,
> then we ask the generic code to find the gap using vm_unmapped_area. So I
> suspect something is happening between this, probably 
> slice_scan_available().


Also, is the memory system you are testing on radix or hash?


>
>>
>>>>   From 0 to 128TB, we map memory directly without using any hint. 
>>>> For the range above
>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In 
>>>> the current test,
>>>> we use random hint addresses, but I have modified it to generate 
>>>> hint addresses linearly
>>>> starting from 128TB.
>>>>
>>>> With this change:
>>>>
>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>
>>>> The 128TB–512TB range is mapped using linear hint addresses and 
>>>> then verified.
>>>>
>>>> Below are the VMAs obtained with this approach:
>>>>
>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address 
>>>> (128TB to 512TB)
>>>>
>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c 
>>>> b/tools/testing/selftests/mm/virtual_address_range.c
>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>> @@ -56,21 +56,21 @@
>>>>    #ifdef __aarch64__
>>>>    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>>> -#define HIGH_ADDR_SHIFT 49
>>>> +#define HIGH_ADDR_SHIFT 48
>>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>>    #else
>>>>    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>>> -#define HIGH_ADDR_SHIFT 48
>>>> +#define HIGH_ADDR_SHIFT 47
>>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>>    #endif
>>>> -static char *hint_addr(void)
>>>> +static char *hint_addr(int hint)
>>>>    {
>>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * 
>>>> MAP_CHUNK_SIZE));
>>>> -       return (char *) (1UL << bits);
>>>> +       return (char *) (addr);
>>>>    }
>>>>    static void validate_addr(char *ptr, int high_addr)
>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>           }
>>>>           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>> -               hint = hint_addr();
>>>> +               hint = hint_addr(i);
>>>>                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>>
>>>>
>>>>
>>>> Can we fix it this way?
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-23  4:53                               ` Dev Jain
  2025-06-23  4:55                                 ` Dev Jain
@ 2025-06-23 17:32                                 ` Donet Tom
  2025-06-24  6:15                                   ` Dev Jain
  1 sibling, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-23 17:32 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
> 
> On 21/06/25 11:25 pm, Donet Tom wrote:
> > On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
> > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > > > first.
> > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > > > mapping count check.
> > > > > > > > 
> > > > > > > > In do_mmap():
> > > > > > > > 
> > > > > > > > 	/* Too many mappings? */
> > > > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > > > 		return -ENOMEM;
> > > > > > > > 
> > > > > > > > 
> > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > > > this.
> > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > 
> > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > > > 
> > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > > > is doing, I can do that : )
> > > > > > > 
> > > > > > I just don't have time right now, I guess I'll have to come back to it
> > > > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > > > it passes, but it might just not be of great value.
> > > > > > 
> > > > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > > > where we place mappings in userland memory. At no point do we promise to not
> > > > > > leave larger gaps if we feel like it :)
> > > > > You have a fair point. Anyhow a debate for another day.
> > > > > 
> > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > > > 
> > > > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > > > implementation details.
> > > > > > 
> > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > > > userland VMA testing I'd say.
> > > > > > 
> > > > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > > > technical appraisal!
> > > > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > > > 
> > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > > > this later.
> > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > > > the gap check at the crossing boundary. What do you think?
> > > > > 
> > > > One problem I am seeing with this approach is that, since the hint address
> > > > is generated randomly, the VMAs are also being created at randomly based on
> > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > 
> > > > High address VMAs
> > > > -----------------
> > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > 
> > > > I have a different approach to solve this issue.
> > > It is really weird that such a large amount of VA space
> > > is left between the two VMAs yet mmap is failing.
> > > 
> > > 
> > > 
> > > Can you please do the following:
> > > set /proc/sys/vm/max_map_count to the highest value possible.
> > > If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
> > > In validate_complete_va_space:
> > > 
> > > if (start_addr >= HIGH_ADDR_MARK && found == false) {
> > > 	found = true;
> > > 	continue;
> > > }
> > 
> > Thanks Dev for the suggestion. I set max_map_count and set overcommit
> > memory to 1, added this code change as well, and then tried. Still, the
> > test is failing
> > 
> > > where found is initialized to false. This will skip the check
> > > for the boundary.
> > > 
> > > After this can you tell whether the test is still failing.
> > > 
> > > Also can you give me the complete output of proc/pid/maps
> > > after putting a sleep at the end of the test.
> > > 
> > 
> > on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
> > total address space size is 4PB With hint it can map upto
> > 4PB. Since the hint addres is random in this test random hing VMAs
> > are getting created. IIUC this is expected only.
> > 
> > 
> > 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> > 10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
> > 1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> > 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
> > 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
> > 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
> > 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
> > 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 
> > 
> > 
> > 
> > If I give the hint address serially from 128TB then the address
> > space is contigous and gap is also MAP_SIZE, the test is passing.
> > 
> > 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > 33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> > 10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
> > 1006f0f0000-10071000000 rw-p 00000000 00:00 0
> > 10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> > 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> > 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> > 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
> > 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
> > 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> > 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
> > 800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]
> > 
> > 
> 
> Thank you for this output. I can't wrap my head around why this behaviour changes
> when you generate the hint sequentially. The mmap() syscall is supposed to do the
> following (irrespective of high VA space or not) - if the allocation at the hint

Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
128TB, and the system can map up to 4PB.

In the test, the first mmap call maps memory up to 128TB without any
hint, so the VMAs are created below the 128TB boundary.

In the second mmap call, we provide a hint starting from 256TB, and
the hint address is generated randomly above 256TB. The mappings are
correctly created at these hint addresses. Since the hint addresses
are random, the resulting VMAs are also created at random locations.

So, what I tried is: mapping from 0 to 128TB without any hint, and
then for the second mmap, instead of starting the hint from 256TB, I
started from 128TB. Instead of using random hint addresses, I used
sequential hint addresses from 128TB up to 512TB. With this change,
the VMAs are created in order, and the test passes.

800000000000-2aab000000000 r--p 00000000 00:00 0    128TB to 512TB VMA

I think we will see same behaviour on x86 with X86_FEATURE_LA57.

I will send the updated patch in V2.

> addr succeeds, then all is well, otherwise, do a top-down search for a large
> enough gap. I am not aware of the nuances in powerpc but I really am suspecting
> a bug in powerpc mmap code. Can you try to do some tracing - which function
> eventually fails to find the empty gap?
> 
> Through my limited code tracing - we should end up in slice_find_area_topdown,
> then we ask the generic code to find the gap using vm_unmapped_area. So I
> suspect something is happening between this, probably slice_scan_available().
> 
> > 
> > > >   From 0 to 128TB, we map memory directly without using any hint. For the range above
> > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > > > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > > > starting from 128TB.
> > > > 
> > > > With this change:
> > > > 
> > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > 
> > > > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > > > 
> > > > Below are the VMAs obtained with this approach:
> > > > 
> > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > > > 
> > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > @@ -56,21 +56,21 @@
> > > >    #ifdef __aarch64__
> > > >    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > > > -#define HIGH_ADDR_SHIFT 49
> > > > +#define HIGH_ADDR_SHIFT 48
> > > >    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> > > >    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > > >    #else
> > > >    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > > > -#define HIGH_ADDR_SHIFT 48
> > > > +#define HIGH_ADDR_SHIFT 47
> > > >    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > > >    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> > > >    #endif
> > > > -static char *hint_addr(void)
> > > > +static char *hint_addr(int hint)
> > > >    {
> > > > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > > > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > > > -       return (char *) (1UL << bits);
> > > > +       return (char *) (addr);
> > > >    }
> > > >    static void validate_addr(char *ptr, int high_addr)
> > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > >           }
> > > >           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > -               hint = hint_addr();
> > > > +               hint = hint_addr(i);
> > > >                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > >                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > > 
> > > > 
> > > > 
> > > > Can we fix it this way?
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-23 17:32                                 ` Donet Tom
@ 2025-06-24  6:15                                   ` Dev Jain
  2025-06-25  9:36                                     ` Donet Tom
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-24  6:15 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

[-- Attachment #1: Type: text/plain, Size: 16274 bytes --]


On 23/06/25 11:02 pm, Donet Tom wrote:
> On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
>> On 21/06/25 11:25 pm, Donet Tom wrote:
>>> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
>>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>>>> first.
>>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>>>> mapping count check.
>>>>>>>>>
>>>>>>>>> In do_mmap():
>>>>>>>>>
>>>>>>>>> 	/* Too many mappings? */
>>>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>>>> 		return -ENOMEM;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>>>> this.
>>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>>
>>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>>>
>>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>>>> is doing, I can do that : )
>>>>>>>>
>>>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>>>> it passes, but it might just not be of great value.
>>>>>>>
>>>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>>>> leave larger gaps if we feel like it :)
>>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>>
>>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>>>
>>>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>>>> implementation details.
>>>>>>>
>>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>>>> userland VMA testing I'd say.
>>>>>>>
>>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>>>> technical appraisal!
>>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>>
>>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>>>> this later.
>>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>>>> the gap check at the crossing boundary. What do you think?
>>>>>>
>>>>> One problem I am seeing with this approach is that, since the hint address
>>>>> is generated randomly, the VMAs are also being created at randomly based on
>>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>>
>>>>> High address VMAs
>>>>> -----------------
>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>>
>>>>> I have a different approach to solve this issue.
>>>> It is really weird that such a large amount of VA space
>>>> is left between the two VMAs yet mmap is failing.
>>>>
>>>>
>>>>
>>>> Can you please do the following:
>>>> set /proc/sys/vm/max_map_count to the highest value possible.
>>>> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
>>>> In validate_complete_va_space:
>>>>
>>>> if (start_addr >= HIGH_ADDR_MARK && found == false) {
>>>> 	found = true;
>>>> 	continue;
>>>> }
>>> Thanks Dev for the suggestion. I set max_map_count and set overcommit
>>> memory to 1, added this code change as well, and then tried. Still, the
>>> test is failing
>>>
>>>> where found is initialized to false. This will skip the check
>>>> for the boundary.
>>>>
>>>> After this can you tell whether the test is still failing.
>>>>
>>>> Also can you give me the complete output of proc/pid/maps
>>>> after putting a sleep at the end of the test.
>>>>
>>> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
>>> total address space size is 4PB With hint it can map upto
>>> 4PB. Since the hint addres is random in this test random hing VMAs
>>> are getting created. IIUC this is expected only.
>>>
>>>
>>> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
>>> 10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
>>> 1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
>>> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
>>> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
>>> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
>>> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
>>> 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>> 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>> 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>> 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>> eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>
>>>
>>>
>>>
>>> If I give the hint address serially from 128TB then the address
>>> space is contigous and gap is also MAP_SIZE, the test is passing.
>>>
>>> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>> 33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
>>> 10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
>>> 1006f0f0000-10071000000 rw-p 00000000 00:00 0
>>> 10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
>>> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
>>> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
>>> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
>>> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
>>> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
>>> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
>>> 800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]
>>>
>>>
>> Thank you for this output. I can't wrap my head around why this behaviour changes
>> when you generate the hint sequentially. The mmap() syscall is supposed to do the
>> following (irrespective of high VA space or not) - if the allocation at the hint
> Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
> 128TB, and the system can map up to 4PB.
>
> In the test, the first mmap call maps memory up to 128TB without any
> hint, so the VMAs are created below the 128TB boundary.
>
> In the second mmap call, we provide a hint starting from 256TB, and
> the hint address is generated randomly above 256TB. The mappings are
> correctly created at these hint addresses. Since the hint addresses
> are random, the resulting VMAs are also created at random locations.
>
> So, what I tried is: mapping from 0 to 128TB without any hint, and
> then for the second mmap, instead of starting the hint from 256TB, I
> started from 128TB. Instead of using random hint addresses, I used
> sequential hint addresses from 128TB up to 512TB. With this change,
> the VMAs are created in order, and the test passes.
>
> 800000000000-2aab000000000 r--p 00000000 00:00 0    128TB to 512TB VMA
>
> I think we will see same behaviour on x86 with X86_FEATURE_LA57.
>
> I will send the updated patch in V2.

Since you say it fails on both radix and hash, it means that the generic
code path is failing. I see that on my system, when I run the test with
LPA2 config, write() fails with errno set to -ENOMEM. Can you apply
the following diff and check whether the test fails still. Doing this
fixed it for arm64.

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c

index b380e102b22f..3032902d01f2 100644

--- a/tools/testing/selftests/mm/virtual_address_range.c

+++ b/tools/testing/selftests/mm/virtual_address_range.c

@@ -173,10 +173,6 @@ static int validate_complete_va_space(void)

                  */

                 hop = 0;

                 while (start_addr + hop < end_addr) {

-                       if (write(fd, (void *)(start_addr + hop), 1) != 1)

-                               return 1;

-                       lseek(fd, 0, SEEK_SET);

-

                         if (is_marked_vma(vma_name))

                                 munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);

>
>> addr succeeds, then all is well, otherwise, do a top-down search for a large
>> enough gap. I am not aware of the nuances in powerpc but I really am suspecting
>> a bug in powerpc mmap code. Can you try to do some tracing - which function
>> eventually fails to find the empty gap?
>>
>> Through my limited code tracing - we should end up in slice_find_area_topdown,
>> then we ask the generic code to find the gap using vm_unmapped_area. So I
>> suspect something is happening between this, probably slice_scan_available().
>>
>>>>>    From 0 to 128TB, we map memory directly without using any hint. For the range above
>>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>>>> starting from 128TB.
>>>>>
>>>>> With this change:
>>>>>
>>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>>
>>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>>>
>>>>> Below are the VMAs obtained with this approach:
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>>>
>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> @@ -56,21 +56,21 @@
>>>>>     #ifdef __aarch64__
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>>>> -#define HIGH_ADDR_SHIFT 49
>>>>> +#define HIGH_ADDR_SHIFT 48
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>>>     #else
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>>>> -#define HIGH_ADDR_SHIFT 48
>>>>> +#define HIGH_ADDR_SHIFT 47
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>>>     #endif
>>>>> -static char *hint_addr(void)
>>>>> +static char *hint_addr(int hint)
>>>>>     {
>>>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>>>> -       return (char *) (1UL << bits);
>>>>> +       return (char *) (addr);
>>>>>     }
>>>>>     static void validate_addr(char *ptr, int high_addr)
>>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>>            }
>>>>>            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>>> -               hint = hint_addr();
>>>>> +               hint = hint_addr(i);
>>>>>                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>>                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>>>
>>>>>
>>>>>
>>>>> Can we fix it this way?

[-- Attachment #2: Type: text/html, Size: 19394 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-24  6:15                                   ` Dev Jain
@ 2025-06-25  9:36                                     ` Donet Tom
  2025-06-25 10:45                                       ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-25  9:36 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

eOn Tue, Jun 24, 2025 at 11:45:09AM +0530, Dev Jain wrote:
> 
> On 23/06/25 11:02 pm, Donet Tom wrote:
> > On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
> > > On 21/06/25 11:25 pm, Donet Tom wrote:
> > > > On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
> > > > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > > > > > first.
> > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > > > > > mapping count check.
> > > > > > > > > > 
> > > > > > > > > > In do_mmap():
> > > > > > > > > > 
> > > > > > > > > > 	/* Too many mappings? */
> > > > > > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > > > > > 		return -ENOMEM;
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > > > > > this.
> > > > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > > > 
> > > > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > > > > > 
> > > > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > > > > > is doing, I can do that : )
> > > > > > > > > 
> > > > > > > > I just don't have time right now, I guess I'll have to come back to it
> > > > > > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > > > > > it passes, but it might just not be of great value.
> > > > > > > > 
> > > > > > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > > > > > where we place mappings in userland memory. At no point do we promise to not
> > > > > > > > leave larger gaps if we feel like it :)
> > > > > > > You have a fair point. Anyhow a debate for another day.
> > > > > > > 
> > > > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > > > > > 
> > > > > > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > > > > > implementation details.
> > > > > > > > 
> > > > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > > > > > userland VMA testing I'd say.
> > > > > > > > 
> > > > > > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > > > > > technical appraisal!
> > > > > > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > > > > > 
> > > > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > > > > > this later.
> > > > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > > > > > the gap check at the crossing boundary. What do you think?
> > > > > > > 
> > > > > > One problem I am seeing with this approach is that, since the hint address
> > > > > > is generated randomly, the VMAs are also being created at randomly based on
> > > > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > > > 
> > > > > > High address VMAs
> > > > > > -----------------
> > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > > > 
> > > > > > I have a different approach to solve this issue.
> > > > > It is really weird that such a large amount of VA space
> > > > > is left between the two VMAs yet mmap is failing.
> > > > > 
> > > > > 
> > > > > 
> > > > > Can you please do the following:
> > > > > set /proc/sys/vm/max_map_count to the highest value possible.
> > > > > If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
> > > > > In validate_complete_va_space:
> > > > > 
> > > > > if (start_addr >= HIGH_ADDR_MARK && found == false) {
> > > > > 	found = true;
> > > > > 	continue;
> > > > > }
> > > > Thanks Dev for the suggestion. I set max_map_count and set overcommit
> > > > memory to 1, added this code change as well, and then tried. Still, the
> > > > test is failing
> > > > 
> > > > > where found is initialized to false. This will skip the check
> > > > > for the boundary.
> > > > > 
> > > > > After this can you tell whether the test is still failing.
> > > > > 
> > > > > Also can you give me the complete output of proc/pid/maps
> > > > > after putting a sleep at the end of the test.
> > > > > 
> > > > on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
> > > > total address space size is 4PB With hint it can map upto
> > > > 4PB. Since the hint addres is random in this test random hing VMAs
> > > > are getting created. IIUC this is expected only.
> > > > 
> > > > 
> > > > 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> > > > 10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
> > > > 1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> > > > 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
> > > > 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
> > > > 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
> > > > 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
> > > > 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > > > 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > > > 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > > > 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > > > eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > > > 
> > > > 
> > > > 
> > > > 
> > > > If I give the hint address serially from 128TB then the address
> > > > space is contigous and gap is also MAP_SIZE, the test is passing.
> > > > 
> > > > 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
> > > > 10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
> > > > 1006f0f0000-10071000000 rw-p 00000000 00:00 0
> > > > 10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
> > > > 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
> > > > 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
> > > > 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
> > > > 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
> > > > 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
> > > > 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
> > > > 800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]
> > > > 
> > > > 
> > > Thank you for this output. I can't wrap my head around why this behaviour changes
> > > when you generate the hint sequentially. The mmap() syscall is supposed to do the
> > > following (irrespective of high VA space or not) - if the allocation at the hint
> > Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
> > 128TB, and the system can map up to 4PB.
> > 
> > In the test, the first mmap call maps memory up to 128TB without any
> > hint, so the VMAs are created below the 128TB boundary.
> > 
> > In the second mmap call, we provide a hint starting from 256TB, and
> > the hint address is generated randomly above 256TB. The mappings are
> > correctly created at these hint addresses. Since the hint addresses
> > are random, the resulting VMAs are also created at random locations.
> > 
> > So, what I tried is: mapping from 0 to 128TB without any hint, and
> > then for the second mmap, instead of starting the hint from 256TB, I
> > started from 128TB. Instead of using random hint addresses, I used
> > sequential hint addresses from 128TB up to 512TB. With this change,
> > the VMAs are created in order, and the test passes.
> > 
> > 800000000000-2aab000000000 r--p 00000000 00:00 0    128TB to 512TB VMA
> > 
> > I think we will see same behaviour on x86 with X86_FEATURE_LA57.
> > 
> > I will send the updated patch in V2.
> 
> Since you say it fails on both radix and hash, it means that the generic
> code path is failing. I see that on my system, when I run the test with
> LPA2 config, write() fails with errno set to -ENOMEM. Can you apply
> the following diff and check whether the test fails still. Doing this
> fixed it for arm64.
> 
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> 
> index b380e102b22f..3032902d01f2 100644
> 
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> 
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> 
> @@ -173,10 +173,6 @@ static int validate_complete_va_space(void)
> 
>                  */
> 
>                 hop = 0;
> 
>                 while (start_addr + hop < end_addr) {
> 
> -                       if (write(fd, (void *)(start_addr + hop), 1) != 1)
> 
> -                               return 1;
> 
> -                       lseek(fd, 0, SEEK_SET);
> 
> -
> 
>                         if (is_marked_vma(vma_name))
> 
>                                 munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);
>

Even with this change, the test is still failing. In this case,
we are allocating physical memory and writing into it, but our
issue seems to be with the gap between VMAs, so I believe this
might not be directly related.

I will send the next revision where the test passes and no
issues are observed

Just curious — with LPA2, is the second mmap() call successful?
And are the VMAs being created at the hint address as expected?
 
> > 
> > > addr succeeds, then all is well, otherwise, do a top-down search for a large
> > > enough gap. I am not aware of the nuances in powerpc but I really am suspecting
> > > a bug in powerpc mmap code. Can you try to do some tracing - which function
> > > eventually fails to find the empty gap?
> > > 
> > > Through my limited code tracing - we should end up in slice_find_area_topdown,
> > > then we ask the generic code to find the gap using vm_unmapped_area. So I
> > > suspect something is happening between this, probably slice_scan_available().
> > > 
> > > > > >    From 0 to 128TB, we map memory directly without using any hint. For the range above
> > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > > > > > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > > > > > starting from 128TB.
> > > > > > 
> > > > > > With this change:
> > > > > > 
> > > > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > > > 
> > > > > > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > > > > > 
> > > > > > Below are the VMAs obtained with this approach:
> > > > > > 
> > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > > > > > 
> > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > @@ -56,21 +56,21 @@
> > > > > >     #ifdef __aarch64__
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > > > > > -#define HIGH_ADDR_SHIFT 49
> > > > > > +#define HIGH_ADDR_SHIFT 48
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > > > > >     #else
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > > > > > -#define HIGH_ADDR_SHIFT 48
> > > > > > +#define HIGH_ADDR_SHIFT 47
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> > > > > >     #endif
> > > > > > -static char *hint_addr(void)
> > > > > > +static char *hint_addr(int hint)
> > > > > >     {
> > > > > > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > > > > > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > > > > > -       return (char *) (1UL << bits);
> > > > > > +       return (char *) (addr);
> > > > > >     }
> > > > > >     static void validate_addr(char *ptr, int high_addr)
> > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > > > >            }
> > > > > >            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > > > -               hint = hint_addr();
> > > > > > +               hint = hint_addr(i);
> > > > > >                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > > > >                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-25  9:36                                     ` Donet Tom
@ 2025-06-25 10:45                                       ` Dev Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-25 10:45 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 25/06/25 3:06 pm, Donet Tom wrote:
> eOn Tue, Jun 24, 2025 at 11:45:09AM +0530, Dev Jain wrote:
>> On 23/06/25 11:02 pm, Donet Tom wrote:
>>> On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
>>>> On 21/06/25 11:25 pm, Donet Tom wrote:
>>>>> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
>>>>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>>>>>> first.
>>>>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>>>>>> mapping count check.
>>>>>>>>>>>
>>>>>>>>>>> In do_mmap():
>>>>>>>>>>>
>>>>>>>>>>> 	/* Too many mappings? */
>>>>>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>>>>>> 		return -ENOMEM;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>>>>>> this.
>>>>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>>>>
>>>>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>>>>>
>>>>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>>>>>> is doing, I can do that : )
>>>>>>>>>>
>>>>>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>>>>>> it passes, but it might just not be of great value.
>>>>>>>>>
>>>>>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>>>>>> leave larger gaps if we feel like it :)
>>>>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>>>>
>>>>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>>>>>
>>>>>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>>>>>> implementation details.
>>>>>>>>>
>>>>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>>>>>> userland VMA testing I'd say.
>>>>>>>>>
>>>>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>>>>>> technical appraisal!
>>>>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>>>>
>>>>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>>>>>> this later.
>>>>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>>>>>> the gap check at the crossing boundary. What do you think?
>>>>>>>>
>>>>>>> One problem I am seeing with this approach is that, since the hint address
>>>>>>> is generated randomly, the VMAs are also being created at randomly based on
>>>>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>>>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>>>>
>>>>>>> High address VMAs
>>>>>>> -----------------
>>>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>>>>
>>>>>>> I have a different approach to solve this issue.
>>>>>> It is really weird that such a large amount of VA space
>>>>>> is left between the two VMAs yet mmap is failing.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you please do the following:
>>>>>> set /proc/sys/vm/max_map_count to the highest value possible.
>>>>>> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
>>>>>> In validate_complete_va_space:
>>>>>>
>>>>>> if (start_addr >= HIGH_ADDR_MARK && found == false) {
>>>>>> 	found = true;
>>>>>> 	continue;
>>>>>> }
>>>>> Thanks Dev for the suggestion. I set max_map_count and set overcommit
>>>>> memory to 1, added this code change as well, and then tried. Still, the
>>>>> test is failing
>>>>>
>>>>>> where found is initialized to false. This will skip the check
>>>>>> for the boundary.
>>>>>>
>>>>>> After this can you tell whether the test is still failing.
>>>>>>
>>>>>> Also can you give me the complete output of proc/pid/maps
>>>>>> after putting a sleep at the end of the test.
>>>>>>
>>>>> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
>>>>> total address space size is 4PB With hint it can map upto
>>>>> 4PB. Since the hint addres is random in this test random hing VMAs
>>>>> are getting created. IIUC this is expected only.
>>>>>
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 30000000-10030000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
>>>>> 10030770000-100307a0000 rw-p 00000000 00:00 0                            [heap]
>>>>> 1004f000000-7fff8f000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
>>>>> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
>>>>> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fff90160000-7fff901a0000 r--p 00000000 00:00 0                          [vvar]
>>>>> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0                          [vdso]
>>>>> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0                          [stack]
>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>>> eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If I give the hint address serially from 128TB then the address
>>>>> space is contigous and gap is also MAP_SIZE, the test is passing.
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10010000-10020000 r--p 00000000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10020000-10030000 rw-p 00010000 fd:05 134226638                          /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 33000000-10033000000 r--p 00000000 00:00 0                               [anon:virtual_address_range]
>>>>> 10033380000-100333b0000 rw-p 00000000 00:00 0                            [heap]
>>>>> 1006f0f0000-10071000000 rw-p 00000000 00:00 0
>>>>> 10071000000-7fffb1000000 r--p 00000000 00:00 0                           [anon:virtual_address_range]
>>>>> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355                     /usr/lib64/libc.so.6
>>>>> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358                     /usr/lib64/libm.so.6
>>>>> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0                          [vvar]
>>>>> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0                          [vdso]
>>>>> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351                     /usr/lib64/ld64.so.2
>>>>> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0                          [stack]
>>>>> 800000000000-2aab000000000 r--p 00000000 00:00 0                         [anon:virtual_address_range]
>>>>>
>>>>>
>>>> Thank you for this output. I can't wrap my head around why this behaviour changes
>>>> when you generate the hint sequentially. The mmap() syscall is supposed to do the
>>>> following (irrespective of high VA space or not) - if the allocation at the hint
>>> Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
>>> 128TB, and the system can map up to 4PB.
>>>
>>> In the test, the first mmap call maps memory up to 128TB without any
>>> hint, so the VMAs are created below the 128TB boundary.
>>>
>>> In the second mmap call, we provide a hint starting from 256TB, and
>>> the hint address is generated randomly above 256TB. The mappings are
>>> correctly created at these hint addresses. Since the hint addresses
>>> are random, the resulting VMAs are also created at random locations.
>>>
>>> So, what I tried is: mapping from 0 to 128TB without any hint, and
>>> then for the second mmap, instead of starting the hint from 256TB, I
>>> started from 128TB. Instead of using random hint addresses, I used
>>> sequential hint addresses from 128TB up to 512TB. With this change,
>>> the VMAs are created in order, and the test passes.
>>>
>>> 800000000000-2aab000000000 r--p 00000000 00:00 0    128TB to 512TB VMA
>>>
>>> I think we will see same behaviour on x86 with X86_FEATURE_LA57.
>>>
>>> I will send the updated patch in V2.
>> Since you say it fails on both radix and hash, it means that the generic
>> code path is failing. I see that on my system, when I run the test with
>> LPA2 config, write() fails with errno set to -ENOMEM. Can you apply
>> the following diff and check whether the test fails still. Doing this
>> fixed it for arm64.
>>
>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>
>> index b380e102b22f..3032902d01f2 100644
>>
>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>
>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>
>> @@ -173,10 +173,6 @@ static int validate_complete_va_space(void)
>>
>>                   */
>>
>>                  hop = 0;
>>
>>                  while (start_addr + hop < end_addr) {
>>
>> -                       if (write(fd, (void *)(start_addr + hop), 1) != 1)
>>
>> -                               return 1;
>>
>> -                       lseek(fd, 0, SEEK_SET);
>>
>> -
>>
>>                          if (is_marked_vma(vma_name))
>>
>>                                  munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);
>>
> Even with this change, the test is still failing. In this case,
> we are allocating physical memory and writing into it, but our
> issue seems to be with the gap between VMAs, so I believe this
> might not be directly related.
>
> I will send the next revision where the test passes and no
> issues are observed

But we are not solving the real problem - can you give me the diff
of the modified test, the sequential hinting you were talking
about?

>
> Just curious — with LPA2, is the second mmap() call successful?
> And are the VMAs being created at the hint address as expected?

mmap() is working as expected on LPA2 - the first three mmap's
correctly happen at the hint addresses, then mmap retrieves
addresses in a top down fashion, and the test passes, after
eliding the gap check on the boundary.

>   
>>>> addr succeeds, then all is well, otherwise, do a top-down search for a large
>>>> enough gap. I am not aware of the nuances in powerpc but I really am suspecting
>>>> a bug in powerpc mmap code. Can you try to do some tracing - which function
>>>> eventually fails to find the empty gap?
>>>>
>>>> Through my limited code tracing - we should end up in slice_find_area_topdown,
>>>> then we ask the generic code to find the gap using vm_unmapped_area. So I
>>>> suspect something is happening between this, probably slice_scan_available().
>>>>
>>>>>>>     From 0 to 128TB, we map memory directly without using any hint. For the range above
>>>>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>>>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>>>>>> starting from 128TB.
>>>>>>>
>>>>>>> With this change:
>>>>>>>
>>>>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>>>>
>>>>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>>>>>
>>>>>>> Below are the VMAs obtained with this approach:
>>>>>>>
>>>>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>>>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> @@ -56,21 +56,21 @@
>>>>>>>      #ifdef __aarch64__
>>>>>>>      #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>>>>>> -#define HIGH_ADDR_SHIFT 49
>>>>>>> +#define HIGH_ADDR_SHIFT 48
>>>>>>>      #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>>>>>      #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>>>>>      #else
>>>>>>>      #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>>>>>> -#define HIGH_ADDR_SHIFT 48
>>>>>>> +#define HIGH_ADDR_SHIFT 47
>>>>>>>      #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>>>>>      #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>>>>>      #endif
>>>>>>> -static char *hint_addr(void)
>>>>>>> +static char *hint_addr(int hint)
>>>>>>>      {
>>>>>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>>>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>>>>>> -       return (char *) (1UL << bits);
>>>>>>> +       return (char *) (addr);
>>>>>>>      }
>>>>>>>      static void validate_addr(char *ptr, int high_addr)
>>>>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>>>>             }
>>>>>>>             for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>>>>> -               hint = hint_addr();
>>>>>>> +               hint = hint_addr(i);
>>>>>>>                     hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>>>>                                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-19  8:23                         ` Donet Tom
  2025-06-19  9:02                           ` Dev Jain
  2025-06-20 14:45                           ` Dev Jain
@ 2025-06-25 12:52                           ` Dev Jain
  2025-06-25 17:17                             ` Donet Tom
  2 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-25 12:52 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 19/06/25 1:53 pm, Donet Tom wrote:
> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>> first.
>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>> mapping count check.
>>>>>
>>>>> In do_mmap():
>>>>>
>>>>> 	/* Too many mappings? */
>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>> 		return -ENOMEM;
>>>>>
>>>>>
>>>>> As well as numerous other checks in mm/vma.c.
>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>> this.
>>> No problem! It's hard to be aware of everything in mm :)
>>>
>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>
>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>> is doing, I can do that : )
>>>>
>>> I just don't have time right now, I guess I'll have to come back to it
>>> later... it's not the end of the world for it to be iffy in my view as long as
>>> it passes, but it might just not be of great value.
>>>
>>> Philosophically I'd rather we didn't assert internal implementation details like
>>> where we place mappings in userland memory. At no point do we promise to not
>>> leave larger gaps if we feel like it :)
>> You have a fair point. Anyhow a debate for another day.
>>
>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>
>>> But again I'm not sure that achieves much and again also is asserting internal
>>> implementation details.
>>>
>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>> userland VMA testing I'd say.
>>>
>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>> technical appraisal!
>> Nah, it will be rather hilarious to see it all go down the drain xD
>>
>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>> this later.
>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>> the gap check at the crossing boundary. What do you think?
>>
> One problem I am seeing with this approach is that, since the hint address
> is generated randomly, the VMAs are also being created at randomly based on
> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>
> High address VMAs
> -----------------
> 1000000000000-1000040000000 r--p 00000000 00:00 0
> 2000000000000-2000040000000 r--p 00000000 00:00 0
> 4000000000000-4000040000000 r--p 00000000 00:00 0
> 8000000000000-8000040000000 r--p 00000000 00:00 0
> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>
> I have a different approach to solve this issue.
>
>  From 0 to 128TB, we map memory directly without using any hint. For the range above
> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> we use random hint addresses, but I have modified it to generate hint addresses linearly
> starting from 128TB.
>
> With this change:
>
> The 0–128TB range is mapped without hints and verified accordingly.
>
> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>
> Below are the VMAs obtained with this approach:
>
> 10000000-10010000 r-xp 00000000 fd:05 135019531
> 10010000-10020000 r--p 00000000 fd:05 135019531
> 10020000-10030000 rw-p 00010000 fd:05 135019531
> 20000000-10020000000 r--p 00000000 00:00 0
> 10020800000-10020830000 rw-p 00000000 00:00 0
> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 4c4c35eac15e..0be008cba4b0 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -56,21 +56,21 @@
>   
>   #ifdef __aarch64__
>   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> -#define HIGH_ADDR_SHIFT 49
> +#define HIGH_ADDR_SHIFT 48
>   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>   #else
>   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> -#define HIGH_ADDR_SHIFT 48
> +#define HIGH_ADDR_SHIFT 47
>   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>   #endif
>   
> -static char *hint_addr(void)
> +static char *hint_addr(int hint)
>   {
> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>   
> -       return (char *) (1UL << bits);
> +       return (char *) (addr);
>   }
>   
>   static void validate_addr(char *ptr, int high_addr)
> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>          }
>   
>          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> -               hint = hint_addr();
> +               hint = hint_addr(i);
>                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

Ah you sent it here, thanks. This is fine really, but the mystery is
something else.


>
>
> Can we fix it this way?
>   


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-25 12:52                           ` Dev Jain
@ 2025-06-25 17:17                             ` Donet Tom
  2025-06-26  3:57                               ` Dev Jain
  0 siblings, 1 reply; 47+ messages in thread
From: Donet Tom @ 2025-06-25 17:17 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
> 
> On 19/06/25 1:53 pm, Donet Tom wrote:
> > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > first.
> > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > mapping count check.
> > > > > > 
> > > > > > In do_mmap():
> > > > > > 
> > > > > > 	/* Too many mappings? */
> > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > 		return -ENOMEM;
> > > > > > 
> > > > > > 
> > > > > > As well as numerous other checks in mm/vma.c.
> > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > this.
> > > > No problem! It's hard to be aware of everything in mm :)
> > > > 
> > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > 
> > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > is doing, I can do that : )
> > > > > 
> > > > I just don't have time right now, I guess I'll have to come back to it
> > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > it passes, but it might just not be of great value.
> > > > 
> > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > where we place mappings in userland memory. At no point do we promise to not
> > > > leave larger gaps if we feel like it :)
> > > You have a fair point. Anyhow a debate for another day.
> > > 
> > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > 
> > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > implementation details.
> > > > 
> > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > userland VMA testing I'd say.
> > > > 
> > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > technical appraisal!
> > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > 
> > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > this later.
> > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > the gap check at the crossing boundary. What do you think?
> > > 
> > One problem I am seeing with this approach is that, since the hint address
> > is generated randomly, the VMAs are also being created at randomly based on
> > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > 
> > High address VMAs
> > -----------------
> > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > 
> > I have a different approach to solve this issue.
> > 
> >  From 0 to 128TB, we map memory directly without using any hint. For the range above
> > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > starting from 128TB.
> > 
> > With this change:
> > 
> > The 0–128TB range is mapped without hints and verified accordingly.
> > 
> > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > 
> > Below are the VMAs obtained with this approach:
> > 
> > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > 10010000-10020000 r--p 00000000 fd:05 135019531
> > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > 20000000-10020000000 r--p 00000000 00:00 0
> > 10020800000-10020830000 rw-p 00000000 00:00 0
> > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 4c4c35eac15e..0be008cba4b0 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -56,21 +56,21 @@
> >   #ifdef __aarch64__
> >   #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > -#define HIGH_ADDR_SHIFT 49
> > +#define HIGH_ADDR_SHIFT 48
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> >   #else
> >   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > -#define HIGH_ADDR_SHIFT 48
> > +#define HIGH_ADDR_SHIFT 47
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> >   #endif
> > -static char *hint_addr(void)
> > +static char *hint_addr(int hint)
> >   {
> > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > -       return (char *) (1UL << bits);
> > +       return (char *) (addr);
> >   }
> >   static void validate_addr(char *ptr, int high_addr)
> > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> >          }
> >          for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > -               hint = hint_addr();
> > +               hint = hint_addr(i);
> >                  hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> >                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
> Ah you sent it here, thanks. This is fine really, but the mystery is
> something else.
>

Thanks Dev

I can send out v2 with this patch included, right?

 
> 
> > 
> > 
> > Can we fix it this way?
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-25 17:17                             ` Donet Tom
@ 2025-06-26  3:57                               ` Dev Jain
  2025-06-26  5:42                                 ` Donet Tom
  0 siblings, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-26  3:57 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 25/06/25 10:47 pm, Donet Tom wrote:
> On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>> first.
>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>> mapping count check.
>>>>>>>
>>>>>>> In do_mmap():
>>>>>>>
>>>>>>> 	/* Too many mappings? */
>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>> 		return -ENOMEM;
>>>>>>>
>>>>>>>
>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>> this.
>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>
>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>
>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>> is doing, I can do that : )
>>>>>>
>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>> it passes, but it might just not be of great value.
>>>>>
>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>> leave larger gaps if we feel like it :)
>>>> You have a fair point. Anyhow a debate for another day.
>>>>
>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>
>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>> implementation details.
>>>>>
>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>> userland VMA testing I'd say.
>>>>>
>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>> technical appraisal!
>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>
>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>> this later.
>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>> the gap check at the crossing boundary. What do you think?
>>>>
>>> One problem I am seeing with this approach is that, since the hint address
>>> is generated randomly, the VMAs are also being created at randomly based on
>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>
>>> High address VMAs
>>> -----------------
>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>
>>> I have a different approach to solve this issue.
>>>
>>>   From 0 to 128TB, we map memory directly without using any hint. For the range above
>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>> starting from 128TB.
>>>
>>> With this change:
>>>
>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>
>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>
>>> Below are the VMAs obtained with this approach:
>>>
>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>> 20000000-10020000000 r--p 00000000 00:00 0
>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 4c4c35eac15e..0be008cba4b0 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -56,21 +56,21 @@
>>>    #ifdef __aarch64__
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>> -#define HIGH_ADDR_SHIFT 49
>>> +#define HIGH_ADDR_SHIFT 48
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>    #else
>>>    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>> -#define HIGH_ADDR_SHIFT 48
>>> +#define HIGH_ADDR_SHIFT 47
>>>    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>    #endif
>>> -static char *hint_addr(void)
>>> +static char *hint_addr(int hint)
>>>    {
>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>> -       return (char *) (1UL << bits);
>>> +       return (char *) (addr);
>>>    }
>>>    static void validate_addr(char *ptr, int high_addr)
>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>           }
>>>           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>> -               hint = hint_addr();
>>> +               hint = hint_addr(i);
>>>                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>> Ah you sent it here, thanks. This is fine really, but the mystery is
>> something else.
>>
> Thanks Dev
>
> I can send out v2 with this patch included, right?

Sorry not yet :) this patch will just hide the real problem, which
is, after the hint addresses get exhausted, why on ppc the kernel
cannot find a VMA to install despite having such large gaps between
VMAs.

It should be quite easy to trace which function is failing. Can you
please do some debugging for me? Otherwise I will have to go ahead
with setting up a PPC VM and testing myself :)

>
>   
>>>
>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-26  3:57                               ` Dev Jain
@ 2025-06-26  5:42                                 ` Donet Tom
  2025-06-26  5:55                                   ` Dev Jain
  2025-06-26  6:35                                   ` Dev Jain
  0 siblings, 2 replies; 47+ messages in thread
From: Donet Tom @ 2025-06-26  5:42 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
> 
> On 25/06/25 10:47 pm, Donet Tom wrote:
> > On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
> > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > > > first.
> > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > > > mapping count check.
> > > > > > > > 
> > > > > > > > In do_mmap():
> > > > > > > > 
> > > > > > > > 	/* Too many mappings? */
> > > > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > > > 		return -ENOMEM;
> > > > > > > > 
> > > > > > > > 
> > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > > > this.
> > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > 
> > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > > > 
> > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > > > is doing, I can do that : )
> > > > > > > 
> > > > > > I just don't have time right now, I guess I'll have to come back to it
> > > > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > > > it passes, but it might just not be of great value.
> > > > > > 
> > > > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > > > where we place mappings in userland memory. At no point do we promise to not
> > > > > > leave larger gaps if we feel like it :)
> > > > > You have a fair point. Anyhow a debate for another day.
> > > > > 
> > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > > > 
> > > > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > > > implementation details.
> > > > > > 
> > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > > > userland VMA testing I'd say.
> > > > > > 
> > > > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > > > technical appraisal!
> > > > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > > > 
> > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > > > this later.
> > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > > > the gap check at the crossing boundary. What do you think?
> > > > > 
> > > > One problem I am seeing with this approach is that, since the hint address
> > > > is generated randomly, the VMAs are also being created at randomly based on
> > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > 
> > > > High address VMAs
> > > > -----------------
> > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > 
> > > > I have a different approach to solve this issue.
> > > > 
> > > >   From 0 to 128TB, we map memory directly without using any hint. For the range above
> > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > > > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > > > starting from 128TB.
> > > > 
> > > > With this change:
> > > > 
> > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > 
> > > > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > > > 
> > > > Below are the VMAs obtained with this approach:
> > > > 
> > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > > > 
> > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > @@ -56,21 +56,21 @@
> > > >    #ifdef __aarch64__
> > > >    #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > > > -#define HIGH_ADDR_SHIFT 49
> > > > +#define HIGH_ADDR_SHIFT 48
> > > >    #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> > > >    #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > > >    #else
> > > >    #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > > > -#define HIGH_ADDR_SHIFT 48
> > > > +#define HIGH_ADDR_SHIFT 47
> > > >    #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > > >    #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> > > >    #endif
> > > > -static char *hint_addr(void)
> > > > +static char *hint_addr(int hint)
> > > >    {
> > > > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > > > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > > > -       return (char *) (1UL << bits);
> > > > +       return (char *) (addr);
> > > >    }
> > > >    static void validate_addr(char *ptr, int high_addr)
> > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > >           }
> > > >           for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > -               hint = hint_addr();
> > > > +               hint = hint_addr(i);
> > > >                   hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > >                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > Ah you sent it here, thanks. This is fine really, but the mystery is
> > > something else.
> > > 
> > Thanks Dev
> > 
> > I can send out v2 with this patch included, right?
> 
> Sorry not yet :) this patch will just hide the real problem, which
> is, after the hint addresses get exhausted, why on ppc the kernel
> cannot find a VMA to install despite having such large gaps between
> VMAs.


I think there is some confusion here, so let me clarify.

On PowerPC, mmap is able to find VMAs both with and without a hint.
There is no issue there. If you look at the test, from 0 to 128TB we
are mapping without any hint, and the VMAs are getting created as
expected.

Above 256TB, we are mapping with random hint addresses, and with
those hints, all VMAs are being created above 258TB. No mmap call
is failing in this case.

The problem is with the test itself: since we are providing random
hint addresses, the VMAs are also being created at random locations.

Below is the VMAs created with hint addreess

1. 256TB hint address

1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]

2. 512TB hint address
2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]

3. 1024TB Hint address
4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]

4. 2048TB hint Address
8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]

5. above 3096 Hint address
eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range].


We support up to 4PB, and since the hint addresses are random,
the VMAs are created at random locations.

With sequential hint addresses from 128TB to 512TB, we provide the
hint addresses in order, and the VMAs are created at the hinted
addresses.

Within 512TB, we were able to test both high and low addresses, so
I thought sequential hinting would be a good approach. Since there
has been a lot of confusion, I’m considering adding a complete 4PB
allocation test — from 0 to 128TB we allocate without any hint, and
from 128TB onward we use sequential hint addresses.

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index e24c36a39f22..f2009d23f8b2 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -50,6 +50,7 @@
 #define NR_CHUNKS_256TB   (NR_CHUNKS_128TB * 2UL)
 #define NR_CHUNKS_384TB   (NR_CHUNKS_128TB * 3UL)
 #define NR_CHUNKS_3840TB  (NR_CHUNKS_128TB * 30UL)
+#define NR_CHUNKS_3968TB  (NR_CHUNKS_128TB * 31UL)
 
 #define ADDR_MARK_128TB  (1UL << 47) /* First address beyond 128TB */
 #define ADDR_MARK_256TB  (1UL << 48) /* First address beyond 256TB */
@@ -59,6 +60,11 @@
 #define HIGH_ADDR_SHIFT 49
 #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
 #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
+#elif defined(__PPC64__)
+#define HIGH_ADDR_MARK  ADDR_MARK_128TB
+#define HIGH_ADDR_SHIFT 47
+#define NR_CHUNKS_LOW   NR_CHUNKS_128TB
+#define NR_CHUNKS_HIGH  NR_CHUNKS_3968TB
 #else
 #define HIGH_ADDR_MARK  ADDR_MARK_128TB
 #define HIGH_ADDR_SHIFT 48


With this the test is passing. 



> 
> It should be quite easy to trace which function is failing. Can you
> please do some debugging for me? Otherwise I will have to go ahead
> with setting up a PPC VM and testing myself :)
> 
> > 
> > > > 
> > > > Can we fix it this way?


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-26  5:42                                 ` Donet Tom
@ 2025-06-26  5:55                                   ` Dev Jain
  2025-06-26  6:35                                   ` Dev Jain
  1 sibling, 0 replies; 47+ messages in thread
From: Dev Jain @ 2025-06-26  5:55 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 26/06/25 11:12 am, Donet Tom wrote:
> On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
>> On 25/06/25 10:47 pm, Donet Tom wrote:
>>> On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
>>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>>>> first.
>>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>>>> mapping count check.
>>>>>>>>>
>>>>>>>>> In do_mmap():
>>>>>>>>>
>>>>>>>>> 	/* Too many mappings? */
>>>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>>>> 		return -ENOMEM;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>>>> this.
>>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>>
>>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>>>
>>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>>>> is doing, I can do that : )
>>>>>>>>
>>>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>>>> it passes, but it might just not be of great value.
>>>>>>>
>>>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>>>> leave larger gaps if we feel like it :)
>>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>>
>>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>>>
>>>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>>>> implementation details.
>>>>>>>
>>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>>>> userland VMA testing I'd say.
>>>>>>>
>>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>>>> technical appraisal!
>>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>>
>>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>>>> this later.
>>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>>>> the gap check at the crossing boundary. What do you think?
>>>>>>
>>>>> One problem I am seeing with this approach is that, since the hint address
>>>>> is generated randomly, the VMAs are also being created at randomly based on
>>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>>
>>>>> High address VMAs
>>>>> -----------------
>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>>
>>>>> I have a different approach to solve this issue.
>>>>>
>>>>>    From 0 to 128TB, we map memory directly without using any hint. For the range above
>>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>>>> starting from 128TB.
>>>>>
>>>>> With this change:
>>>>>
>>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>>
>>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>>>
>>>>> Below are the VMAs obtained with this approach:
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>>>
>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> @@ -56,21 +56,21 @@
>>>>>     #ifdef __aarch64__
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>>>> -#define HIGH_ADDR_SHIFT 49
>>>>> +#define HIGH_ADDR_SHIFT 48
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>>>     #else
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>>>> -#define HIGH_ADDR_SHIFT 48
>>>>> +#define HIGH_ADDR_SHIFT 47
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>>>     #endif
>>>>> -static char *hint_addr(void)
>>>>> +static char *hint_addr(int hint)
>>>>>     {
>>>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>>>> -       return (char *) (1UL << bits);
>>>>> +       return (char *) (addr);
>>>>>     }
>>>>>     static void validate_addr(char *ptr, int high_addr)
>>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>>            }
>>>>>            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>>> -               hint = hint_addr();
>>>>> +               hint = hint_addr(i);
>>>>>                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>>                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>> Ah you sent it here, thanks. This is fine really, but the mystery is
>>>> something else.
>>>>
>>> Thanks Dev
>>>
>>> I can send out v2 with this patch included, right?
>> Sorry not yet :) this patch will just hide the real problem, which
>> is, after the hint addresses get exhausted, why on ppc the kernel
>> cannot find a VMA to install despite having such large gaps between
>> VMAs.
>
> I think there is some confusion here, so let me clarify.
>
> On PowerPC, mmap is able to find VMAs both with and without a hint.
> There is no issue there. If you look at the test, from 0 to 128TB we
> are mapping without any hint, and the VMAs are getting created as
> expected.
>
> Above 256TB, we are mapping with random hint addresses, and with
> those hints, all VMAs are being created above 258TB. No mmap call
> is failing in this case.
>
> The problem is with the test itself: since we are providing random
> hint addresses, the VMAs are also being created at random locations.
>
> Below is the VMAs created with hint addreess
>
> 1. 256TB hint address
>
> 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 2. 512TB hint address
> 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 3. 1024TB Hint address
> 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 4. 2048TB hint Address
> 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 5. above 3096 Hint address
> eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range].
>
>
> We support up to 4PB, and since the hint addresses are random,
> the VMAs are created at random locations.

I am still not getting you. What happens on my system (and what should
happen) is, suppose after you get the VMA layout you described above,
you again do an mmap with a random high addr hint. Suppose that the hint
addr is 1 << 50. Then, mmap will see that there already is a VMA there. So,
we will fall back to top down allocation, and the fifth VMA described
above will get expanded downwards. This will keep happening till the gap
between the fifth VMA start and fourth VMA end is less than MAP_CHUNK_SIZE.

Then, mmap will extend the fourth VMA downwards, and so on. Eventually
all the gaps will be less than MAP_CHUNK_SIZE.

>
> With sequential hint addresses from 128TB to 512TB, we provide the
> hint addresses in order, and the VMAs are created at the hinted
> addresses.
>
> Within 512TB, we were able to test both high and low addresses, so
> I thought sequential hinting would be a good approach. Since there
> has been a lot of confusion, I’m considering adding a complete 4PB
> allocation test — from 0 to 128TB we allocate without any hint, and
> from 128TB onward we use sequential hint addresses.
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index e24c36a39f22..f2009d23f8b2 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -50,6 +50,7 @@
>   #define NR_CHUNKS_256TB   (NR_CHUNKS_128TB * 2UL)
>   #define NR_CHUNKS_384TB   (NR_CHUNKS_128TB * 3UL)
>   #define NR_CHUNKS_3840TB  (NR_CHUNKS_128TB * 30UL)
> +#define NR_CHUNKS_3968TB  (NR_CHUNKS_128TB * 31UL)
>   
>   #define ADDR_MARK_128TB  (1UL << 47) /* First address beyond 128TB */
>   #define ADDR_MARK_256TB  (1UL << 48) /* First address beyond 256TB */
> @@ -59,6 +60,11 @@
>   #define HIGH_ADDR_SHIFT 49
>   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> +#elif defined(__PPC64__)
> +#define HIGH_ADDR_MARK  ADDR_MARK_128TB
> +#define HIGH_ADDR_SHIFT 47
> +#define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> +#define NR_CHUNKS_HIGH  NR_CHUNKS_3968TB
>   #else
>   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>   #define HIGH_ADDR_SHIFT 48
>
>
> With this the test is passing.
>
>
>
>> It should be quite easy to trace which function is failing. Can you
>> please do some debugging for me? Otherwise I will have to go ahead
>> with setting up a PPC VM and testing myself :)
>>
>>>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-26  5:42                                 ` Donet Tom
  2025-06-26  5:55                                   ` Dev Jain
@ 2025-06-26  6:35                                   ` Dev Jain
  2025-06-26  6:52                                     ` Donet Tom
  1 sibling, 1 reply; 47+ messages in thread
From: Dev Jain @ 2025-06-26  6:35 UTC (permalink / raw)
  To: Donet Tom
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list


On 26/06/25 11:12 am, Donet Tom wrote:
> On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
>> On 25/06/25 10:47 pm, Donet Tom wrote:
>>> On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
>>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>>>> first.
>>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>>>> mapping count check.
>>>>>>>>>
>>>>>>>>> In do_mmap():
>>>>>>>>>
>>>>>>>>> 	/* Too many mappings? */
>>>>>>>>> 	if (mm->map_count > sysctl_max_map_count)
>>>>>>>>> 		return -ENOMEM;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>>>> this.
>>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>>
>>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>>>
>>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>>>> is doing, I can do that : )
>>>>>>>>
>>>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>>>> it passes, but it might just not be of great value.
>>>>>>>
>>>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>>>> leave larger gaps if we feel like it :)
>>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>>
>>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>>>
>>>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>>>> implementation details.
>>>>>>>
>>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>>>> userland VMA testing I'd say.
>>>>>>>
>>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>>>> technical appraisal!
>>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>>
>>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>>>> this later.
>>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>>>> the gap check at the crossing boundary. What do you think?
>>>>>>
>>>>> One problem I am seeing with this approach is that, since the hint address
>>>>> is generated randomly, the VMAs are also being created at randomly based on
>>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>>
>>>>> High address VMAs
>>>>> -----------------
>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>>
>>>>> I have a different approach to solve this issue.
>>>>>
>>>>>    From 0 to 128TB, we map memory directly without using any hint. For the range above
>>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>>>> starting from 128TB.
>>>>>
>>>>> With this change:
>>>>>
>>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>>
>>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>>>
>>>>> Below are the VMAs obtained with this approach:
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>>> 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
>>>>>
>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> @@ -56,21 +56,21 @@
>>>>>     #ifdef __aarch64__
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
>>>>> -#define HIGH_ADDR_SHIFT 49
>>>>> +#define HIGH_ADDR_SHIFT 48
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
>>>>>     #else
>>>>>     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>>>>> -#define HIGH_ADDR_SHIFT 48
>>>>> +#define HIGH_ADDR_SHIFT 47
>>>>>     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
>>>>>     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
>>>>>     #endif
>>>>> -static char *hint_addr(void)
>>>>> +static char *hint_addr(int hint)
>>>>>     {
>>>>> -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>>> +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>>>> -       return (char *) (1UL << bits);
>>>>> +       return (char *) (addr);
>>>>>     }
>>>>>     static void validate_addr(char *ptr, int high_addr)
>>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>>            }
>>>>>            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>>> -               hint = hint_addr();
>>>>> +               hint = hint_addr(i);
>>>>>                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>>                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>> Ah you sent it here, thanks. This is fine really, but the mystery is
>>>> something else.
>>>>
>>> Thanks Dev
>>>
>>> I can send out v2 with this patch included, right?
>> Sorry not yet :) this patch will just hide the real problem, which
>> is, after the hint addresses get exhausted, why on ppc the kernel
>> cannot find a VMA to install despite having such large gaps between
>> VMAs.
>
> I think there is some confusion here, so let me clarify.
>
> On PowerPC, mmap is able to find VMAs both with and without a hint.
> There is no issue there. If you look at the test, from 0 to 128TB we
> are mapping without any hint, and the VMAs are getting created as
> expected.
>
> Above 256TB, we are mapping with random hint addresses, and with
> those hints, all VMAs are being created above 258TB. No mmap call
> is failing in this case.
>
> The problem is with the test itself: since we are providing random
> hint addresses, the VMAs are also being created at random locations.
>
> Below is the VMAs created with hint addreess
>
> 1. 256TB hint address
>
> 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 2. 512TB hint address
> 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 3. 1024TB Hint address
> 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 4. 2048TB hint Address
> 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
>
> 5. above 3096 Hint address
> eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range].
>
>
> We support up to 4PB, and since the hint addresses are random,
> the VMAs are created at random locations.
>
> With sequential hint addresses from 128TB to 512TB, we provide the
> hint addresses in order, and the VMAs are created at the hinted
> addresses.
>
> Within 512TB, we were able to test both high and low addresses, so
> I thought sequential hinting would be a good approach. Since there
> has been a lot of confusion, I’m considering adding a complete 4PB
> allocation test — from 0 to 128TB we allocate without any hint, and
> from 128TB onward we use sequential hint addresses.
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index e24c36a39f22..f2009d23f8b2 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -50,6 +50,7 @@
>   #define NR_CHUNKS_256TB   (NR_CHUNKS_128TB * 2UL)
>   #define NR_CHUNKS_384TB   (NR_CHUNKS_128TB * 3UL)
>   #define NR_CHUNKS_3840TB  (NR_CHUNKS_128TB * 30UL)
> +#define NR_CHUNKS_3968TB  (NR_CHUNKS_128TB * 31UL)
>   
>   #define ADDR_MARK_128TB  (1UL << 47) /* First address beyond 128TB */
>   #define ADDR_MARK_256TB  (1UL << 48) /* First address beyond 256TB */
> @@ -59,6 +60,11 @@
>   #define HIGH_ADDR_SHIFT 49
>   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
>   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> +#elif defined(__PPC64__)
> +#define HIGH_ADDR_MARK  ADDR_MARK_128TB
> +#define HIGH_ADDR_SHIFT 47
> +#define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> +#define NR_CHUNKS_HIGH  NR_CHUNKS_3968TB
>   #else
>   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
>   #define HIGH_ADDR_SHIFT 48
>
>
> With this the test is passing.

Ah okay this was the problem, PPC got extended for 52 bits and the
test was not updated. This is the correct fix, you can go ahead
with this one.

>
>
>
>> It should be quite easy to trace which function is failing. Can you
>> please do some debugging for me? Otherwise I will have to go ahead
>> with setting up a PPC VM and testing myself :)
>>
>>>>> Can we fix it this way?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
  2025-06-26  6:35                                   ` Dev Jain
@ 2025-06-26  6:52                                     ` Donet Tom
  0 siblings, 0 replies; 47+ messages in thread
From: Donet Tom @ 2025-06-26  6:52 UTC (permalink / raw)
  To: Dev Jain
  Cc: Lorenzo Stoakes, Aboorva Devarajan, akpm, Liam.Howlett, shuah,
	pfalcato, david, ziy, baolin.wang, npache, ryan.roberts, baohua,
	linux-mm, linux-kselftest, linux-kernel, ritesh.list

On Thu, Jun 26, 2025 at 12:05:11PM +0530, Dev Jain wrote:
> 
> On 26/06/25 11:12 am, Donet Tom wrote:
> > On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
> > > On 25/06/25 10:47 pm, Donet Tom wrote:
> > > > On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
> > > > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > > > > > first.
> > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > > > > > mapping count check.
> > > > > > > > > > 
> > > > > > > > > > In do_mmap():
> > > > > > > > > > 
> > > > > > > > > > 	/* Too many mappings? */
> > > > > > > > > > 	if (mm->map_count > sysctl_max_map_count)
> > > > > > > > > > 		return -ENOMEM;
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > > > > > this.
> > > > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > > > 
> > > > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > > > > > 
> > > > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > > > > > is doing, I can do that : )
> > > > > > > > > 
> > > > > > > > I just don't have time right now, I guess I'll have to come back to it
> > > > > > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > > > > > it passes, but it might just not be of great value.
> > > > > > > > 
> > > > > > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > > > > > where we place mappings in userland memory. At no point do we promise to not
> > > > > > > > leave larger gaps if we feel like it :)
> > > > > > > You have a fair point. Anyhow a debate for another day.
> > > > > > > 
> > > > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > > > > > 
> > > > > > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > > > > > implementation details.
> > > > > > > > 
> > > > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > > > > > userland VMA testing I'd say.
> > > > > > > > 
> > > > > > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > > > > > technical appraisal!
> > > > > > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > > > > > 
> > > > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > > > > > this later.
> > > > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > > > > > the gap check at the crossing boundary. What do you think?
> > > > > > > 
> > > > > > One problem I am seeing with this approach is that, since the hint address
> > > > > > is generated randomly, the VMAs are also being created at randomly based on
> > > > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > > > 
> > > > > > High address VMAs
> > > > > > -----------------
> > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > > > 
> > > > > > I have a different approach to solve this issue.
> > > > > > 
> > > > > >    From 0 to 128TB, we map memory directly without using any hint. For the range above
> > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > > > > > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > > > > > starting from 128TB.
> > > > > > 
> > > > > > With this change:
> > > > > > 
> > > > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > > > 
> > > > > > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > > > > > 
> > > > > > Below are the VMAs obtained with this approach:
> > > > > > 
> > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address (128TB to 512TB)
> > > > > > 
> > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > @@ -56,21 +56,21 @@
> > > > > >     #ifdef __aarch64__
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > > > > > -#define HIGH_ADDR_SHIFT 49
> > > > > > +#define HIGH_ADDR_SHIFT 48
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > > > > >     #else
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > > > > > -#define HIGH_ADDR_SHIFT 48
> > > > > > +#define HIGH_ADDR_SHIFT 47
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> > > > > >     #endif
> > > > > > -static char *hint_addr(void)
> > > > > > +static char *hint_addr(int hint)
> > > > > >     {
> > > > > > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > > > > > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > > > > > -       return (char *) (1UL << bits);
> > > > > > +       return (char *) (addr);
> > > > > >     }
> > > > > >     static void validate_addr(char *ptr, int high_addr)
> > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > > > >            }
> > > > > >            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > > > -               hint = hint_addr();
> > > > > > +               hint = hint_addr(i);
> > > > > >                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > > > >                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > > > Ah you sent it here, thanks. This is fine really, but the mystery is
> > > > > something else.
> > > > > 
> > > > Thanks Dev
> > > > 
> > > > I can send out v2 with this patch included, right?
> > > Sorry not yet :) this patch will just hide the real problem, which
> > > is, after the hint addresses get exhausted, why on ppc the kernel
> > > cannot find a VMA to install despite having such large gaps between
> > > VMAs.
> > 
> > I think there is some confusion here, so let me clarify.
> > 
> > On PowerPC, mmap is able to find VMAs both with and without a hint.
> > There is no issue there. If you look at the test, from 0 to 128TB we
> > are mapping without any hint, and the VMAs are getting created as
> > expected.
> > 
> > Above 256TB, we are mapping with random hint addresses, and with
> > those hints, all VMAs are being created above 258TB. No mmap call
> > is failing in this case.
> > 
> > The problem is with the test itself: since we are providing random
> > hint addresses, the VMAs are also being created at random locations.
> > 
> > Below is the VMAs created with hint addreess
> > 
> > 1. 256TB hint address
> > 
> > 1000000000000-1000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 
> > 2. 512TB hint address
> > 2000000000000-2000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 
> > 3. 1024TB Hint address
> > 4000000000000-4000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 
> > 4. 2048TB hint Address
> > 8000000000000-8000040000000 r--p 00000000 00:00 0                        [anon:virtual_address_range]
> > 
> > 5. above 3096 Hint address
> > eb95410220000-fffff90220000 r--p 00000000 00:00 0                        [anon:virtual_address_range].
> > 
> > 
> > We support up to 4PB, and since the hint addresses are random,
> > the VMAs are created at random locations.
> > 
> > With sequential hint addresses from 128TB to 512TB, we provide the
> > hint addresses in order, and the VMAs are created at the hinted
> > addresses.
> > 
> > Within 512TB, we were able to test both high and low addresses, so
> > I thought sequential hinting would be a good approach. Since there
> > has been a lot of confusion, I’m considering adding a complete 4PB
> > allocation test — from 0 to 128TB we allocate without any hint, and
> > from 128TB onward we use sequential hint addresses.
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index e24c36a39f22..f2009d23f8b2 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -50,6 +50,7 @@
> >   #define NR_CHUNKS_256TB   (NR_CHUNKS_128TB * 2UL)
> >   #define NR_CHUNKS_384TB   (NR_CHUNKS_128TB * 3UL)
> >   #define NR_CHUNKS_3840TB  (NR_CHUNKS_128TB * 30UL)
> > +#define NR_CHUNKS_3968TB  (NR_CHUNKS_128TB * 31UL)
> >   #define ADDR_MARK_128TB  (1UL << 47) /* First address beyond 128TB */
> >   #define ADDR_MARK_256TB  (1UL << 48) /* First address beyond 256TB */
> > @@ -59,6 +60,11 @@
> >   #define HIGH_ADDR_SHIFT 49
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > +#elif defined(__PPC64__)
> > +#define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > +#define HIGH_ADDR_SHIFT 47
> > +#define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > +#define NR_CHUNKS_HIGH  NR_CHUNKS_3968TB
> >   #else
> >   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> >   #define HIGH_ADDR_SHIFT 48
> > 
> > 
> > With this the test is passing.
> 
> Ah okay this was the problem, PPC got extended for 52 bits and the
> test was not updated. This is the correct fix, you can go ahead
> with this one.


Thanks Dev
 
> > 
> > 
> > 
> > > It should be quite easy to trace which function is failing. Can you
> > > please do some debugging for me? Otherwise I will have to go ahead
> > > with setting up a PPC VM and testing myself :)
> > > 
> > > > > > Can we fix it this way?
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-06-26  6:52 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-16 16:06 [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Aboorva Devarajan
2025-06-16 16:06 ` [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues Aboorva Devarajan
2025-06-16 16:27   ` Dev Jain
2025-06-18 10:06     ` Donet Tom
2025-06-18 10:35       ` Dev Jain
2025-06-18 11:22     ` Lorenzo Stoakes
2025-06-18 11:28       ` Dev Jain
2025-06-18 11:37         ` Lorenzo Stoakes
2025-06-18 11:45           ` Dev Jain
2025-06-18 11:57             ` Lorenzo Stoakes
2025-06-18 11:59               ` Lorenzo Stoakes
2025-06-18 13:58               ` Dev Jain
2025-06-18 14:07                 ` Lorenzo Stoakes
2025-06-18 14:17                   ` Dev Jain
2025-06-18 14:35                     ` Lorenzo Stoakes
2025-06-18 14:43                       ` Dev Jain
2025-06-19  8:23                         ` Donet Tom
2025-06-19  9:02                           ` Dev Jain
2025-06-19 15:31                             ` Donet Tom
2025-06-19 16:14                               ` Dev Jain
2025-06-20 14:45                           ` Dev Jain
2025-06-21 17:55                             ` Donet Tom
2025-06-23  4:53                               ` Dev Jain
2025-06-23  4:55                                 ` Dev Jain
2025-06-23 17:32                                 ` Donet Tom
2025-06-24  6:15                                   ` Dev Jain
2025-06-25  9:36                                     ` Donet Tom
2025-06-25 10:45                                       ` Dev Jain
2025-06-25 12:52                           ` Dev Jain
2025-06-25 17:17                             ` Donet Tom
2025-06-26  3:57                               ` Dev Jain
2025-06-26  5:42                                 ` Donet Tom
2025-06-26  5:55                                   ` Dev Jain
2025-06-26  6:35                                   ` Dev Jain
2025-06-26  6:52                                     ` Donet Tom
2025-06-18 11:50           ` Lorenzo Stoakes
2025-06-16 16:06 ` [PATCH 2/6] selftest/mm: Fix ksm_funtional_test failures Aboorva Devarajan
2025-06-16 17:04   ` Liam R. Howlett
2025-06-17 15:10     ` donettom
2025-06-16 16:06 ` [PATCH 3/6] selftests/mm : fix test_prctl_fork_exec failure Aboorva Devarajan
2025-06-16 16:28   ` Dev Jain
2025-06-17 15:04     ` donettom
2025-06-16 16:06 ` [PATCH 4/6] mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size Aboorva Devarajan
2025-06-16 16:06 ` [PATCH 5/6] selftests/mm: Fix child process exit codes in KSM tests Aboorva Devarajan
2025-06-16 16:06 ` [PATCH 6/6] selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no 1G pages Aboorva Devarajan
2025-06-16 16:11 ` [PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests Lorenzo Stoakes
2025-06-17  7:53   ` Aboorva Devarajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).