public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements
@ 2026-03-27  7:15 Sayali Patil
  2026-03-27  7:15 ` [PATCH v3 01/13] selftests/mm: restore default nr_hugepages value during cleanup in charge_reserved_hugetlb.sh Sayali Patil
                   ` (13 more replies)
  0 siblings, 14 replies; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil

Hi all,

Powerpc systems with a 64K base page size exposed several issues while
running mm selftests. Some tests assume specific hugetlb configurations,
use incorrect interfaces, or fail instead of skipping when the required
kernel features are not available.

This series fixes these issues and improves test robustness.

Please review the patches and provide any feedback or suggestions for
improvement.

Thanks,
Sayali

---
v2->v3
  - selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported:
    Rename function to check_uffd_wp_feature_supported() as suggested
    in review.
  - selftest/mm: fix cgroup task placement and drop memory.current checks
    in hugetlb_reparenting_test.sh:
    Drop memory.current validation from the hugetlb reparenting test.
    Keep tolerance at 7MB (reverting earlier increase to 8MB in v1).

v2: https://lore.kernel.org/all/cover.1773305677.git.sayalip@linux.ibm.com/

---
v1->v2
  - For "selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap":
    update FLAGS definition to MAP_HUGETLB | MAP_SHARED | MAP_POPULATE and
    used it for mmap() calls as suggested during review.

v1: https://lore.kernel.org/all/cover.1773134177.git.sayalip@linux.ibm.com/
---

Sayali Patil (13):
  selftests/mm: restore default nr_hugepages value during cleanup in
    charge_reserved_hugetlb.sh
  selftests/mm: fix hugetlb pathname construction in
    charge_reserved_hugetlb.sh
  selftests/mm: fix hugetlb pathname construction in
    hugetlb_reparenting_test.sh
  selftest/mm: fix cgroup task placement and drop memory.current checks
    in hugetlb_reparenting_test.sh
  selftests/mm: size tmpfs according to PMD page size in
    split_huge_page_test
  selftest/mm: adjust hugepage-mremap test size for large huge pages
  selftest/mm: register existing mapping with userfaultfd in
    hugepage-mremap
  selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported
  selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero
  selftests/mm: fix double increment in linked list cleanup in
    compaction_test
  selftests/mm: move hwpoison setup into run_test() and silence modprobe
    output for memory-failure category
  selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page
    sizes

 .../selftests/cgroup/test_hugetlb_memcg.c     | 66 ++++++++++++++-----
 .../selftests/mm/charge_reserved_hugetlb.sh   | 44 +++++++++----
 tools/testing/selftests/mm/compaction_test.c  |  3 -
 tools/testing/selftests/mm/hugepage-mremap.c  | 32 +++------
 .../selftests/mm/hugetlb_reparenting_test.sh  | 56 ++++++++--------
 tools/testing/selftests/mm/run_vmtests.sh     | 59 +++++++++++------
 .../selftests/mm/split_huge_page_test.c       |  5 +-
 tools/testing/selftests/mm/uffd-stress.c      |  6 +-
 tools/testing/selftests/mm/uffd-wp-mremap.c   | 13 ++++
 .../testing/selftests/mm/write_to_hugetlbfs.c |  5 +-
 10 files changed, 181 insertions(+), 108 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 01/13] selftests/mm: restore default nr_hugepages value during cleanup in charge_reserved_hugetlb.sh
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
@ 2026-03-27  7:15 ` Sayali Patil
  2026-03-27  7:15 ` [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction " Sayali Patil
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

During cleanup, the value of /proc/sys/vm/nr_hugepages is currently being
set to 0.  At the end of the test, if all tests pass, the original
nr_hugepages value is restored.  However, if any test fails, it remains
set to 0.
With this patch, we ensure that the original nr_hugepages value is
restored during cleanup, regardless of whether the test passes or fails.

Fixes: 7d695b1c3695b ("selftests/mm: save and restore nr_hugepages value")
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index 447769657634..c9fe68b6fcf9 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -65,7 +65,7 @@ function cleanup() {
   if [[ -e $cgroup_path/hugetlb_cgroup_test2 ]]; then
     rmdir $cgroup_path/hugetlb_cgroup_test2
   fi
-  echo 0 >/proc/sys/vm/nr_hugepages
+  echo "$nr_hugepgs" > /proc/sys/vm/nr_hugepages
   echo CLEANUP DONE
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction in charge_reserved_hugetlb.sh
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
  2026-03-27  7:15 ` [PATCH v3 01/13] selftests/mm: restore default nr_hugepages value during cleanup in charge_reserved_hugetlb.sh Sayali Patil
@ 2026-03-27  7:15 ` Sayali Patil
  2026-04-01 14:06   ` David Hildenbrand (Arm)
  2026-03-27  7:15 ` [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh Sayali Patil
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The charge_reserved_hugetlb.sh script assumes hugetlb cgroup memory
interface file names use the "<size>MB" format
(e.g. hugetlb.1024MB.current).
This assumption breaks on systems with larger huge pages such as 1GB,
where the kernel exposes normalized units:
    hugetlb.1GB.current
    hugetlb.1GB.max
    hugetlb.1GB.rsvd.max
    ...

As a result, the script attempts to access files like
hugetlb.1024MB.current, which do not exist when the kernel reports the
size in GB.

Normalize the huge page size and construct the pathname using the
appropriate unit (MB or GB), matching the hugetlb controller naming.

Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 .../selftests/mm/charge_reserved_hugetlb.sh   | 42 +++++++++++++------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index c9fe68b6fcf9..6bec53e16e05 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -89,6 +89,15 @@ function get_machine_hugepage_size() {
 }
 
 MB=$(get_machine_hugepage_size)
+if (( MB >= 1024 )); then
+        # For 1GB hugepages
+        UNIT="GB"
+        MB_DISPLAY=$((MB / 1024))
+else
+        # For 2MB hugepages
+        UNIT="MB"
+        MB_DISPLAY=$MB
+fi
 
 function setup_cgroup() {
   local name="$1"
@@ -98,11 +107,12 @@ function setup_cgroup() {
   mkdir $cgroup_path/$name
 
   echo writing cgroup limit: "$cgroup_limit"
-  echo "$cgroup_limit" >$cgroup_path/$name/hugetlb.${MB}MB.$fault_limit_file
+  echo "$cgroup_limit" > \
+	  $cgroup_path/$name/hugetlb.${MB_DISPLAY}${UNIT}.$fault_limit_file
 
   echo writing reservation limit: "$reservation_limit"
   echo "$reservation_limit" > \
-    $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file
+    $cgroup_path/$name/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_limit_file
 
   if [ -e "$cgroup_path/$name/cpuset.cpus" ]; then
     echo 0 >$cgroup_path/$name/cpuset.cpus
@@ -137,7 +147,7 @@ function wait_for_file_value() {
 
 function wait_for_hugetlb_memory_to_get_depleted() {
   local cgroup="$1"
-  local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+  local path="$cgroup_path/$cgroup/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file"
 
   wait_for_file_value "$path" "0"
 }
@@ -145,7 +155,7 @@ function wait_for_hugetlb_memory_to_get_depleted() {
 function wait_for_hugetlb_memory_to_get_reserved() {
   local cgroup="$1"
   local size="$2"
-  local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+  local path="$cgroup_path/$cgroup/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file"
 
   wait_for_file_value "$path" "$size"
 }
@@ -153,7 +163,7 @@ function wait_for_hugetlb_memory_to_get_reserved() {
 function wait_for_hugetlb_memory_to_get_written() {
   local cgroup="$1"
   local size="$2"
-  local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
+  local path="$cgroup_path/$cgroup/hugetlb.${MB_DISPLAY}${UNIT}.$fault_usage_file"
 
   wait_for_file_value "$path" "$size"
 }
@@ -175,8 +185,8 @@ function write_hugetlbfs_and_get_usage() {
   hugetlb_difference=0
   reserved_difference=0
 
-  local hugetlb_usage=$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file
-  local reserved_usage=$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file
+  local hugetlb_usage=$cgroup_path/$cgroup/hugetlb.${MB_DISPLAY}${UNIT}.$fault_usage_file
+  local reserved_usage=$cgroup_path/$cgroup/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file
 
   local hugetlb_before=$(cat $hugetlb_usage)
   local reserved_before=$(cat $reserved_usage)
@@ -307,8 +317,10 @@ function run_test() {
 
   cleanup_hugetlb_memory "hugetlb_cgroup_test"
 
-  local final_hugetlb=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB}MB.$fault_usage_file)
-  local final_reservation=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB}MB.$reservation_usage_file)
+  local final_hugetlb=$(cat \
+	 $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB_DISPLAY}${UNIT}.$fault_usage_file)
+  local final_reservation=$(cat \
+	  $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file)
 
   echo $hugetlb_difference
   echo $reserved_difference
@@ -364,10 +376,14 @@ function run_multiple_cgroup_test() {
   reservation_failed1=$reservation_failed
   oom_killed1=$oom_killed
 
-  local cgroup1_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB}MB.$fault_usage_file
-  local cgroup1_reservation_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB}MB.$reservation_usage_file
-  local cgroup2_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB}MB.$fault_usage_file
-  local cgroup2_reservation_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB}MB.$reservation_usage_file
+  local cgroup1_hugetlb_usage=\
+	  $cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB_DISPLAY}${UNIT}.$fault_usage_file
+  local cgroup1_reservation_usage=\
+	  $cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file
+  local cgroup2_hugetlb_usage=\
+	  $cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB_DISPLAY}${UNIT}.$fault_usage_file
+  local cgroup2_reservation_usage=\
+	  $cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB_DISPLAY}${UNIT}.$reservation_usage_file
 
   local usage_before_second_write=$(cat $cgroup1_hugetlb_usage)
   local reservation_usage_before_second_write=$(cat $cgroup1_reservation_usage)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
  2026-03-27  7:15 ` [PATCH v3 01/13] selftests/mm: restore default nr_hugepages value during cleanup in charge_reserved_hugetlb.sh Sayali Patil
  2026-03-27  7:15 ` [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction " Sayali Patil
@ 2026-03-27  7:15 ` Sayali Patil
  2026-04-01 14:06   ` David Hildenbrand (Arm)
  2026-03-27  7:15 ` [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks " Sayali Patil
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The hugetlb_reparenting_test.sh script constructs hugetlb cgroup
memory interface file names based on the configured huge page size. The
script formats the size only in MB units, which causes mismatches on
systems using larger huge pages where the kernel exposes normalized
units (e.g. "1GB" instead of "1024MB").

As a result, the test fails to locate the corresponding cgroup files
when 1GB huge pages are configured.

Update the script to detect the huge page size and select the
appropriate unit (MB or GB) so that the constructed paths match the
kernel's hugetlb controller naming.

Also print an explicit "Fail" message when a test failure occurs to
improve result visibility.

Fixes: e487a5d513cb ("selftest/mm: make hugetlb_reparenting_test tolerant to async reparenting")
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 .../selftests/mm/hugetlb_reparenting_test.sh       | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
index 0dd31892ff67..073a71fa36b4 100755
--- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
@@ -46,6 +46,13 @@ function get_machine_hugepage_size() {
 }
 
 MB=$(get_machine_hugepage_size)
+if (( MB >= 1024 )); then
+  UNIT="GB"
+  MB_DISPLAY=$((MB / 1024))
+else
+  UNIT="MB"
+  MB_DISPLAY=$MB
+fi
 
 function cleanup() {
   echo cleanup
@@ -87,6 +94,7 @@ function assert_with_retry() {
     if [[ $elapsed -ge $timeout ]]; then
       echo "actual = $((${actual%% *} / 1024 / 1024)) MB"
       echo "expected = $((${expected%% *} / 1024 / 1024)) MB"
+      echo FAIL
       cleanup
       exit 1
     fi
@@ -107,11 +115,13 @@ function assert_state() {
   fi
 
   assert_with_retry "$CGROUP_ROOT/a/memory.$usage_file" "$expected_a"
-  assert_with_retry "$CGROUP_ROOT/a/hugetlb.${MB}MB.$usage_file" "$expected_a_hugetlb"
+  assert_with_retry \
+	  "$CGROUP_ROOT/a/hugetlb.${MB_DISPLAY}${UNIT}.$usage_file" "$expected_a_hugetlb"
 
   if [[ -n "$expected_b" && -n "$expected_b_hugetlb" ]]; then
     assert_with_retry "$CGROUP_ROOT/a/b/memory.$usage_file" "$expected_b"
-    assert_with_retry "$CGROUP_ROOT/a/b/hugetlb.${MB}MB.$usage_file" "$expected_b_hugetlb"
+    assert_with_retry \
+	  "$CGROUP_ROOT/a/b/hugetlb.${MB_DISPLAY}${UNIT}.$usage_file" "$expected_b_hugetlb"
   fi
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks in hugetlb_reparenting_test.sh
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (2 preceding siblings ...)
  2026-03-27  7:15 ` [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh Sayali Patil
@ 2026-03-27  7:15 ` Sayali Patil
  2026-04-01 14:08   ` David Hildenbrand (Arm)
  2026-03-27  7:15 ` [PATCH v3 05/13] selftests/mm: size tmpfs according to PMD page size in split_huge_page_test Sayali Patil
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil

Launch write_to_hugetlbfs as a separate process and move only its PID
into the target cgroup before waiting for completion. This avoids moving
the test shell itself, prevents unintended charging to the shell, and
ensures hugetlb and memcg accounting is attributed only to the intended
workload.

Add a short delay before the hugetlb allocation to avoid a race where
memory may be charged before the task migration takes effect, which
can lead to incorrect accounting and intermittent test failures.

The test currently validates both hugetlb usage and memory.current.
However, memory.current includes internal memcg allocations and
per-CPU batched accounting (MEMCG_CHARGE_BATCH), which are not
synchronized and can vary across systems, leading to
non-deterministic results.

Since hugetlb memory is accounted via hugetlb.<size>.current,
memory.current is not a reliable indicator here. Drop memory.current
checks and rely only on hugetlb controller statistics for stable
and accurate validation.

Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 .../selftests/mm/hugetlb_reparenting_test.sh  | 42 ++++++++-----------
 .../testing/selftests/mm/write_to_hugetlbfs.c |  5 ++-
 2 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
index 073a71fa36b4..1e87ac67d43e 100755
--- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
@@ -104,22 +104,17 @@ function assert_with_retry() {
 }
 
 function assert_state() {
-  local expected_a="$1"
-  local expected_a_hugetlb="$2"
-  local expected_b=""
+  local expected_a_hugetlb="$1"
   local expected_b_hugetlb=""
 
-  if [ ! -z ${3:-} ] && [ ! -z ${4:-} ]; then
-    expected_b="$3"
-    expected_b_hugetlb="$4"
+  if [ ! -z ${2:-} ]; then
+    expected_b_hugetlb="$2"
   fi
 
-  assert_with_retry "$CGROUP_ROOT/a/memory.$usage_file" "$expected_a"
   assert_with_retry \
 	  "$CGROUP_ROOT/a/hugetlb.${MB_DISPLAY}${UNIT}.$usage_file" "$expected_a_hugetlb"
 
-  if [[ -n "$expected_b" && -n "$expected_b_hugetlb" ]]; then
-    assert_with_retry "$CGROUP_ROOT/a/b/memory.$usage_file" "$expected_b"
+  if [[ -n "$expected_b_hugetlb" ]]; then
     assert_with_retry \
 	  "$CGROUP_ROOT/a/b/hugetlb.${MB_DISPLAY}${UNIT}.$usage_file" "$expected_b_hugetlb"
   fi
@@ -153,18 +148,17 @@ write_hugetlbfs() {
   local size="$3"
 
   if [[ $cgroup2 ]]; then
-    echo $$ >$CGROUP_ROOT/$cgroup/cgroup.procs
+    cg_file="$CGROUP_ROOT/$cgroup/cgroup.procs"
   else
     echo 0 >$CGROUP_ROOT/$cgroup/cpuset.mems
     echo 0 >$CGROUP_ROOT/$cgroup/cpuset.cpus
-    echo $$ >"$CGROUP_ROOT/$cgroup/tasks"
-  fi
-  ./write_to_hugetlbfs -p "$path" -s "$size" -m 0 -o
-  if [[ $cgroup2 ]]; then
-    echo $$ >$CGROUP_ROOT/cgroup.procs
-  else
-    echo $$ >"$CGROUP_ROOT/tasks"
+    cg_file="$CGROUP_ROOT/$cgroup/tasks"
   fi
+
+  # Spawn write_to_hugetlbfs in a separate task to ensure correct cgroup accounting
+  ./write_to_hugetlbfs -p "$path" -s "$size" -m 0 -o -d & pid=$!
+  echo "$pid" > "$cg_file"
+  wait "$pid"
   echo
 }
 
@@ -202,21 +196,21 @@ if [[ ! $cgroup2 ]]; then
   write_hugetlbfs a "$MNT"/test $size
 
   echo Assert memory charged correctly for parent use.
-  assert_state 0 $size 0 0
+  assert_state $size 0
 
   write_hugetlbfs a/b "$MNT"/test2 $size
 
   echo Assert memory charged correctly for child use.
-  assert_state 0 $(($size * 2)) 0 $size
+  assert_state $(($size * 2)) $size
 
   rmdir "$CGROUP_ROOT"/a/b
   echo Assert memory reparent correctly.
-  assert_state 0 $(($size * 2))
+  assert_state $(($size * 2))
 
   rm -rf "$MNT"/*
   umount "$MNT"
   echo Assert memory uncharged correctly.
-  assert_state 0 0
+  assert_state 0
 
   cleanup
 fi
@@ -230,16 +224,16 @@ echo write
 write_hugetlbfs a/b "$MNT"/test2 $size
 
 echo Assert memory charged correctly for child only use.
-assert_state 0 $(($size)) 0 $size
+assert_state $(($size)) $size
 
 rmdir "$CGROUP_ROOT"/a/b
 echo Assert memory reparent correctly.
-assert_state 0 $size
+assert_state $size
 
 rm -rf "$MNT"/*
 umount "$MNT"
 echo Assert memory uncharged correctly.
-assert_state 0 0
+assert_state 0
 
 cleanup
 
diff --git a/tools/testing/selftests/mm/write_to_hugetlbfs.c b/tools/testing/selftests/mm/write_to_hugetlbfs.c
index ecb5f7619960..6b01b0485bd0 100644
--- a/tools/testing/selftests/mm/write_to_hugetlbfs.c
+++ b/tools/testing/selftests/mm/write_to_hugetlbfs.c
@@ -83,7 +83,7 @@ int main(int argc, char **argv)
 	setvbuf(stdout, NULL, _IONBF, 0);
 	self = argv[0];
 
-	while ((c = getopt(argc, argv, "s:p:m:owlrn")) != -1) {
+	while ((c = getopt(argc, argv, "s:p:m:owlrnd")) != -1) {
 		switch (c) {
 		case 's':
 			if (sscanf(optarg, "%zu", &size) != 1) {
@@ -118,6 +118,9 @@ int main(int argc, char **argv)
 		case 'n':
 			reserve = 0;
 			break;
+		case 'd':
+			sleep(1);
+			break;
 		default:
 			errno = EINVAL;
 			perror("Invalid arg");
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 05/13] selftests/mm: size tmpfs according to PMD page size in split_huge_page_test
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (3 preceding siblings ...)
  2026-03-27  7:15 ` [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks " Sayali Patil
@ 2026-03-27  7:15 ` Sayali Patil
  2026-03-27  7:16 ` [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages Sayali Patil
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The split_file_backed_thp() test mounts a tmpfs with a fixed size of
"4m". This works on systems with smaller PMD page sizes,
but fails on configurations where the PMD huge page size is
larger (e.g. 16MB).

On such systems, the fixed 4MB tmpfs is insufficient to allocate even
a single PMD-sized THP, causing the test to fail.

Fix this by sizing the tmpfs dynamically based on the runtime
pmd_pagesize, allocating space for two PMD-sized pages.

Before patch:
  running ./split_huge_page_test /tmp/xfs_dir_YTrI5E
  --------------------------------------------------
  TAP version 13
  1..55
  ok 1 Split zero filled huge pages successful
  ok 2 Split huge pages to order 0 successful
  ok 3 Split huge pages to order 2 successful
  ok 4 Split huge pages to order 3 successful
  ok 5 Split huge pages to order 4 successful
  ok 6 Split huge pages to order 5 successful
  ok 7 Split huge pages to order 6 successful
  ok 8 Split huge pages to order 7 successful
  ok 9 Split PTE-mapped huge pages successful
   Please enable pr_debug in split_huge_pages_in_file() for more info.
   Failed to write data to testing file: Success (0)
  Bail out! Error occurred
   Planned tests != run tests (55 != 9)
   Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0
 [FAIL]

After patch:
  --------------------------------------------------
  running ./split_huge_page_test /tmp/xfs_dir_bMvj6o
  --------------------------------------------------
  TAP version 13
  1..55
  ok 1 Split zero filled huge pages successful
  ok 2 Split huge pages to order 0 successful
  ok 3 Split huge pages to order 2 successful
  ok 4 Split huge pages to order 3 successful
  ok 5 Split huge pages to order 4 successful
  ok 6 Split huge pages to order 5 successful
  ok 7 Split huge pages to order 6 successful
  ok 8 Split huge pages to order 7 successful
  ok 9 Split PTE-mapped huge pages successful
   Please enable pr_debug in split_huge_pages_in_file() for more info.
   Please check dmesg for more information
  ok 10 File-backed THP split to order 0 test done
   Please enable pr_debug in split_huge_pages_in_file() for more info.
   Please check dmesg for more information
  ok 11 File-backed THP split to order 1 test done
   Please enable pr_debug in split_huge_pages_in_file() for more info.
   Please check dmesg for more information
  ok 12 File-backed THP split to order 2 test done
...
  ok 55 Split PMD-mapped pagecache folio to order 7 at
    in-folio offset 128 passed
   Totals: pass:55 fail:0 xfail:0 xpass:0 skip:0 error:0
   [PASS]
ok 1 split_huge_page_test /tmp/xfs_dir_bMvj6o

Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split")
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/split_huge_page_test.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index e0167111bdd1..57e8a1c9647a 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -484,6 +484,8 @@ static void split_file_backed_thp(int order)
 	char tmpfs_template[] = "/tmp/thp_split_XXXXXX";
 	const char *tmpfs_loc = mkdtemp(tmpfs_template);
 	char testfile[INPUT_MAX];
+	unsigned long size = 2 * pmd_pagesize;
+	char opts[64];
 	ssize_t num_written, num_read;
 	char *file_buf1, *file_buf2;
 	uint64_t pgoff_start = 0, pgoff_end = 1024;
@@ -503,7 +505,8 @@ static void split_file_backed_thp(int order)
 		file_buf1[i] = (char)i;
 	memset(file_buf2, 0, pmd_pagesize);
 
-	status = mount("tmpfs", tmpfs_loc, "tmpfs", 0, "huge=always,size=4m");
+	snprintf(opts, sizeof(opts), "huge=always,size=%lu", size);
+	status = mount("tmpfs", tmpfs_loc, "tmpfs", 0, opts);
 
 	if (status)
 		ksft_exit_fail_msg("Unable to create a tmpfs for testing\n");
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (4 preceding siblings ...)
  2026-03-27  7:15 ` [PATCH v3 05/13] selftests/mm: size tmpfs according to PMD page size in split_huge_page_test Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-01 14:10   ` David Hildenbrand (Arm)
  2026-03-27  7:16 ` [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap Sayali Patil
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The hugepage-mremap selftest uses a default size of 10MB, which is
sufficient for small huge page sizes. However, when the huge page size
is large (e.g. 1GB), 10MB is smaller than a single huge page.
As a result, the test does not trigger PMD sharing and the
corresponding unshare path in mremap(), causing the
test to fail (mremap succeeds where a failure is expected).

Update run_vmtest.sh to use twice the huge page size when the huge page
size exceeds 10MB, while retaining the 10MB default for smaller huge
pages. This ensures the test exercises the intended PMD sharing and
unsharing paths for larger huge page sizes.

Before patch:
 running ./hugepage-mremap
 ------------------------------
 TAP version 13
 1..1
  Map haddr: Returned address is 0x7eaa40000000
  Map daddr: Returned address is 0x7daa40000000
  Map vaddr: Returned address is 0x7faa40000000
  Address returned by mmap() = 0x7fffaa600000
  Mremap: Returned address is 0x7faa40000000
  First hex is 0
  First hex is 3020100
 Bail out! mremap: Expected failure, but call succeeded
 Planned tests != run tests (1 != 0)
 Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
 [FAIL]
 not ok 1 hugepage-mremap # exit=1

Before patch:
 running ./hugepage-mremap
 ------------------------------
 TAP version 13
 1..1
  Map haddr: Returned address is 0x7eaa40000000
  Map daddr: Returned address is 0x7daa40000000
  Map vaddr: Returned address is 0x7faa40000000
  Address returned by mmap() = 0x7fffaa600000
  Mremap: Returned address is 0x7faa40000000
  First hex is 0
  First hex is 3020100
 Bail out! mremap: Expected failure, but call succeeded
 Planned tests != run tests (1 != 0)
 Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
 [FAIL]
 not ok 1 hugepage-mremap # exit=1

After patch:
 running ./hugepage-mremap 2048
 ------------------------------
 TAP version 13
 1..1
  Map haddr: Returned address is 0x7eaa40000000
  Map daddr: Returned address is 0x7daa40000000
  Map vaddr: Returned address is 0x7faa40000000
  Address returned by mmap() = 0x7fff13000000
  Mremap: Returned address is 0x7faa40000000
  First hex is 0
  First hex is 3020100
  ok 1 Read same data
 Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 [PASS]
 ok 1 hugepage-mremap 2048

Fixes: f77a286de48c ("mm, hugepages: make memory size variable in hugepage-mremap selftest")
Acked-by: Zi Yan <ziy@nvidia.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index afdcfd0d7cef..eecec0b6eb13 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -293,7 +293,18 @@ echo "$shmmax" > /proc/sys/kernel/shmmax
 echo "$shmall" > /proc/sys/kernel/shmall
 
 CATEGORY="hugetlb" run_test ./map_hugetlb
-CATEGORY="hugetlb" run_test ./hugepage-mremap
+
+# If the huge page size is larger than 10MB, increase the test memory size
+# to twice the huge page size (in MB) to ensure the test exercises PMD sharing
+# and the unshare path in hugepage-mremap. Otherwise, run the test with
+# the default 10MB memory size.
+if [ "$hpgsize_KB" -gt 10240 ]; then
+	len_mb=$(( (2 * hpgsize_KB) / 1024 ))
+	CATEGORY="hugetlb" run_test ./hugepage-mremap "${len_mb}"
+else
+	CATEGORY="hugetlb" run_test ./hugepage-mremap
+fi
+
 CATEGORY="hugetlb" run_test ./hugepage-vmemmap
 CATEGORY="hugetlb" run_test ./hugetlb-madvise
 CATEGORY="hugetlb" run_test ./hugetlb_dio
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (5 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-01 14:18   ` David Hildenbrand (Arm)
  2026-03-27  7:16 ` [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed " Sayali Patil
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

Previously, register_region_with_uffd() created a new anonymous
mapping and overwrote the address supplied by the caller before
registering the range with userfaultfd.

As a result, userfaultfd was applied to an unrelated anonymous mapping
instead of the hugetlb region used by the test.

Remove the extra mmap() and register the caller-provided address range
directly using UFFDIO_REGISTER_MODE_MISSING, so that faults are
generated for the hugetlb mapping used by the test.

This ensures userfaultfd operates on the actual hugetlb test region and
validates the expected fault handling.

Before patch:
 running ./hugepage-mremap
 -------------------------
 TAP version 13
 1..1
  Map haddr: Returned address is 0x7eaa40000000
  Map daddr: Returned address is 0x7daa40000000
  Map vaddr: Returned address is 0x7faa40000000
  Address returned by mmap() = 0x7fff9d000000
  Mremap: Returned address is 0x7faa40000000
  First hex is 0
  First hex is 3020100
 ok 1 Read same data
 Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 [PASS]
 ok 1 hugepage-mremap

After patch:
 running ./hugepage-mremap
 -------------------------
 TAP version 13
 1..1
  Map haddr: Returned address is 0x7eaa40000000
  Map daddr: Returned address is 0x7daa40000000
  Map vaddr: Returned address is 0x7faa40000000
  Registered memory at address 0x7eaa40000000 with userfaultfd
  Mremap: Returned address is 0x7faa40000000
  First hex is 0
  First hex is 3020100
 ok 1 Read same data
 Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
 [PASS]
 ok 1 hugepage-mremap

Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/hugepage-mremap.c | 21 +++++---------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
index b8f7d92e5a35..e611249080d6 100644
--- a/tools/testing/selftests/mm/hugepage-mremap.c
+++ b/tools/testing/selftests/mm/hugepage-mremap.c
@@ -85,25 +85,14 @@ static void register_region_with_uffd(char *addr, size_t len)
 	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
 		ksft_exit_fail_msg("ioctl-UFFDIO_API: %s\n", strerror(errno));
 
-	/* Create a private anonymous mapping. The memory will be
-	 * demand-zero paged--that is, not yet allocated. When we
-	 * actually touch the memory, it will be allocated via
-	 * the userfaultfd.
-	 */
-
-	addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
-		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-	if (addr == MAP_FAILED)
-		ksft_exit_fail_msg("mmap: %s\n", strerror(errno));
-
-	ksft_print_msg("Address returned by mmap() = %p\n", addr);
-
-	/* Register the memory range of the mapping we just created for
-	 * handling by the userfaultfd object. In mode, we request to track
-	 * missing pages (i.e., pages that have not yet been faulted in).
+	/* Register the passed memory range for handling by the userfaultfd object.
+	 * In mode, we request to track missing pages
+	 * (i.e., pages that have not yet been faulted in).
 	 */
 	if (uffd_register(uffd, addr, len, true, false, false))
 		ksft_exit_fail_msg("ioctl-UFFDIO_REGISTER: %s\n", strerror(errno));
+
+	ksft_print_msg("Registered memory at address %p with userfaultfd\n", addr);
 }
 
 int main(int argc, char *argv[])
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (6 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-01 14:21   ` David Hildenbrand (Arm)
  2026-03-27  7:16 ` [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported Sayali Patil
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The hugepage-mremap selftest reserves the destination address using a
anonymous base-page mapping before calling mremap() with MREMAP_FIXED,
while the source region is hugetlb-backed.

When remapping a hugetlb mapping into a base-page VMA may fail with:

    mremap: Device or resource busy

This is observed on powerpc hash MMU systems where slice constraints
and page size incompatibilities prevent the remap.

Ensure the destination region is created using MAP_HUGETLB so that both
source and destination VMAs are hugetlb-backed and compatible. Also add
MAP_POPULATE to the destination mapping to prefault hugepages,
matching the behaviour used for other hugetlb mapping in the test and
ensuring deterministic behaviour.

Update the FLAGS macro to include MAP_HUGETLB | MAP_SHARED |
MAP_POPULATE so that both mappings are hugetlb-backed and compatible.
Also use the macro for the mmap() calls to avoid repeating
the flag combination.

This ensures the test reliably exercises hugetlb mremap instead of
failing due to VMA type mismatch.

Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
Acked-by: Zi Yan <ziy@nvidia.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/hugepage-mremap.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
index e611249080d6..48c24a4ba9a7 100644
--- a/tools/testing/selftests/mm/hugepage-mremap.c
+++ b/tools/testing/selftests/mm/hugepage-mremap.c
@@ -31,7 +31,7 @@
 #define MB_TO_BYTES(x) (x * 1024 * 1024)
 
 #define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC)
-#define FLAGS (MAP_SHARED | MAP_ANONYMOUS)
+#define FLAGS (MAP_HUGETLB | MAP_SHARED | MAP_POPULATE)
 
 static void check_bytes(char *addr)
 {
@@ -121,23 +121,20 @@ int main(int argc, char *argv[])
 
 	/* mmap to a PUD aligned address to hopefully trigger pmd sharing. */
 	unsigned long suggested_addr = 0x7eaa40000000;
-	void *haddr = mmap((void *)suggested_addr, length, PROTECTION,
-			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
+	void *haddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
 	ksft_print_msg("Map haddr: Returned address is %p\n", haddr);
 	if (haddr == MAP_FAILED)
 		ksft_exit_fail_msg("mmap1: %s\n", strerror(errno));
 
 	/* mmap again to a dummy address to hopefully trigger pmd sharing. */
 	suggested_addr = 0x7daa40000000;
-	void *daddr = mmap((void *)suggested_addr, length, PROTECTION,
-			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
+	void *daddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
 	ksft_print_msg("Map daddr: Returned address is %p\n", daddr);
 	if (daddr == MAP_FAILED)
 		ksft_exit_fail_msg("mmap3: %s\n", strerror(errno));
 
 	suggested_addr = 0x7faa40000000;
-	void *vaddr =
-		mmap((void *)suggested_addr, length, PROTECTION, FLAGS, -1, 0);
+	void *vaddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
 	ksft_print_msg("Map vaddr: Returned address is %p\n", vaddr);
 	if (vaddr == MAP_FAILED)
 		ksft_exit_fail_msg("mmap2: %s\n", strerror(errno));
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (7 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed " Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-02  6:59   ` Sayali Patil
  2026-03-27  7:16 ` [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero Sayali Patil
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil

The uffd-wp-mremap test requires the UFFD_FEATURE_PAGEFAULT_FLAG_WP
capability. On systems where userfaultfd write-protect is
not supported, uffd_register() fails and the test reports failures.

Check for the required feature at startup and skip the test when the
UFFD_FEATURE_PAGEFAULT_FLAG_WP capability is not present,
preventing false failures on unsupported configurations.

Before patch:
 running ./uffd-wp-mremap
 ------------------------
  [INFO] detected THP size: 256 KiB
  [INFO] detected THP size: 512 KiB
  [INFO] detected THP size: 1024 KiB
  [INFO] detected THP size: 2048 KiB
  [INFO] detected hugetlb page size: 2048 KiB
  [INFO] detected hugetlb page size: 1048576 KiB
 1..24
  [RUN] test_one_folio(size=65536, private=false, swapout=false,
  hugetlb=false)
 not ok 1 uffd_register() failed
  [RUN] test_one_folio(size=65536, private=true, swapout=false,
  hugetlb=false)
 not ok 2 uffd_register() failed
  [RUN] test_one_folio(size=65536, private=false, swapout=true,
  hugetlb=false)
 not ok 3 uffd_register() failed
  [RUN] test_one_folio(size=65536, private=true, swapout=true,
  hugetlb=false)
 not ok 4 uffd_register() failed
  [RUN] test_one_folio(size=262144, private=false, swapout=false,
  hugetlb=false)
 not ok 5 uffd_register() failed
  [RUN] test_one_folio(size=524288, private=false, swapout=false,
  hugetlb=false)
 not ok 6 uffd_register() failed
 .
 .
 .
 Bail out! 24 out of 24 tests failed
  Totals: pass:0 fail:24 xfail:0 xpass:0 skip:0 error:0
 [FAIL]
not ok 1 uffd-wp-mremap # exit=1

After patch:
 running ./uffd-wp-mremap
 ------------------------
 1..0 # SKIP uffd-wp feature not supported
 [SKIP]
ok 1 uffd-wp-mremap # SKIP

Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/uffd-wp-mremap.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/mm/uffd-wp-mremap.c b/tools/testing/selftests/mm/uffd-wp-mremap.c
index 17186d4a4147..6edbd09f0ca6 100644
--- a/tools/testing/selftests/mm/uffd-wp-mremap.c
+++ b/tools/testing/selftests/mm/uffd-wp-mremap.c
@@ -19,6 +19,17 @@ static size_t thpsizes[20];
 static int nr_hugetlbsizes;
 static size_t hugetlbsizes[10];
 
+static void check_uffd_wp_feature_supported(void)
+{
+	uint64_t features;
+
+	if (uffd_get_features(&features) && errno == ENOENT)
+		ksft_exit_skip("failed to get available features (%d)\n", errno);
+
+	if (!(features & UFFD_FEATURE_PAGEFAULT_FLAG_WP))
+		ksft_exit_skip("uffd-wp feature not supported\n");
+}
+
 static int detect_thp_sizes(size_t sizes[], int max)
 {
 	int count = 0;
@@ -336,6 +347,8 @@ int main(int argc, char **argv)
 	struct thp_settings settings;
 	int i, j, plan = 0;
 
+	check_uffd_wp_feature_supported();
+
 	pagesize = getpagesize();
 	nr_thpsizes = detect_thp_sizes(thpsizes, ARRAY_SIZE(thpsizes));
 	nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (8 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-01 14:23   ` David Hildenbrand (Arm)
  2026-03-27  7:16 ` [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test Sayali Patil
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

uffd-stress currently fails when the computed nr_pages_per_cpu
evaluates to zero:

nr_pages_per_cpu = bytes / page_size / nr_parallel

This can occur on systems with large hugepage sizes (e.g. 1GB) and a
high number of CPUs, where the total allocated memory is sufficient
overall but not enough to provide at least one page per cpu.

In such cases, the failure is due to insufficient test resources
rather than incorrect kernel behaviour. Update the test
to treat this condition as a test skip instead of reporting an error.

Fixes: db0f1c138f18 ("selftests/mm: print some details when uffd-stress gets bad params")
Acked-by: Zi Yan <ziy@nvidia.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/uffd-stress.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 700fbaa18d44..b8f22ea859a6 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -491,9 +491,9 @@ int main(int argc, char **argv)
 
 	gopts->nr_pages_per_cpu = bytes / gopts->page_size / gopts->nr_parallel;
 	if (!gopts->nr_pages_per_cpu) {
-		_err("pages_per_cpu = 0, cannot test (%lu / %lu / %lu)",
-			bytes, gopts->page_size, gopts->nr_parallel);
-		usage();
+		ksft_print_msg("pages_per_cpu = 0, cannot test (%lu / %lu / %lu)\n",
+			       bytes, gopts->page_size, gopts->nr_parallel);
+		return KSFT_SKIP;
 	}
 
 	bounces = atoi(argv[3]);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (9 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-01 14:32   ` Sayali Patil
  2026-03-27  7:16 ` [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category Sayali Patil
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The cleanup loop of allocated memory currently uses:

    for (entry = list; entry != NULL; entry = entry->next) {
        munmap(entry->map, MAP_SIZE);
        if (!entry->next)
            break;
        entry = entry->next;
    }

The inner entry = entry->next causes the loop to skip every
other node, resulting in only half of the mapped regions being
unmapped.

Remove the redundant increment to ensure every entry is visited
and unmapped during cleanup.

Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/compaction_test.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 30209c40b697..f73930706bd0 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -263,9 +263,6 @@ int main(int argc, char **argv)
 
 	for (entry = list; entry != NULL; entry = entry->next) {
 		munmap(entry->map, MAP_SIZE);
-		if (!entry->next)
-			break;
-		entry = entry->next;
 	}
 
 	if (check_compaction(mem_free, hugepage_size,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (10 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-02  7:15   ` Sayali Patil
  2026-03-27  7:16 ` [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes Sayali Patil
  2026-03-27 18:11 ` [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Andrew Morton
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Miaohe Lin, Venkat Rao Bagalkote

run_vmtests.sh contains special handling to ensure the hwpoison_inject
module is available for the memory-failure tests. This logic was
implemented outside of run_test(), making the setup category-specific
but managed globally.

Move the hwpoison_inject handling into run_test() and restrict it
to the memory-failure category so that:
1. the module is checked and loaded only when memory-failure tests run,
2. the test is skipped if the module or the debugfs interface
(/sys/kernel/debug/hwpoison/) is not available.
3. the module is unloaded after the test if it was loaded by the script.

This localizes category-specific setup and makes the test flow
consistent with other per-category preparations.

While updating this logic, fix the module availability check.
The script previously used:

	modprobe -R hwpoison_inject

The -R option prints the resolved module name to stdout, causing every
run to print:

	hwpoison_inject

in the test output, even when no action is required, introducing
unnecessary noise.

Replace this with:

	modprobe -n hwpoison_inject

which verifies that the module is loadable without producing output,
keeping the selftest logs clean and consistent.

Fixes: ff4ef2fbd101 ("selftests/mm: add memory failure anonymous page test")
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 46 ++++++++++++++---------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index eecec0b6eb13..606558cc3b09 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -250,6 +250,27 @@ run_test() {
 			fi
 		fi
 
+		# Ensure hwpoison_inject is available for memory-failure tests
+		if [ "${CATEGORY}" = "memory-failure" ]; then
+			# Try to load hwpoison_inject if not present.
+			HWPOISON_DIR=/sys/kernel/debug/hwpoison/
+			if [ ! -d "$HWPOISON_DIR" ]; then
+				if ! modprobe -n hwpoison_inject > /dev/null 2>&1; then
+					echo "Module hwpoison_inject not found, skipping..." \
+						| tap_prefix
+					skip=1
+				else
+					modprobe hwpoison_inject > /dev/null 2>&1
+					LOADED_MOD=1
+				fi
+			fi
+
+			if [ ! -d "$HWPOISON_DIR" ]; then
+				echo "hwpoison debugfs interface not present" | tap_prefix
+				skip=1
+			fi
+		fi
+
 		local test=$(pretty_name "$*")
 		local title="running $*"
 		local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
@@ -261,6 +282,12 @@ run_test() {
 		else
 			local ret=$ksft_skip
 		fi
+
+		# Unload hwpoison_inject if we loaded it
+		if [ -n "${LOADED_MOD}" ]; then
+			modprobe -r hwpoison_inject > /dev/null 2>&1
+		fi
+
 		count_total=$(( count_total + 1 ))
 		if [ $ret -eq 0 ]; then
 			count_pass=$(( count_pass + 1 ))
@@ -540,24 +567,7 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned
 
 CATEGORY="rmap" run_test ./rmap
 
-# Try to load hwpoison_inject if not present.
-HWPOISON_DIR=/sys/kernel/debug/hwpoison/
-if [ ! -d "$HWPOISON_DIR" ]; then
-	if ! modprobe -q -R hwpoison_inject; then
-		echo "Module hwpoison_inject not found, skipping..."
-	else
-		modprobe hwpoison_inject > /dev/null 2>&1
-		LOADED_MOD=1
-	fi
-fi
-
-if [ -d "$HWPOISON_DIR" ]; then
-	CATEGORY="memory-failure" run_test ./memory-failure
-fi
-
-if [ -n "${LOADED_MOD}" ]; then
-	modprobe -r hwpoison_inject > /dev/null 2>&1
-fi
+CATEGORY="memory-failure" run_test ./memory-failure
 
 if [ "${HAVE_HUGEPAGES}" = 1 ]; then
 	echo "$orig_nr_hugepgs" > /proc/sys/vm/nr_hugepages
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (11 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category Sayali Patil
@ 2026-03-27  7:16 ` Sayali Patil
  2026-04-03 17:16   ` Sayali Patil
  2026-03-27 18:11 ` [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Andrew Morton
  13 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-03-27  7:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Sayali Patil, Venkat Rao Bagalkote

The hugetlb memcg selftest was previously skipped when the configured
huge page size was not 2MB, preventing the test from running on systems
using other default huge page sizes.

Detect the system's configured huge page size at runtime and use it for
the allocation instead of assuming a fixed 2MB size. This allows the
test to run on configurations using non-2MB huge pages and avoids
unnecessary skips.

Fixes: c0dddb7aa5f8 ("selftests: add a selftest to verify hugetlb usage in memcg")
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
---
 .../selftests/cgroup/test_hugetlb_memcg.c     | 66 ++++++++++++++-----
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
index f451aa449be6..a449dbec16a8 100644
--- a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
+++ b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
@@ -12,10 +12,15 @@
 
 #define ADDR ((void *)(0x0UL))
 #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
-/* mapping 8 MBs == 4 hugepages */
-#define LENGTH (8UL*1024*1024)
 #define PROTECTION (PROT_READ | PROT_WRITE)
 
+/*
+ * This value matches the kernel's MEMCG_CHARGE_BATCH definition:
+ * see include/linux/memcontrol.h. If the kernel value changes, this
+ * test constant must be updated accordingly to stay consistent.
+ */
+#define MEMCG_CHARGE_BATCH 64U
+
 /* borrowed from mm/hmm-tests.c */
 static long get_hugepage_size(void)
 {
@@ -84,11 +89,11 @@ static unsigned int check_first(char *addr)
 	return *(unsigned int *)addr;
 }
 
-static void write_data(char *addr)
+static void write_data(char *addr, size_t length)
 {
 	unsigned long i;
 
-	for (i = 0; i < LENGTH; i++)
+	for (i = 0; i < length; i++)
 		*(addr + i) = (char)i;
 }
 
@@ -96,26 +101,31 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
 {
 	char *test_group = (char *)arg;
 	void *addr;
+	long hpage_size = get_hugepage_size() * 1024;
 	long old_current, expected_current, current;
 	int ret = EXIT_FAILURE;
+	size_t length = 4 * hpage_size;
+	int pagesize, nr_pages;
+
+	pagesize = getpagesize();
 
 	old_current = cg_read_long(test_group, "memory.current");
 	set_nr_hugepages(20);
 	current = cg_read_long(test_group, "memory.current");
-	if (current - old_current >= MB(2)) {
+	if (current - old_current >= hpage_size) {
 		ksft_print_msg(
 			"setting nr_hugepages should not increase hugepage usage.\n");
 		ksft_print_msg("before: %ld, after: %ld\n", old_current, current);
 		return EXIT_FAILURE;
 	}
 
-	addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0);
+	addr = mmap(ADDR, length, PROTECTION, FLAGS, 0, 0);
 	if (addr == MAP_FAILED) {
 		ksft_print_msg("fail to mmap.\n");
 		return EXIT_FAILURE;
 	}
 	current = cg_read_long(test_group, "memory.current");
-	if (current - old_current >= MB(2)) {
+	if (current - old_current >= hpage_size) {
 		ksft_print_msg("mmap should not increase hugepage usage.\n");
 		ksft_print_msg("before: %ld, after: %ld\n", old_current, current);
 		goto out_failed_munmap;
@@ -124,10 +134,24 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
 
 	/* read the first page */
 	check_first(addr);
-	expected_current = old_current + MB(2);
+	nr_pages = hpage_size / pagesize;
+	expected_current = old_current + hpage_size;
 	current = cg_read_long(test_group, "memory.current");
-	if (!values_close(expected_current, current, 5)) {
-		ksft_print_msg("memory usage should increase by around 2MB.\n");
+	if (nr_pages < MEMCG_CHARGE_BATCH && current == old_current) {
+		/*
+		 * Memory cgroup charging uses per-CPU stocks and batched updates to the
+		 *  memcg usage counters. For hugetlb allocations, the number of pages
+		 *  that memcg charges is expressed in base pages (nr_pages), not
+		 *  in hugepage units. When the charge for an allocation is smaller than
+		 *  the internal batching threshold  (nr_pages <  MEMCG_CHARGE_BATCH),
+		 *  it may be fully satisfied from the CPU’s local stock. In such
+		 *  cases memory.current does not necessarily
+		 *  increase.
+		 *  Therefore, Treat a zero delta as valid behaviour here.
+		 */
+		ksft_print_msg("no visible memcg charge, allocation consumed from local stock.\n");
+	} else if (!values_close(expected_current, current, 5)) {
+		ksft_print_msg("memory usage should increase by ~1 huge page.\n");
 		ksft_print_msg(
 			"expected memory: %ld, actual memory: %ld\n",
 			expected_current, current);
@@ -135,11 +159,11 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
 	}
 
 	/* write to the whole range */
-	write_data(addr);
+	write_data(addr, length);
 	current = cg_read_long(test_group, "memory.current");
-	expected_current = old_current + MB(8);
+	expected_current = old_current + length;
 	if (!values_close(expected_current, current, 5)) {
-		ksft_print_msg("memory usage should increase by around 8MB.\n");
+		ksft_print_msg("memory usage should increase by around 4 huge pages.\n");
 		ksft_print_msg(
 			"expected memory: %ld, actual memory: %ld\n",
 			expected_current, current);
@@ -147,7 +171,7 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
 	}
 
 	/* unmap the whole range */
-	munmap(addr, LENGTH);
+	munmap(addr, length);
 	current = cg_read_long(test_group, "memory.current");
 	expected_current = old_current;
 	if (!values_close(expected_current, current, 5)) {
@@ -162,13 +186,15 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
 	return ret;
 
 out_failed_munmap:
-	munmap(addr, LENGTH);
+	munmap(addr, length);
 	return ret;
 }
 
 static int test_hugetlb_memcg(char *root)
 {
 	int ret = KSFT_FAIL;
+	int num_pages = 20;
+	long hpage_size = get_hugepage_size();
 	char *test_group;
 
 	test_group = cg_name(root, "hugetlb_memcg_test");
@@ -177,7 +203,7 @@ static int test_hugetlb_memcg(char *root)
 		goto out;
 	}
 
-	if (cg_write(test_group, "memory.max", "100M")) {
+	if (cg_write_numeric(test_group, "memory.max", num_pages * hpage_size * 1024)) {
 		ksft_print_msg("fail to set cgroup memory limit.\n");
 		goto out;
 	}
@@ -200,6 +226,7 @@ int main(int argc, char **argv)
 {
 	char root[PATH_MAX];
 	int ret = EXIT_SUCCESS, has_memory_hugetlb_acc;
+	long val;
 
 	has_memory_hugetlb_acc = proc_mount_contains("memory_hugetlb_accounting");
 	if (has_memory_hugetlb_acc < 0)
@@ -208,12 +235,15 @@ int main(int argc, char **argv)
 		ksft_exit_skip("memory hugetlb accounting is disabled\n");
 
 	/* Unit is kB! */
-	if (get_hugepage_size() != 2048) {
-		ksft_print_msg("test_hugetlb_memcg requires 2MB hugepages\n");
+	val = get_hugepage_size();
+	if (val < 0) {
+		ksft_print_msg("Failed to read hugepage size\n");
 		ksft_test_result_skip("test_hugetlb_memcg\n");
 		return ret;
 	}
 
+	ksft_print_msg("Hugepage size: %ld kB\n", val);
+
 	if (cg_find_unified_root(root, sizeof(root), NULL))
 		ksft_exit_skip("cgroup v2 isn't mounted\n");
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements
  2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
                   ` (12 preceding siblings ...)
  2026-03-27  7:16 ` [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes Sayali Patil
@ 2026-03-27 18:11 ` Andrew Morton
       [not found]   ` <09104413-483f-4852-9d7e-71e0f86a1754@linux.ibm.com>
  13 siblings, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2026-03-27 18:11 UTC (permalink / raw)
  To: Sayali Patil
  Cc: Shuah Khan, linux-mm, linux-kernel, linux-kselftest,
	Ritesh Harjani, David Hildenbrand, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev

On Fri, 27 Mar 2026 12:45:54 +0530 Sayali Patil <sayalip@linux.ibm.com> wrote:

> Powerpc systems with a 64K base page size exposed several issues while
> running mm selftests. Some tests assume specific hugetlb configurations,
> use incorrect interfaces, or fail instead of skipping when the required
> kernel features are not available.
> 
> This series fixes these issues and improves test robustness.
> 
> Please review the patches and provide any feedback or suggestions for
> improvement.

AI review asks many questions:
https://sashiko.dev/#/patchset/cover.1774591179.git.sayalip@linux.ibm.com


I never knew about that bash line continuation thing.

hp2:/home/akpm> cat t.sh

foo=\
	bar
	
echo $foo
hp2:/home/akpm> bash t.sh
t.sh: line 3: bar: command not found

Huh.  But it presumably passed your testing so confused.


I don't want to risk breaking selftests so I'll set v3 aside until
you're confident we should proceed.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements
       [not found]   ` <09104413-483f-4852-9d7e-71e0f86a1754@linux.ibm.com>
@ 2026-03-30 22:11     ` Andrew Morton
  2026-04-01 14:05       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2026-03-30 22:11 UTC (permalink / raw)
  To: Sayali Patil
  Cc: Shuah Khan, linux-mm, linux-kernel, linux-kselftest,
	Ritesh Harjani, David Hildenbrand, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev

On Mon, 30 Mar 2026 11:27:04 +0530 Sayali Patil <sayalip@linux.ibm.com> wrote:

> > I don't want to risk breaking selftests so I'll set v3 aside until
> > you're confident we should proceed.
> >
> > Thanks.
> 
> This line continuation pattern has been used in selftests for quite some 
> time. For example, a similar usage exists in 
> |charge_reserved_hugetlb.sh|, introduced here:
> https://lore.kernel.org/all/20200211213128.73302-8-almasrymina@google.com/T/#u
> <https://lore.kernel.org/all/20200211213128.73302-8-almasrymina@google.com/T/#u>
> 
>   echo "$reservation_limit" > \
>      $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file
> 
> In this case, it was primarily used to keep line length within 100 
> characters. I’ve tested the script  and it behaved as expected.

Great, thanks for checking.

Series is nicely reviewed and an earlier version spent time in mm.git. 
And the bar tends to be lower for selftests.  So I *could* break my rule
(https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org)
but would prefer not.  What do others think?

Did Venkat's report
(https://lkml.kernel.org/r/cf815c21-138e-44c8-986d-d8496503ee32@linux.ibm.com)
get addressed?  I'm not seeing that in the v2->v3 changelogging.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements
  2026-03-30 22:11     ` Andrew Morton
@ 2026-04-01 14:05       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:05 UTC (permalink / raw)
  To: Andrew Morton, Sayali Patil
  Cc: Shuah Khan, linux-mm, linux-kernel, linux-kselftest,
	Ritesh Harjani, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev

On 3/31/26 00:11, Andrew Morton wrote:
> On Mon, 30 Mar 2026 11:27:04 +0530 Sayali Patil <sayalip@linux.ibm.com> wrote:
> 
>>> I don't want to risk breaking selftests so I'll set v3 aside until
>>> you're confident we should proceed.
>>>
>>> Thanks.
>>
>> This line continuation pattern has been used in selftests for quite some 
>> time. For example, a similar usage exists in 
>> |charge_reserved_hugetlb.sh|, introduced here:
>> https://lore.kernel.org/all/20200211213128.73302-8-almasrymina@google.com/T/#u
>> <https://lore.kernel.org/all/20200211213128.73302-8-almasrymina@google.com/T/#u>
>>
>>   echo "$reservation_limit" > \
>>      $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file
>>
>> In this case, it was primarily used to keep line length within 100 
>> characters. I’ve tested the script  and it behaved as expected.
> 
> Great, thanks for checking.
> 
> Series is nicely reviewed and an earlier version spent time in mm.git. 
> And the bar tends to be lower for selftests.  So I *could* break my rule
> (https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org)
> but would prefer not.  What do others think?

I guess it doesn't really hurt to take this now.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction in charge_reserved_hugetlb.sh
  2026-03-27  7:15 ` [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction " Sayali Patil
@ 2026-04-01 14:06   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:06 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:15, Sayali Patil wrote:
> The charge_reserved_hugetlb.sh script assumes hugetlb cgroup memory
> interface file names use the "<size>MB" format
> (e.g. hugetlb.1024MB.current).
> This assumption breaks on systems with larger huge pages such as 1GB,
> where the kernel exposes normalized units:
>     hugetlb.1GB.current
>     hugetlb.1GB.max
>     hugetlb.1GB.rsvd.max
>     ...
> 
> As a result, the script attempts to access files like
> hugetlb.1024MB.current, which do not exist when the kernel reports the
> size in GB.
> 
> Normalize the huge page size and construct the pathname using the
> appropriate unit (MB or GB), matching the hugetlb controller naming.
> 
> Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
> Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh
  2026-03-27  7:15 ` [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh Sayali Patil
@ 2026-04-01 14:06   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:06 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:15, Sayali Patil wrote:
> The hugetlb_reparenting_test.sh script constructs hugetlb cgroup
> memory interface file names based on the configured huge page size. The
> script formats the size only in MB units, which causes mismatches on
> systems using larger huge pages where the kernel exposes normalized
> units (e.g. "1GB" instead of "1024MB").
> 
> As a result, the test fails to locate the corresponding cgroup files
> when 1GB huge pages are configured.
> 
> Update the script to detect the huge page size and select the
> appropriate unit (MB or GB) so that the constructed paths match the
> kernel's hugetlb controller naming.
> 
> Also print an explicit "Fail" message when a test failure occurs to
> improve result visibility.
> 
> Fixes: e487a5d513cb ("selftest/mm: make hugetlb_reparenting_test tolerant to async reparenting")
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks in hugetlb_reparenting_test.sh
  2026-03-27  7:15 ` [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks " Sayali Patil
@ 2026-04-01 14:08   ` David Hildenbrand (Arm)
  2026-04-03 19:59     ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:08 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev

On 3/27/26 08:15, Sayali Patil wrote:
> Launch write_to_hugetlbfs as a separate process and move only its PID
> into the target cgroup before waiting for completion. This avoids moving
> the test shell itself, prevents unintended charging to the shell, and
> ensures hugetlb and memcg accounting is attributed only to the intended
> workload.
> 
> Add a short delay before the hugetlb allocation to avoid a race where
> memory may be charged before the task migration takes effect, which
> can lead to incorrect accounting and intermittent test failures.

Isn't there still a chance for a race, for example, when running in a VM?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages
  2026-03-27  7:16 ` [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages Sayali Patil
@ 2026-04-01 14:10   ` David Hildenbrand (Arm)
  2026-04-01 20:45     ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:10 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:16, Sayali Patil wrote:
> The hugepage-mremap selftest uses a default size of 10MB, which is
> sufficient for small huge page sizes. However, when the huge page size
> is large (e.g. 1GB), 10MB is smaller than a single huge page.
> As a result, the test does not trigger PMD sharing and the
> corresponding unshare path in mremap(), causing the
> test to fail (mremap succeeds where a failure is expected).
> 
> Update run_vmtest.sh to use twice the huge page size when the huge page
> size exceeds 10MB, while retaining the 10MB default for smaller huge
> pages. This ensures the test exercises the intended PMD sharing and
> unsharing paths for larger huge page sizes.
> 
> Before patch:
>  running ./hugepage-mremap
>  ------------------------------
>  TAP version 13
>  1..1
>   Map haddr: Returned address is 0x7eaa40000000
>   Map daddr: Returned address is 0x7daa40000000
>   Map vaddr: Returned address is 0x7faa40000000
>   Address returned by mmap() = 0x7fffaa600000
>   Mremap: Returned address is 0x7faa40000000
>   First hex is 0
>   First hex is 3020100
>  Bail out! mremap: Expected failure, but call succeeded
>  Planned tests != run tests (1 != 0)
>  Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
>  [FAIL]
>  not ok 1 hugepage-mremap # exit=1
> 
> Before patch:
>  running ./hugepage-mremap
>  ------------------------------
>  TAP version 13
>  1..1
>   Map haddr: Returned address is 0x7eaa40000000
>   Map daddr: Returned address is 0x7daa40000000
>   Map vaddr: Returned address is 0x7faa40000000
>   Address returned by mmap() = 0x7fffaa600000
>   Mremap: Returned address is 0x7faa40000000
>   First hex is 0
>   First hex is 3020100
>  Bail out! mremap: Expected failure, but call succeeded
>  Planned tests != run tests (1 != 0)
>  Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
>  [FAIL]
>  not ok 1 hugepage-mremap # exit=1
> 

Why are there two "Before patch" in here?

> After patch:
>  running ./hugepage-mremap 2048
>  ------------------------------
>  TAP version 13
>  1..1
>   Map haddr: Returned address is 0x7eaa40000000
>   Map daddr: Returned address is 0x7daa40000000
>   Map vaddr: Returned address is 0x7faa40000000
>   Address returned by mmap() = 0x7fff13000000
>   Mremap: Returned address is 0x7faa40000000
>   First hex is 0
>   First hex is 3020100
>   ok 1 Read same data
>  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>  [PASS]
>  ok 1 hugepage-mremap 2048
> 
> Fixes: f77a286de48c ("mm, hugepages: make memory size variable in hugepage-mremap selftest")
> Acked-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>  tools/testing/selftests/mm/run_vmtests.sh | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
> index afdcfd0d7cef..eecec0b6eb13 100755
> --- a/tools/testing/selftests/mm/run_vmtests.sh
> +++ b/tools/testing/selftests/mm/run_vmtests.sh
> @@ -293,7 +293,18 @@ echo "$shmmax" > /proc/sys/kernel/shmmax
>  echo "$shmall" > /proc/sys/kernel/shmall
>  
>  CATEGORY="hugetlb" run_test ./map_hugetlb
> -CATEGORY="hugetlb" run_test ./hugepage-mremap
> +
> +# If the huge page size is larger than 10MB, increase the test memory size
> +# to twice the huge page size (in MB) to ensure the test exercises PMD sharing
> +# and the unshare path in hugepage-mremap. Otherwise, run the test with
> +# the default 10MB memory size.

PMD sharing requires, on x86, a 1 GiB area with 2 MiB hugetlb folios.

How does doubling sort that out?

Also, why the magic value 10mb?


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap
  2026-03-27  7:16 ` [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap Sayali Patil
@ 2026-04-01 14:18   ` David Hildenbrand (Arm)
       [not found]     ` <7b6652f3-c994-4ef4-87a4-5473cd1254b7@linux.ibm.com>
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:18 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:16, Sayali Patil wrote:
> Previously, register_region_with_uffd() created a new anonymous
> mapping and overwrote the address supplied by the caller before
> registering the range with userfaultfd.
> 
> As a result, userfaultfd was applied to an unrelated anonymous mapping
> instead of the hugetlb region used by the test.
> 
> Remove the extra mmap() and register the caller-provided address range
> directly using UFFDIO_REGISTER_MODE_MISSING, so that faults are
> generated for the hugetlb mapping used by the test.
> 
> This ensures userfaultfd operates on the actual hugetlb test region and
> validates the expected fault handling.
> 
> Before patch:
>  running ./hugepage-mremap
>  -------------------------
>  TAP version 13
>  1..1
>   Map haddr: Returned address is 0x7eaa40000000
>   Map daddr: Returned address is 0x7daa40000000
>   Map vaddr: Returned address is 0x7faa40000000
>   Address returned by mmap() = 0x7fff9d000000
>   Mremap: Returned address is 0x7faa40000000
>   First hex is 0
>   First hex is 3020100
>  ok 1 Read same data
>  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>  [PASS]
>  ok 1 hugepage-mremap
> 
> After patch:
>  running ./hugepage-mremap
>  -------------------------
>  TAP version 13
>  1..1
>   Map haddr: Returned address is 0x7eaa40000000
>   Map daddr: Returned address is 0x7daa40000000
>   Map vaddr: Returned address is 0x7faa40000000
>   Registered memory at address 0x7eaa40000000 with userfaultfd
>   Mremap: Returned address is 0x7faa40000000
>   First hex is 0
>   First hex is 3020100
>  ok 1 Read same data
>  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>  [PASS]
>  ok 1 hugepage-mremap

Okay, so we tested mremap() of something that is not even hugetlb.

> 
> Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>  tools/testing/selftests/mm/hugepage-mremap.c | 21 +++++---------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
> index b8f7d92e5a35..e611249080d6 100644
> --- a/tools/testing/selftests/mm/hugepage-mremap.c
> +++ b/tools/testing/selftests/mm/hugepage-mremap.c
> @@ -85,25 +85,14 @@ static void register_region_with_uffd(char *addr, size_t len)
>  	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
>  		ksft_exit_fail_msg("ioctl-UFFDIO_API: %s\n", strerror(errno));
>  
> -	/* Create a private anonymous mapping. The memory will be
> -	 * demand-zero paged--that is, not yet allocated. When we
> -	 * actually touch the memory, it will be allocated via
> -	 * the userfaultfd.
> -	 */
> -
> -	addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
> -		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -	if (addr == MAP_FAILED)
> -		ksft_exit_fail_msg("mmap: %s\n", strerror(errno));
> -
> -	ksft_print_msg("Address returned by mmap() = %p\n", addr);
> -
> -	/* Register the memory range of the mapping we just created for
> -	 * handling by the userfaultfd object. In mode, we request to track
> -	 * missing pages (i.e., pages that have not yet been faulted in).
> +	/* Register the passed memory range for handling by the userfaultfd object.


/*
 * ...

While at it.

> +	 * In mode, we request to track missing pages
> +	 * (i.e., pages that have not yet been faulted in).
>  	 */
>  	if (uffd_register(uffd, addr, len, true, false, false))
>  		ksft_exit_fail_msg("ioctl-UFFDIO_REGISTER: %s\n", strerror(errno));
> +
> +	ksft_print_msg("Registered memory at address %p with userfaultfd\n", addr);
>  }
>  
>  int main(int argc, char *argv[])

Yes, that code is extremely weird. I wonder if this was some
copy-and-paste from other uffd test code.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-03-27  7:16 ` [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed " Sayali Patil
@ 2026-04-01 14:21   ` David Hildenbrand (Arm)
  2026-04-01 14:40     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:21 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:16, Sayali Patil wrote:
> The hugepage-mremap selftest reserves the destination address using a
> anonymous base-page mapping before calling mremap() with MREMAP_FIXED,
> while the source region is hugetlb-backed.
> 
> When remapping a hugetlb mapping into a base-page VMA may fail with:
> 
>     mremap: Device or resource busy
> 
> This is observed on powerpc hash MMU systems where slice constraints
> and page size incompatibilities prevent the remap.
> 

That is weird. An mremap(MREMAP_FIXED) is really just an munmap() + move.

Are we sure this is not some actual problem in the hugetlb implementation?

> Ensure the destination region is created using MAP_HUGETLB so that both
> source and destination VMAs are hugetlb-backed and compatible. Also add
> MAP_POPULATE to the destination mapping to prefault hugepages,
> matching the behaviour used for other hugetlb mapping in the test and
> ensuring deterministic behaviour.

But then the test suddenly requires more hugetlb pages, no? I don't see
a good reason for the MAP_POPULATE, really. It will be discarded either way.

> 
> Update the FLAGS macro to include MAP_HUGETLB | MAP_SHARED |
> MAP_POPULATE so that both mappings are hugetlb-backed and compatible.
> Also use the macro for the mmap() calls to avoid repeating
> the flag combination.
> 
> This ensures the test reliably exercises hugetlb mremap instead of
> failing due to VMA type mismatch.
> 
> Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
> Acked-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>  tools/testing/selftests/mm/hugepage-mremap.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
> index e611249080d6..48c24a4ba9a7 100644
> --- a/tools/testing/selftests/mm/hugepage-mremap.c
> +++ b/tools/testing/selftests/mm/hugepage-mremap.c
> @@ -31,7 +31,7 @@
>  #define MB_TO_BYTES(x) (x * 1024 * 1024)
>  
>  #define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC)
> -#define FLAGS (MAP_SHARED | MAP_ANONYMOUS)
> +#define FLAGS (MAP_HUGETLB | MAP_SHARED | MAP_POPULATE)
>  
>  static void check_bytes(char *addr)
>  {
> @@ -121,23 +121,20 @@ int main(int argc, char *argv[])
>  
>  	/* mmap to a PUD aligned address to hopefully trigger pmd sharing. */
>  	unsigned long suggested_addr = 0x7eaa40000000;
> -	void *haddr = mmap((void *)suggested_addr, length, PROTECTION,
> -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
> +	void *haddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>  	ksft_print_msg("Map haddr: Returned address is %p\n", haddr);
>  	if (haddr == MAP_FAILED)
>  		ksft_exit_fail_msg("mmap1: %s\n", strerror(errno));
>  
>  	/* mmap again to a dummy address to hopefully trigger pmd sharing. */
>  	suggested_addr = 0x7daa40000000;
> -	void *daddr = mmap((void *)suggested_addr, length, PROTECTION,
> -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
> +	void *daddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>  	ksft_print_msg("Map daddr: Returned address is %p\n", daddr);
>  	if (daddr == MAP_FAILED)
>  		ksft_exit_fail_msg("mmap3: %s\n", strerror(errno));
>  
>  	suggested_addr = 0x7faa40000000;
> -	void *vaddr =
> -		mmap((void *)suggested_addr, length, PROTECTION, FLAGS, -1, 0);
> +	void *vaddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>  	ksft_print_msg("Map vaddr: Returned address is %p\n", vaddr);
>  	if (vaddr == MAP_FAILED)
>  		ksft_exit_fail_msg("mmap2: %s\n", strerror(errno));


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero
  2026-03-27  7:16 ` [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero Sayali Patil
@ 2026-04-01 14:23   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:23 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 3/27/26 08:16, Sayali Patil wrote:
> uffd-stress currently fails when the computed nr_pages_per_cpu
> evaluates to zero:
> 
> nr_pages_per_cpu = bytes / page_size / nr_parallel
> 
> This can occur on systems with large hugepage sizes (e.g. 1GB) and a
> high number of CPUs, where the total allocated memory is sufficient
> overall but not enough to provide at least one page per cpu.
> 
> In such cases, the failure is due to insufficient test resources
> rather than incorrect kernel behaviour. Update the test
> to treat this condition as a test skip instead of reporting an error.
> 
> Fixes: db0f1c138f18 ("selftests/mm: print some details when uffd-stress gets bad params")
> Acked-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test
  2026-03-27  7:16 ` [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test Sayali Patil
@ 2026-04-01 14:32   ` Sayali Patil
  2026-04-01 14:39     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-04-01 14:32 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Venkat Rao Bagalkote


On 27/03/26 12:46, Sayali Patil wrote:
> The cleanup loop of allocated memory currently uses:
>
>      for (entry = list; entry != NULL; entry = entry->next) {
>          munmap(entry->map, MAP_SIZE);
>          if (!entry->next)
>              break;
>          entry = entry->next;
>      }
>
> The inner entry = entry->next causes the loop to skip every
> other node, resulting in only half of the mapped regions being
> unmapped.
>
> Remove the redundant increment to ensure every entry is visited
> and unmapped during cleanup.
>
> Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>   tools/testing/selftests/mm/compaction_test.c | 3 ---
>   1 file changed, 3 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
> index 30209c40b697..f73930706bd0 100644
> --- a/tools/testing/selftests/mm/compaction_test.c
> +++ b/tools/testing/selftests/mm/compaction_test.c
> @@ -263,9 +263,6 @@ int main(int argc, char **argv)
>   
>   	for (entry = list; entry != NULL; entry = entry->next) {
>   		munmap(entry->map, MAP_SIZE);
> -		if (!entry->next)
> -			break;
> -		entry = entry->next;
>   	}
>   
>   	if (check_compaction(mem_free, hugepage_size,

Sorry, this change is not valid.

The goal of this test is to verify the kernel’s ability to compact
unevictable (MAP_LOCKED) pages. The loop is intentionally written to
unmap every other chunk, thereby creating fragmentation with locked pages
before check_compaction() is invoked.

With the proposed change (removing the double increment), the loop ends up
unmapping all allocated locked pages instead of leaving a fragmented
pattern. This results in memory being effectively unfragmented.

I will send v4 without this patch.

Thanks,
Sayali

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test
  2026-04-01 14:32   ` Sayali Patil
@ 2026-04-01 14:39     ` David Hildenbrand (Arm)
  2026-04-01 17:33       ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 14:39 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 4/1/26 16:32, Sayali Patil wrote:
> 
> On 27/03/26 12:46, Sayali Patil wrote:
>> The cleanup loop of allocated memory currently uses:
>>
>>      for (entry = list; entry != NULL; entry = entry->next) {
>>          munmap(entry->map, MAP_SIZE);
>>          if (!entry->next)
>>              break;
>>          entry = entry->next;
>>      }
>>
>> The inner entry = entry->next causes the loop to skip every
>> other node, resulting in only half of the mapped regions being
>> unmapped.
>>
>> Remove the redundant increment to ensure every entry is visited
>> and unmapped during cleanup.
>>
>> Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>> ---
>>   tools/testing/selftests/mm/compaction_test.c | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/
>> testing/selftests/mm/compaction_test.c
>> index 30209c40b697..f73930706bd0 100644
>> --- a/tools/testing/selftests/mm/compaction_test.c
>> +++ b/tools/testing/selftests/mm/compaction_test.c
>> @@ -263,9 +263,6 @@ int main(int argc, char **argv)
>>         for (entry = list; entry != NULL; entry = entry->next) {
>>           munmap(entry->map, MAP_SIZE);
>> -        if (!entry->next)
>> -            break;
>> -        entry = entry->next;
>>       }
>>         if (check_compaction(mem_free, hugepage_size,
> 
> Sorry, this change is not valid.
> 
> The goal of this test is to verify the kernel’s ability to compact
> unevictable (MAP_LOCKED) pages. The loop is intentionally written to
> unmap every other chunk, thereby creating fragmentation with locked pages
> before check_compaction() is invoked.
> 
> With the proposed change (removing the double increment), the loop ends up
> unmapping all allocated locked pages instead of leaving a fragmented
> pattern. This results in memory being effectively unfragmented.

Ahhh, we should really make that clearer in a comment. I missed it myself :(

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-04-01 14:21   ` David Hildenbrand (Arm)
@ 2026-04-01 14:40     ` Lorenzo Stoakes (Oracle)
  2026-04-01 20:39       ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-01 14:40 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev, Venkat Rao Bagalkote

On Wed, Apr 01, 2026 at 04:21:55PM +0200, David Hildenbrand (Arm) wrote:
> On 3/27/26 08:16, Sayali Patil wrote:
> > The hugepage-mremap selftest reserves the destination address using a
> > anonymous base-page mapping before calling mremap() with MREMAP_FIXED,
> > while the source region is hugetlb-backed.
> >
> > When remapping a hugetlb mapping into a base-page VMA may fail with:
> >
> >     mremap: Device or resource busy
> >
> > This is observed on powerpc hash MMU systems where slice constraints
> > and page size incompatibilities prevent the remap.

OK so digging in:

mremap -> ... -> vrm_set_new_addr() -> get_unmapped_area() -> ... (in ppc arch
code) -> slice_get_unmapped_area():

unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
				      unsigned long flags, unsigned int psize,
				      int topdown)
{
	...
	/* bunch of checks */

	/* If we have MAP_FIXED and failed the above steps, then error out */
	if (fixed)
		return -EBUSY;

	...
}

Is presumably where we hit the issue.

> >
>
> That is weird. An mremap(MREMAP_FIXED) is really just an munmap() + move.

Yeah the weird bit I guess is that we _still_ invoke get_unmapped_area() but
with MAP_FIXED set to indicate that we want the specific address, so it's
subject to the above checks.

>
> Are we sure this is not some actual problem in the hugetlb implementation?

It seems the 'slices' check sees if the _target address_ has an equivalent page
size, presumably hugetlb-mandated, and fails if they're not equivalent, so this
change is just accounting for that.


>
> > Ensure the destination region is created using MAP_HUGETLB so that both
> > source and destination VMAs are hugetlb-backed and compatible. Also add
> > MAP_POPULATE to the destination mapping to prefault hugepages,
> > matching the behaviour used for other hugetlb mapping in the test and
> > ensuring deterministic behaviour.
>
> But then the test suddenly requires more hugetlb pages, no? I don't see
> a good reason for the MAP_POPULATE, really. It will be discarded either way.

Yeah I'm not sure about the MAP_POPULATE being all that important here.

>
> >
> > Update the FLAGS macro to include MAP_HUGETLB | MAP_SHARED |
> > MAP_POPULATE so that both mappings are hugetlb-backed and compatible.
> > Also use the macro for the mmap() calls to avoid repeating
> > the flag combination.
> >
> > This ensures the test reliably exercises hugetlb mremap instead of
> > failing due to VMA type mismatch.
> >
> > Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
> > Acked-by: Zi Yan <ziy@nvidia.com>
> > Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> > Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> > ---
> >  tools/testing/selftests/mm/hugepage-mremap.c | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
> > index e611249080d6..48c24a4ba9a7 100644
> > --- a/tools/testing/selftests/mm/hugepage-mremap.c
> > +++ b/tools/testing/selftests/mm/hugepage-mremap.c
> > @@ -31,7 +31,7 @@
> >  #define MB_TO_BYTES(x) (x * 1024 * 1024)
> >
> >  #define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC)
> > -#define FLAGS (MAP_SHARED | MAP_ANONYMOUS)
> > +#define FLAGS (MAP_HUGETLB | MAP_SHARED | MAP_POPULATE)
> >
> >  static void check_bytes(char *addr)
> >  {
> > @@ -121,23 +121,20 @@ int main(int argc, char *argv[])
> >
> >  	/* mmap to a PUD aligned address to hopefully trigger pmd sharing. */
> >  	unsigned long suggested_addr = 0x7eaa40000000;
> > -	void *haddr = mmap((void *)suggested_addr, length, PROTECTION,
> > -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
> > +	void *haddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
> >  	ksft_print_msg("Map haddr: Returned address is %p\n", haddr);
> >  	if (haddr == MAP_FAILED)
> >  		ksft_exit_fail_msg("mmap1: %s\n", strerror(errno));
> >
> >  	/* mmap again to a dummy address to hopefully trigger pmd sharing. */
> >  	suggested_addr = 0x7daa40000000;
> > -	void *daddr = mmap((void *)suggested_addr, length, PROTECTION,
> > -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
> > +	void *daddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
> >  	ksft_print_msg("Map daddr: Returned address is %p\n", daddr);
> >  	if (daddr == MAP_FAILED)
> >  		ksft_exit_fail_msg("mmap3: %s\n", strerror(errno));
> >
> >  	suggested_addr = 0x7faa40000000;
> > -	void *vaddr =
> > -		mmap((void *)suggested_addr, length, PROTECTION, FLAGS, -1, 0);
> > +	void *vaddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
> >  	ksft_print_msg("Map vaddr: Returned address is %p\n", vaddr);
> >  	if (vaddr == MAP_FAILED)
> >  		ksft_exit_fail_msg("mmap2: %s\n", strerror(errno));
>
>
> --
> Cheers,
>
> David

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test
  2026-04-01 14:39     ` David Hildenbrand (Arm)
@ 2026-04-01 17:33       ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-01 17:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, Shuah Khan, linux-mm,
	linux-kernel, linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote



On 01/04/26 20:09, David Hildenbrand (Arm) wrote:
> On 4/1/26 16:32, Sayali Patil wrote:
>>
>> On 27/03/26 12:46, Sayali Patil wrote:
>>> The cleanup loop of allocated memory currently uses:
>>>
>>>       for (entry = list; entry != NULL; entry = entry->next) {
>>>           munmap(entry->map, MAP_SIZE);
>>>           if (!entry->next)
>>>               break;
>>>           entry = entry->next;
>>>       }
>>>
>>> The inner entry = entry->next causes the loop to skip every
>>> other node, resulting in only half of the mapped regions being
>>> unmapped.
>>>
>>> Remove the redundant increment to ensure every entry is visited
>>> and unmapped during cleanup.
>>>
>>> Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>>> ---
>>>    tools/testing/selftests/mm/compaction_test.c | 3 ---
>>>    1 file changed, 3 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/
>>> testing/selftests/mm/compaction_test.c
>>> index 30209c40b697..f73930706bd0 100644
>>> --- a/tools/testing/selftests/mm/compaction_test.c
>>> +++ b/tools/testing/selftests/mm/compaction_test.c
>>> @@ -263,9 +263,6 @@ int main(int argc, char **argv)
>>>          for (entry = list; entry != NULL; entry = entry->next) {
>>>            munmap(entry->map, MAP_SIZE);
>>> -        if (!entry->next)
>>> -            break;
>>> -        entry = entry->next;
>>>        }
>>>          if (check_compaction(mem_free, hugepage_size,
>>
>> Sorry, this change is not valid.
>>
>> The goal of this test is to verify the kernel’s ability to compact
>> unevictable (MAP_LOCKED) pages. The loop is intentionally written to
>> unmap every other chunk, thereby creating fragmentation with locked pages
>> before check_compaction() is invoked.
>>
>> With the proposed change (removing the double increment), the loop ends up
>> unmapping all allocated locked pages instead of leaving a fragmented
>> pattern. This results in memory being effectively unfragmented.
> 
> Ahhh, we should really make that clearer in a comment. I missed it myself :(
> 
yes, let me add a comment to clarify this and send it in v4.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-04-01 14:40     ` Lorenzo Stoakes (Oracle)
@ 2026-04-01 20:39       ` Sayali Patil
  2026-04-02  7:33         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 39+ messages in thread
From: Sayali Patil @ 2026-04-01 20:39 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), David Hildenbrand (Arm)
  Cc: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev, Venkat Rao Bagalkote



On 01/04/26 20:10, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Apr 01, 2026 at 04:21:55PM +0200, David Hildenbrand (Arm) wrote:
>> On 3/27/26 08:16, Sayali Patil wrote:
>>> The hugepage-mremap selftest reserves the destination address using a
>>> anonymous base-page mapping before calling mremap() with MREMAP_FIXED,
>>> while the source region is hugetlb-backed.
>>>
>>> When remapping a hugetlb mapping into a base-page VMA may fail with:
>>>
>>>      mremap: Device or resource busy
>>>
>>> This is observed on powerpc hash MMU systems where slice constraints
>>> and page size incompatibilities prevent the remap.
> 
> OK so digging in:
> 
> mremap -> ... -> vrm_set_new_addr() -> get_unmapped_area() -> ... (in ppc arch
> code) -> slice_get_unmapped_area():
> 
> unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> 				      unsigned long flags, unsigned int psize,
> 				      int topdown)
> {
> 	...
> 	/* bunch of checks */
> 
> 	/* If we have MAP_FIXED and failed the above steps, then error out */
> 	if (fixed)
> 		return -EBUSY;
> 
> 	...
> }
> 
> Is presumably where we hit the issue.
> 
>>>
>>
>> That is weird. An mremap(MREMAP_FIXED) is really just an munmap() + move.
> 
> Yeah the weird bit I guess is that we _still_ invoke get_unmapped_area() but
> with MAP_FIXED set to indicate that we want the specific address, so it's
> subject to the above checks.
> 
>>
>> Are we sure this is not some actual problem in the hugetlb implementation?
> 
> It seems the 'slices' check sees if the _target address_ has an equivalent page
> size, presumably hugetlb-mandated, and fails if they're not equivalent, so this
> change is just accounting for that.
> 
Yes, this change accounts for that by ensuring the destination is 
created with MAP_HUGETLB so it has the same page size as the source.
> 
>>
>>> Ensure the destination region is created using MAP_HUGETLB so that both
>>> source and destination VMAs are hugetlb-backed and compatible. Also add
>>> MAP_POPULATE to the destination mapping to prefault hugepages,
>>> matching the behaviour used for other hugetlb mapping in the test and
>>> ensuring deterministic behaviour.
>>
>> But then the test suddenly requires more hugetlb pages, no? I don't see
>> a good reason for the MAP_POPULATE, really. It will be discarded either way.
> 
> Yeah I'm not sure about the MAP_POPULATE being all that important here.
> 
As far as I understand, without MAP_POPULATE, memory accesses would 
trigger userfaults, and since the test is single-threaded and has no 
background handler for the uffd, it would deadlock. MAP_POPULATE ensures 
the test runs correctly by prefaulting all pages, but please let me know 
if I’m mistaken.
>>
>>>
>>> Update the FLAGS macro to include MAP_HUGETLB | MAP_SHARED |
>>> MAP_POPULATE so that both mappings are hugetlb-backed and compatible.
>>> Also use the macro for the mmap() calls to avoid repeating
>>> the flag combination.
>>>
>>> This ensures the test reliably exercises hugetlb mremap instead of
>>> failing due to VMA type mismatch.
>>>
>>> Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>>> ---
>>>   tools/testing/selftests/mm/hugepage-mremap.c | 11 ++++-------
>>>   1 file changed, 4 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
>>> index e611249080d6..48c24a4ba9a7 100644
>>> --- a/tools/testing/selftests/mm/hugepage-mremap.c
>>> +++ b/tools/testing/selftests/mm/hugepage-mremap.c
>>> @@ -31,7 +31,7 @@
>>>   #define MB_TO_BYTES(x) (x * 1024 * 1024)
>>>
>>>   #define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC)
>>> -#define FLAGS (MAP_SHARED | MAP_ANONYMOUS)
>>> +#define FLAGS (MAP_HUGETLB | MAP_SHARED | MAP_POPULATE)
>>>
>>>   static void check_bytes(char *addr)
>>>   {
>>> @@ -121,23 +121,20 @@ int main(int argc, char *argv[])
>>>
>>>   	/* mmap to a PUD aligned address to hopefully trigger pmd sharing. */
>>>   	unsigned long suggested_addr = 0x7eaa40000000;
>>> -	void *haddr = mmap((void *)suggested_addr, length, PROTECTION,
>>> -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
>>> +	void *haddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>>>   	ksft_print_msg("Map haddr: Returned address is %p\n", haddr);
>>>   	if (haddr == MAP_FAILED)
>>>   		ksft_exit_fail_msg("mmap1: %s\n", strerror(errno));
>>>
>>>   	/* mmap again to a dummy address to hopefully trigger pmd sharing. */
>>>   	suggested_addr = 0x7daa40000000;
>>> -	void *daddr = mmap((void *)suggested_addr, length, PROTECTION,
>>> -			   MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0);
>>> +	void *daddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>>>   	ksft_print_msg("Map daddr: Returned address is %p\n", daddr);
>>>   	if (daddr == MAP_FAILED)
>>>   		ksft_exit_fail_msg("mmap3: %s\n", strerror(errno));
>>>
>>>   	suggested_addr = 0x7faa40000000;
>>> -	void *vaddr =
>>> -		mmap((void *)suggested_addr, length, PROTECTION, FLAGS, -1, 0);
>>> +	void *vaddr = mmap((void *)suggested_addr, length, PROTECTION, FLAGS, fd, 0);
>>>   	ksft_print_msg("Map vaddr: Returned address is %p\n", vaddr);
>>>   	if (vaddr == MAP_FAILED)
>>>   		ksft_exit_fail_msg("mmap2: %s\n", strerror(errno));
>>
>>
>> --
>> Cheers,
>>
>> David
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages
  2026-04-01 14:10   ` David Hildenbrand (Arm)
@ 2026-04-01 20:45     ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-01 20:45 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, Shuah Khan, linux-mm,
	linux-kernel, linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote



On 01/04/26 19:40, David Hildenbrand (Arm) wrote:
> On 3/27/26 08:16, Sayali Patil wrote:
>> The hugepage-mremap selftest uses a default size of 10MB, which is
>> sufficient for small huge page sizes. However, when the huge page size
>> is large (e.g. 1GB), 10MB is smaller than a single huge page.
>> As a result, the test does not trigger PMD sharing and the
>> corresponding unshare path in mremap(), causing the
>> test to fail (mremap succeeds where a failure is expected).
>>
>> Update run_vmtest.sh to use twice the huge page size when the huge page
>> size exceeds 10MB, while retaining the 10MB default for smaller huge
>> pages. This ensures the test exercises the intended PMD sharing and
>> unsharing paths for larger huge page sizes.
>>
>> Before patch:
>>   running ./hugepage-mremap
>>   ------------------------------
>>   TAP version 13
>>   1..1
>>    Map haddr: Returned address is 0x7eaa40000000
>>    Map daddr: Returned address is 0x7daa40000000
>>    Map vaddr: Returned address is 0x7faa40000000
>>    Address returned by mmap() = 0x7fffaa600000
>>    Mremap: Returned address is 0x7faa40000000
>>    First hex is 0
>>    First hex is 3020100
>>   Bail out! mremap: Expected failure, but call succeeded
>>   Planned tests != run tests (1 != 0)
>>   Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
>>   [FAIL]
>>   not ok 1 hugepage-mremap # exit=1
>>
>> Before patch:
>>   running ./hugepage-mremap
>>   ------------------------------
>>   TAP version 13
>>   1..1
>>    Map haddr: Returned address is 0x7eaa40000000
>>    Map daddr: Returned address is 0x7daa40000000
>>    Map vaddr: Returned address is 0x7faa40000000
>>    Address returned by mmap() = 0x7fffaa600000
>>    Mremap: Returned address is 0x7faa40000000
>>    First hex is 0
>>    First hex is 3020100
>>   Bail out! mremap: Expected failure, but call succeeded
>>   Planned tests != run tests (1 != 0)
>>   Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
>>   [FAIL]
>>   not ok 1 hugepage-mremap # exit=1
>>
> 
> Why are there two "Before patch" in here?
Thanks for pointing that out, Let me fix it in the next version.
> 
>> After patch:
>>   running ./hugepage-mremap 2048
>>   ------------------------------
>>   TAP version 13
>>   1..1
>>    Map haddr: Returned address is 0x7eaa40000000
>>    Map daddr: Returned address is 0x7daa40000000
>>    Map vaddr: Returned address is 0x7faa40000000
>>    Address returned by mmap() = 0x7fff13000000
>>    Mremap: Returned address is 0x7faa40000000
>>    First hex is 0
>>    First hex is 3020100
>>    ok 1 Read same data
>>   Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>   [PASS]
>>   ok 1 hugepage-mremap 2048
>>
>> Fixes: f77a286de48c ("mm, hugepages: make memory size variable in hugepage-mremap selftest")
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>> ---
>>   tools/testing/selftests/mm/run_vmtests.sh | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>> index afdcfd0d7cef..eecec0b6eb13 100755
>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>> @@ -293,7 +293,18 @@ echo "$shmmax" > /proc/sys/kernel/shmmax
>>   echo "$shmall" > /proc/sys/kernel/shmall
>>   
>>   CATEGORY="hugetlb" run_test ./map_hugetlb
>> -CATEGORY="hugetlb" run_test ./hugepage-mremap
>> +
>> +# If the huge page size is larger than 10MB, increase the test memory size
>> +# to twice the huge page size (in MB) to ensure the test exercises PMD sharing
>> +# and the unshare path in hugepage-mremap. Otherwise, run the test with
>> +# the default 10MB memory size.
> 
> PMD sharing requires, on x86, a 1 GiB area with 2 MiB hugetlb folios.
> 
> How does doubling sort that out?
> 
> Also, why the magic value 10mb?
> 
> 
Hi David,
Yes, 1GB huge pages are mapped at the PUD level and are not involved in 
PMD sharing, as huge_pte_alloc() skips sharing for sizes other than 
PMD_SIZE.

The issue here is due to an unaligned memory size on a 1GB mapping.
This leads munmap() to fail at an unaligned address, causing the 
subsequent expected-to-fail mremap() to unexpectedly succeed.
The default memory size for this test is 10MB.

Aligning the size to a multiple of 1GB avoids this failure, but it is 
not related to PMD sharing. I will update the description in v4 to 
reflect this more accurately.

I will also update the test code directly to align the memory size to 
the huge page size, rather than modifying run_vmtests.sh.

Thanks,
Sayali

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported
  2026-03-27  7:16 ` [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported Sayali Patil
@ 2026-04-02  6:59   ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-02  6:59 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev



On 27/03/26 12:46, Sayali Patil wrote:
> The uffd-wp-mremap test requires the UFFD_FEATURE_PAGEFAULT_FLAG_WP
> capability. On systems where userfaultfd write-protect is
> not supported, uffd_register() fails and the test reports failures.
> 
> Check for the required feature at startup and skip the test when the
> UFFD_FEATURE_PAGEFAULT_FLAG_WP capability is not present,
> preventing false failures on unsupported configurations.
> 
> Before patch:
>   running ./uffd-wp-mremap
>   ------------------------
>    [INFO] detected THP size: 256 KiB
>    [INFO] detected THP size: 512 KiB
>    [INFO] detected THP size: 1024 KiB
>    [INFO] detected THP size: 2048 KiB
>    [INFO] detected hugetlb page size: 2048 KiB
>    [INFO] detected hugetlb page size: 1048576 KiB
>   1..24
>    [RUN] test_one_folio(size=65536, private=false, swapout=false,
>    hugetlb=false)
>   not ok 1 uffd_register() failed
>    [RUN] test_one_folio(size=65536, private=true, swapout=false,
>    hugetlb=false)
>   not ok 2 uffd_register() failed
>    [RUN] test_one_folio(size=65536, private=false, swapout=true,
>    hugetlb=false)
>   not ok 3 uffd_register() failed
>    [RUN] test_one_folio(size=65536, private=true, swapout=true,
>    hugetlb=false)
>   not ok 4 uffd_register() failed
>    [RUN] test_one_folio(size=262144, private=false, swapout=false,
>    hugetlb=false)
>   not ok 5 uffd_register() failed
>    [RUN] test_one_folio(size=524288, private=false, swapout=false,
>    hugetlb=false)
>   not ok 6 uffd_register() failed
>   .
>   .
>   .
>   Bail out! 24 out of 24 tests failed
>    Totals: pass:0 fail:24 xfail:0 xpass:0 skip:0 error:0
>   [FAIL]
> not ok 1 uffd-wp-mremap # exit=1
> 
> After patch:
>   running ./uffd-wp-mremap
>   ------------------------
>   1..0 # SKIP uffd-wp feature not supported
>   [SKIP]
> ok 1 uffd-wp-mremap # SKIP
> 
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>   tools/testing/selftests/mm/uffd-wp-mremap.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/tools/testing/selftests/mm/uffd-wp-mremap.c b/tools/testing/selftests/mm/uffd-wp-mremap.c
> index 17186d4a4147..6edbd09f0ca6 100644
> --- a/tools/testing/selftests/mm/uffd-wp-mremap.c
> +++ b/tools/testing/selftests/mm/uffd-wp-mremap.c
> @@ -19,6 +19,17 @@ static size_t thpsizes[20];
>   static int nr_hugetlbsizes;
>   static size_t hugetlbsizes[10];
>   
> +static void check_uffd_wp_feature_supported(void)
> +{
> +	uint64_t features;
> +
> +	if (uffd_get_features(&features) && errno == ENOENT)
> +		ksft_exit_skip("failed to get available features (%d)\n", errno);
> +
> +	if (!(features & UFFD_FEATURE_PAGEFAULT_FLAG_WP))
> +		ksft_exit_skip("uffd-wp feature not supported\n");
> +}
> +
>   static int detect_thp_sizes(size_t sizes[], int max)
>   {
>   	int count = 0;
> @@ -336,6 +347,8 @@ int main(int argc, char **argv)
>   	struct thp_settings settings;
>   	int i, j, plan = 0;
>   
> +	check_uffd_wp_feature_supported();
> +
>   	pagesize = getpagesize();
>   	nr_thpsizes = detect_thp_sizes(thpsizes, ARRAY_SIZE(thpsizes));
>   	nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes,

AI review question:

> commit 22914640356d1eaf4e0a8a8b13ad71911d12db50
> Author: Sayali Patil <sayalip@linux.ibm.com>
> selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported
> This commit adds a check at startup to verify if the userfaultfd
> write-protect capability is present. It skips the test instead of reporting> false failures on unsupported configurations.
> > diff --git a/tools/testing/selftests/mm/uffd-wp-mremap.c b/tools/testing/selftests/mm/uffd-wp-mremap.c> > index 17186d4a4147..6edbd09f0ca6 100644
> > --- a/tools/testing/selftests/mm/uffd-wp-mremap.c
> > +++ b/tools/testing/selftests/mm/uffd-wp-mremap.c
> > @@ -19,6 +19,17 @@ static size_t thpsizes[20];
> > static int nr_hugetlbsizes;
> > static size_t hugetlbsizes[10];
>
> > +static void check_uffd_wp_feature_supported(void)
> > +{
> > +	uint64_t features;
> > +
> > +	if (uffd_get_features(&features) && errno == ENOENT)
> > +		ksft_exit_skip("failed to get available features (%d)\n", errno);
> If uffd_get_features() fails but errno is something other than ENOENT (for> example, EPERM when unprivileged userfaultfd is disabled), this check will
> evaluate to false and execution will continue.> > +
> > +	if (!(features & UFFD_FEATURE_PAGEFAULT_FLAG_WP))
> Could this evaluate uninitialized stack memory?
> If uffd_get_features() fails and is bypassed by the previous check, the
> features variable will remain uninitialized. This might cause the test to
> falsely pass or fail depending on whatever data happens to be on the stack.> Should we initialize features to zero or exit on all uffd_get_features()
> failures instead?
> > +		ksft_exit_skip("uffd-wp feature not supported\n");
> > +}

Yes,If uffd_get_features() fails with an error other than ENOENT, the 
features variable would remain uninitialized, which could lead to 
undefined behavior in the subsequent check.

I will fix this in the next patch version.

Thanks,
Sayali

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category
  2026-03-27  7:16 ` [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category Sayali Patil
@ 2026-04-02  7:15   ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-02  7:15 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev, Miaohe Lin,
	Venkat Rao Bagalkote



On 27/03/26 12:46, Sayali Patil wrote:
> run_vmtests.sh contains special handling to ensure the hwpoison_inject
> module is available for the memory-failure tests. This logic was
> implemented outside of run_test(), making the setup category-specific
> but managed globally.
> 
> Move the hwpoison_inject handling into run_test() and restrict it
> to the memory-failure category so that:
> 1. the module is checked and loaded only when memory-failure tests run,
> 2. the test is skipped if the module or the debugfs interface
> (/sys/kernel/debug/hwpoison/) is not available.
> 3. the module is unloaded after the test if it was loaded by the script.
> 
> This localizes category-specific setup and makes the test flow
> consistent with other per-category preparations.
> 
> While updating this logic, fix the module availability check.
> The script previously used:
> 
> 	modprobe -R hwpoison_inject
> 
> The -R option prints the resolved module name to stdout, causing every
> run to print:
> 
> 	hwpoison_inject
> 
> in the test output, even when no action is required, introducing
> unnecessary noise.
> 
> Replace this with:
> 
> 	modprobe -n hwpoison_inject
> 
> which verifies that the module is loadable without producing output,
> keeping the selftest logs clean and consistent.
> 
> Fixes: ff4ef2fbd101 ("selftests/mm: add memory failure anonymous page test")
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>   tools/testing/selftests/mm/run_vmtests.sh | 46 ++++++++++++++---------
>   1 file changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
> index eecec0b6eb13..606558cc3b09 100755
> --- a/tools/testing/selftests/mm/run_vmtests.sh
> +++ b/tools/testing/selftests/mm/run_vmtests.sh
> @@ -250,6 +250,27 @@ run_test() {
>   			fi
>   		fi
>   
> +		# Ensure hwpoison_inject is available for memory-failure tests
> +		if [ "${CATEGORY}" = "memory-failure" ]; then
> +			# Try to load hwpoison_inject if not present.
> +			HWPOISON_DIR=/sys/kernel/debug/hwpoison/
> +			if [ ! -d "$HWPOISON_DIR" ]; then
> +				if ! modprobe -n hwpoison_inject > /dev/null 2>&1; then
> +					echo "Module hwpoison_inject not found, skipping..." \
> +						| tap_prefix
> +					skip=1
> +				else
> +					modprobe hwpoison_inject > /dev/null 2>&1
> +					LOADED_MOD=1
> +				fi
> +			fi
> +
> +			if [ ! -d "$HWPOISON_DIR" ]; then
> +				echo "hwpoison debugfs interface not present" | tap_prefix
> +				skip=1
> +			fi
> +		fi
> +
>   		local test=$(pretty_name "$*")
>   		local title="running $*"
>   		local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
> @@ -261,6 +282,12 @@ run_test() {
>   		else
>   			local ret=$ksft_skip
>   		fi
> +
> +		# Unload hwpoison_inject if we loaded it
> +		if [ -n "${LOADED_MOD}" ]; then
> +			modprobe -r hwpoison_inject > /dev/null 2>&1
> +		fi
> +
>   		count_total=$(( count_total + 1 ))
>   		if [ $ret -eq 0 ]; then
>   			count_pass=$(( count_pass + 1 ))
> @@ -540,24 +567,7 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned
>   
>   CATEGORY="rmap" run_test ./rmap
>   
> -# Try to load hwpoison_inject if not present.
> -HWPOISON_DIR=/sys/kernel/debug/hwpoison/
> -if [ ! -d "$HWPOISON_DIR" ]; then
> -	if ! modprobe -q -R hwpoison_inject; then
> -		echo "Module hwpoison_inject not found, skipping..."
> -	else
> -		modprobe hwpoison_inject > /dev/null 2>&1
> -		LOADED_MOD=1
> -	fi
> -fi
> -
> -if [ -d "$HWPOISON_DIR" ]; then
> -	CATEGORY="memory-failure" run_test ./memory-failure
> -fi
> -
> -if [ -n "${LOADED_MOD}" ]; then
> -	modprobe -r hwpoison_inject > /dev/null 2>&1
> -fi
> +CATEGORY="memory-failure" run_test ./memory-failure
>   
>   if [ "${HAVE_HUGEPAGES}" = 1 ]; then
>   	echo "$orig_nr_hugepgs" > /proc/sys/vm/nr_hugepages

AI review question:
> > diff --git a/tools/testing/selftests/mm/run_vmtests.sh
> > b/tools/testing/selftests/mm/run_vmtests.sh
> > --- a/tools/testing/selftests/mm/run_vmtests.sh
> > +++ b/tools/testing/selftests/mm/run_vmtests.sh
> > @@ -250,6 +250,27 @@ run_test() {
> >  			fi
> >  		fi
> >
> > +		# Ensure hwpoison_inject is available for memory-failure tests
> > +		if [ "${CATEGORY}" = "memory-failure" ]; then
> > +			# Try to load hwpoison_inject if not present.
> > +			HWPOISON_DIR=/sys/kernel/debug/hwpoison/
> > +			if [ ! -d "$HWPOISON_DIR" ]; then
> > +				if ! modprobe -n hwpoison_inject > /dev/null 2>&1; then
> > +					echo "Module hwpoison_inject not found, skipping..." \
> > +						| tap_prefix
> > +					skip=1
> Does moving this skip logic inside run_test() mask previous test failures?
> The run_vmtests.sh script sets exitcode=$ksft_skip when a test skips.
> Previously, if the module was missing, the script bypassed calling run_test()
> for memory-failure entirely, which preserved any previous 
> failure exit code.
> With this patch, run_test() executes, skips internally, and sets the
> exitcode to 4. Since memory-failure is the final test, could this
> change the test
> suite's final exit code from 1 to 4 on systems without the module?


This was pre-existing behavior, however, the concern is valid. Failures 
should take precedence over skips. Let me address this in v4.


> > +				else
> > +					modprobe hwpoison_inject > /dev/null 2>&1
> > +					LOADED_MOD=1
> Should LOADED_MOD be declared as a local variable here?
> Since it lacks a local declaration, it persists in the script's global
> environment. If any tests are appended after memory-failure in the future,
> or if LOADED_MOD is already set in the user's environment, this could cause
> modprobe -r hwpoison_inject to execute redundantly after those tests.


Yes, good point. Let me address this in v4 by making it a local variable.


> > +				fi
> > +			fi
> > +
> > +			if [ ! -d "$HWPOISON_DIR" ]; then
> > +				echo "hwpoison debugfs interface not present" | tap_prefix
> > +				skip=1
> > +			fi
> Will this print a redundant skip message when the module is not found?
> If modprobe -n fails earlier, the directory will still be absent, causing
> this second block to evaluate to true. This prints hwpoison debugfs 
> interface
> not present right after Module hwpoison_inject not found, skipping...,
> which might contradict the goal of keeping the selftest logs clean.


Agreed, this can lead to redundant output. I’ll update it in v4 to avoid 
the duplicate skip message.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap
       [not found]     ` <7b6652f3-c994-4ef4-87a4-5473cd1254b7@linux.ibm.com>
@ 2026-04-02  7:31       ` David Hildenbrand (Arm)
  2026-04-03 17:41         ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-02  7:31 UTC (permalink / raw)
  To: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote

On 4/1/26 16:43, Sayali Patil wrote:
> 
> On 01/04/26 19:48, David Hildenbrand (Arm) wrote:
>> On 3/27/26 08:16, Sayali Patil wrote:
>>> Previously, register_region_with_uffd() created a new anonymous
>>> mapping and overwrote the address supplied by the caller before
>>> registering the range with userfaultfd.
>>>
>>> As a result, userfaultfd was applied to an unrelated anonymous mapping
>>> instead of the hugetlb region used by the test.
>>>
>>> Remove the extra mmap() and register the caller-provided address range
>>> directly using UFFDIO_REGISTER_MODE_MISSING, so that faults are
>>> generated for the hugetlb mapping used by the test.
>>>
>>> This ensures userfaultfd operates on the actual hugetlb test region and
>>> validates the expected fault handling.
>>>
>>> Before patch:
>>>  running ./hugepage-mremap
>>>  -------------------------
>>>  TAP version 13
>>>  1..1
>>>   Map haddr: Returned address is 0x7eaa40000000
>>>   Map daddr: Returned address is 0x7daa40000000
>>>   Map vaddr: Returned address is 0x7faa40000000
>>>   Address returned by mmap() = 0x7fff9d000000
>>>   Mremap: Returned address is 0x7faa40000000
>>>   First hex is 0
>>>   First hex is 3020100
>>>  ok 1 Read same data
>>>  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>  [PASS]
>>>  ok 1 hugepage-mremap
>>>
>>> After patch:
>>>  running ./hugepage-mremap
>>>  -------------------------
>>>  TAP version 13
>>>  1..1
>>>   Map haddr: Returned address is 0x7eaa40000000
>>>   Map daddr: Returned address is 0x7daa40000000
>>>   Map vaddr: Returned address is 0x7faa40000000
>>>   Registered memory at address 0x7eaa40000000 with userfaultfd
>>>   Mremap: Returned address is 0x7faa40000000
>>>   First hex is 0
>>>   First hex is 3020100
>>>  ok 1 Read same data
>>>  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>  [PASS]
>>>  ok 1 hugepage-mremap
>> Okay, so we tested mremap() of something that is not even hugetlb.
>>
>>> Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
>>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>>> ---
>>>  tools/testing/selftests/mm/hugepage-mremap.c | 21 +++++---------------
>>>  1 file changed, 5 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
>>> index b8f7d92e5a35..e611249080d6 100644
>>> --- a/tools/testing/selftests/mm/hugepage-mremap.c
>>> +++ b/tools/testing/selftests/mm/hugepage-mremap.c
>>> @@ -85,25 +85,14 @@ static void register_region_with_uffd(char *addr, size_t len)
>>>  	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
>>>  		ksft_exit_fail_msg("ioctl-UFFDIO_API: %s\n", strerror(errno));
>>>  
>>> -	/* Create a private anonymous mapping. The memory will be
>>> -	 * demand-zero paged--that is, not yet allocated. When we
>>> -	 * actually touch the memory, it will be allocated via
>>> -	 * the userfaultfd.
>>> -	 */
>>> -
>>> -	addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
>>> -		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>> -	if (addr == MAP_FAILED)
>>> -		ksft_exit_fail_msg("mmap: %s\n", strerror(errno));
>>> -
>>> -	ksft_print_msg("Address returned by mmap() = %p\n", addr);
>>> -
>>> -	/* Register the memory range of the mapping we just created for
>>> -	 * handling by the userfaultfd object. In mode, we request to track
>>> -	 * missing pages (i.e., pages that have not yet been faulted in).
>>> +	/* Register the passed memory range for handling by the userfaultfd object.
>> /*
>>  * ...
>>
>> While at it.
>>
>>> +	 * In mode, we request to track missing pages
>>> +	 * (i.e., pages that have not yet been faulted in).
>>>  	 */
>>>  	if (uffd_register(uffd, addr, len, true, false, false))
>>>  		ksft_exit_fail_msg("ioctl-UFFDIO_REGISTER: %s\n", strerror(errno));
>>> +
>>> +	ksft_print_msg("Registered memory at address %p with userfaultfd\n", addr);
>>>  }
>>>  
>>>  int main(int argc, char *argv[])
>> Yes, that code is extremely weird. I wonder if this was some
>> copy-and-paste from other uffd test code.
>>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>
>>
> Hi David,
> 
> Yes, the test operates on hugetlb mappings created with
> |MAP_HUGETLB | MAP_POPULATE|and sets up userfaultfd. Consequently,
> registering it with |UFFDIO_REGISTER_MODE_MISSING| does not result in
> any userfaults.
> 
> Originally, the helper function created a separate anonymous mapping and
> registered it with userfaultfd instead of the address supplied by the
> caller. However, the test operates on hugetlb mappings, and the registered
> anonymous mapping is never used in the |mremap()| path being exercised.
> 
> Would it be better to remove userfaultfd registration entirely from this
> test, since that path is not actually being tested?

If it's tested with your change now (which I think that's what
happenes), this is fine.

It was just very weird before, because it tested something fairly unrelated.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-04-01 20:39       ` Sayali Patil
@ 2026-04-02  7:33         ` David Hildenbrand (Arm)
  2026-04-02  9:05           ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-02  7:33 UTC (permalink / raw)
  To: Sayali Patil, Lorenzo Stoakes (Oracle)
  Cc: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev, Venkat Rao Bagalkote

On 4/1/26 22:39, Sayali Patil wrote:
> 
> 
> On 01/04/26 20:10, Lorenzo Stoakes (Oracle) wrote:
>> On Wed, Apr 01, 2026 at 04:21:55PM +0200, David Hildenbrand (Arm) wrote:
>>
>> OK so digging in:
>>
>> mremap -> ... -> vrm_set_new_addr() -> get_unmapped_area() -> ... (in
>> ppc arch
>> code) -> slice_get_unmapped_area():
>>
>> unsigned long slice_get_unmapped_area(unsigned long addr, unsigned
>> long len,
>>                       unsigned long flags, unsigned int psize,
>>                       int topdown)
>> {
>>     ...
>>     /* bunch of checks */
>>
>>     /* If we have MAP_FIXED and failed the above steps, then error out */
>>     if (fixed)
>>         return -EBUSY;
>>
>>     ...
>> }
>>
>> Is presumably where we hit the issue.
>>
>>>
>>> That is weird. An mremap(MREMAP_FIXED) is really just an munmap() +
>>> move.
>>
>> Yeah the weird bit I guess is that we _still_ invoke
>> get_unmapped_area() but
>> with MAP_FIXED set to indicate that we want the specific address, so it's
>> subject to the above checks.
>>
>>>
>>> Are we sure this is not some actual problem in the hugetlb
>>> implementation?
>>
>> It seems the 'slices' check sees if the _target address_ has an
>> equivalent page
>> size, presumably hugetlb-mandated, and fails if they're not
>> equivalent, so this
>> change is just accounting for that.
>>
> Yes, this change accounts for that by ensuring the destination is
> created with MAP_HUGETLB so it has the same page size as the source.

Okay, weird, so it's the right thing to do to cover all odd arch behavior.

>>
>>>
>>>
>>> But then the test suddenly requires more hugetlb pages, no? I don't see
>>> a good reason for the MAP_POPULATE, really. It will be discarded
>>> either way.
>>
>> Yeah I'm not sure about the MAP_POPULATE being all that important here.
>>
> As far as I understand, without MAP_POPULATE, memory accesses would
> trigger userfaults, and since the test is single-threaded and has no
> background handler for the uffd, it would deadlock. MAP_POPULATE ensures
> the test runs correctly by prefaulting all pages, but please let me know
> if I’m mistaken.

So you are saying the test would deadlock if you are not adding
MAP_POPULATE? If so, please double check if that is actually the case.

And if it's actually the case, please carefully document that in the
patch description, and probably as a comment above the MAP_POPULATE usage.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-04-02  7:33         ` David Hildenbrand (Arm)
@ 2026-04-02  9:05           ` Lorenzo Stoakes (Oracle)
  2026-04-03 17:41             ` Sayali Patil
  0 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-02  9:05 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Sayali Patil, Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev, Venkat Rao Bagalkote

On Thu, Apr 02, 2026 at 09:33:29AM +0200, David Hildenbrand (Arm) wrote:
> On 4/1/26 22:39, Sayali Patil wrote:
> >
> >
> > On 01/04/26 20:10, Lorenzo Stoakes (Oracle) wrote:
> >> On Wed, Apr 01, 2026 at 04:21:55PM +0200, David Hildenbrand (Arm) wrote:
> >>
> >> OK so digging in:
> >>
> >> mremap -> ... -> vrm_set_new_addr() -> get_unmapped_area() -> ... (in
> >> ppc arch
> >> code) -> slice_get_unmapped_area():
> >>
> >> unsigned long slice_get_unmapped_area(unsigned long addr, unsigned
> >> long len,
> >>                       unsigned long flags, unsigned int psize,
> >>                       int topdown)
> >> {
> >>     ...
> >>     /* bunch of checks */
> >>
> >>     /* If we have MAP_FIXED and failed the above steps, then error out */
> >>     if (fixed)
> >>         return -EBUSY;
> >>
> >>     ...
> >> }
> >>
> >> Is presumably where we hit the issue.
> >>
> >>>
> >>> That is weird. An mremap(MREMAP_FIXED) is really just an munmap() +
> >>> move.
> >>
> >> Yeah the weird bit I guess is that we _still_ invoke
> >> get_unmapped_area() but
> >> with MAP_FIXED set to indicate that we want the specific address, so it's
> >> subject to the above checks.
> >>
> >>>
> >>> Are we sure this is not some actual problem in the hugetlb
> >>> implementation?
> >>
> >> It seems the 'slices' check sees if the _target address_ has an
> >> equivalent page
> >> size, presumably hugetlb-mandated, and fails if they're not
> >> equivalent, so this
> >> change is just accounting for that.
> >>
> > Yes, this change accounts for that by ensuring the destination is
> > created with MAP_HUGETLB so it has the same page size as the source.
>
> Okay, weird, so it's the right thing to do to cover all odd arch behavior.
>
> >>
> >>>
> >>>
> >>> But then the test suddenly requires more hugetlb pages, no? I don't see
> >>> a good reason for the MAP_POPULATE, really. It will be discarded
> >>> either way.
> >>
> >> Yeah I'm not sure about the MAP_POPULATE being all that important here.
> >>
> > As far as I understand, without MAP_POPULATE, memory accesses would
> > trigger userfaults, and since the test is single-threaded and has no
> > background handler for the uffd, it would deadlock. MAP_POPULATE ensures
> > the test runs correctly by prefaulting all pages, but please let me know
> > if I’m mistaken.
>
> So you are saying the test would deadlock if you are not adding
> MAP_POPULATE? If so, please double check if that is actually the case.
>
> And if it's actually the case, please carefully document that in the
> patch description, and probably as a comment above the MAP_POPULATE usage.

Do keep in mind MAP_POPULATE is not _guaranteed_ to work :)

For guaranteed populate you need madvise(..., MADV_POPULATE_[READ/WRITE]) or to
directly fault in.

>
> --
> Cheers,
>
> David

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes
  2026-03-27  7:16 ` [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes Sayali Patil
@ 2026-04-03 17:16   ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-03 17:16 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani
  Cc: David Hildenbrand, Zi Yan, Michal Hocko, Oscar Salvador,
	Lorenzo Stoakes, Dev Jain, Liam.Howlett, linuxppc-dev,
	Venkat Rao Bagalkote



On 27/03/26 12:46, Sayali Patil wrote:
> The hugetlb memcg selftest was previously skipped when the configured
> huge page size was not 2MB, preventing the test from running on systems
> using other default huge page sizes.
> 
> Detect the system's configured huge page size at runtime and use it for
> the allocation instead of assuming a fixed 2MB size. This allows the
> test to run on configurations using non-2MB huge pages and avoids
> unnecessary skips.
> 
> Fixes: c0dddb7aa5f8 ("selftests: add a selftest to verify hugetlb usage in memcg")
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
> ---
>   .../selftests/cgroup/test_hugetlb_memcg.c     | 66 ++++++++++++++-----
>   1 file changed, 48 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
> index f451aa449be6..a449dbec16a8 100644
> --- a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
> +++ b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c
> @@ -12,10 +12,15 @@
>   
>   #define ADDR ((void *)(0x0UL))
>   #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
> -/* mapping 8 MBs == 4 hugepages */
> -#define LENGTH (8UL*1024*1024)
>   #define PROTECTION (PROT_READ | PROT_WRITE)
>   
> +/*
> + * This value matches the kernel's MEMCG_CHARGE_BATCH definition:
> + * see include/linux/memcontrol.h. If the kernel value changes, this
> + * test constant must be updated accordingly to stay consistent.
> + */
> +#define MEMCG_CHARGE_BATCH 64U
> +
>   /* borrowed from mm/hmm-tests.c */
>   static long get_hugepage_size(void)
>   {
> @@ -84,11 +89,11 @@ static unsigned int check_first(char *addr)
>   	return *(unsigned int *)addr;
>   }
>   
> -static void write_data(char *addr)
> +static void write_data(char *addr, size_t length)
>   {
>   	unsigned long i;
>   
> -	for (i = 0; i < LENGTH; i++)
> +	for (i = 0; i < length; i++)
>   		*(addr + i) = (char)i;
>   }
>   
> @@ -96,26 +101,31 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
>   {
>   	char *test_group = (char *)arg;
>   	void *addr;
> +	long hpage_size = get_hugepage_size() * 1024;
>   	long old_current, expected_current, current;
>   	int ret = EXIT_FAILURE;
> +	size_t length = 4 * hpage_size;
> +	int pagesize, nr_pages;
> +
> +	pagesize = getpagesize();
>   
>   	old_current = cg_read_long(test_group, "memory.current");
>   	set_nr_hugepages(20);
>   	current = cg_read_long(test_group, "memory.current");
> -	if (current - old_current >= MB(2)) {
> +	if (current - old_current >= hpage_size) {
>   		ksft_print_msg(
>   			"setting nr_hugepages should not increase hugepage usage.\n");
>   		ksft_print_msg("before: %ld, after: %ld\n", old_current, current);
>   		return EXIT_FAILURE;
>   	}
>   
> -	addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0);
> +	addr = mmap(ADDR, length, PROTECTION, FLAGS, 0, 0);
>   	if (addr == MAP_FAILED) {
>   		ksft_print_msg("fail to mmap.\n");
>   		return EXIT_FAILURE;
>   	}
>   	current = cg_read_long(test_group, "memory.current");
> -	if (current - old_current >= MB(2)) {
> +	if (current - old_current >= hpage_size) {
>   		ksft_print_msg("mmap should not increase hugepage usage.\n");
>   		ksft_print_msg("before: %ld, after: %ld\n", old_current, current);
>   		goto out_failed_munmap;
> @@ -124,10 +134,24 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
>   
>   	/* read the first page */
>   	check_first(addr);
> -	expected_current = old_current + MB(2);
> +	nr_pages = hpage_size / pagesize;
> +	expected_current = old_current + hpage_size;
>   	current = cg_read_long(test_group, "memory.current");
> -	if (!values_close(expected_current, current, 5)) {
> -		ksft_print_msg("memory usage should increase by around 2MB.\n");
> +	if (nr_pages < MEMCG_CHARGE_BATCH && current == old_current) {
> +		/*
> +		 * Memory cgroup charging uses per-CPU stocks and batched updates to the
> +		 *  memcg usage counters. For hugetlb allocations, the number of pages
> +		 *  that memcg charges is expressed in base pages (nr_pages), not
> +		 *  in hugepage units. When the charge for an allocation is smaller than
> +		 *  the internal batching threshold  (nr_pages <  MEMCG_CHARGE_BATCH),
> +		 *  it may be fully satisfied from the CPU’s local stock. In such
> +		 *  cases memory.current does not necessarily
> +		 *  increase.
> +		 *  Therefore, Treat a zero delta as valid behaviour here.
> +		 */
> +		ksft_print_msg("no visible memcg charge, allocation consumed from local stock.\n");
> +	} else if (!values_close(expected_current, current, 5)) {
> +		ksft_print_msg("memory usage should increase by ~1 huge page.\n");
>   		ksft_print_msg(
>   			"expected memory: %ld, actual memory: %ld\n",
>   			expected_current, current);
> @@ -135,11 +159,11 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
>   	}
>   
>   	/* write to the whole range */
> -	write_data(addr);
> +	write_data(addr, length);
>   	current = cg_read_long(test_group, "memory.current");
> -	expected_current = old_current + MB(8);
> +	expected_current = old_current + length;
>   	if (!values_close(expected_current, current, 5)) {
> -		ksft_print_msg("memory usage should increase by around 8MB.\n");
> +		ksft_print_msg("memory usage should increase by around 4 huge pages.\n");
>   		ksft_print_msg(
>   			"expected memory: %ld, actual memory: %ld\n",
>   			expected_current, current);
> @@ -147,7 +171,7 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
>   	}
>   
>   	/* unmap the whole range */
> -	munmap(addr, LENGTH);
> +	munmap(addr, length);
>   	current = cg_read_long(test_group, "memory.current");
>   	expected_current = old_current;
>   	if (!values_close(expected_current, current, 5)) {
> @@ -162,13 +186,15 @@ static int hugetlb_test_program(const char *cgroup, void *arg)
>   	return ret;
>   
>   out_failed_munmap:
> -	munmap(addr, LENGTH);
> +	munmap(addr, length);
>   	return ret;
>   }
>   
>   static int test_hugetlb_memcg(char *root)
>   {
>   	int ret = KSFT_FAIL;
> +	int num_pages = 20;
> +	long hpage_size = get_hugepage_size();
>   	char *test_group;
>   
>   	test_group = cg_name(root, "hugetlb_memcg_test");
> @@ -177,7 +203,7 @@ static int test_hugetlb_memcg(char *root)
>   		goto out;
>   	}
>   
> -	if (cg_write(test_group, "memory.max", "100M")) {
> +	if (cg_write_numeric(test_group, "memory.max", num_pages * hpage_size * 1024)) {
>   		ksft_print_msg("fail to set cgroup memory limit.\n");
>   		goto out;
>   	}
> @@ -200,6 +226,7 @@ int main(int argc, char **argv)
>   {
>   	char root[PATH_MAX];
>   	int ret = EXIT_SUCCESS, has_memory_hugetlb_acc;
> +	long val;
>   
>   	has_memory_hugetlb_acc = proc_mount_contains("memory_hugetlb_accounting");
>   	if (has_memory_hugetlb_acc < 0)
> @@ -208,12 +235,15 @@ int main(int argc, char **argv)
>   		ksft_exit_skip("memory hugetlb accounting is disabled\n");
>   
>   	/* Unit is kB! */
> -	if (get_hugepage_size() != 2048) {
> -		ksft_print_msg("test_hugetlb_memcg requires 2MB hugepages\n");
> +	val = get_hugepage_size();
> +	if (val < 0) {
> +		ksft_print_msg("Failed to read hugepage size\n");
>   		ksft_test_result_skip("test_hugetlb_memcg\n");
>   		return ret;
>   	}
>   
> +	ksft_print_msg("Hugepage size: %ld kB\n", val);
> +
>   	if (cg_find_unified_root(root, sizeof(root), NULL))
>   		ksft_exit_skip("cgroup v2 isn't mounted\n");
>   

AI review question:


> > -	addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0);
> > +	addr = mmap(ADDR, length, PROTECTION, FLAGS, 0, 0);
> >  	if (addr == MAP_FAILED) {
> >  		ksft_print_msg("fail to mmap.\n");
> >  		return EXIT_FAILURE;
> Should the test gracefully handle memory constraints by skipping
> instead of returning EXIT_FAILURE here?
> With the removal of the 2MB skip condition, the test will now run on systems
> with large default huge pages (e.g., 1GB on x86_64). Reserving 20 
> huge pages
> (20GB) and mapping 4 huge pages (4GB) might easily fail due to memory
> exhaustion in typical automated testing environments, causing false 
> test regressions.

I’ll update it in v4 to skip on memory constraints instead of returning 
EXIT_FAILURE.

> > @@ -124,10 +134,24 @@ static int hugetlb_test_program(const char 
> *cgroup, void *arg)
> >
> >  	/* read the first page */
> >  	check_first(addr);
> > -	expected_current = old_current + MB(2);
> > +	nr_pages = hpage_size / pagesize;
> > +	expected_current = old_current + hpage_size;
> >  	current = cg_read_long(test_group, "memory.current");
> > -	if (!values_close(expected_current, current, 5)) {
> > -		ksft_print_msg("memory usage should increase by around 2MB.\n");
> > +	if (nr_pages < MEMCG_CHARGE_BATCH && current == old_current) {
> > +		/*
> > +		 * Memory cgroup charging uses per-CPU stocks and batched 
> updates to the
> > +		 *  memcg usage counters. For hugetlb allocations, the number of 
> pages
> > +		 *  that memcg charges is expressed in base pages (nr_pages), not
> > +		 *  in hugepage units. When the charge for an allocation is 
> smaller than
> > +		 *  the internal batching threshold  (nr_pages < 
> > MEMCG_CHARGE_BATCH),
> > +		 *  it may be fully satisfied from the CPU’s local stock. In such
> > +		 *  cases memory.current does not necessarily
> > +		 *  increase.
> > +		 *  Therefore, Treat a zero delta as valid behaviour here.
> > +		 */
> > +		ksft_print_msg("no visible memcg charge, allocation consumed 
> from local stock.\n");
> > +	} else if (!values_close(expected_current, current, 5)) {
> Does this exception fully account for memcg stock batching when the 
> per-CPU
> stock is empty?
> If the stock lacks sufficient pages, charging the huge page triggers 
> a stock
> refill that charges exactly MEMCG_CHARGE_BATCH (64) base pages from 
> the global
> memcg counter.
> On an architecture with 64KB base pages and 2MB contiguous huge pages,
> nr_pages is 32. Refilling the stock would charge 64 base pages (4MB), 
> which
> increases memory.current by 4MB instead of the expected 2MB. Since
> current != old_current, the test falls through to values_close(), 
> which expects
> a 2MB increase and may fail the test.

Valid point.
The current exception does not fully account for this
scenario when the per-CPU stock is empty and a refill charges
MEMCG_CHARGE_BATCH. This can indeed lead to a larger-than-expected jump
in memory.current and cause the test to fail.
I’ll update the logic in v4 to handle this case more robustly.

> > @@ -177,7 +203,7 @@ static int test_hugetlb_memcg(char *root)
> >  		goto out;
> >  	}
> >
> > -	if (cg_write(test_group, "memory.max", "100M")) {
> > +	if (cg_write_numeric(test_group, "memory.max", num_pages * 
> > hpage_size * 1024)) {
> Can this calculation overflow on 32-bit systems?
> Since long is 32 bits on 32-bit systems, num_pages * hpage_size * 1024 can
> exceed the 32-bit signed integer maximum if the architecture supports large> huge pages (e.g., 256MB on MIPS).This would evaluate to 5,368,709,120,
> resulting in a negative or truncated value, which sets memory.max to an
> invalid or overly restrictive limit.

Yes,  this can overflow on 32-bit systems. I’ll fix it in v4.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed in hugepage-mremap
  2026-04-02  9:05           ` Lorenzo Stoakes (Oracle)
@ 2026-04-03 17:41             ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-03 17:41 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), David Hildenbrand (Arm)
  Cc: Andrew Morton, Shuah Khan, linux-mm, linux-kernel,
	linux-kselftest, Ritesh Harjani, Zi Yan, Michal Hocko,
	Oscar Salvador, Lorenzo Stoakes, Dev Jain, Liam.Howlett,
	linuxppc-dev, Venkat Rao Bagalkote



On 02/04/26 14:35, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Apr 02, 2026 at 09:33:29AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/1/26 22:39, Sayali Patil wrote:
>>>
>>>
>>> On 01/04/26 20:10, Lorenzo Stoakes (Oracle) wrote:
>>>> On Wed, Apr 01, 2026 at 04:21:55PM +0200, David Hildenbrand (Arm) wrote:
>>>>
>>>> OK so digging in:
>>>>
>>>> mremap -> ... -> vrm_set_new_addr() -> get_unmapped_area() -> ... (in
>>>> ppc arch
>>>> code) -> slice_get_unmapped_area():
>>>>
>>>> unsigned long slice_get_unmapped_area(unsigned long addr, unsigned
>>>> long len,
>>>>                        unsigned long flags, unsigned int psize,
>>>>                        int topdown)
>>>> {
>>>>      ...
>>>>      /* bunch of checks */
>>>>
>>>>      /* If we have MAP_FIXED and failed the above steps, then error out */
>>>>      if (fixed)
>>>>          return -EBUSY;
>>>>
>>>>      ...
>>>> }
>>>>
>>>> Is presumably where we hit the issue.
>>>>
>>>>>
>>>>> That is weird. An mremap(MREMAP_FIXED) is really just an munmap() +
>>>>> move.
>>>>
>>>> Yeah the weird bit I guess is that we _still_ invoke
>>>> get_unmapped_area() but
>>>> with MAP_FIXED set to indicate that we want the specific address, so it's
>>>> subject to the above checks.
>>>>
>>>>>
>>>>> Are we sure this is not some actual problem in the hugetlb
>>>>> implementation?
>>>>
>>>> It seems the 'slices' check sees if the _target address_ has an
>>>> equivalent page
>>>> size, presumably hugetlb-mandated, and fails if they're not
>>>> equivalent, so this
>>>> change is just accounting for that.
>>>>
>>> Yes, this change accounts for that by ensuring the destination is
>>> created with MAP_HUGETLB so it has the same page size as the source.
>>
>> Okay, weird, so it's the right thing to do to cover all odd arch behavior.
>>
>>>>
>>>>>
>>>>>
>>>>> But then the test suddenly requires more hugetlb pages, no? I don't see
>>>>> a good reason for the MAP_POPULATE, really. It will be discarded
>>>>> either way.
>>>>
>>>> Yeah I'm not sure about the MAP_POPULATE being all that important here.
>>>>
>>> As far as I understand, without MAP_POPULATE, memory accesses would
>>> trigger userfaults, and since the test is single-threaded and has no
>>> background handler for the uffd, it would deadlock. MAP_POPULATE ensures
>>> the test runs correctly by prefaulting all pages, but please let me know
>>> if I’m mistaken.
>>
>> So you are saying the test would deadlock if you are not adding
>> MAP_POPULATE? If so, please double check if that is actually the case.
>>
>> And if it's actually the case, please carefully document that in the
>> patch description, and probably as a comment above the MAP_POPULATE usage.
> 
> Do keep in mind MAP_POPULATE is not _guaranteed_ to work :)
> 
> For guaranteed populate you need madvise(..., MADV_POPULATE_[READ/WRITE]) or to
> directly fault in.
> 
>>
>> --
>> Cheers,
>>
>> David
> 
> Cheers, Lorenzo
> 
Thanks David and Lorenzo for the input.
I tested without MAP_POPULATE and the test works fine without it.
I will remove it in the next version.

Thanks,
Sayali

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap
  2026-04-02  7:31       ` David Hildenbrand (Arm)
@ 2026-04-03 17:41         ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-03 17:41 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, Shuah Khan, linux-mm,
	linux-kernel, linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev, Venkat Rao Bagalkote



On 02/04/26 13:01, David Hildenbrand (Arm) wrote:
> On 4/1/26 16:43, Sayali Patil wrote:
>>
>> On 01/04/26 19:48, David Hildenbrand (Arm) wrote:
>>> On 3/27/26 08:16, Sayali Patil wrote:
>>>> Previously, register_region_with_uffd() created a new anonymous
>>>> mapping and overwrote the address supplied by the caller before
>>>> registering the range with userfaultfd.
>>>>
>>>> As a result, userfaultfd was applied to an unrelated anonymous mapping
>>>> instead of the hugetlb region used by the test.
>>>>
>>>> Remove the extra mmap() and register the caller-provided address range
>>>> directly using UFFDIO_REGISTER_MODE_MISSING, so that faults are
>>>> generated for the hugetlb mapping used by the test.
>>>>
>>>> This ensures userfaultfd operates on the actual hugetlb test region and
>>>> validates the expected fault handling.
>>>>
>>>> Before patch:
>>>>   running ./hugepage-mremap
>>>>   -------------------------
>>>>   TAP version 13
>>>>   1..1
>>>>    Map haddr: Returned address is 0x7eaa40000000
>>>>    Map daddr: Returned address is 0x7daa40000000
>>>>    Map vaddr: Returned address is 0x7faa40000000
>>>>    Address returned by mmap() = 0x7fff9d000000
>>>>    Mremap: Returned address is 0x7faa40000000
>>>>    First hex is 0
>>>>    First hex is 3020100
>>>>   ok 1 Read same data
>>>>   Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>>   [PASS]
>>>>   ok 1 hugepage-mremap
>>>>
>>>> After patch:
>>>>   running ./hugepage-mremap
>>>>   -------------------------
>>>>   TAP version 13
>>>>   1..1
>>>>    Map haddr: Returned address is 0x7eaa40000000
>>>>    Map daddr: Returned address is 0x7daa40000000
>>>>    Map vaddr: Returned address is 0x7faa40000000
>>>>    Registered memory at address 0x7eaa40000000 with userfaultfd
>>>>    Mremap: Returned address is 0x7faa40000000
>>>>    First hex is 0
>>>>    First hex is 3020100
>>>>   ok 1 Read same data
>>>>   Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>>   [PASS]
>>>>   ok 1 hugepage-mremap
>>> Okay, so we tested mremap() of something that is not even hugetlb.
>>>
>>>> Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test")
>>>> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>>> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
>>>> ---
>>>>   tools/testing/selftests/mm/hugepage-mremap.c | 21 +++++---------------
>>>>   1 file changed, 5 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c
>>>> index b8f7d92e5a35..e611249080d6 100644
>>>> --- a/tools/testing/selftests/mm/hugepage-mremap.c
>>>> +++ b/tools/testing/selftests/mm/hugepage-mremap.c
>>>> @@ -85,25 +85,14 @@ static void register_region_with_uffd(char *addr, size_t len)
>>>>   	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
>>>>   		ksft_exit_fail_msg("ioctl-UFFDIO_API: %s\n", strerror(errno));
>>>>   
>>>> -	/* Create a private anonymous mapping. The memory will be
>>>> -	 * demand-zero paged--that is, not yet allocated. When we
>>>> -	 * actually touch the memory, it will be allocated via
>>>> -	 * the userfaultfd.
>>>> -	 */
>>>> -
>>>> -	addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
>>>> -		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>> -	if (addr == MAP_FAILED)
>>>> -		ksft_exit_fail_msg("mmap: %s\n", strerror(errno));
>>>> -
>>>> -	ksft_print_msg("Address returned by mmap() = %p\n", addr);
>>>> -
>>>> -	/* Register the memory range of the mapping we just created for
>>>> -	 * handling by the userfaultfd object. In mode, we request to track
>>>> -	 * missing pages (i.e., pages that have not yet been faulted in).
>>>> +	/* Register the passed memory range for handling by the userfaultfd object.
>>> /*
>>>   * ...
>>>
>>> While at it.
>>>
>>>> +	 * In mode, we request to track missing pages
>>>> +	 * (i.e., pages that have not yet been faulted in).
>>>>   	 */
>>>>   	if (uffd_register(uffd, addr, len, true, false, false))
>>>>   		ksft_exit_fail_msg("ioctl-UFFDIO_REGISTER: %s\n", strerror(errno));
>>>> +
>>>> +	ksft_print_msg("Registered memory at address %p with userfaultfd\n", addr);
>>>>   }
>>>>   
>>>>   int main(int argc, char *argv[])
>>> Yes, that code is extremely weird. I wonder if this was some
>>> copy-and-paste from other uffd test code.
>>>
>>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>>
>>>
>> Hi David,
>>
>> Yes, the test operates on hugetlb mappings created with
>> |MAP_HUGETLB | MAP_POPULATE|and sets up userfaultfd. Consequently,
>> registering it with |UFFDIO_REGISTER_MODE_MISSING| does not result in
>> any userfaults.
>>
>> Originally, the helper function created a separate anonymous mapping and
>> registered it with userfaultfd instead of the address supplied by the
>> caller. However, the test operates on hugetlb mappings, and the registered
>> anonymous mapping is never used in the |mremap()| path being exercised.
>>
>> Would it be better to remove userfaultfd registration entirely from this
>> test, since that path is not actually being tested?
> 
> If it's tested with your change now (which I think that's what
> happenes), this is fine.
> 
> It was just very weird before, because it tested something fairly unrelated.
> 
Thanks for the review. Yes, tested with this change and it behaves as 
expected now.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks in hugetlb_reparenting_test.sh
  2026-04-01 14:08   ` David Hildenbrand (Arm)
@ 2026-04-03 19:59     ` Sayali Patil
  0 siblings, 0 replies; 39+ messages in thread
From: Sayali Patil @ 2026-04-03 19:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, Shuah Khan, linux-mm,
	linux-kernel, linux-kselftest, Ritesh Harjani
  Cc: Zi Yan, Michal Hocko, Oscar Salvador, Lorenzo Stoakes, Dev Jain,
	Liam.Howlett, linuxppc-dev



On 01/04/26 19:38, David Hildenbrand (Arm) wrote:
> On 3/27/26 08:15, Sayali Patil wrote:
>> Launch write_to_hugetlbfs as a separate process and move only its PID
>> into the target cgroup before waiting for completion. This avoids moving
>> the test shell itself, prevents unintended charging to the shell, and
>> ensures hugetlb and memcg accounting is attributed only to the intended
>> workload.
>>
>> Add a short delay before the hugetlb allocation to avoid a race where
>> memory may be charged before the task migration takes effect, which
>> can lead to incorrect accounting and intermittent test failures.
> 
> Isn't there still a chance for a race, for example, when running in a VM?
> 
Yes, there is still a small race window in the current approach.

I am looking into making this more reliable with a deterministic 
synchronization mechanism to avoid such timing dependencies.

I will send a v4 with this improvement.

Thanks,
Sayali

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2026-04-03 19:59 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  7:15 [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Sayali Patil
2026-03-27  7:15 ` [PATCH v3 01/13] selftests/mm: restore default nr_hugepages value during cleanup in charge_reserved_hugetlb.sh Sayali Patil
2026-03-27  7:15 ` [PATCH v3 02/13] selftests/mm: fix hugetlb pathname construction " Sayali Patil
2026-04-01 14:06   ` David Hildenbrand (Arm)
2026-03-27  7:15 ` [PATCH v3 03/13] selftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.sh Sayali Patil
2026-04-01 14:06   ` David Hildenbrand (Arm)
2026-03-27  7:15 ` [PATCH v3 04/13] selftest/mm: fix cgroup task placement and drop memory.current checks " Sayali Patil
2026-04-01 14:08   ` David Hildenbrand (Arm)
2026-04-03 19:59     ` Sayali Patil
2026-03-27  7:15 ` [PATCH v3 05/13] selftests/mm: size tmpfs according to PMD page size in split_huge_page_test Sayali Patil
2026-03-27  7:16 ` [PATCH v3 06/13] selftest/mm: adjust hugepage-mremap test size for large huge pages Sayali Patil
2026-04-01 14:10   ` David Hildenbrand (Arm)
2026-04-01 20:45     ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 07/13] selftest/mm: register existing mapping with userfaultfd in hugepage-mremap Sayali Patil
2026-04-01 14:18   ` David Hildenbrand (Arm)
     [not found]     ` <7b6652f3-c994-4ef4-87a4-5473cd1254b7@linux.ibm.com>
2026-04-02  7:31       ` David Hildenbrand (Arm)
2026-04-03 17:41         ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 08/13] selftests/mm: ensure destination is hugetlb-backed " Sayali Patil
2026-04-01 14:21   ` David Hildenbrand (Arm)
2026-04-01 14:40     ` Lorenzo Stoakes (Oracle)
2026-04-01 20:39       ` Sayali Patil
2026-04-02  7:33         ` David Hildenbrand (Arm)
2026-04-02  9:05           ` Lorenzo Stoakes (Oracle)
2026-04-03 17:41             ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 09/13] selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported Sayali Patil
2026-04-02  6:59   ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 10/13] selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero Sayali Patil
2026-04-01 14:23   ` David Hildenbrand (Arm)
2026-03-27  7:16 ` [PATCH v3 11/13] selftests/mm: fix double increment in linked list cleanup in compaction_test Sayali Patil
2026-04-01 14:32   ` Sayali Patil
2026-04-01 14:39     ` David Hildenbrand (Arm)
2026-04-01 17:33       ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 12/13] selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category Sayali Patil
2026-04-02  7:15   ` Sayali Patil
2026-03-27  7:16 ` [PATCH v3 13/13] selftests/cgroup: extend test_hugetlb_memcg.c to support all huge page sizes Sayali Patil
2026-04-03 17:16   ` Sayali Patil
2026-03-27 18:11 ` [PATCH v3 00/13] selftests/mm: fix failures and robustness improvements Andrew Morton
     [not found]   ` <09104413-483f-4852-9d7e-71e0f86a1754@linux.ibm.com>
2026-03-30 22:11     ` Andrew Morton
2026-04-01 14:05       ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox