[PATCH 0/2] selftests/mm: uffd-stress fixes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] selftests/mm: uffd-stress fixes
@ 2025-08-26  7:07 Dev Jain
  2025-08-26  7:07 ` [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory Dev Jain
  2025-08-26  7:07 ` [PATCH 2/2] selftests/mm/uffd-stress: Stricten constraint on free hugepages before the test Dev Jain
  0 siblings, 2 replies; 5+ messages in thread
From: Dev Jain @ 2025-08-26  7:07 UTC (permalink / raw)
  To: akpm, david, shuah
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	npache, ryan.roberts, linux-mm, linux-kselftest, linux-kernel,
	Dev Jain

This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature of
the test. Patch 1 corrects the hugepage constraint in the run_vmtests.sh
script, whereas patch 2 corrects the constraint in the test itself.

Dev Jain (2):
  selftests/mm/uffd-stress: Make test operate on less hugetlb memory
  selftests/mm/uffd-stress: Stricten constraint on free hugepages before
    the test

 tools/testing/selftests/mm/run_vmtests.sh | 2 +-
 tools/testing/selftests/mm/uffd-stress.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory
  2025-08-26  7:07 [PATCH 0/2] selftests/mm: uffd-stress fixes Dev Jain
@ 2025-08-26  7:07 ` Dev Jain
  2025-08-28 14:50   ` Ryan Roberts
  2025-08-26  7:07 ` [PATCH 2/2] selftests/mm/uffd-stress: Stricten constraint on free hugepages before the test Dev Jain
  1 sibling, 1 reply; 5+ messages in thread
From: Dev Jain @ 2025-08-26  7:07 UTC (permalink / raw)
  To: akpm, david, shuah
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	npache, ryan.roberts, linux-mm, linux-kselftest, linux-kernel,
	Dev Jain

We observed uffd-stress selftest failure on arm64 and intermittent failures
on x86 too:
running ./uffd-stress hugetlb-private 128 32

bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL]
not ok 18 uffd-stress hugetlb-private 128 32 # exit=1

For this particular case, the number of free hugepages from run_vmtests.sh
will be 128, and the test will allocate 64 hugepages in the source
location. The stress() function will start spawning threads which will
operate on the destination location, triggering uffd-operations like
UFFDIO_COPY from src to dst, which means that we will require 64 more
hugepages for the dst location.

Let us observe the locking_thread() function. It will lock the mutex kept
at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst)
hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that
two threads trying to lock the mutex at dst, try to do so at the same
hugepage number. If one thread succeeds in reserving the last hugepage,
then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM.
I can confirm that this is indeed the case by this hacky patch:

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 753f99b4c718..39eb21d8a91b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,

 		folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
 		if (IS_ERR(folio)) {
+			pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
+			if (actual_pte) {
+				ret = -EEXIST;
+				goto out;
+			}
 			ret = -ENOMEM;
 			goto out;
 		}

This code path gets triggered indicating that the PMD at which one thread
is trying to map a hugepage, gets filled by a racing thread.

Therefore, instead of using freepgs to compute the amount of memory,
use freepgs - 10, so that the test still has some extra hugepages to use.
Note that, in case this value underflows, there is a check for the number
of free hugepages in the test itself, which will fail, so we are safe.

Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 471e539d82b8..6a9f435be7a1 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -326,7 +326,7 @@ CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
 # the size of the free pages we have, which is used for *each*.
 # uffd-stress expects a region expressed in MiB, so we adjust
 # half_ufd_size_MB accordingly.
-half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
+half_ufd_size_MB=$((((freepgs - 10) * hpgsize_KB) / 1024 / 2))
 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] selftests/mm/uffd-stress: Stricten constraint on free hugepages before the test
  2025-08-26  7:07 [PATCH 0/2] selftests/mm: uffd-stress fixes Dev Jain
  2025-08-26  7:07 ` [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory Dev Jain
@ 2025-08-26  7:07 ` Dev Jain
  1 sibling, 0 replies; 5+ messages in thread
From: Dev Jain @ 2025-08-26  7:07 UTC (permalink / raw)
  To: akpm, david, shuah
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	npache, ryan.roberts, linux-mm, linux-kselftest, linux-kernel,
	Dev Jain

The test requires at least 2 * (bytes/page_size) hugetlb memory, since
we require identical number of hugepages for src and dst location. Fix
this.

Along with the above, as explained in patch "selftests/mm/uffd-stress:
Make test operate on less hugetlb memory", the racy nature of the test
requires that we have some extra number of hugepages left beyond what is
required. Therefore, stricten this constraint.

Fixes: 5a6aa60d1823 ("selftests/mm: skip uffd hugetlb tests with insufficient hugepages")
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 tools/testing/selftests/mm/uffd-stress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 40af7f67c407..eb0b37f08061 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -449,7 +449,7 @@ int main(int argc, char **argv)
 	bytes = atol(argv[2]) * 1024 * 1024;
 
 	if (test_type == TEST_HUGETLB &&
-	   get_free_hugepages() < bytes / page_size) {
+	   get_free_hugepages() < 2 * (bytes / page_size) + 10) {
 		printf("skip: Skipping userfaultfd... not enough hugepages\n");
 		return KSFT_SKIP;
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory
  2025-08-26  7:07 ` [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory Dev Jain
@ 2025-08-28 14:50   ` Ryan Roberts
  2025-08-29 11:09     ` Dev Jain
  0 siblings, 1 reply; 5+ messages in thread
From: Ryan Roberts @ 2025-08-28 14:50 UTC (permalink / raw)
  To: Dev Jain, akpm, david, shuah
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	npache, linux-mm, linux-kselftest, linux-kernel

On 26/08/2025 08:07, Dev Jain wrote:
> We observed uffd-stress selftest failure on arm64 and intermittent failures
> on x86 too:
> running ./uffd-stress hugetlb-private 128 32
> 
> bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL]
> not ok 18 uffd-stress hugetlb-private 128 32 # exit=1
> 
> For this particular case, the number of free hugepages from run_vmtests.sh
> will be 128, and the test will allocate 64 hugepages in the source
> location. The stress() function will start spawning threads which will
> operate on the destination location, triggering uffd-operations like
> UFFDIO_COPY from src to dst, which means that we will require 64 more
> hugepages for the dst location.
> 
> Let us observe the locking_thread() function. It will lock the mutex kept
> at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst)
> hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that
> two threads trying to lock the mutex at dst, try to do so at the same
> hugepage number. If one thread succeeds in reserving the last hugepage,
> then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM.
> I can confirm that this is indeed the case by this hacky patch:
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 753f99b4c718..39eb21d8a91b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
>  
>  		folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
>  		if (IS_ERR(folio)) {
> +			pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
> +			if (actual_pte) {
> +				ret = -EEXIST;
> +				goto out;
> +			}
>  			ret = -ENOMEM;
>  			goto out;
>  		}
> 
> This code path gets triggered indicating that the PMD at which one thread
> is trying to map a hugepage, gets filled by a racing thread.
> 
> Therefore, instead of using freepgs to compute the amount of memory,
> use freepgs - 10, so that the test still has some extra hugepages to use.
> Note that, in case this value underflows, there is a check for the number
> of free hugepages in the test itself, which will fail, so we are safe.
> 
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>  tools/testing/selftests/mm/run_vmtests.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
> index 471e539d82b8..6a9f435be7a1 100755
> --- a/tools/testing/selftests/mm/run_vmtests.sh
> +++ b/tools/testing/selftests/mm/run_vmtests.sh
> @@ -326,7 +326,7 @@ CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
>  # the size of the free pages we have, which is used for *each*.
>  # uffd-stress expects a region expressed in MiB, so we adjust
>  # half_ufd_size_MB accordingly.
> -half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
> +half_ufd_size_MB=$((((freepgs - 10) * hpgsize_KB) / 1024 / 2))

Why 10? I don't know much about uffd-stress but the comment at the top says it
runs 3 threads per CPU, so does the number of potential races increase with the
number of CPUs? Perhaps this number needs to be a function of nrcpu?

I tested it and it works though so:

Tested-by: Ryan Roberts <ryan.roberts@arm.com>

>  CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
>  CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
>  CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory
  2025-08-28 14:50   ` Ryan Roberts
@ 2025-08-29 11:09     ` Dev Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Dev Jain @ 2025-08-29 11:09 UTC (permalink / raw)
  To: Ryan Roberts, akpm, david, shuah
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	npache, linux-mm, linux-kselftest, linux-kernel


On 28/08/25 8:20 pm, Ryan Roberts wrote:
> On 26/08/2025 08:07, Dev Jain wrote:
>> We observed uffd-stress selftest failure on arm64 and intermittent failures
>> on x86 too:
>> running ./uffd-stress hugetlb-private 128 32
>>
>> bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL]
>> not ok 18 uffd-stress hugetlb-private 128 32 # exit=1
>>
>> For this particular case, the number of free hugepages from run_vmtests.sh
>> will be 128, and the test will allocate 64 hugepages in the source
>> location. The stress() function will start spawning threads which will
>> operate on the destination location, triggering uffd-operations like
>> UFFDIO_COPY from src to dst, which means that we will require 64 more
>> hugepages for the dst location.
>>
>> Let us observe the locking_thread() function. It will lock the mutex kept
>> at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst)
>> hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that
>> two threads trying to lock the mutex at dst, try to do so at the same
>> hugepage number. If one thread succeeds in reserving the last hugepage,
>> then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM.
>> I can confirm that this is indeed the case by this hacky patch:
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 753f99b4c718..39eb21d8a91b 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
>>   
>>   		folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
>>   		if (IS_ERR(folio)) {
>> +			pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
>> +			if (actual_pte) {
>> +				ret = -EEXIST;
>> +				goto out;
>> +			}
>>   			ret = -ENOMEM;
>>   			goto out;
>>   		}
>>
>> This code path gets triggered indicating that the PMD at which one thread
>> is trying to map a hugepage, gets filled by a racing thread.
>>
>> Therefore, instead of using freepgs to compute the amount of memory,
>> use freepgs - 10, so that the test still has some extra hugepages to use.
>> Note that, in case this value underflows, there is a check for the number
>> of free hugepages in the test itself, which will fail, so we are safe.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   tools/testing/selftests/mm/run_vmtests.sh | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>> index 471e539d82b8..6a9f435be7a1 100755
>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>> @@ -326,7 +326,7 @@ CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
>>   # the size of the free pages we have, which is used for *each*.
>>   # uffd-stress expects a region expressed in MiB, so we adjust
>>   # half_ufd_size_MB accordingly.
>> -half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
>> +half_ufd_size_MB=$((((freepgs - 10) * hpgsize_KB) / 1024 / 2))
> Why 10? I don't know much about uffd-stress but the comment at the top says it
> runs 3 threads per CPU, so does the number of potential races increase with the
> number of CPUs? Perhaps this number needs to be a function of nrcpu?

Yes the race will amplify with nrcpus, technically we need nr_cpus - 1 extra
hugepages - the worst case is that all threads will try to perform uffd-copy on
the same address, one of them will reserve the last hugepage and others will
fail. 10 was just a random number; I see that run_vmtests.sh already has
nr_cpus computed so I can easily reuse that. I'll send a v2.

>
> I tested it and it works though so:
>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks.

>
>>   CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
>>   CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
>>   CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-29 11:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26  7:07 [PATCH 0/2] selftests/mm: uffd-stress fixes Dev Jain
2025-08-26  7:07 ` [PATCH 1/2] selftests/mm/uffd-stress: Make test operate on less hugetlb memory Dev Jain
2025-08-28 14:50   ` Ryan Roberts
2025-08-29 11:09     ` Dev Jain
2025-08-26  7:07 ` [PATCH 2/2] selftests/mm/uffd-stress: Stricten constraint on free hugepages before the test Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).