All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim
@ 2026-05-27  5:08 Wei Gao via ltp
  2026-05-27  5:31 ` [LTP] " linuxtestproject.agent
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Wei Gao via ltp @ 2026-05-27  5:08 UTC (permalink / raw)
  To: ltp

High memory pressure can cause MemFree to temporarily drop below the
min_free_kbytes threshold before the kernel reclaimer can catch up.
This results in intermittent test failures, particularly observed on
openQA aarch64 machines.

Implement a 1-second grace period with exponential backoff polling
(from 1ms up to 512ms) in check_monitor() to allow the kernel time to
reclaim memory.

Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v1->v2:
- Combine TINFO and TFAIL messages in check_monitor() for cleaner output.
- Remove end = 0;

 .../kernel/mem/tunable/min_free_kbytes.c      | 33 +++++++++++++------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/testcases/kernel/mem/tunable/min_free_kbytes.c b/testcases/kernel/mem/tunable/min_free_kbytes.c
index a62e4ae9d..e0342ef06 100644
--- a/testcases/kernel/mem/tunable/min_free_kbytes.c
+++ b/testcases/kernel/mem/tunable/min_free_kbytes.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /*
- * Copyright (c) Linux Test Project, 2012-2025
+ * Copyright (c) Linux Test Project, 2012-2026
  * Copyright (C) 2012-2017  Red Hat, Inc.
  */
 
@@ -140,14 +140,13 @@ static void test_tune(unsigned long overcommit_policy)
 		} else {
 			if (WIFEXITED(status)) {
 				if (WEXITSTATUS(status) != 0) {
-					tst_res(TFAIL, "child unexpectedly "
-						 "failed: %d", status);
+					tst_res(TFAIL, "child unexpectedly failed: %d",
+						status);
 				}
 			} else if (!WIFSIGNALED(status) ||
 				   WTERMSIG(status) != SIGKILL) {
-				tst_res(TFAIL,
-					 "child unexpectedly failed: %d",
-					 status);
+				tst_res(TFAIL, "child unexpectedly failed: %d",
+					status);
 			}
 		}
 	}
@@ -183,18 +182,32 @@ static void check_monitor(void)
 {
 	unsigned long tune;
 	unsigned long memfree;
+	int i;
 
 	while (!end) {
 		memfree = SAFE_READ_MEMINFO("MemFree:");
 		tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
 
 		if (memfree < tune) {
-			tst_res(TINFO, "MemFree is %lu kB, "
-				 "min_free_kbytes is %lu kB", memfree, tune);
-			tst_res(TFAIL, "MemFree < min_free_kbytes");
+			/*
+			 * Give it some time to reclaim. The kernel should keep
+			 * MemFree above min_free_kbytes, but transient drops
+			 * are possible under high pressure.
+			 */
+			for (i = 1; i < 1024; i *= 2) {
+				usleep(i * 1000);
+				memfree = SAFE_READ_MEMINFO("MemFree:");
+				if (memfree >= tune)
+					break;
+			}
+
+			if (memfree < tune) {
+				tst_res(TFAIL, "MemFree %lu kB < min_free_kbytes %lu kB",
+					memfree, tune);
+			}
 		}
 
-		sleep(2);
+		usleep(100000);
 	}
 }
 
-- 
2.54.0


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [LTP] mem/min_free_kbytes: Add grace period for memory reclaim
  2026-05-27  5:08 [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim Wei Gao via ltp
@ 2026-05-27  5:31 ` linuxtestproject.agent
  2026-05-27 15:40 ` [LTP] [PATCH v2] " Cyril Hrubis
  2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
  2 siblings, 0 replies; 12+ messages in thread
From: linuxtestproject.agent @ 2026-05-27  5:31 UTC (permalink / raw)
  To: Wei Gao; +Cc: ltp

Hi Wei,

On Wed, May 27 2026, Wei Gao wrote:
> mem/min_free_kbytes: Add grace period for memory reclaim

> +for (i = 1; i < 1024; i *= 2) {
> +usleep(i * 1000);
> +memfree = SAFE_READ_MEMINFO("MemFree:");
> +if (memfree >= tune)
> +break;
> +}

The exponential-backoff polling is the right approach here — async kernel
reclaim is exactly the case Ground Rule 2 calls out for this pattern.

> -sleep(2);
> +usleep(100000);

Reducing the polling interval from 2s to 100ms is fine.

[...]

All good. Approved.

---
Note:

The agent can sometimes produce false positives although often its
findings are genuine. If you find issues with the review, please
comment this email or ignore the suggestions.

Regards,
LTP AI Reviewer

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim
  2026-05-27  5:08 [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim Wei Gao via ltp
  2026-05-27  5:31 ` [LTP] " linuxtestproject.agent
@ 2026-05-27 15:40 ` Cyril Hrubis
  2026-05-29 16:07   ` Petr Vorel
  2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
  2 siblings, 1 reply; 12+ messages in thread
From: Cyril Hrubis @ 2026-05-27 15:40 UTC (permalink / raw)
  To: Wei Gao; +Cc: ltp

Hi!
> High memory pressure can cause MemFree to temporarily drop below the
> min_free_kbytes threshold before the kernel reclaimer can catch up.
> This results in intermittent test failures, particularly observed on
> openQA aarch64 machines.
> 
> Implement a 1-second grace period with exponential backoff polling
> (from 1ms up to 512ms) in check_monitor() to allow the kernel time to
> reclaim memory.
> 
> Reviewed-by: Petr Vorel <pvorel@suse.cz>
> Signed-off-by: Wei Gao <wegao@suse.com>
> ---
> v1->v2:
> - Combine TINFO and TFAIL messages in check_monitor() for cleaner output.
> - Remove end = 0;
> 
>  .../kernel/mem/tunable/min_free_kbytes.c      | 33 +++++++++++++------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/testcases/kernel/mem/tunable/min_free_kbytes.c b/testcases/kernel/mem/tunable/min_free_kbytes.c
> index a62e4ae9d..e0342ef06 100644
> --- a/testcases/kernel/mem/tunable/min_free_kbytes.c
> +++ b/testcases/kernel/mem/tunable/min_free_kbytes.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-or-later
>  /*
> - * Copyright (c) Linux Test Project, 2012-2025
> + * Copyright (c) Linux Test Project, 2012-2026
>   * Copyright (C) 2012-2017  Red Hat, Inc.
>   */
>  
> @@ -140,14 +140,13 @@ static void test_tune(unsigned long overcommit_policy)
>  		} else {
>  			if (WIFEXITED(status)) {
>  				if (WEXITSTATUS(status) != 0) {
> -					tst_res(TFAIL, "child unexpectedly "
> -						 "failed: %d", status);
> +					tst_res(TFAIL, "child unexpectedly failed: %d",
> +						status);

We do have tst_strstatus().

>  				}
>  			} else if (!WIFSIGNALED(status) ||
>  				   WTERMSIG(status) != SIGKILL) {
> -				tst_res(TFAIL,
> -					 "child unexpectedly failed: %d",
> -					 status);
> +				tst_res(TFAIL, "child unexpectedly failed: %d",
> +					status);
>  			}
>  		}
>  	}
> @@ -183,18 +182,32 @@ static void check_monitor(void)
>  {
>  	unsigned long tune;
>  	unsigned long memfree;
> +	int i;
>  
>  	while (!end) {
>  		memfree = SAFE_READ_MEMINFO("MemFree:");
>  		tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
>  
>  		if (memfree < tune) {
> -			tst_res(TINFO, "MemFree is %lu kB, "
> -				 "min_free_kbytes is %lu kB", memfree, tune);
> -			tst_res(TFAIL, "MemFree < min_free_kbytes");
> +			/*
> +			 * Give it some time to reclaim. The kernel should keep
> +			 * MemFree above min_free_kbytes, but transient drops
> +			 * are possible under high pressure.
> +			 */
> +			for (i = 1; i < 1024; i *= 2) {
> +				usleep(i * 1000);
> +				memfree = SAFE_READ_MEMINFO("MemFree:");
> +				if (memfree >= tune)
> +					break;
> +			}
> +
> +			if (memfree < tune) {
> +				tst_res(TFAIL, "MemFree %lu kB < min_free_kbytes %lu kB",
> +					memfree, tune);
> +			}
>  		}

Looks good.

Reviewed-by: Cyril Hrubis <chrubis@suse.cz>

I think that we also want to change the test so that the monitor is
started and stopped for each testcase with a specific value we set the
min_free_kbytes to. Running it asynchronously like this may mean that we
will be looking for a wrong value for the second if we are unlucky. But
that can be done later on.

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim
  2026-05-27 15:40 ` [LTP] [PATCH v2] " Cyril Hrubis
@ 2026-05-29 16:07   ` Petr Vorel
  2026-05-31 13:51     ` Wei Gao via ltp
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vorel @ 2026-05-29 16:07 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: ltp

Hi all,

> > @@ -140,14 +140,13 @@ static void test_tune(unsigned long overcommit_policy)
> >  		} else {
> >  			if (WIFEXITED(status)) {
> >  				if (WEXITSTATUS(status) != 0) {
> > -					tst_res(TFAIL, "child unexpectedly "
> > -						 "failed: %d", status);
> > +					tst_res(TFAIL, "child unexpectedly failed: %d",
> > +						status);

> We do have tst_strstatus().

+1, I fixed this as a separate change:
https://github.com/linux-test-project/ltp/commit/d81526f7da5645a04d6e03e557d3c829b67b3c57

> >  				}
> >  			} else if (!WIFSIGNALED(status) ||
> >  				   WTERMSIG(status) != SIGKILL) {
> > -				tst_res(TFAIL,
> > -					 "child unexpectedly failed: %d",
> > -					 status);
> > +				tst_res(TFAIL, "child unexpectedly failed: %d",
> > +					status);
> >  			}
> >  		}
> >  	}
> > @@ -183,18 +182,32 @@ static void check_monitor(void)
> >  {
> >  	unsigned long tune;
> >  	unsigned long memfree;
> > +	int i;

> >  	while (!end) {
> >  		memfree = SAFE_READ_MEMINFO("MemFree:");
> >  		tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);

> >  		if (memfree < tune) {
> > -			tst_res(TINFO, "MemFree is %lu kB, "
> > -				 "min_free_kbytes is %lu kB", memfree, tune);
> > -			tst_res(TFAIL, "MemFree < min_free_kbytes");
> > +			/*
> > +			 * Give it some time to reclaim. The kernel should keep
> > +			 * MemFree above min_free_kbytes, but transient drops
> > +			 * are possible under high pressure.
> > +			 */
> > +			for (i = 1; i < 1024; i *= 2) {
> > +				usleep(i * 1000);
> > +				memfree = SAFE_READ_MEMINFO("MemFree:");
> > +				if (memfree >= tune)
> > +					break;
> > +			}
> > +
> > +			if (memfree < tune) {
> > +				tst_res(TFAIL, "MemFree %lu kB < min_free_kbytes %lu kB",
> > +					memfree, tune);
> > +			}
> >  		}

> Looks good.

> Reviewed-by: Cyril Hrubis <chrubis@suse.cz>

> I think that we also want to change the test so that the monitor is
> started and stopped for each testcase with a specific value we set the
> min_free_kbytes to. Running it asynchronously like this may mean that we
> will be looking for a wrong value for the second if we are unlucky. But
> that can be done later on.

@Wei Unfortunately this does not help on the current stable kernels (at least
not on 7.0.10 on Tumbleweed. We discussed it with Vlastimil Babka and Cyril
Hrubis and the conclusion  is to start with running the monitor synchronously
with each subtestcase and making sure MemFree is big enough before we start the
monitor and the process that creates memory stress.
Also, please rebase when doing changes.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-05-27  5:08 [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim Wei Gao via ltp
  2026-05-27  5:31 ` [LTP] " linuxtestproject.agent
  2026-05-27 15:40 ` [LTP] [PATCH v2] " Cyril Hrubis
@ 2026-05-31 13:40 ` Wei Gao via ltp
  2026-06-01  6:42   ` [LTP] " linuxtestproject.agent
  2026-06-02  1:00   ` [LTP] [PATCH v4] " Wei Gao via ltp
  2 siblings, 2 replies; 12+ messages in thread
From: Wei Gao via ltp @ 2026-05-31 13:40 UTC (permalink / raw)
  To: ltp

High memory pressure can cause MemFree to temporarily drop below the
min_free_kbytes threshold before the kernel reclaimer can catch up.
This results in intermittent test failures, particularly observed on
openQA aarch64 machines where swap is exhausted.

Implement a 2-second grace period with high-accuracy 10ms fixed polling
in check_monitor() to allow the kernel time to reclaim memory.

Introduce a 10% tolerance (90% threshold) for the MemFree check. Our
measurements showed that under extreme pressure, MemFree can take a
long time to recover to the exact 100% MIN_FREE_KBYTES, or may stay slightly
below it. This tolerance prevents false positives and avoids excessive
wait times while still ensuring memory is maintained near the required level.

Enhanced diagnostics are added to report MemAvailable and the minimum
memory level seen during the pressure period to aid in future
calibration.

Signed-off-by: Wei Gao <wegao@suse.com>
---
v2->v3:
- Switched from an exponential backoff retry loop to a fixed 10ms polling interval for up to 2 seconds. 
  This provides better resolution and a more predictable grace period for the kernel to reclaim memory.
- Introduced a 10% tolerance threshold (90% of min_free_kbytes). Memory levels staying within this range 
  after the grace period are now logged as info rather than failing, avoiding false positives on systems 
  under extreme pressure.
- Added enhanced diagnostics: the test now reports MemAvailable, the minimum MemFree level seen during 
  the pressure period, and the percentage of the threshold achieved.
- Refined logging to clearly distinguish between recovery, tolerance-level maintenance, and actual failures.

 .../kernel/mem/tunable/min_free_kbytes.c      | 49 ++++++++++++++++---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/testcases/kernel/mem/tunable/min_free_kbytes.c b/testcases/kernel/mem/tunable/min_free_kbytes.c
index 7882c6072..bd3821cf1 100644
--- a/testcases/kernel/mem/tunable/min_free_kbytes.c
+++ b/testcases/kernel/mem/tunable/min_free_kbytes.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /*
- * Copyright (c) Linux Test Project, 2012-2025
+ * Copyright (c) Linux Test Project, 2012-2026
  * Copyright (C) 2012-2017  Red Hat, Inc.
  */
 
@@ -177,20 +177,55 @@ static int eatup_mem(unsigned long overcommit_policy)
 
 static void check_monitor(void)
 {
-	unsigned long tune;
-	unsigned long memfree;
+	unsigned long tune, threshold;
+	unsigned long memfree, memavail, min_memfree;
+	int i, retry_count;
 
 	while (!end) {
 		memfree = SAFE_READ_MEMINFO("MemFree:");
 		tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
+		/*
+		 * Allow 10% tolerance to account for transient states.
+		 */
+		threshold = tune * 9 / 10;
 
 		if (memfree < tune) {
-			tst_res(TINFO, "MemFree is %lu kB, "
-				 "min_free_kbytes is %lu kB", memfree, tune);
-			tst_res(TFAIL, "MemFree < min_free_kbytes");
+			min_memfree = memfree;
+			retry_count = 0;
+			/*
+			 * Give it some time to reclaim. The kernel should keep
+			 * MemFree above min_free_kbytes, but transient drops
+			 * are possible under high pressure.
+			 * Check every 10ms for up to 2 seconds for high accuracy.
+			 */
+			for (i = 10; i <= 2000; i += 10) {
+				retry_count++;
+				usleep(10000);
+				memfree = SAFE_READ_MEMINFO("MemFree:");
+				if (memfree < min_memfree)
+					min_memfree = memfree;
+
+				if (memfree >= tune)
+					break;
+			}
+
+			memavail = SAFE_READ_MEMINFO("MemAvailable:");
+
+			if (memfree < threshold) {
+				tst_res(TINFO, "tune=%lu, threshold=%lu", tune, threshold);
+				tst_res(TINFO, "MemFree=%lu, MemAvailable=%lu, MinSeen=%lu (%lu%%)",
+					memfree, memavail, min_memfree, (min_memfree * 100 / tune));
+				tst_res(TFAIL, "MemFree < 90%% of min_free_kbytes after ~2s");
+			} else if (memfree < tune) {
+				tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%, avail %lu) after ~2s",
+					memfree, (min_memfree * 100 / tune), memavail);
+			} else {
+				tst_res(TINFO, "MemFree recovered to %lu (min %lu%%, avail %lu) after %d retries (~%d ms)",
+					memfree, (min_memfree * 100 / tune), memavail, retry_count, i);
+			}
 		}
 
-		sleep(2);
+		usleep(100000);
 	}
 }
 
-- 
2.54.0


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim
  2026-05-29 16:07   ` Petr Vorel
@ 2026-05-31 13:51     ` Wei Gao via ltp
  0 siblings, 0 replies; 12+ messages in thread
From: Wei Gao via ltp @ 2026-05-31 13:51 UTC (permalink / raw)
  To: Petr Vorel; +Cc: ltp

On Fri, May 29, 2026 at 06:07:08PM +0200, Petr Vorel wrote:
> Hi all,
> 
> 
> > I think that we also want to change the test so that the monitor is
> > started and stopped for each testcase with a specific value we set the
> > min_free_kbytes to. Running it asynchronously like this may mean that we
> > will be looking for a wrong value for the second if we are unlucky. But
> > that can be done later on.
> 
> @Wei Unfortunately this does not help on the current stable kernels (at least
> not on 7.0.10 on Tumbleweed. We discussed it with Vlastimil Babka and Cyril
> Hrubis and the conclusion  is to start with running the monitor synchronously
> with each subtestcase and making sure MemFree is big enough before we start the
> monitor and the process that creates memory stress.
> Also, please rebase when doing changes.

Base my latest v3 patch test result, it can pass Tumbleweed test now.
And if you check the log of patchv3, there is always big enough mem
before we start monitor, so i do not think monitor synchronously can
help, only increase pool time and threadhold can help.


tst_test.c:2042: TINFO: LTP version: 20260529
tst_test.c:2045: TINFO: Tested kernel: 7.0.10-2-default #1 SMP PREEMPT_DYNAMIC Sat May 23 12:09:09 UTC 2026 (bb95589) aarch64
tst_test.c:1864: TINFO: Test timeout is not limited
min_free_kbytes.c:89: TINFO: Setting /proc/sys/vm/overcommit_memory to 2
min_free_kbytes.c:93: TINFO: Setting /proc/sys/vm/min_free_kbytes to 45056
memfree is 1200068 kB before eatup mem   <<<<<<<
memfree is 74972 kB after eatup mem
min_free_kbytes.c:95: TINFO: Setting /proc/sys/vm/min_free_kbytes to 90112
memfree is 1680500 kB before eatup mem   <<<<<<<
min_free_kbytes.c:224: TINFO: MemFree recovered to 93868 (min 98%, avail 0) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 96684 (min 98%, avail 0) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 94712 (min 93%, avail 0) after 1 retries (~10 ms)
memfree is 94760 kB after eatup mem
min_free_kbytes.c:103: TINFO: Setting /proc/sys/vm/min_free_kbytes to 40165
memfree is 1655360 kB before eatup mem
min_free_kbytes.c:224: TINFO: MemFree recovered to 40748 (min 99%, avail 0) after 2 retries (~20 ms)
memfree is 75220 kB after eatup mem
min_free_kbytes.c:89: TINFO: Setting /proc/sys/vm/overcommit_memory to 0
min_free_kbytes.c:93: TINFO: Setting /proc/sys/vm/min_free_kbytes to 45056
memfree is 1727596 kB before eatup mem
min_free_kbytes.c:95: TINFO: Setting /proc/sys/vm/min_free_kbytes to 90112
memfree is 1716960 kB before eatup mem
min_free_kbytes.c:224: TINFO: MemFree recovered to 92648 (min 96%, avail 0) after 4 retries (~40 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 91488 (min 85%, avail 0) after 9 retries (~90 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 90348 (min 69%, avail 0) after 28 retries (~280 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 98264 (min 79%, avail 0) after 27 retries (~270 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 92440 (min 80%, avail 0) after 14 retries (~140 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 91864 (min 72%, avail 0) after 27 retries (~270 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 94816 (min 86%, avail 0) after 4 retries (~40 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 92680 (min 86%, avail 0) after 3 retries (~30 ms)
min_free_kbytes.c:103: TINFO: Setting /proc/sys/vm/min_free_kbytes to 40165
memfree is 1651044 kB before eatup mem
min_free_kbytes.c:224: TINFO: MemFree recovered to 45152 (min 94%, avail 1252) after 4 retries (~40 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 55620 (min 91%, avail 7644) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 46452 (min 72%, avail 0) after 2 retries (~20 ms)
min_free_kbytes.c:89: TINFO: Setting /proc/sys/vm/overcommit_memory to 1
min_free_kbytes.c:93: TINFO: Setting /proc/sys/vm/min_free_kbytes to 45056
memfree is 1703808 kB before eatup mem
min_free_kbytes.c:224: TINFO: MemFree recovered to 53224 (min 87%, avail 11664) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 53500 (min 98%, avail 10556) after 1 retries (~10 ms)
min_free_kbytes.c:95: TINFO: Setting /proc/sys/vm/min_free_kbytes to 90112
memfree is 1664820 kB before eatup mem <<<<<<<<<<<<
min_free_kbytes.c:224: TINFO: MemFree recovered to 91172 (min 68%, avail 0) after 73 retries (~730 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 92996 (min 78%, avail 0) after 14 retries (~140 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 91500 (min 72%, avail 0) after 94 retries (~940 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 92760 (min 91%, avail 0) after 9 retries (~90 ms)
min_free_kbytes.c:103: TINFO: Setting /proc/sys/vm/min_free_kbytes to 40165
memfree is 1654728 kB before eatup mem
min_free_kbytes.c:224: TINFO: MemFree recovered to 41508 (min 96%, avail 0) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 41984 (min 96%, avail 0) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 52068 (min 93%, avail 3480) after 1 retries (~10 ms)
min_free_kbytes.c:224: TINFO: MemFree recovered to 47720 (min 98%, avail 0) after 1 retries (~10 ms)
min_free_kbytes.c:79: TPASS: min_free_kbytes test pass
> 
> Kind regards,
> Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
@ 2026-06-01  6:42   ` linuxtestproject.agent
  2026-06-02  1:00   ` [LTP] [PATCH v4] " Wei Gao via ltp
  1 sibling, 0 replies; 12+ messages in thread
From: linuxtestproject.agent @ 2026-06-01  6:42 UTC (permalink / raw)
  To: Wei Gao; +Cc: ltp

Hi Wei,

Thank you for the patch. A few issues to address before this can be merged.

On Sun, May 31, 2026 at 13:40:47 +0000, Wei Gao <wegao@suse.com> wrote:

> min_free_kbytes: Handle transient memory drops in check_monitor
>
> High memory pressure can cause MemFree to temporarily drop below the
> min_free_kbytes threshold before the kernel reclaimer can catch up.
> This results in intermittent test failures, particularly observed on
> openQA aarch64 machines where swap is exhausted.
>
> Implement a 2-second grace period with high-accuracy 10ms fixed polling
> in check_monitor() to allow the kernel time to reclaim memory.
>
> Introduce a 10% tolerance (90% threshold) for the MemFree check. Our
> measurements showed that under extreme pressure, MemFree can take a
> long time to recover to the exact 100% MIN_FREE_KBYTES, or may stay slightly
> below it. This tolerance prevents false positives and avoids excessive
> wait times while still ensuring memory is maintained near the required level.

The 10% tolerance weakens what the test actually verifies. The commit says
"Our measurements showed..." but no data or bounds are provided. What is the
maximum observed shortfall? If the worst case seen was, say, 3%, a 10%
tolerance is far too loose and may mask real regressions. Please quantify the
measurement and tighten the tolerance to match observed reality, or at least
document the worst-case value that motivated the 10% figure.

> Enhanced diagnostics are added to report MemAvailable and the minimum
> memory level seen during the pressure period to aid in future
> calibration.

"During the pressure period" is inaccurate. `min_memfree` is correctly
tracked inside the retry loop, but `memavail` is read *after* the loop
completes:

>+		memavail = SAFE_READ_MEMINFO("MemAvailable:");
>+
>+		if (memfree < threshold) {
>+			tst_res(TINFO, "tune=%lu, threshold=%lu", tune, threshold);
>+			tst_res(TINFO, "MemFree=%lu, MemAvailable=%lu, MinSeen=%lu (%lu%%)",
>+				memfree, memavail, min_memfree, (min_memfree * 100 / tune));

`memavail` is a snapshot taken at the *end* of the grace period, not during
it. By the time the retry loop finishes the kernel reclaimer may have already
recovered substantially. Either move SAFE_READ_MEMINFO("MemAvailable:") inside
the loop to track it properly (as is done for MemFree), or adjust the commit
message to say "after the grace period" rather than "during the pressure
period".

> @@ -177,20 +177,55 @@ static int eatup_mem(unsigned long overcommit_policy)
>  static void check_monitor(void)
>  {
> -	unsigned long tune;
> -	unsigned long memfree;
> +	unsigned long tune, threshold;
> +	unsigned long memfree, memavail, min_memfree;
> +	int i, retry_count;

`retry_count` is redundant. Within the loop, `retry_count` is always equal
to `i / 10` (e.g. when i=10 retry_count=1, when i=50 retry_count=5). The
else-branch TINFO message reports both, conveying identical information in
two different units. Drop `retry_count` and derive the count from `i / 10`
directly, or just remove it from the diagnostic entirely since the elapsed
time in ms (`i`) is the more useful figure.

> -	sleep(2);
> +	usleep(100000);
>  }

This changes the normal (non-pressure) monitoring interval from 2 seconds to
100 ms, making the monitor sample `/proc/meminfo` and
`/proc/sys/vm/min_free_kbytes` 20x more often during the entire test run.
This is a significant behavioral change that is completely absent from the
commit message. Please document it — including the rationale for choosing
100 ms — so reviewers and bisectors can understand the intent.

---
Verdict: Needs revision

Issues:
  1. [NEEDS REVISION] Undocumented behavioral change: normal poll interval
     2s → 100 ms is not mentioned in the commit message.
  2. [NEEDS REVISION] Commit message inaccuracy: MemAvailable is read after
     the retry loop, not "during the pressure period".
  3. [NEEDS DISCUSSION] 10% tolerance lacks quantitative justification; the
     "Our measurements showed..." claim needs backing data or tighter bounds.
  4. [NEEDS REVISION] `retry_count` is always equal to `i / 10`; the
     variable is redundant and should be removed.

LTP AI Reviewer

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [LTP] [PATCH v4] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
  2026-06-01  6:42   ` [LTP] " linuxtestproject.agent
@ 2026-06-02  1:00   ` Wei Gao via ltp
  2026-06-02  4:02     ` [LTP] " linuxtestproject.agent
  1 sibling, 1 reply; 12+ messages in thread
From: Wei Gao via ltp @ 2026-06-02  1:00 UTC (permalink / raw)
  To: ltp

High memory pressure can cause MemFree to temporarily drop below the
min_free_kbytes threshold before the kernel reclaimer can catch up.
This results in intermittent test failures, observed on openQA aarch64
virtual machines.

Implement a 2-second grace period with high-accuracy 10ms fixed polling
in check_monitor() to allow the kernel time to reclaim memory.

Introduce a 10% tolerance (90% threshold) for the MemFree check. My
measurements showed that under extreme pressure, MemFree can dip as low
as ~50% to ~70% of the target. While it typically recovers above 90%
within one second, hitting the exact 100% watermark sometimes can take
significantly longer. This tolerance prevents false positives during the
slow recovery tail while still ensuring memory is maintained near the
required level.

Also, increase the monitor's idle polling frequency from 2s to 100ms
to improve responsiveness during the test run.

Enhanced diagnostics are added to report the minimum memory level seen
during the pressure period to aid in future calibration.

Signed-off-by: Wei Gao <wegao@suse.com>
---
v3->v4:
- Remove redundant 'retry_count' variable.
- Remove 'MemAvailable' from diagnostics.
- Provide quantitative justification for the 10% tolerance (measured dips to ~50%).

 .../kernel/mem/tunable/min_free_kbytes.c      | 43 ++++++++++++++++---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/testcases/kernel/mem/tunable/min_free_kbytes.c b/testcases/kernel/mem/tunable/min_free_kbytes.c
index 7882c6072..6b6d009dd 100644
--- a/testcases/kernel/mem/tunable/min_free_kbytes.c
+++ b/testcases/kernel/mem/tunable/min_free_kbytes.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /*
- * Copyright (c) Linux Test Project, 2012-2025
+ * Copyright (c) Linux Test Project, 2012-2026
  * Copyright (C) 2012-2017  Red Hat, Inc.
  */
 
@@ -177,20 +177,49 @@ static int eatup_mem(unsigned long overcommit_policy)
 
 static void check_monitor(void)
 {
-	unsigned long tune;
-	unsigned long memfree;
+	unsigned long tune, threshold;
+	unsigned long memfree, min_memfree;
+	int i;
 
 	while (!end) {
 		memfree = SAFE_READ_MEMINFO("MemFree:");
 		tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
+		/*
+		 * Allow 10% tolerance to account for transient states.
+		 */
+		threshold = tune * 9 / 10;
 
 		if (memfree < tune) {
-			tst_res(TINFO, "MemFree is %lu kB, "
-				 "min_free_kbytes is %lu kB", memfree, tune);
-			tst_res(TFAIL, "MemFree < min_free_kbytes");
+			min_memfree = memfree;
+			/*
+			 * Give it some time to reclaim. The kernel should keep
+			 * MemFree above min_free_kbytes, but transient drops
+			 * are possible under high pressure.
+			 * Check every 10ms for up to 2 seconds for high accuracy.
+			 */
+			for (i = 10; i <= 2000; i += 10) {
+				usleep(10000);
+				memfree = SAFE_READ_MEMINFO("MemFree:");
+				if (memfree < min_memfree)
+					min_memfree = memfree;
+
+				if (memfree >= tune)
+					break;
+			}
+
+			if (memfree < threshold) {
+				tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
+					memfree, tune, (min_memfree * 100 / tune));
+			} else if (memfree < tune) {
+				tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
+					memfree, (min_memfree * 100 / tune));
+			} else {
+				tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
+					memfree, (min_memfree * 100 / tune), i);
+			}
 		}
 
-		sleep(2);
+		usleep(100000);
 	}
 }
 
-- 
2.54.0


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-06-02  1:00   ` [LTP] [PATCH v4] " Wei Gao via ltp
@ 2026-06-02  4:02     ` linuxtestproject.agent
  2026-06-02  7:46       ` Wei Gao via ltp
  0 siblings, 1 reply; 12+ messages in thread
From: linuxtestproject.agent @ 2026-06-02  4:02 UTC (permalink / raw)
  To: Wei Gao; +Cc: ltp

Hi Wei,

Thanks for the patch. A few comments below.

> min_free_kbytes: Handle transient memory drops in check_monitor
>
> High memory pressure can cause MemFree to temporarily drop below the
> min_free_kbytes threshold before the kernel reclaimer can catch up.
> This results in intermittent test failures, observed on openQA aarch64
> virtual machines.
>
> Implement a 2-second grace period with high-accuracy 10ms fixed polling
> in check_monitor() to allow the kernel time to reclaim memory.
>
> Introduce a 10% tolerance (90% threshold) for the MemFree check. My
> measurements showed that under extreme pressure, MemFree can dip as low
> as ~50% to ~70% of the target. While it typically recovers above 90%
> within one second, hitting the exact 100% watermark sometimes can take
> significantly longer. This tolerance prevents false positives during the
> slow recovery tail while still ensuring memory is maintained near the
> required level.

Please rewrite in impersonal form, e.g.:
  "Measurements under extreme pressure show MemFree can dip as low as
   ~50% to ~70% of the target."

Commit messages become a permanent part of the project history and
should avoid first-person language.

> -unsigned long tune;
> -unsigned long memfree;
> +unsigned long tune, threshold;
> +unsigned long memfree, min_memfree;
> +int i;
>
>  while (!end) {
>  memfree = SAFE_READ_MEMINFO("MemFree:");
>  tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
> +/*
> + * Allow 10% tolerance to account for transient states.
> + */
> +threshold = tune * 9 / 10;
>
>  if (memfree < tune) {
> -tst_res(TINFO, "MemFree is %lu kB, "
> - "min_free_kbytes is %lu kB", memfree, tune);
> -tst_res(TFAIL, "MemFree < min_free_kbytes");
> +min_memfree = memfree;
> +/*
> + * Give it some time to reclaim. The kernel should keep
> + * MemFree above min_free_kbytes, but transient drops
> + * are possible under high pressure.
> + * Check every 10ms for up to 2 seconds for high accuracy.
> + */
> +for (i = 10; i <= 2000; i += 10) {
> +usleep(10000);
> +memfree = SAFE_READ_MEMINFO("MemFree:");
> +if (memfree < min_memfree)
> +min_memfree = memfree;
> +
> +if (memfree >= tune)
> +break;
> +}

The inner polling loop does not check the 'end' flag, which is set to 1
by the SIGUSR1 handler when the parent finishes. In the original code,
the single sleep(2) at the bottom of the outer loop was interrupted by
the signal (usleep/sleep return EINTR), so the outer while (!end) check
fired promptly. With the new grace period loop, the signal can fire
while we are in any of the 200 inner usleep(10000) calls; the sleep
returns early but the loop body continues, and the outer while (!end)
cannot be reached until the full ~2 seconds elapse.

Please add an early exit:

  for (i = 10; i <= 2000; i += 10) {
      if (end)
          return;
      usleep(10000);
      memfree = SAFE_READ_MEMINFO("MemFree:");
      if (memfree < min_memfree)
          min_memfree = memfree;
      if (memfree >= tune)
          break;
  }

Alternatively a `break` is sufficient if the outer loop check is
reached quickly, but `return` is cleaner given the function only has
one exit point anyway.

> +if (memfree < threshold) {
> +tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
> +memfree, tune, (min_memfree * 100 / tune));
> +} else if (memfree < tune) {
> +tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
> +memfree, (min_memfree * 100 / tune));
> +} else {
> +tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
> +memfree, (min_memfree * 100 / tune), i);
> +}

Minor: the TINFO messages mix kB values and percentage values using the
same format specifier %lu, which can be confusing in the output. The
TFAIL line prints MemFree and tune as raw kB values but labels the third
argument "(MinSeen: N%)" — the intent is clear but adding "kB" units to
the first two values would be more consistent:

  "MemFree %lu kB < 90%% of min_free_kbytes %lu kB (MinSeen: %lu%%) after 2s"

Similarly for the TINFO lines.

---

Verdict: Needs revision

The missing 'end' check in the inner polling loop is a regression: it
removes the prompt termination behaviour of the original sleep(2) and
can delay the test exit by up to 2 seconds after SIGUSR1 is received.
Please fix before merging. The commit message first-person wording and
the missing kB unit labels are minor and can be addressed in the same
respin.

LTP AI Reviewer

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-06-02  4:02     ` [LTP] " linuxtestproject.agent
@ 2026-06-02  7:46       ` Wei Gao via ltp
  2026-06-02 16:07         ` Andrea Cervesato via ltp
  0 siblings, 1 reply; 12+ messages in thread
From: Wei Gao via ltp @ 2026-06-02  7:46 UTC (permalink / raw)
  To: linuxtestproject.agent; +Cc: ltp

On Tue, Jun 02, 2026 at 04:02:55AM +0000, linuxtestproject.agent@gmail.com wrote:
> Hi Wei,
> 
> Thanks for the patch. A few comments below.
> 
> > min_free_kbytes: Handle transient memory drops in check_monitor
> >
> > High memory pressure can cause MemFree to temporarily drop below the
> > min_free_kbytes threshold before the kernel reclaimer can catch up.
> > This results in intermittent test failures, observed on openQA aarch64
> > virtual machines.
> >
> > Implement a 2-second grace period with high-accuracy 10ms fixed polling
> > in check_monitor() to allow the kernel time to reclaim memory.
> >
> > Introduce a 10% tolerance (90% threshold) for the MemFree check. My
> > measurements showed that under extreme pressure, MemFree can dip as low
> > as ~50% to ~70% of the target. While it typically recovers above 90%
> > within one second, hitting the exact 100% watermark sometimes can take
> > significantly longer. This tolerance prevents false positives during the
> > slow recovery tail while still ensuring memory is maintained near the
> > required level.
> 
> Please rewrite in impersonal form, e.g.:
>   "Measurements under extreme pressure show MemFree can dip as low as
>    ~50% to ~70% of the target."
> 
> Commit messages become a permanent part of the project history and
> should avoid first-person language.
The measurement result is quite related with my local env, also is there any ltp
rule forbids the use of "My"?
> 
> > -unsigned long tune;
> > -unsigned long memfree;
> > +unsigned long tune, threshold;
> > +unsigned long memfree, min_memfree;
> > +int i;
> >
> >  while (!end) {
> >  memfree = SAFE_READ_MEMINFO("MemFree:");
> >  tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
> > +/*
> > + * Allow 10% tolerance to account for transient states.
> > + */
> > +threshold = tune * 9 / 10;
> >
> >  if (memfree < tune) {
> > -tst_res(TINFO, "MemFree is %lu kB, "
> > - "min_free_kbytes is %lu kB", memfree, tune);
> > -tst_res(TFAIL, "MemFree < min_free_kbytes");
> > +min_memfree = memfree;
> > +/*
> > + * Give it some time to reclaim. The kernel should keep
> > + * MemFree above min_free_kbytes, but transient drops
> > + * are possible under high pressure.
> > + * Check every 10ms for up to 2 seconds for high accuracy.
> > + */
> > +for (i = 10; i <= 2000; i += 10) {
> > +usleep(10000);
> > +memfree = SAFE_READ_MEMINFO("MemFree:");
> > +if (memfree < min_memfree)
> > +min_memfree = memfree;
> > +
> > +if (memfree >= tune)
> > +break;
> > +}
> 
> The inner polling loop does not check the 'end' flag, which is set to 1
> by the SIGUSR1 handler when the parent finishes. In the original code,
> the single sleep(2) at the bottom of the outer loop was interrupted by
> the signal (usleep/sleep return EINTR), so the outer while (!end) check
> fired promptly. With the new grace period loop, the signal can fire
> while we are in any of the 200 inner usleep(10000) calls; the sleep
> returns early but the loop body continues, and the outer while (!end)
> cannot be reached until the full ~2 seconds elapse.
> 
> Please add an early exit:
> 
>   for (i = 10; i <= 2000; i += 10) {
>       if (end)
>           return;
>       usleep(10000);
>       memfree = SAFE_READ_MEMINFO("MemFree:");
>       if (memfree < min_memfree)
>           min_memfree = memfree;
>       if (memfree >= tune)
>           break;
>   }
> 
> Alternatively a `break` is sufficient if the outer loop check is
> reached quickly, but `return` is cleaner given the function only has
> one exit point anyway.
Completeness is more important than saving 2 seconds of test time, especially 
for the final sub-test. If we jump out early, we might miss the final result of the test.
> 
> > +if (memfree < threshold) {
> > +tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
> > +memfree, tune, (min_memfree * 100 / tune));
> > +} else if (memfree < tune) {
> > +tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
> > +memfree, (min_memfree * 100 / tune));
> > +} else {
> > +tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
> > +memfree, (min_memfree * 100 / tune), i);
> > +}
> 
> Minor: the TINFO messages mix kB values and percentage values using the
> same format specifier %lu, which can be confusing in the output. The
> TFAIL line prints MemFree and tune as raw kB values but labels the third
> argument "(MinSeen: N%)" — the intent is clear but adding "kB" units to
> the first two values would be more consistent:
> 
>   "MemFree %lu kB < 90%% of min_free_kbytes %lu kB (MinSeen: %lu%%) after 2s"
> 
> Similarly for the TINFO lines.
Name is min_free_kbytes and the test context is very clear.

Unless there's a logical error or an LTP rule violation, I won't be sending another patch.
> 
> ---
> 
> Verdict: Needs revision
> 
> The missing 'end' check in the inner polling loop is a regression: it
> removes the prompt termination behaviour of the original sleep(2) and
> can delay the test exit by up to 2 seconds after SIGUSR1 is received.
> Please fix before merging. The commit message first-person wording and
> the missing kB unit labels are minor and can be addressed in the same
> respin.
> 
> LTP AI Reviewer

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-06-02  7:46       ` Wei Gao via ltp
@ 2026-06-02 16:07         ` Andrea Cervesato via ltp
  2026-06-03  3:07           ` Wei Gao via ltp
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Cervesato via ltp @ 2026-06-02 16:07 UTC (permalink / raw)
  To: Wei Gao via ltp; +Cc: ltp, linuxtestproject.agent

Hi Wei,

> > > Introduce a 10% tolerance (90% threshold) for the MemFree check. My
> > > measurements showed that under extreme pressure, MemFree can dip as low
> > > as ~50% to ~70% of the target. While it typically recovers above 90%
> > > within one second, hitting the exact 100% watermark sometimes can take
> > > significantly longer. This tolerance prevents false positives during the
> > > slow recovery tail while still ensuring memory is maintained near the
> > > required level.
> > 
> > Please rewrite in impersonal form, e.g.:
> >   "Measurements under extreme pressure show MemFree can dip as low as
> >    ~50% to ~70% of the target."
> > 
> > Commit messages become a permanent part of the project history and
> > should avoid first-person language.
> The measurement result is quite related with my local env, also is there any ltp
> rule forbids the use of "My"?

Yeah, this is quite common in the commit messages, but it's not a strict
rule don't worry about it.

> > 
> > > -unsigned long tune;
> > > -unsigned long memfree;
> > > +unsigned long tune, threshold;
> > > +unsigned long memfree, min_memfree;
> > > +int i;
> > >
> > >  while (!end) {
> > >  memfree = SAFE_READ_MEMINFO("MemFree:");
> > >  tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
> > > +/*
> > > + * Allow 10% tolerance to account for transient states.
> > > + */
> > > +threshold = tune * 9 / 10;
> > >
> > >  if (memfree < tune) {
> > > -tst_res(TINFO, "MemFree is %lu kB, "
> > > - "min_free_kbytes is %lu kB", memfree, tune);
> > > -tst_res(TFAIL, "MemFree < min_free_kbytes");
> > > +min_memfree = memfree;
> > > +/*
> > > + * Give it some time to reclaim. The kernel should keep
> > > + * MemFree above min_free_kbytes, but transient drops
> > > + * are possible under high pressure.
> > > + * Check every 10ms for up to 2 seconds for high accuracy.
> > > + */
> > > +for (i = 10; i <= 2000; i += 10) {
> > > +usleep(10000);
> > > +memfree = SAFE_READ_MEMINFO("MemFree:");
> > > +if (memfree < min_memfree)
> > > +min_memfree = memfree;
> > > +
> > > +if (memfree >= tune)
> > > +break;
> > > +}
> > 
> > The inner polling loop does not check the 'end' flag, which is set to 1
> > by the SIGUSR1 handler when the parent finishes. In the original code,
> > the single sleep(2) at the bottom of the outer loop was interrupted by
> > the signal (usleep/sleep return EINTR), so the outer while (!end) check
> > fired promptly. With the new grace period loop, the signal can fire
> > while we are in any of the 200 inner usleep(10000) calls; the sleep
> > returns early but the loop body continues, and the outer while (!end)
> > cannot be reached until the full ~2 seconds elapse.
> > 
> > Please add an early exit:
> > 
> >   for (i = 10; i <= 2000; i += 10) {
> >       if (end)
> >           return;
> >       usleep(10000);
> >       memfree = SAFE_READ_MEMINFO("MemFree:");
> >       if (memfree < min_memfree)
> >           min_memfree = memfree;
> >       if (memfree >= tune)
> >           break;
> >   }
> > 
> > Alternatively a `break` is sufficient if the outer loop check is
> > reached quickly, but `return` is cleaner given the function only has
> > one exit point anyway.
> Completeness is more important than saving 2 seconds of test time, especially 
> for the final sub-test. If we jump out early, we might miss the final result of the test.

I'm not sure about this one. In general, once we have a result we
should return as fast as possible, so we can save time while running
other tests. 2 seconds might not seem enough, but when we sum up
various tests with the same logic, we might have a big delay.

> > 
> > > +if (memfree < threshold) {
> > > +tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
> > > +memfree, tune, (min_memfree * 100 / tune));
> > > +} else if (memfree < tune) {
> > > +tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
> > > +memfree, (min_memfree * 100 / tune));
> > > +} else {
> > > +tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
> > > +memfree, (min_memfree * 100 / tune), i);
> > > +}
> > 
> > Minor: the TINFO messages mix kB values and percentage values using the
> > same format specifier %lu, which can be confusing in the output. The
> > TFAIL line prints MemFree and tune as raw kB values but labels the third
> > argument "(MinSeen: N%)" — the intent is clear but adding "kB" units to
> > the first two values would be more consistent:
> > 
> >   "MemFree %lu kB < 90%% of min_free_kbytes %lu kB (MinSeen: %lu%%) after 2s"
> > 
> > Similarly for the TINFO lines.
> Name is min_free_kbytes and the test context is very clear.
> 
> Unless there's a logical error or an LTP rule violation, I won't be sending another patch.

I think the agent complains that the output message is not easy
to debug: we are printing values without the units. I would just
add MemFree unit `kB` in this case. The rest of the message looks
ok.

Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
  2026-06-02 16:07         ` Andrea Cervesato via ltp
@ 2026-06-03  3:07           ` Wei Gao via ltp
  0 siblings, 0 replies; 12+ messages in thread
From: Wei Gao via ltp @ 2026-06-03  3:07 UTC (permalink / raw)
  To: Andrea Cervesato; +Cc: Wei Gao via ltp, linuxtestproject.agent

On Tue, Jun 02, 2026 at 04:07:02PM +0000, Andrea Cervesato via ltp wrote:
> Hi Wei,
> 
> > > > Introduce a 10% tolerance (90% threshold) for the MemFree check. My
> > > > measurements showed that under extreme pressure, MemFree can dip as low
> > > > as ~50% to ~70% of the target. While it typically recovers above 90%
> > > > within one second, hitting the exact 100% watermark sometimes can take
> > > > significantly longer. This tolerance prevents false positives during the
> > > > slow recovery tail while still ensuring memory is maintained near the
> > > > required level.
> > > 
> > > Please rewrite in impersonal form, e.g.:
> > >   "Measurements under extreme pressure show MemFree can dip as low as
> > >    ~50% to ~70% of the target."
> > > 
> > > Commit messages become a permanent part of the project history and
> > > should avoid first-person language.
> > The measurement result is quite related with my local env, also is there any ltp
> > rule forbids the use of "My"?
> 
> Yeah, this is quite common in the commit messages, but it's not a strict
> rule don't worry about it.
> 
> > > 
> > > > -unsigned long tune;
> > > > -unsigned long memfree;
> > > > +unsigned long tune, threshold;
> > > > +unsigned long memfree, min_memfree;
> > > > +int i;
> > > >
> > > >  while (!end) {
> > > >  memfree = SAFE_READ_MEMINFO("MemFree:");
> > > >  tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
> > > > +/*
> > > > + * Allow 10% tolerance to account for transient states.
> > > > + */
> > > > +threshold = tune * 9 / 10;
> > > >
> > > >  if (memfree < tune) {
> > > > -tst_res(TINFO, "MemFree is %lu kB, "
> > > > - "min_free_kbytes is %lu kB", memfree, tune);
> > > > -tst_res(TFAIL, "MemFree < min_free_kbytes");
> > > > +min_memfree = memfree;
> > > > +/*
> > > > + * Give it some time to reclaim. The kernel should keep
> > > > + * MemFree above min_free_kbytes, but transient drops
> > > > + * are possible under high pressure.
> > > > + * Check every 10ms for up to 2 seconds for high accuracy.
> > > > + */
> > > > +for (i = 10; i <= 2000; i += 10) {
> > > > +usleep(10000);
> > > > +memfree = SAFE_READ_MEMINFO("MemFree:");
> > > > +if (memfree < min_memfree)
> > > > +min_memfree = memfree;
> > > > +
> > > > +if (memfree >= tune)
> > > > +break;
> > > > +}
> > > 
> > > The inner polling loop does not check the 'end' flag, which is set to 1
> > > by the SIGUSR1 handler when the parent finishes. In the original code,
> > > the single sleep(2) at the bottom of the outer loop was interrupted by
> > > the signal (usleep/sleep return EINTR), so the outer while (!end) check
> > > fired promptly. With the new grace period loop, the signal can fire
> > > while we are in any of the 200 inner usleep(10000) calls; the sleep
> > > returns early but the loop body continues, and the outer while (!end)
> > > cannot be reached until the full ~2 seconds elapse.
> > > 
> > > Please add an early exit:
> > > 
> > >   for (i = 10; i <= 2000; i += 10) {
> > >       if (end)
> > >           return;
> > >       usleep(10000);
> > >       memfree = SAFE_READ_MEMINFO("MemFree:");
> > >       if (memfree < min_memfree)
> > >           min_memfree = memfree;
> > >       if (memfree >= tune)
> > >           break;
> > >   }
> > > 
> > > Alternatively a `break` is sufficient if the outer loop check is
> > > reached quickly, but `return` is cleaner given the function only has
> > > one exit point anyway.
> > Completeness is more important than saving 2 seconds of test time, especially 
> > for the final sub-test. If we jump out early, we might miss the final result of the test.
> 
> I'm not sure about this one. In general, once we have a result we
> should return as fast as possible, so we can save time while running
> other tests. 2 seconds might not seem enough, but when we sum up
> various tests with the same logic, we might have a big delay.

Since end is only set to 1 by SIGUSR1 at the very end of the entire program, 
adding end check to the inner loop would only save time during the final teardown, 
the maximum possible time saved is just 2 seconds, also we start take a real risk: 
If a memory drop is active at the very end of the test, and we exit early because 
end == 1, we miss the final result (completeness).

> 
> > > 
> > > > +if (memfree < threshold) {
> > > > +tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
> > > > +memfree, tune, (min_memfree * 100 / tune));
> > > > +} else if (memfree < tune) {
> > > > +tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
> > > > +memfree, (min_memfree * 100 / tune));
> > > > +} else {
> > > > +tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
> > > > +memfree, (min_memfree * 100 / tune), i);
> > > > +}
> > > 
> > > Minor: the TINFO messages mix kB values and percentage values using the
> > > same format specifier %lu, which can be confusing in the output. The
> > > TFAIL line prints MemFree and tune as raw kB values but labels the third
> > > argument "(MinSeen: N%)" — the intent is clear but adding "kB" units to
> > > the first two values would be more consistent:
> > > 
> > >   "MemFree %lu kB < 90%% of min_free_kbytes %lu kB (MinSeen: %lu%%) after 2s"
> > > 
> > > Similarly for the TINFO lines.
> > Name is min_free_kbytes and the test context is very clear.
> > 
> > Unless there's a logical error or an LTP rule violation, I won't be sending another patch.
> 
> I think the agent complains that the output message is not easy
> to debug: we are printing values without the units. I would just
> add MemFree unit `kB` in this case. The rest of the message looks
> ok.
> 
> Regards,
> --
> Andrea Cervesato
> SUSE QE Automation Engineer Linux
> andrea.cervesato@suse.com
> 
> -- 
> Mailing list info: https://lists.linux.it/listinfo/ltp

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-03  3:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27  5:08 [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim Wei Gao via ltp
2026-05-27  5:31 ` [LTP] " linuxtestproject.agent
2026-05-27 15:40 ` [LTP] [PATCH v2] " Cyril Hrubis
2026-05-29 16:07   ` Petr Vorel
2026-05-31 13:51     ` Wei Gao via ltp
2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
2026-06-01  6:42   ` [LTP] " linuxtestproject.agent
2026-06-02  1:00   ` [LTP] [PATCH v4] " Wei Gao via ltp
2026-06-02  4:02     ` [LTP] " linuxtestproject.agent
2026-06-02  7:46       ` Wei Gao via ltp
2026-06-02 16:07         ` Andrea Cervesato via ltp
2026-06-03  3:07           ` Wei Gao via ltp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.