All of lore.kernel.org
 help / color / mirror / Atom feed
From: linuxtestproject.agent@gmail.com
To: Wei Gao <wegao@suse.com>
Cc: ltp@lists.linux.it
Subject: Re: [LTP] min_free_kbytes: Handle transient memory drops in check_monitor
Date: Mon,  1 Jun 2026 06:42:20 +0000	[thread overview]
Message-ID: <20260601064220.4028-1-linuxtestproject.agent@gmail.com> (raw)
In-Reply-To: <20260531134250.25279-1-wegao@suse.com>

Hi Wei,

Thank you for the patch. A few issues to address before this can be merged.

On Sun, May 31, 2026 at 13:40:47 +0000, Wei Gao <wegao@suse.com> wrote:

> min_free_kbytes: Handle transient memory drops in check_monitor
>
> High memory pressure can cause MemFree to temporarily drop below the
> min_free_kbytes threshold before the kernel reclaimer can catch up.
> This results in intermittent test failures, particularly observed on
> openQA aarch64 machines where swap is exhausted.
>
> Implement a 2-second grace period with high-accuracy 10ms fixed polling
> in check_monitor() to allow the kernel time to reclaim memory.
>
> Introduce a 10% tolerance (90% threshold) for the MemFree check. Our
> measurements showed that under extreme pressure, MemFree can take a
> long time to recover to the exact 100% MIN_FREE_KBYTES, or may stay slightly
> below it. This tolerance prevents false positives and avoids excessive
> wait times while still ensuring memory is maintained near the required level.

The 10% tolerance weakens what the test actually verifies. The commit says
"Our measurements showed..." but no data or bounds are provided. What is the
maximum observed shortfall? If the worst case seen was, say, 3%, a 10%
tolerance is far too loose and may mask real regressions. Please quantify the
measurement and tighten the tolerance to match observed reality, or at least
document the worst-case value that motivated the 10% figure.

> Enhanced diagnostics are added to report MemAvailable and the minimum
> memory level seen during the pressure period to aid in future
> calibration.

"During the pressure period" is inaccurate. `min_memfree` is correctly
tracked inside the retry loop, but `memavail` is read *after* the loop
completes:

>+		memavail = SAFE_READ_MEMINFO("MemAvailable:");
>+
>+		if (memfree < threshold) {
>+			tst_res(TINFO, "tune=%lu, threshold=%lu", tune, threshold);
>+			tst_res(TINFO, "MemFree=%lu, MemAvailable=%lu, MinSeen=%lu (%lu%%)",
>+				memfree, memavail, min_memfree, (min_memfree * 100 / tune));

`memavail` is a snapshot taken at the *end* of the grace period, not during
it. By the time the retry loop finishes the kernel reclaimer may have already
recovered substantially. Either move SAFE_READ_MEMINFO("MemAvailable:") inside
the loop to track it properly (as is done for MemFree), or adjust the commit
message to say "after the grace period" rather than "during the pressure
period".

> @@ -177,20 +177,55 @@ static int eatup_mem(unsigned long overcommit_policy)
>  static void check_monitor(void)
>  {
> -	unsigned long tune;
> -	unsigned long memfree;
> +	unsigned long tune, threshold;
> +	unsigned long memfree, memavail, min_memfree;
> +	int i, retry_count;

`retry_count` is redundant. Within the loop, `retry_count` is always equal
to `i / 10` (e.g. when i=10 retry_count=1, when i=50 retry_count=5). The
else-branch TINFO message reports both, conveying identical information in
two different units. Drop `retry_count` and derive the count from `i / 10`
directly, or just remove it from the diagnostic entirely since the elapsed
time in ms (`i`) is the more useful figure.

> -	sleep(2);
> +	usleep(100000);
>  }

This changes the normal (non-pressure) monitoring interval from 2 seconds to
100 ms, making the monitor sample `/proc/meminfo` and
`/proc/sys/vm/min_free_kbytes` 20x more often during the entire test run.
This is a significant behavioral change that is completely absent from the
commit message. Please document it — including the rationale for choosing
100 ms — so reviewers and bisectors can understand the intent.

---
Verdict: Needs revision

Issues:
  1. [NEEDS REVISION] Undocumented behavioral change: normal poll interval
     2s → 100 ms is not mentioned in the commit message.
  2. [NEEDS REVISION] Commit message inaccuracy: MemAvailable is read after
     the retry loop, not "during the pressure period".
  3. [NEEDS DISCUSSION] 10% tolerance lacks quantitative justification; the
     "Our measurements showed..." claim needs backing data or tighter bounds.
  4. [NEEDS REVISION] `retry_count` is always equal to `i / 10`; the
     variable is redundant and should be removed.

LTP AI Reviewer

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2026-06-01  6:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27  5:08 [LTP] [PATCH v2] mem/min_free_kbytes: Add grace period for memory reclaim Wei Gao via ltp
2026-05-27  5:31 ` [LTP] " linuxtestproject.agent
2026-05-27 15:40 ` [LTP] [PATCH v2] " Cyril Hrubis
2026-05-29 16:07   ` Petr Vorel
2026-05-31 13:51     ` Wei Gao via ltp
2026-05-31 13:40 ` [LTP] [PATCH v3] min_free_kbytes: Handle transient memory drops in check_monitor Wei Gao via ltp
2026-06-01  6:42   ` linuxtestproject.agent [this message]
2026-06-02  1:00   ` [LTP] [PATCH v4] " Wei Gao via ltp
2026-06-02  4:02     ` [LTP] " linuxtestproject.agent
2026-06-02  7:46       ` Wei Gao via ltp
2026-06-02 16:07         ` Andrea Cervesato via ltp
2026-06-03  3:07           ` Wei Gao via ltp
  -- strict thread matches above, loose matches on Subject: below --
2026-05-26  8:36 [LTP] [PATCH v1] " Wei Gao via ltp
2026-05-26 15:29 ` [LTP] " linuxtestproject.agent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260601064220.4028-1-linuxtestproject.agent@gmail.com \
    --to=linuxtestproject.agent@gmail.com \
    --cc=ltp@lists.linux.it \
    --cc=wegao@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.