* [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures
@ 2025-05-02 1:04 Waiman Long
2025-05-02 1:04 ` [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Waiman Long @ 2025-05-02 1:04 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
Shuah Khan
Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long
v8:
- Ignore the low event count of child 2 with memory_recursiveprot on
in patch 1 as originally suggested by Michal.
v7:
- Skip the vmscan change as the mem_cgroup_usage() check for now as
it is currently redundant.
v6:
- The memcg_test_low failure is indeed due to the memory_recursiveprot
mount option which is enabled by default in systemd cgroup v2 setting.
So adopt Michal's suggestion to adjust the low event checking
according to whether memory_recursiveprot is enabled or not.
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test (with memory_recursiveprot enabled) and sporadically fails
its test_memcg_min sub-test. This patchset fixes the test_memcg_min
and test_memcg_low failures by adjusting the test_memcontrol selftest
to fix these test failures.
Waiman Long (2):
selftests: memcg: Allow low event with no memory.low and
memory_recursiveprot on
selftests: memcg: Increase error tolerance of child memory.current
check in test_memcg_protection()
.../selftests/cgroup/test_memcontrol.c | 22 ++++++++++++++-----
1 file changed, 16 insertions(+), 6 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
2025-05-02 1:04 [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
@ 2025-05-02 1:04 ` Waiman Long
2025-05-02 9:43 ` Michal Koutný
2025-05-02 1:04 ` [PATCH v8 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
2025-05-02 18:39 ` [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Tejun Heo
2 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2025-05-02 1:04 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
Shuah Khan
Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that its 3rd test child cgroup which
have a memmory.low of 0 have low event count. This happens when
memory_recursiveprot mount option is enabled which is the default
setting used by systemd to mount cgroup2 filesystem.
This issue was originally fixed by commit cdc69458a5f3 ("cgroup:
account for memory_recursiveprot in test_memcg_low()"). It was later
reverted by commit 1d09069f5313 ("selftests: memcg: expect no low events
in unprotected sibling") expecting the memory reclaim code would be
fixed. However, it turns out the unprotected cgroup may still have some
residual effective memory.low protection depending on the memory.low
settings in its parent and its siblings. As a result, low events may
still be triggered.
One way to fix the test failure is to revert the revert commit. However,
Michal suggested that it might be better to ignore the low event count
with memory_recursiveprot enabled as low event may or may not happen
depending on the actual test configuration.
Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup
with memory_recursiveprot on.
The 4th child cgroup has no memory usage and so has an effective
low of 0. It has no low event count because the mem_cgroup_below_low()
check in shrink_node_memcgs() is skipped as mem_cgroup_below_min()
returns true. If we ever change mem_cgroup_below_min() in such a way
that it no longer skips the no usage case, we will have to add code to
explicitly skip it.
With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
.../testing/selftests/cgroup/test_memcontrol.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..58602c1831f1 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -380,10 +380,11 @@ static bool reclaim_until(const char *memcg, long goal);
*
* Then it checks actual memory usages and expects that:
* A/B memory.current ~= 50M
- * A/B/C memory.current ~= 29M
- * A/B/D memory.current ~= 21M
- * A/B/E memory.current ~= 0
- * A/B/F memory.current = 0
+ * A/B/C memory.current ~= 29M [memory.events:low > 0]
+ * A/B/D memory.current ~= 21M [memory.events:low > 0]
+ * A/B/E memory.current ~= 0 [memory.events:low == 0 if !memory_recursiveprot,
+ * undefined otherwise]
+ * A/B/F memory.current = 0 [memory.events:low == 0]
* (for origin of the numbers, see model in memcg_protection.m.)
*
* After that it tries to allocate more than there is
@@ -525,7 +526,14 @@ static int test_memcg_protection(const char *root, bool min)
goto cleanup;
}
+ /*
+ * Child 2 has memory.low=0, but some low protection may still be
+ * distributed down from its parent with memory.low=50M if cgroup2
+ * memory_recursiveprot mount option is enabled. Ignore the low
+ * event count in this case.
+ */
for (i = 0; i < ARRAY_SIZE(children); i++) {
+ int ignore_low_events_index = has_recursiveprot ? 2 : -1;
int no_low_events_index = 1;
long low, oom;
@@ -534,6 +542,8 @@ static int test_memcg_protection(const char *root, bool min)
if (oom)
goto cleanup;
+ if (i == ignore_low_events_index)
+ continue;
if (i <= no_low_events_index && low <= 0)
goto cleanup;
if (i > no_low_events_index && low)
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v8 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()
2025-05-02 1:04 [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
2025-05-02 1:04 ` [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
@ 2025-05-02 1:04 ` Waiman Long
2025-05-02 18:39 ` [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Tejun Heo
2 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-05-02 1:04 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
Shuah Khan
Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:
parent: memory.min/low = 50M
child 0: memory.min/low = 75M, memory.current = 50M
child 1: memory.min/low = 25M, memory.current = 50M
child 2: memory.min/low = 0, memory.current = 50M
After applying memory pressure, the function expects the following
actual memory usages.
parent: memory.current ~= 50M
child 0: memory.current ~= 29M
child 1: memory.current ~= 21M
child 2: memory.current ~= 0
In reality, the actual memory usages can differ quite a bit from the
expected values. It uses an error tolerance of 10% with the values_close()
helper.
Both the test_memcg_min and test_memcg_low sub-tests can fail
sporadically because the actual memory usage exceeds the 10% error
tolerance. Below are a sample of the usage data of the tests runs
that fail.
Child Actual usage Expected usage %err
----- ------------ -------------- ----
1 16990208 22020096 -12.9%
1 17252352 22020096 -12.1%
0 37699584 30408704 +10.7%
1 14368768 22020096 -21.0%
1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory
reclaim have certainly evolved quite a bit since then which may result
in a bit more run-to-run variation than previously expected.
Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure. The tolerance is bigger
for child 1 because an upswing in child 0 corresponds to a smaller
%err than a similar downswing in child 1 due to the way %err is used
in values_close().
Before this patch, a 100 test runs of test_memcontrol produced the
following results:
17 not ok 1 test_memcg_min
22 not ok 2 test_memcg_low
After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs. However, these tests may still fail
once in a while if the memory usage goes beyond the newly extended range.
Signed-off-by: Waiman Long <longman@redhat.com>
---
tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 58602c1831f1..d6534d7301a2 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -496,10 +496,10 @@ static int test_memcg_protection(const char *root, bool min)
for (i = 0; i < ARRAY_SIZE(children); i++)
c[i] = cg_read_long(children[i], "memory.current");
- if (!values_close(c[0], MB(29), 10))
+ if (!values_close(c[0], MB(29), 15))
goto cleanup;
- if (!values_close(c[1], MB(21), 10))
+ if (!values_close(c[1], MB(21), 20))
goto cleanup;
if (c[3] != 0)
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
2025-05-02 1:04 ` [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
@ 2025-05-02 9:43 ` Michal Koutný
0 siblings, 0 replies; 5+ messages in thread
From: Michal Koutný @ 2025-05-02 9:43 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
cgroups, linux-mm, linux-kselftest
[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]
On Thu, May 01, 2025 at 09:04:42PM -0400, Waiman Long <longman@redhat.com> wrote:
> Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup
> with memory_recursiveprot on.
>
> The 4th child cgroup has no memory usage and so has an effective
> low of 0. It has no low event count because the mem_cgroup_below_low()
> check in shrink_node_memcgs() is skipped as mem_cgroup_below_min()
> returns true. If we ever change mem_cgroup_below_min() in such a way
> that it no longer skips the no usage case, we will have to add code to
> explicitly skip it.
>
> With this patch applied, the test_memcg_low sub-test finishes
> successfully without failure in most cases. Though both test_memcg_low
> and test_memcg_min sub-tests may still fail occasionally if the
> memory.current values fall outside of the expected ranges.
>
> Suggested-by: Michal Koutný <mkoutny@suse.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
(Thank you. Not sure if this can be both with Suggested-by, so either of
them alone is fine by me.)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures
2025-05-02 1:04 [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
2025-05-02 1:04 ` [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
2025-05-02 1:04 ` [PATCH v8 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
@ 2025-05-02 18:39 ` Tejun Heo
2 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2025-05-02 18:39 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On Thu, May 01, 2025 at 09:04:41PM -0400, Waiman Long wrote:
> v8:
> - Ignore the low event count of child 2 with memory_recursiveprot on
> in patch 1 as originally suggested by Michal.
>
> v7:
> - Skip the vmscan change as the mem_cgroup_usage() check for now as
> it is currently redundant.
>
> v6:
> - The memcg_test_low failure is indeed due to the memory_recursiveprot
> mount option which is enabled by default in systemd cgroup v2 setting.
> So adopt Michal's suggestion to adjust the low event checking
> according to whether memory_recursiveprot is enabled or not.
>
> The test_memcontrol selftest consistently fails its test_memcg_low
> sub-test (with memory_recursiveprot enabled) and sporadically fails
> its test_memcg_min sub-test. This patchset fixes the test_memcg_min
> and test_memcg_low failures by adjusting the test_memcontrol selftest
> to fix these test failures.
>
> Waiman Long (2):
> selftests: memcg: Allow low event with no memory.low and
> memory_recursiveprot on
> selftests: memcg: Increase error tolerance of child memory.current
> check in test_memcg_protection()
Acked-by: Tejun Heo <tj@kernel.org>
Probably best to go through -mm? If cgroup would be better, please let me
know.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-02 18:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-02 1:04 [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
2025-05-02 1:04 ` [PATCH v8 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
2025-05-02 9:43 ` Michal Koutný
2025-05-02 1:04 ` [PATCH v8 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
2025-05-02 18:39 ` [PATCH v8 0/2] memcg: Fix test_memcg_min/low test failures Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).