* [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
@ 2025-04-04 1:24 Waiman Long
2025-04-04 1:24 ` [PATCH v2 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Waiman Long @ 2025-04-04 1:24 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
Shuah Khan
Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.
The two failed use cases are as follows:
1) memory.low is set to 0, but low events can still be triggered and
so the cgroup may have a non-zero low event count. I doubt users are
looking for that as they didn't set memory.low at all.
2) memory.low is set to a non-zero value but the cgroup has no task in
it so that it has an effective low value of 0. Again it may have a
non-zero low event count if memory reclaim happens. This is probably
not a result expected by the users and it is really doubtful that
users will check an empty cgroup with no task in it and expecting
some non-zero event counts.
The simple and naive fix of changing the operator to ">", however,
changes the memory reclaim behavior which can lead to other failures
as low events are needed to facilitate memory reclaim. So we can't do
that without some relatively riskier changes in memory reclaim.
Another simpler alternative is to avoid reporting below_low failure
if either memory.low or its effective equivalent is 0 which is done
by this patch specifically for the two failed use cases above.
With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.
To be consistent, similar change is appled to mem_cgroup_below_min()
as to avoid the two failed use cases above with low replaced by min.
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/memcontrol.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 53364526d877..4d4a1f159eaa 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -601,21 +601,31 @@ static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long elow;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.elow) >=
- page_counter_read(&memcg->memory);
+ elow = READ_ONCE(memcg->memory.elow);
+ if (!elow || !READ_ONCE(memcg->memory.low))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= elow;
}
static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long emin;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.emin) >=
- page_counter_read(&memcg->memory);
+ emin = READ_ONCE(memcg->memory.emin);
+ if (!emin || !READ_ONCE(memcg->memory.min))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= emin;
}
int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()
2025-04-04 1:24 [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Waiman Long
@ 2025-04-04 1:24 ` Waiman Long
2025-04-04 17:12 ` [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Tejun Heo
2025-04-04 18:26 ` Michal Koutný
2 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-04-04 1:24 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
Shuah Khan
Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:
parent: memory.min/low = 50M
child 0: memory.min/low = 75M, memory.current = 50M
child 1: memory.min/low = 25M, memory.current = 50M
child 2: memory.min/low = 0, memory.current = 50M
After applying memory pressure, the function expects the following
actual memory usages.
parent: memory.current ~= 50M
child 0: memory.current ~= 29M
child 1: memory.current ~= 21M
child 2: memory.current ~= 0
In reality, the actual memory usages can differ quite a bit from the
expected values. It uses an error tolerance of 10% with the values_close()
helper.
Both the test_memcg_min and test_memcg_low sub-tests can fail
sporadically because the actual memory usage exceeds the 10% error
tolerance. Below are a sample of the usage data of the tests runs
that fail.
Child Actual usage Expected usage %err
----- ------------ -------------- ----
1 16990208 22020096 -12.9%
1 17252352 22020096 -12.1%
0 37699584 30408704 +10.7%
1 14368768 22020096 -21.0%
1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory
reclaim have certainly evolved quite a bit since then which may result
in a bit more run-to-run variation than previously expected.
Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure. The tolerance is bigger
for child 1 because an upswing in child 0 corresponds to a smaller
%err than a similar downswing in child 1 due to the way %err is used
in values_close().
Before this patch, a 100 test runs of test_memcontrol produced the
following results:
19 not ok 3 test_memcg_min
13 not ok 4 test_memcg_low
After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs.
Signed-off-by: Waiman Long <longman@redhat.com>
---
tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..f442c0c3f5a7 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -495,10 +495,10 @@ static int test_memcg_protection(const char *root, bool min)
for (i = 0; i < ARRAY_SIZE(children); i++)
c[i] = cg_read_long(children[i], "memory.current");
- if (!values_close(c[0], MB(29), 10))
+ if (!values_close(c[0], MB(29), 15))
goto cleanup;
- if (!values_close(c[1], MB(21), 10))
+ if (!values_close(c[1], MB(21), 20))
goto cleanup;
if (c[3] != 0)
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 1:24 [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Waiman Long
2025-04-04 1:24 ` [PATCH v2 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
@ 2025-04-04 17:12 ` Tejun Heo
2025-04-04 17:25 ` Waiman Long
2025-04-04 18:26 ` Michal Koutný
2 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2025-04-04 17:12 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
Hello,
On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
...
> The simple and naive fix of changing the operator to ">", however,
> changes the memory reclaim behavior which can lead to other failures
> as low events are needed to facilitate memory reclaim. So we can't do
> that without some relatively riskier changes in memory reclaim.
I'm doubtful using ">" would change reclaim behavior in a meaningful way and
that'd be more straightforward. What do mm people think?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 17:12 ` [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Tejun Heo
@ 2025-04-04 17:25 ` Waiman Long
2025-04-04 18:13 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-04-04 17:25 UTC (permalink / raw)
To: Tejun Heo
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On 4/4/25 1:12 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
> ...
>> The simple and naive fix of changing the operator to ">", however,
>> changes the memory reclaim behavior which can lead to other failures
>> as low events are needed to facilitate memory reclaim. So we can't do
>> that without some relatively riskier changes in memory reclaim.
> I'm doubtful using ">" would change reclaim behavior in a meaningful way and
> that'd be more straightforward. What do mm people think?
I haven't looked deeply into why that is the case, but
test_memcg_low/min tests had other failures when I made this change.
Cheers,
Longman
>
> Thanks.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 17:25 ` Waiman Long
@ 2025-04-04 18:13 ` Johannes Weiner
2025-04-04 18:55 ` Waiman Long
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2025-04-04 18:13 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On Fri, Apr 04, 2025 at 01:25:33PM -0400, Waiman Long wrote:
>
> On 4/4/25 1:12 PM, Tejun Heo wrote:
> > Hello,
> >
> > On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
> > ...
> >> The simple and naive fix of changing the operator to ">", however,
> >> changes the memory reclaim behavior which can lead to other failures
> >> as low events are needed to facilitate memory reclaim. So we can't do
> >> that without some relatively riskier changes in memory reclaim.
> > I'm doubtful using ">" would change reclaim behavior in a meaningful way and
> > that'd be more straightforward. What do mm people think?
The knob documentation uses "within low" and "above low" to
distinguish whether you are protected or not, so at least from a code
clarity pov, >= makes more sense to me: if your protection is N and
you use exactly N, you're considered protected.
That also means that by definition an empty cgroup is protected. It's
not in excess of its protection. The test result isn't wrong.
The real weirdness is issuing a "low reclaim" event when no reclaim is
going to happen*.
The patch effectively special cases "empty means in excess" to avoid
the event and fall through to reclaim, which then does nothing as a
result of its own scan target calculations. That seems convoluted.
Why not skip empty cgroups before running inapplicable checks?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..260ab238ec22 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5963,6 +5963,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
mem_cgroup_calculate_protection(target_memcg, memcg);
+ if (!mem_cgroup_usage(memcg, false))
+ continue;
+
if (mem_cgroup_below_min(target_memcg, memcg)) {
/*
* Hard protection.
> I haven't looked deeply into why that is the case, but
> test_memcg_low/min tests had other failures when I made this change.
It surprises me as well that it makes any practical difference.
* Waiman points out that the weirdness is seeing low events without
having a low configured. Eh, this isn't really true with recursive
propagation; you may or may not have an elow depending on parental
configuration and sibling behavior.
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 1:24 [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Waiman Long
2025-04-04 1:24 ` [PATCH v2 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
2025-04-04 17:12 ` [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Tejun Heo
@ 2025-04-04 18:26 ` Michal Koutný
2025-04-04 19:01 ` Waiman Long
2 siblings, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2025-04-04 18:26 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
cgroups, linux-mm, linux-kselftest
[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]
Hello Waiman.
On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long <longman@redhat.com> wrote:
> 1) memory.low is set to 0, but low events can still be triggered and
> so the cgroup may have a non-zero low event count. I doubt users are
> looking for that as they didn't set memory.low at all.
I agree with this reasoning, been there [1] but fix ain't easy (also
consensus of whether such an event should count or not and whether
reclaim should happen or not). (See also [2] where I had tried other
approaches that _didn't_ work.)
> 2) memory.low is set to a non-zero value but the cgroup has no task in
> it so that it has an effective low value of 0.
There maybe page cache remaining in the cgroup even with not present
task inside it.
> Again it may have a non-zero low event count if memory reclaim
> happens. This is probably not a result expected by the users and it
> is really doubtful that users will check an empty cgroup with no
> task in it and expecting some non-zero event counts.
Well, if memory.current > 0, some reclaim events can be justified and
thus expected (e.g. by me).
> The simple and naive fix of changing the operator to ">", however,
> changes the memory reclaim behavior which can lead to other failures
> as low events are needed to facilitate memory reclaim. So we can't do
> that without some relatively riskier changes in memory reclaim.
>
> Another simpler alternative is to avoid reporting below_low failure
> if either memory.low or its effective equivalent is 0 which is done
> by this patch specifically for the two failed use cases above.
Admittedly, I haven't seen any complaints from real world about these
events except for this test (which was ported from selftests to LTP
too).
> With this patch applied, the test_memcg_low sub-test finishes
> successfully without failure in most cases.
I'd say the simplest solution to make the test pass without figuring out
what semantics of low events should be correct is not to check the
memory.events:low at all with memory_recursiveprot (this is what was
done in the cloned LTP test).
Michal
[1] https://lore.kernel.org/all/20220322182248.29121-1-mkoutny@suse.com/
[2] https://bugzilla.suse.com/show_bug.cgi?id=1196298
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 18:13 ` Johannes Weiner
@ 2025-04-04 18:55 ` Waiman Long
2025-04-04 19:38 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-04-04 18:55 UTC (permalink / raw)
To: Johannes Weiner, Waiman Long
Cc: Tejun Heo, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On 4/4/25 2:13 PM, Johannes Weiner wrote:
> On Fri, Apr 04, 2025 at 01:25:33PM -0400, Waiman Long wrote:
>> On 4/4/25 1:12 PM, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
>>> ...
>>>> The simple and naive fix of changing the operator to ">", however,
>>>> changes the memory reclaim behavior which can lead to other failures
>>>> as low events are needed to facilitate memory reclaim. So we can't do
>>>> that without some relatively riskier changes in memory reclaim.
>>> I'm doubtful using ">" would change reclaim behavior in a meaningful way and
>>> that'd be more straightforward. What do mm people think?
> The knob documentation uses "within low" and "above low" to
> distinguish whether you are protected or not, so at least from a code
> clarity pov, >= makes more sense to me: if your protection is N and
> you use exactly N, you're considered protected.
>
> That also means that by definition an empty cgroup is protected. It's
> not in excess of its protection. The test result isn't wrong.
>
> The real weirdness is issuing a "low reclaim" event when no reclaim is
> going to happen*.
>
> The patch effectively special cases "empty means in excess" to avoid
> the event and fall through to reclaim, which then does nothing as a
> result of its own scan target calculations. That seems convoluted.
>
> Why not skip empty cgroups before running inapplicable checks?
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b620d74b0f66..260ab238ec22 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -5963,6 +5963,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>
> mem_cgroup_calculate_protection(target_memcg, memcg);
>
> + if (!mem_cgroup_usage(memcg, false))
> + continue;
> +
> if (mem_cgroup_below_min(target_memcg, memcg)) {
> /*
> * Hard protection.
Yes, that should take care of the memcg with no task case.
>
>> I haven't looked deeply into why that is the case, but
>> test_memcg_low/min tests had other failures when I made this change.
> It surprises me as well that it makes any practical difference.
I looked at it again and failure is the same expected memory.current
check in test_memcontrol. If I remove the equal sign, I got errors like:
values_close: child 0 = 8339456, 29MB = 30408704
failed with err = 21
not ok 1 test_memcg_min
So the test is expecting memory.current to have around 29MB, but it got
a lot less (~8MB) in this case. Before removing the equality sign, I
usually got about 25 MB and above for child 0. That is a pretty big
change in behavior, so I didn't make it.
>
> * Waiman points out that the weirdness is seeing low events without
> having a low configured. Eh, this isn't really true with recursive
> propagation; you may or may not have an elow depending on parental
> configuration and sibling behavior.
>
Do you mind if we just don't update the low event count if low isn't
set, but leave the rest the same like
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 91721c8862c3..48a8bfa7d337 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -659,21 +659,25 @@ static inline bool mem_cgroup_unprotected(struct
mem_cgro>
static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long elow;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.elow) >=
- page_counter_read(&memcg->memory);
+ elow = READ_ONCE(memcg->memory.elow);
+ return elow && (page_counter_read(&memcg->memory) <= elow);
}
static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long emin;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.emin) >=
- page_counter_read(&memcg->memory);
+ emin = READ_ONCE(memcg->memory.emin);
+ return emin && (page_counter_read(&memcg->memory) <= emin);
}
void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup
*memcg);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 77d015d5db0c..e8c1838c7962 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4827,7 +4827,8 @@ static int shrink_one(struct lruvec *lruvec,
struct scan_>
if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL)
return MEMCG_LRU_TAIL;
- memcg_memory_event(memcg, MEMCG_LOW);
+ if (memcg->memory.low)
+ memcg_memory_event(memcg, MEMCG_LOW);
}
success = try_to_shrink_lruvec(lruvec, sc);
@@ -5902,6 +5903,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat,
struct s>
mem_cgroup_calculate_protection(target_memcg, memcg);
+ if (!mem_cgroup_usage(memcg, false))
+ continue;
+
if (mem_cgroup_below_min(target_memcg, memcg)) {
/*
* Hard protection.
@@ -5919,7 +5923,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat,
struct s>
sc->memcg_low_skipped = 1;
continue;
}
- memcg_memory_event(memcg, MEMCG_LOW);
+ if (memcg->memory.low)
+ memcg_memory_event(memcg, MEMCG_LOW);
}
reclaimed = sc->nr_reclaimed;
Cheers,
Longman
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 18:26 ` Michal Koutný
@ 2025-04-04 19:01 ` Waiman Long
0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-04-04 19:01 UTC (permalink / raw)
To: Michal Koutný
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
cgroups, linux-mm, linux-kselftest
On 4/4/25 2:26 PM, Michal Koutný wrote:
> Hello Waiman.
>
> On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long <longman@redhat.com> wrote:
>> 1) memory.low is set to 0, but low events can still be triggered and
>> so the cgroup may have a non-zero low event count. I doubt users are
>> looking for that as they didn't set memory.low at all.
> I agree with this reasoning, been there [1] but fix ain't easy (also
> consensus of whether such an event should count or not and whether
> reclaim should happen or not). (See also [2] where I had tried other
> approaches that _didn't_ work.)
>
>> 2) memory.low is set to a non-zero value but the cgroup has no task in
>> it so that it has an effective low value of 0.
> There maybe page cache remaining in the cgroup even with not present
> task inside it.
For the test_memcontrol case, a cgroup is created but no task has
already been moved into it. So the memory usage is 0. I agree that if a
task has ever lived in the cgroup, the usage will not be 0. In that case
memory reclaim is certainly justified.
>> Again it may have a non-zero low event count if memory reclaim
>> happens. This is probably not a result expected by the users and it
>> is really doubtful that users will check an empty cgroup with no
>> task in it and expecting some non-zero event counts.
> Well, if memory.current > 0, some reclaim events can be justified and
> thus expected (e.g. by me).
>
>> The simple and naive fix of changing the operator to ">", however,
>> changes the memory reclaim behavior which can lead to other failures
>> as low events are needed to facilitate memory reclaim. So we can't do
>> that without some relatively riskier changes in memory reclaim.
>>
>> Another simpler alternative is to avoid reporting below_low failure
>> if either memory.low or its effective equivalent is 0 which is done
>> by this patch specifically for the two failed use cases above.
> Admittedly, I haven't seen any complaints from real world about these
> events except for this test (which was ported from selftests to LTP
> too).
>
>> With this patch applied, the test_memcg_low sub-test finishes
>> successfully without failure in most cases.
> I'd say the simplest solution to make the test pass without figuring out
> what semantics of low events should be correct is not to check the
> memory.events:low at all with memory_recursiveprot (this is what was
> done in the cloned LTP test).
Another alternative is to modify the test to allow non-zero event count
even if low is not set.
Cheers,
Longman
>
> Michal
>
> [1] https://lore.kernel.org/all/20220322182248.29121-1-mkoutny@suse.com/
> [2] https://bugzilla.suse.com/show_bug.cgi?id=1196298
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 18:55 ` Waiman Long
@ 2025-04-04 19:38 ` Johannes Weiner
2025-04-05 18:52 ` Waiman Long
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2025-04-04 19:38 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On Fri, Apr 04, 2025 at 02:55:35PM -0400, Waiman Long wrote:
> On 4/4/25 2:13 PM, Johannes Weiner wrote:
> > * Waiman points out that the weirdness is seeing low events without
> > having a low configured. Eh, this isn't really true with recursive
> > propagation; you may or may not have an elow depending on parental
> > configuration and sibling behavior.
> >
> Do you mind if we just don't update the low event count if low isn't
> set, but leave the rest the same like
What's the motivation for doing anything beyond the skip-on-!usage?
> @@ -659,21 +659,25 @@ static inline bool mem_cgroup_unprotected(struct
> mem_cgro>
> static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
> struct mem_cgroup *memcg)
> {
> + unsigned long elow;
> +
> if (mem_cgroup_unprotected(target, memcg))
> return false;
>
> - return READ_ONCE(memcg->memory.elow) >=
> - page_counter_read(&memcg->memory);
> + elow = READ_ONCE(memcg->memory.elow);
> + return elow && (page_counter_read(&memcg->memory) <= elow);
> }
>
> static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
> struct mem_cgroup *memcg)
> {
> + unsigned long emin;
> +
> if (mem_cgroup_unprotected(target, memcg))
> return false;
>
> - return READ_ONCE(memcg->memory.emin) >=
> - page_counter_read(&memcg->memory);
> + emin = READ_ONCE(memcg->memory.emin);
> + return emin && (page_counter_read(&memcg->memory) <= emin);
> }
This still redefines the empty case to mean excess. That's a quirk I
would have liked to avoid. I don't see why you would need it?
> @@ -5919,7 +5923,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat,
> struct s>
> sc->memcg_low_skipped = 1;
> continue;
> }
> - memcg_memory_event(memcg, MEMCG_LOW);
> + if (memcg->memory.low)
> + memcg_memory_event(memcg, MEMCG_LOW);
That's not right. In setups where protection comes from the parent, no
breaches would ever be counted.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0
2025-04-04 19:38 ` Johannes Weiner
@ 2025-04-05 18:52 ` Waiman Long
0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-04-05 18:52 UTC (permalink / raw)
To: Johannes Weiner, Waiman Long
Cc: Tejun Heo, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Michal Koutný, Shuah Khan,
linux-kernel, cgroups, linux-mm, linux-kselftest
On 4/4/25 3:38 PM, Johannes Weiner wrote:
> On Fri, Apr 04, 2025 at 02:55:35PM -0400, Waiman Long wrote:
>> On 4/4/25 2:13 PM, Johannes Weiner wrote:
>>> * Waiman points out that the weirdness is seeing low events without
>>> having a low configured. Eh, this isn't really true with recursive
>>> propagation; you may or may not have an elow depending on parental
>>> configuration and sibling behavior.
>>>
>> Do you mind if we just don't update the low event count if low isn't
>> set, but leave the rest the same like
> What's the motivation for doing anything beyond the skip-on-!usage?
It is to avoid making further change. I am fine with modifying the test
to allow low event even when low isn't set.
>> @@ -659,21 +659,25 @@ static inline bool mem_cgroup_unprotected(struct
>> mem_cgro>
>> static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
>> struct mem_cgroup *memcg)
>> {
>> + unsigned long elow;
>> +
>> if (mem_cgroup_unprotected(target, memcg))
>> return false;
>>
>> - return READ_ONCE(memcg->memory.elow) >=
>> - page_counter_read(&memcg->memory);
>> + elow = READ_ONCE(memcg->memory.elow);
>> + return elow && (page_counter_read(&memcg->memory) <= elow);
>> }
>>
>> static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
>> struct mem_cgroup *memcg)
>> {
>> + unsigned long emin;
>> +
>> if (mem_cgroup_unprotected(target, memcg))
>> return false;
>>
>> - return READ_ONCE(memcg->memory.emin) >=
>> - page_counter_read(&memcg->memory);
>> + emin = READ_ONCE(memcg->memory.emin);
>> + return emin && (page_counter_read(&memcg->memory) <= emin);
>> }
> This still redefines the empty case to mean excess. That's a quirk I
> would have liked to avoid. I don't see why you would need it?
OK, I will drop that.
>
>> @@ -5919,7 +5923,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat,
>> struct s>
>> sc->memcg_low_skipped = 1;
>> continue;
>> }
>> - memcg_memory_event(memcg, MEMCG_LOW);
>> + if (memcg->memory.low)
>> + memcg_memory_event(memcg, MEMCG_LOW);
> That's not right. In setups where protection comes from the parent, no
> breaches would ever be counted.
OK. Will post a v3 to incorporate your suggestion.
Thanks,
Longman
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-04-05 18:52 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 1:24 [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Waiman Long
2025-04-04 1:24 ` [PATCH v2 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
2025-04-04 17:12 ` [PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 Tejun Heo
2025-04-04 17:25 ` Waiman Long
2025-04-04 18:13 ` Johannes Weiner
2025-04-04 18:55 ` Waiman Long
2025-04-04 19:38 ` Johannes Weiner
2025-04-05 18:52 ` Waiman Long
2025-04-04 18:26 ` Michal Koutný
2025-04-04 19:01 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).