* Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 8:57 ` [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Michal Hocko
@ 2012-10-11 9:13 ` Michal Hocko
2012-10-11 12:20 ` Johannes Weiner
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2012-10-11 9:13 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Thu 11-10-12 10:57:39, Michal Hocko wrote:
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller than
> the available swap because the used memory (capped by the limit) is
> negligible comparing to totalpages so the resulting score is too small
> if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> A wrong process might be selected as result.
>
> The same issue exists for the global oom killer as well but it is not
> that problematic as the amount of the RAM is usually much bigger than
> the swap space.
>
> The problem can be worked around by checking mem_cgroup_swappiness==0
> and not considering swap at all in such a case.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: stable [3.5+]
I have just realized that fe35004f (introduced in 3.5-rc1) has been
backported to 3.2 and 3.4 stable kernels so this should be [3.2+]
> ---
> mm/memcontrol.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7acf43b..93a7e36 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1452,17 +1452,26 @@ static int mem_cgroup_count_children(struct mem_cgroup *memcg)
> static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
> {
> u64 limit;
> - u64 memsw;
>
> limit = res_counter_read_u64(&memcg->res, RES_LIMIT);
> - limit += total_swap_pages << PAGE_SHIFT;
>
> - memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
> /*
> - * If memsw is finite and limits the amount of swap space available
> - * to this memcg, return that limit.
> + * Do not consider swap space if we cannot swap due to swappiness
> */
> - return min(limit, memsw);
> + if (mem_cgroup_swappiness(memcg)) {
> + u64 memsw;
> +
> + limit += total_swap_pages << PAGE_SHIFT;
> + memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
> +
> + /*
> + * If memsw is finite and limits the amount of swap space
> + * available to this memcg, return that limit.
> + */
> + limit = min(limit, memsw);
> + }
> +
> + return limit;
> }
>
> void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> --
> 1.7.10.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 8:57 ` [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Michal Hocko
2012-10-11 9:13 ` Michal Hocko
@ 2012-10-11 12:20 ` Johannes Weiner
2012-10-12 13:01 ` Michal Hocko
2012-10-11 22:36 ` KOSAKI Motohiro
2012-10-15 22:04 ` [PATCH v2] " Michal Hocko
3 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2012-10-11 12:20 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, David Rientjes, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, LKML
On Thu, Oct 11, 2012 at 10:57:39AM +0200, Michal Hocko wrote:
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller than
> the available swap because the used memory (capped by the limit) is
> negligible comparing to totalpages so the resulting score is too small
> if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> A wrong process might be selected as result.
>
> The same issue exists for the global oom killer as well but it is not
> that problematic as the amount of the RAM is usually much bigger than
> the swap space.
>
> The problem can be worked around by checking mem_cgroup_swappiness==0
> and not considering swap at all in such a case.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: stable [3.5+]
I also don't think it's hackish, the limit depends very much on
whether reclaim can swap, so it's natural that swappiness shows up
here.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 12:20 ` Johannes Weiner
@ 2012-10-12 13:01 ` Michal Hocko
0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2012-10-12 13:01 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, linux-mm, David Rientjes, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, LKML
On Thu 11-10-12 08:20:37, Johannes Weiner wrote:
> On Thu, Oct 11, 2012 at 10:57:39AM +0200, Michal Hocko wrote:
> > oom_badness takes totalpages argument which says how many pages are
> > available and it uses it as a base for the score calculation. The value
> > is calculated by mem_cgroup_get_limit which considers both limit and
> > total_swap_pages (resp. memsw portion of it).
> >
> > This is usually correct but since fe35004f (mm: avoid swapping out
> > with swappiness==0) we do not swap when swappiness is 0 which means
> > that we cannot really use up all the totalpages pages. This in turn
> > confuses oom score calculation if the memcg limit is much smaller than
> > the available swap because the used memory (capped by the limit) is
> > negligible comparing to totalpages so the resulting score is too small
> > if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> > A wrong process might be selected as result.
> >
> > The same issue exists for the global oom killer as well but it is not
> > that problematic as the amount of the RAM is usually much bigger than
> > the swap space.
> >
> > The problem can be worked around by checking mem_cgroup_swappiness==0
> > and not considering swap at all in such a case.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Cc: stable [3.5+]
>
> I also don't think it's hackish, the limit depends very much on
> whether reclaim can swap, so it's natural that swappiness shows up
> here.
OK, maybe I was just a bit over sensitive here. The other reason I
didn't like it is that swappiness might change over time we some of the
tasks could be already swapped out. oom_score already considers
MM_SWAPENTS but this just tells the number of swapped out ptes not the
pages. So we could still kill something that is resident with much
smaller memory footprint. But this is a different issue and probably a
corner case.
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Thanks
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 8:57 ` [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Michal Hocko
2012-10-11 9:13 ` Michal Hocko
2012-10-11 12:20 ` Johannes Weiner
@ 2012-10-11 22:36 ` KOSAKI Motohiro
2012-10-12 13:01 ` Michal Hocko
2012-10-15 22:04 ` [PATCH v2] " Michal Hocko
3 siblings, 1 reply; 22+ messages in thread
From: KOSAKI Motohiro @ 2012-10-11 22:36 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, David Rientjes, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Thu, Oct 11, 2012 at 4:57 AM, Michal Hocko <mhocko@suse.cz> wrote:
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller than
> the available swap because the used memory (capped by the limit) is
> negligible comparing to totalpages so the resulting score is too small
> if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> A wrong process might be selected as result.
>
> The same issue exists for the global oom killer as well but it is not
> that problematic as the amount of the RAM is usually much bigger than
> the swap space.
>
> The problem can be worked around by checking mem_cgroup_swappiness==0
> and not considering swap at all in such a case.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: stable [3.5+]
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 22:36 ` KOSAKI Motohiro
@ 2012-10-12 13:01 ` Michal Hocko
0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2012-10-12 13:01 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux-mm, David Rientjes, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Thu 11-10-12 18:36:26, KOSAKI Motohiro wrote:
> On Thu, Oct 11, 2012 at 4:57 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > oom_badness takes totalpages argument which says how many pages are
> > available and it uses it as a base for the score calculation. The value
> > is calculated by mem_cgroup_get_limit which considers both limit and
> > total_swap_pages (resp. memsw portion of it).
> >
> > This is usually correct but since fe35004f (mm: avoid swapping out
> > with swappiness==0) we do not swap when swappiness is 0 which means
> > that we cannot really use up all the totalpages pages. This in turn
> > confuses oom score calculation if the memcg limit is much smaller than
> > the available swap because the used memory (capped by the limit) is
> > negligible comparing to totalpages so the resulting score is too small
> > if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> > A wrong process might be selected as result.
> >
> > The same issue exists for the global oom killer as well but it is not
> > that problematic as the amount of the RAM is usually much bigger than
> > the swap space.
> >
> > The problem can be worked around by checking mem_cgroup_swappiness==0
> > and not considering swap at all in such a case.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Cc: stable [3.5+]
>
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-11 8:57 ` [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Michal Hocko
` (2 preceding siblings ...)
2012-10-11 22:36 ` KOSAKI Motohiro
@ 2012-10-15 22:04 ` Michal Hocko
2012-10-15 22:07 ` [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0 Michal Hocko
2012-11-07 22:10 ` [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0 Andrew Morton
3 siblings, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2012-10-15 22:04 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
As Kosaki correctly pointed out, the glogal reclaim doesn't have this
issue because we _do_ swap on swappinnes==0 so the swap space has
to be considered. So the v2 is just acks + changelog fix.
Changes since v1
- drop a note about global swappiness affected as well from the
changelog
- stable needs 3.2+ rather than 3.5+ because the fe35004f has been
backported to stable
---
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0
2012-10-15 22:04 ` [PATCH v2] " Michal Hocko
@ 2012-10-15 22:07 ` Michal Hocko
2012-10-16 0:51 ` Kamezawa Hiroyuki
2012-10-16 0:54 ` David Rientjes
2012-11-07 22:10 ` [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0 Andrew Morton
1 sibling, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2012-10-15 22:07 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
And a follow up for memcg.swappiness documentation which is more
specific about spwappiness==0 meaning.
---
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0
2012-10-15 22:07 ` [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0 Michal Hocko
@ 2012-10-16 0:51 ` Kamezawa Hiroyuki
2012-10-16 0:54 ` David Rientjes
1 sibling, 0 replies; 22+ messages in thread
From: Kamezawa Hiroyuki @ 2012-10-16 0:51 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, David Rientjes, KOSAKI Motohiro,
Johannes Weiner, LKML
(2012/10/16 7:07), Michal Hocko wrote:
> And a follow up for memcg.swappiness documentation which is more
> specific about spwappiness==0 meaning.
> ---
> From 1bc3a94fea728107ed108edd42df464b908cd067 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Mon, 15 Oct 2012 11:43:56 +0200
> Subject: [PATCH] doc: describe memcg swappiness more precisely
>
> since fe35004f (mm: avoid swapping out with swappiness==0) memcg reclaim
> stopped swapping out anon pages completely when 0 value is used.
> Although this is somehow expected it hasn't been done for a really long
> time this way and so it is probably better to be explicit about the
> effect. Moreover global reclaim swapps out even when swappiness is 0
> to prevent from OOM killer.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Nice :)
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> Documentation/cgroups/memory.txt | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index c07f7b4..71c4da4 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -466,6 +466,10 @@ Note:
> 5.3 swappiness
>
> Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
> +Please note that unlike the global swappiness, memcg knob set to 0
> +really prevents from any swapping even if there is a swap storage
> +available. This might lead to memcg OOM killer if there are no file
> +pages to reclaim.
>
> Following cgroups' swappiness can't be changed.
> - root cgroup (uses /proc/sys/vm/swappiness).
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0
2012-10-15 22:07 ` [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0 Michal Hocko
2012-10-16 0:51 ` Kamezawa Hiroyuki
@ 2012-10-16 0:54 ` David Rientjes
1 sibling, 0 replies; 22+ messages in thread
From: David Rientjes @ 2012-10-16 0:54 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Tue, 16 Oct 2012, Michal Hocko wrote:
> And a follow up for memcg.swappiness documentation which is more
> specific about spwappiness==0 meaning.
> ---
> From 1bc3a94fea728107ed108edd42df464b908cd067 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Mon, 15 Oct 2012 11:43:56 +0200
> Subject: [PATCH] doc: describe memcg swappiness more precisely
>
> since fe35004f (mm: avoid swapping out with swappiness==0) memcg reclaim
> stopped swapping out anon pages completely when 0 value is used.
> Although this is somehow expected it hasn't been done for a really long
> time this way and so it is probably better to be explicit about the
> effect. Moreover global reclaim swapps out even when swappiness is 0
> to prevent from OOM killer.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-10-15 22:04 ` [PATCH v2] " Michal Hocko
2012-10-15 22:07 ` [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0 Michal Hocko
@ 2012-11-07 22:10 ` Andrew Morton
2012-11-07 22:46 ` Michal Hocko
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2012-11-07 22:10 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Tue, 16 Oct 2012 00:04:08 +0200
Michal Hocko <mhocko@suse.cz> wrote:
> As Kosaki correctly pointed out, the glogal reclaim doesn't have this
> issue because we _do_ swap on swappinnes==0 so the swap space has
> to be considered. So the v2 is just acks + changelog fix.
>
> Changes since v1
> - drop a note about global swappiness affected as well from the
> changelog
> - stable needs 3.2+ rather than 3.5+ because the fe35004f has been
> backported to stable
> ---
> >From c2ae4849f09dbfda6b61472c6dd1fd8c2fe8ac81 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 10 Oct 2012 15:46:54 +0200
> Subject: [PATCH] memcg: oom: fix totalpages calculation for
> memory.swappiness==0
>
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller than
> the available swap because the used memory (capped by the limit) is
> negligible comparing to totalpages so the resulting score is too small
> if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> A wrong process might be selected as result.
>
> The problem can be worked around by checking mem_cgroup_swappiness==0
> and not considering swap at all in such a case.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: stable [3.2+]
That's "Cc: <stable@vger.kernel.org>", please.
It's unobvious from the changelog that a -stable backport is really
needed. The bug looks pretty obscure and has been there for a long
time. Realistically, is anyone likely to hurt from this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-11-07 22:10 ` [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0 Andrew Morton
@ 2012-11-07 22:46 ` Michal Hocko
2012-11-07 22:53 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2012-11-07 22:46 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Wed 07-11-12 14:10:25, Andrew Morton wrote:
> On Tue, 16 Oct 2012 00:04:08 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
>
> > As Kosaki correctly pointed out, the glogal reclaim doesn't have this
> > issue because we _do_ swap on swappinnes==0 so the swap space has
> > to be considered. So the v2 is just acks + changelog fix.
> >
> > Changes since v1
> > - drop a note about global swappiness affected as well from the
> > changelog
> > - stable needs 3.2+ rather than 3.5+ because the fe35004f has been
> > backported to stable
> > ---
> > >From c2ae4849f09dbfda6b61472c6dd1fd8c2fe8ac81 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Wed, 10 Oct 2012 15:46:54 +0200
> > Subject: [PATCH] memcg: oom: fix totalpages calculation for
> > memory.swappiness==0
> >
> > oom_badness takes totalpages argument which says how many pages are
> > available and it uses it as a base for the score calculation. The value
> > is calculated by mem_cgroup_get_limit which considers both limit and
> > total_swap_pages (resp. memsw portion of it).
> >
> > This is usually correct but since fe35004f (mm: avoid swapping out
> > with swappiness==0) we do not swap when swappiness is 0 which means
> > that we cannot really use up all the totalpages pages. This in turn
> > confuses oom score calculation if the memcg limit is much smaller than
> > the available swap because the used memory (capped by the limit) is
> > negligible comparing to totalpages so the resulting score is too small
> > if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
> > A wrong process might be selected as result.
> >
> > The problem can be worked around by checking mem_cgroup_swappiness==0
> > and not considering swap at all in such a case.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: stable [3.2+]
>
> That's "Cc: <stable@vger.kernel.org>", please.
Will do next time.
> It's unobvious from the changelog that a -stable backport is really
> needed. The bug looks pretty obscure and has been there for a long
> time.
Yes but it is not _that_ long since fe35004f made it into stable trees
(e.g. 3.2.29).
The reason why we probably do not see many reports is because people
didn't get used to swappiness==0 really works these days - especially
with memcg where it means _really_ no swapping.
> Realistically, is anyone likely to hurt from this?
The primary motivation for the fix was a real report by a customer.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-11-07 22:46 ` Michal Hocko
@ 2012-11-07 22:53 ` Andrew Morton
2012-11-08 8:35 ` Michal Hocko
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2012-11-07 22:53 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Wed, 7 Nov 2012 23:46:40 +0100
Michal Hocko <mhocko@suse.cz> wrote:
> > Realistically, is anyone likely to hurt from this?
>
> The primary motivation for the fix was a real report by a customer.
Describe it please and I'll copy it to the changelog.
So that Greg can understand why we sent this at him and so that others
who hit the same problem (or any other problem!) will be able to
determine whether this patch might solve it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
2012-11-07 22:53 ` Andrew Morton
@ 2012-11-08 8:35 ` Michal Hocko
0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2012-11-08 8:35 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, David Rientjes, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Johannes Weiner, LKML
On Wed 07-11-12 14:53:40, Andrew Morton wrote:
> On Wed, 7 Nov 2012 23:46:40 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
>
> > > Realistically, is anyone likely to hurt from this?
> >
> > The primary motivation for the fix was a real report by a customer.
>
> Describe it please and I'll copy it to the changelog.
The original issue (a wrong tasks get killed in a small group and memcg
swappiness=0) has been reported on top of our 3.0 based kernel (with
fe35004f backported). I have tried to replicate it by the test case
mentioned https://lkml.org/lkml/2012/10/10/223.
As David correctly pointed out (https://lkml.org/lkml/2012/10/10/418)
the significant role played the fact that all the processes in the group
have CAP_SYS_ADMIN but oom_score_adj has the similar effect.
Say there is 2G of swap space which is 524288 pages. If you add
CAP_SYS_ADMIN bonus then you have -15728 score for the bias. This means
that all tasks with less than 60M get the minimum score and it is tasks
ordering which determines who gets killed as a result.
To summarize it. Users of small groups (relatively to the swap size)
with CAP_SYS_ADMIN tasks resp. oom_score_adj are affected the most
others might see an unexpected oom_badness calculation.
Whether this is a workload which is representative, I don't know but
I think that it is worth fixing and pushing to stable as well.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread