All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] memcg: oom: fix totalpages calculation for swappiness==0
Date: Mon, 15 Oct 2012 18:11:24 +0900	[thread overview]
Message-ID: <507BD33C.4030209@jp.fujitsu.com> (raw)
In-Reply-To: <20121010141142.GG23011@dhcp22.suse.cz>

(2012/10/10 23:11), Michal Hocko wrote:
> Hi,
> I am sending the patch below as an RFC because I am not entirely happy
> about myself and maybe somebody can come up with a different approach
> which would be less hackish.
> As a background, I have noticed that memcg OOM killer kills a wrong
> tasks while playing with memory.swappiness==0 in a small group (e.g.
> 50M). I have multiple anon mem eaters which fault in more than the hard
> limit. OOM killer kills the last executed task:
>
> # mem_eater spawns one process per parameter, mmaps the given size and
> # faults memory in in parallel (all of them are synced to start together)
> ./mem_eater anon:50M anon:20M anon:20M anon:20M
> 10571: anon_eater for 20971520B
> 10570: anon_eater for 52428800B
> 10573: anon_eater for 20971520B
> 10572: anon_eater for 20971520B
> 10573: done with status 9
> 10571: done with status 0
> 10572: done with status 9
> 10570: done with status 9
>
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815     4118      15        0             0 mem_eater
> [10571]     0 10571     6135     5140      16        0             0 mem_eater
> [10572]     0 10572     6135       22       7        0             0 mem_eater
> [10573]     0 10573     6135     3541      14        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10573 (mem_eater) score 0 or sacrifice child
> Killed process 10573 (mem_eater) total-vm:24540kB, anon-rss:14028kB, file-rss:136kB
> [...]
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815    10267      27        0             0 mem_eater
> [10572]     0 10572     6135     2519      12        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10572 (mem_eater) score 0 or sacrifice child
> Killed process 10572 (mem_eater) total-vm:24540kB, anon-rss:9940kB, file-rss:136kB
> [...]
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815    12773      31        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10570 (mem_eater) score 2 or sacrifice child
> Killed process 10570 (mem_eater) total-vm:55260kB, anon-rss:50956kB, file-rss:136kB
>
> As you can see 50M (pid:10570) is killed as the last one while 20M ones
> are killed first. See the patch for more details about the problem.
> As I state in the changelog the very same issue is present in the global
> oom killer as well but it is much less probable as the amount of swap is
> usualy much smaller than the available RAM and I think it is not worth
> considering.
>
> ---
>  From 445c2ced957cd77cbfca44d0e3f5056fed252a34 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 10 Oct 2012 15:46:54 +0200
> Subject: [PATCH] memcg: oom: fix totalpages calculation for swappiness==0
>
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller
> than the available swap because the used memory (capped by the limit)
> is negligible comparing to totalpages so the resulting score is too
> small. A wrong process might be selected as result.
>
> The same issue exists for the global oom killer as well but it is not
> that problematic as the amount of the RAM is usually much bigger than
> the swap space.
>
> The problem can be worked around by checking swappiness==0 and not
> considering swap at all.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>@jp.fujitsu.com>

Hm...where should we describe this behavior....
Documentation/cgroup/memory.txt "5.3 swappiness" ?

Anyway, the patch itself seems good.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] memcg: oom: fix totalpages calculation for swappiness==0
Date: Mon, 15 Oct 2012 18:11:24 +0900	[thread overview]
Message-ID: <507BD33C.4030209@jp.fujitsu.com> (raw)
In-Reply-To: <20121010141142.GG23011@dhcp22.suse.cz>

(2012/10/10 23:11), Michal Hocko wrote:
> Hi,
> I am sending the patch below as an RFC because I am not entirely happy
> about myself and maybe somebody can come up with a different approach
> which would be less hackish.
> As a background, I have noticed that memcg OOM killer kills a wrong
> tasks while playing with memory.swappiness==0 in a small group (e.g.
> 50M). I have multiple anon mem eaters which fault in more than the hard
> limit. OOM killer kills the last executed task:
>
> # mem_eater spawns one process per parameter, mmaps the given size and
> # faults memory in in parallel (all of them are synced to start together)
> ./mem_eater anon:50M anon:20M anon:20M anon:20M
> 10571: anon_eater for 20971520B
> 10570: anon_eater for 52428800B
> 10573: anon_eater for 20971520B
> 10572: anon_eater for 20971520B
> 10573: done with status 9
> 10571: done with status 0
> 10572: done with status 9
> 10570: done with status 9
>
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815     4118      15        0             0 mem_eater
> [10571]     0 10571     6135     5140      16        0             0 mem_eater
> [10572]     0 10572     6135       22       7        0             0 mem_eater
> [10573]     0 10573     6135     3541      14        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10573 (mem_eater) score 0 or sacrifice child
> Killed process 10573 (mem_eater) total-vm:24540kB, anon-rss:14028kB, file-rss:136kB
> [...]
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815    10267      27        0             0 mem_eater
> [10572]     0 10572     6135     2519      12        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10572 (mem_eater) score 0 or sacrifice child
> Killed process 10572 (mem_eater) total-vm:24540kB, anon-rss:9940kB, file-rss:136kB
> [...]
> [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> [ 5706]     0  5706     4955      556      13        0             0 bash
> [10569]     0 10569     1015      134       6        0             0 mem_eater
> [10570]     0 10570    13815    12773      31        0             0 mem_eater
> Memory cgroup out of memory: Kill process 10570 (mem_eater) score 2 or sacrifice child
> Killed process 10570 (mem_eater) total-vm:55260kB, anon-rss:50956kB, file-rss:136kB
>
> As you can see 50M (pid:10570) is killed as the last one while 20M ones
> are killed first. See the patch for more details about the problem.
> As I state in the changelog the very same issue is present in the global
> oom killer as well but it is much less probable as the amount of swap is
> usualy much smaller than the available RAM and I think it is not worth
> considering.
>
> ---
>  From 445c2ced957cd77cbfca44d0e3f5056fed252a34 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 10 Oct 2012 15:46:54 +0200
> Subject: [PATCH] memcg: oom: fix totalpages calculation for swappiness==0
>
> oom_badness takes totalpages argument which says how many pages are
> available and it uses it as a base for the score calculation. The value
> is calculated by mem_cgroup_get_limit which considers both limit and
> total_swap_pages (resp. memsw portion of it).
>
> This is usually correct but since fe35004f (mm: avoid swapping out
> with swappiness==0) we do not swap when swappiness is 0 which means
> that we cannot really use up all the totalpages pages. This in turn
> confuses oom score calculation if the memcg limit is much smaller
> than the available swap because the used memory (capped by the limit)
> is negligible comparing to totalpages so the resulting score is too
> small. A wrong process might be selected as result.
>
> The same issue exists for the global oom killer as well but it is not
> that problematic as the amount of the RAM is usually much bigger than
> the swap space.
>
> The problem can be worked around by checking swappiness==0 and not
> considering swap at all.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>@jp.fujitsu.com>

Hm...where should we describe this behavior....
Documentation/cgroup/memory.txt "5.3 swappiness" ?

Anyway, the patch itself seems good.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



  parent reply	other threads:[~2012-10-15  9:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-10 14:11 [RFC PATCH] memcg: oom: fix totalpages calculation for swappiness==0 Michal Hocko
2012-10-10 14:11 ` Michal Hocko
2012-10-10 20:50 ` David Rientjes
2012-10-10 20:50   ` David Rientjes
2012-10-11  8:50   ` Michal Hocko
2012-10-11  8:50     ` Michal Hocko
2012-10-11  8:57     ` [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Michal Hocko
2012-10-11  8:57       ` Michal Hocko
2012-10-11  9:13       ` Michal Hocko
2012-10-11  9:13         ` Michal Hocko
2012-10-11 12:20       ` Johannes Weiner
2012-10-11 12:20         ` Johannes Weiner
2012-10-12 13:01         ` Michal Hocko
2012-10-12 13:01           ` Michal Hocko
2012-10-11 22:36       ` KOSAKI Motohiro
2012-10-11 22:36         ` KOSAKI Motohiro
2012-10-12 13:01         ` Michal Hocko
2012-10-12 13:01           ` Michal Hocko
2012-10-15 22:04       ` [PATCH v2] " Michal Hocko
2012-10-15 22:04         ` Michal Hocko
2012-10-15 22:07         ` [PATCH] doc: describe memcg swappiness more precisely memory.swappiness==0 Michal Hocko
2012-10-15 22:07           ` Michal Hocko
2012-10-16  0:51           ` Kamezawa Hiroyuki
2012-10-16  0:51             ` Kamezawa Hiroyuki
2012-10-16  0:54           ` David Rientjes
2012-10-16  0:54             ` David Rientjes
2012-11-07 22:10         ` [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0 Andrew Morton
2012-11-07 22:10           ` Andrew Morton
2012-11-07 22:46           ` Michal Hocko
2012-11-07 22:46             ` Michal Hocko
2012-11-07 22:53             ` Andrew Morton
2012-11-07 22:53               ` Andrew Morton
2012-11-08  8:35               ` Michal Hocko
2012-11-08  8:35                 ` Michal Hocko
2012-10-15  9:11 ` Kamezawa Hiroyuki [this message]
2012-10-15  9:11   ` [RFC PATCH] memcg: oom: fix totalpages calculation for swappiness==0 Kamezawa Hiroyuki
2012-10-15  9:49   ` Michal Hocko
2012-10-15  9:49     ` Michal Hocko
2012-10-15 14:25     ` KOSAKI Motohiro
2012-10-15 14:25       ` KOSAKI Motohiro
2012-10-15 14:47       ` Michal Hocko
2012-10-15 14:47         ` Michal Hocko
2012-10-15 22:33         ` KOSAKI Motohiro
2012-10-15 22:33           ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507BD33C.4030209@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.