All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v10 3/6] mm, oom: cgroup-aware OOM killer
Date: Wed, 4 Oct 2017 20:51:10 +0100	[thread overview]
Message-ID: <20171004195110.GA18900@castle> (raw)
In-Reply-To: <20171004192720.GC1501@cmpxchg.org>

On Wed, Oct 04, 2017 at 03:27:20PM -0400, Johannes Weiner wrote:
> On Wed, Oct 04, 2017 at 04:46:35PM +0100, Roman Gushchin wrote:
> > Traditionally, the OOM killer is operating on a process level.
> > Under oom conditions, it finds a process with the highest oom score
> > and kills it.
> > 
> > This behavior doesn't suit well the system with many running
> > containers:
> > 
> > 1) There is no fairness between containers. A small container with
> > few large processes will be chosen over a large one with huge
> > number of small processes.
> > 
> > 2) Containers often do not expect that some random process inside
> > will be killed. In many cases much safer behavior is to kill
> > all tasks in the container. Traditionally, this was implemented
> > in userspace, but doing it in the kernel has some advantages,
> > especially in a case of a system-wide OOM.
> > 
> > To address these issues, the cgroup-aware OOM killer is introduced.
> > 
> > Under OOM conditions, it looks for the biggest leaf memory cgroup
> > and kills the biggest task belonging to it. The following patches
> > will extend this functionality to consider non-leaf memory cgroups
> > as well, and also provide an ability to kill all tasks belonging
> > to the victim cgroup.
> > 
> > The root cgroup is treated as a leaf memory cgroup, so it's score
> > is compared with leaf memory cgroups.
> > Due to memcg statistics implementation a special algorithm
> > is used for estimating it's oom_score: we define it as maximum
> > oom_score of the belonging tasks.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: kernel-team@fb.com
> > Cc: cgroups@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> 
> This looks good to me.
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I just have one question:
> 
> > @@ -828,6 +828,12 @@ static void __oom_kill_process(struct task_struct *victim)
> >  	struct mm_struct *mm;
> >  	bool can_oom_reap = true;
> >  
> > +	if (is_global_init(victim) || (victim->flags & PF_KTHREAD) ||
> > +	    victim->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > +		put_task_struct(victim);
> > +		return;
> > +	}
> > +
> >  	p = find_lock_task_mm(victim);
> >  	if (!p) {
> >  		put_task_struct(victim);
> 
> Is this necessary? The callers of this function use oom_badness() to
> find a victim, and that filters init, kthread, OOM_SCORE_ADJ_MIN.

It is. __oom_kill_process() is used to kill all processes belonging
to the selected memory cgroup, so we should perform these checks
to avoid killing unkillable processes.

Thanks!

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v10 3/6] mm, oom: cgroup-aware OOM killer
Date: Wed, 4 Oct 2017 20:51:10 +0100	[thread overview]
Message-ID: <20171004195110.GA18900@castle> (raw)
In-Reply-To: <20171004192720.GC1501@cmpxchg.org>

On Wed, Oct 04, 2017 at 03:27:20PM -0400, Johannes Weiner wrote:
> On Wed, Oct 04, 2017 at 04:46:35PM +0100, Roman Gushchin wrote:
> > Traditionally, the OOM killer is operating on a process level.
> > Under oom conditions, it finds a process with the highest oom score
> > and kills it.
> > 
> > This behavior doesn't suit well the system with many running
> > containers:
> > 
> > 1) There is no fairness between containers. A small container with
> > few large processes will be chosen over a large one with huge
> > number of small processes.
> > 
> > 2) Containers often do not expect that some random process inside
> > will be killed. In many cases much safer behavior is to kill
> > all tasks in the container. Traditionally, this was implemented
> > in userspace, but doing it in the kernel has some advantages,
> > especially in a case of a system-wide OOM.
> > 
> > To address these issues, the cgroup-aware OOM killer is introduced.
> > 
> > Under OOM conditions, it looks for the biggest leaf memory cgroup
> > and kills the biggest task belonging to it. The following patches
> > will extend this functionality to consider non-leaf memory cgroups
> > as well, and also provide an ability to kill all tasks belonging
> > to the victim cgroup.
> > 
> > The root cgroup is treated as a leaf memory cgroup, so it's score
> > is compared with leaf memory cgroups.
> > Due to memcg statistics implementation a special algorithm
> > is used for estimating it's oom_score: we define it as maximum
> > oom_score of the belonging tasks.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: kernel-team@fb.com
> > Cc: cgroups@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> 
> This looks good to me.
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I just have one question:
> 
> > @@ -828,6 +828,12 @@ static void __oom_kill_process(struct task_struct *victim)
> >  	struct mm_struct *mm;
> >  	bool can_oom_reap = true;
> >  
> > +	if (is_global_init(victim) || (victim->flags & PF_KTHREAD) ||
> > +	    victim->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > +		put_task_struct(victim);
> > +		return;
> > +	}
> > +
> >  	p = find_lock_task_mm(victim);
> >  	if (!p) {
> >  		put_task_struct(victim);
> 
> Is this necessary? The callers of this function use oom_badness() to
> find a victim, and that filters init, kthread, OOM_SCORE_ADJ_MIN.

It is. __oom_kill_process() is used to kill all processes belonging
to the selected memory cgroup, so we should perform these checks
to avoid killing unkillable processes.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, <kernel-team@fb.com>,
	<cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [v10 3/6] mm, oom: cgroup-aware OOM killer
Date: Wed, 4 Oct 2017 20:51:10 +0100	[thread overview]
Message-ID: <20171004195110.GA18900@castle> (raw)
In-Reply-To: <20171004192720.GC1501@cmpxchg.org>

On Wed, Oct 04, 2017 at 03:27:20PM -0400, Johannes Weiner wrote:
> On Wed, Oct 04, 2017 at 04:46:35PM +0100, Roman Gushchin wrote:
> > Traditionally, the OOM killer is operating on a process level.
> > Under oom conditions, it finds a process with the highest oom score
> > and kills it.
> > 
> > This behavior doesn't suit well the system with many running
> > containers:
> > 
> > 1) There is no fairness between containers. A small container with
> > few large processes will be chosen over a large one with huge
> > number of small processes.
> > 
> > 2) Containers often do not expect that some random process inside
> > will be killed. In many cases much safer behavior is to kill
> > all tasks in the container. Traditionally, this was implemented
> > in userspace, but doing it in the kernel has some advantages,
> > especially in a case of a system-wide OOM.
> > 
> > To address these issues, the cgroup-aware OOM killer is introduced.
> > 
> > Under OOM conditions, it looks for the biggest leaf memory cgroup
> > and kills the biggest task belonging to it. The following patches
> > will extend this functionality to consider non-leaf memory cgroups
> > as well, and also provide an ability to kill all tasks belonging
> > to the victim cgroup.
> > 
> > The root cgroup is treated as a leaf memory cgroup, so it's score
> > is compared with leaf memory cgroups.
> > Due to memcg statistics implementation a special algorithm
> > is used for estimating it's oom_score: we define it as maximum
> > oom_score of the belonging tasks.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: kernel-team@fb.com
> > Cc: cgroups@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> 
> This looks good to me.
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I just have one question:
> 
> > @@ -828,6 +828,12 @@ static void __oom_kill_process(struct task_struct *victim)
> >  	struct mm_struct *mm;
> >  	bool can_oom_reap = true;
> >  
> > +	if (is_global_init(victim) || (victim->flags & PF_KTHREAD) ||
> > +	    victim->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > +		put_task_struct(victim);
> > +		return;
> > +	}
> > +
> >  	p = find_lock_task_mm(victim);
> >  	if (!p) {
> >  		put_task_struct(victim);
> 
> Is this necessary? The callers of this function use oom_badness() to
> find a victim, and that filters init, kthread, OOM_SCORE_ADJ_MIN.

It is. __oom_kill_process() is used to kill all processes belonging
to the selected memory cgroup, so we should perform these checks
to avoid killing unkillable processes.

Thanks!

  reply	other threads:[~2017-10-04 19:51 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-04 15:46 [v10 0/6] cgroup-aware OOM killer Roman Gushchin
2017-10-04 15:46 ` Roman Gushchin
2017-10-04 15:46 ` [v10 1/6] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
     [not found]   ` <20171004154638.710-2-guro-b10kYP2dOMg@public.gmane.org>
2017-10-04 19:14     ` Johannes Weiner
2017-10-04 19:14       ` Johannes Weiner
2017-10-04 19:14       ` Johannes Weiner
2017-10-04 15:46 ` [v10 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
2017-10-04 19:15   ` Johannes Weiner
2017-10-04 19:15     ` Johannes Weiner
2017-10-04 20:10   ` David Rientjes
2017-10-04 20:10     ` David Rientjes
2017-10-04 15:46 ` [v10 3/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
2017-10-04 19:27   ` Johannes Weiner
2017-10-04 19:27     ` Johannes Weiner
2017-10-04 19:51     ` Roman Gushchin [this message]
2017-10-04 19:51       ` Roman Gushchin
2017-10-04 19:51       ` Roman Gushchin
2017-10-04 20:17       ` David Rientjes
2017-10-04 20:17         ` David Rientjes
     [not found]         ` <alpine.DEB.2.10.1710041316120.67374-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2017-10-04 20:22           ` Roman Gushchin
2017-10-04 20:22             ` Roman Gushchin
2017-10-04 20:22             ` Roman Gushchin
2017-10-04 20:31         ` Johannes Weiner
2017-10-04 20:31           ` Johannes Weiner
2017-10-05 11:14           ` Michal Hocko
2017-10-05 11:14             ` Michal Hocko
2017-10-04 19:48   ` Shakeel Butt
2017-10-04 19:48     ` Shakeel Butt
2017-10-04 20:15     ` Roman Gushchin
2017-10-04 20:15       ` Roman Gushchin
2017-10-04 21:24       ` Shakeel Butt
2017-10-04 21:24         ` Shakeel Butt
     [not found]         ` <CALvZod45ObeQwq-pKeqyLe2bNwfKAr0majCbNfqPOEJL+AeiNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-05 10:27           ` Roman Gushchin
2017-10-05 10:27             ` Roman Gushchin
2017-10-05 10:27             ` Roman Gushchin
     [not found]             ` <20171005102707.GA12982-2xczL/1GIl5a1dPMsufgnw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-10-05 11:12               ` Michal Hocko
2017-10-05 11:12                 ` Michal Hocko
2017-10-05 11:12                 ` Michal Hocko
2017-10-05 11:45                 ` Roman Gushchin
2017-10-05 11:45                   ` Roman Gushchin
     [not found]   ` <20171004154638.710-4-guro-b10kYP2dOMg@public.gmane.org>
2017-10-04 20:27     ` David Rientjes
2017-10-04 20:27       ` David Rientjes
2017-10-04 20:27       ` David Rientjes
2017-10-04 20:41       ` Johannes Weiner
2017-10-04 20:41         ` Johannes Weiner
2017-10-05  8:40         ` David Rientjes
2017-10-05  8:40           ` David Rientjes
2017-10-05 10:27           ` Johannes Weiner
2017-10-05 10:27             ` Johannes Weiner
2017-10-05 21:53             ` David Rientjes
2017-10-05 21:53               ` David Rientjes
2017-10-05 10:44           ` Roman Gushchin
2017-10-05 10:44             ` Roman Gushchin
2017-10-05 22:02             ` David Rientjes
2017-10-05 22:02               ` David Rientjes
2017-10-06  5:43               ` Michal Hocko
2017-10-06  5:43                 ` Michal Hocko
2017-10-05 11:40   ` Michal Hocko
2017-10-05 11:40     ` Michal Hocko
2017-10-04 15:46 ` [v10 4/6] mm, oom: introduce memory.oom_group Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
2017-10-04 19:37   ` Johannes Weiner
2017-10-04 19:37     ` Johannes Weiner
2017-10-05 12:06   ` Michal Hocko
2017-10-05 12:06     ` Michal Hocko
2017-10-05 12:32     ` Roman Gushchin
2017-10-05 12:32       ` Roman Gushchin
2017-10-05 12:32       ` Roman Gushchin
2017-10-05 12:58       ` Michal Hocko
2017-10-05 12:58         ` Michal Hocko
2017-10-04 15:46 ` [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
     [not found]   ` <20171004154638.710-6-guro-b10kYP2dOMg@public.gmane.org>
2017-10-04 20:04     ` Johannes Weiner
2017-10-04 20:04       ` Johannes Weiner
2017-10-04 20:04       ` Johannes Weiner
     [not found]       ` <20171004200453.GE1501-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2017-10-05 13:14         ` Michal Hocko
2017-10-05 13:14           ` Michal Hocko
2017-10-05 13:14           ` Michal Hocko
2017-10-05 13:41           ` Roman Gushchin
2017-10-05 13:41             ` Roman Gushchin
2017-10-05 13:41             ` Roman Gushchin
2017-10-05 14:10             ` Michal Hocko
2017-10-05 14:10               ` Michal Hocko
2017-10-05 14:54           ` Johannes Weiner
2017-10-05 14:54             ` Johannes Weiner
2017-10-05 16:40             ` Michal Hocko
2017-10-05 16:40               ` Michal Hocko
2017-10-05 15:51           ` Tejun Heo
2017-10-05 15:51             ` Tejun Heo
2017-10-04 15:46 ` [v10 6/6] mm, oom, docs: describe the " Roman Gushchin
2017-10-04 15:46   ` Roman Gushchin
     [not found]   ` <20171004154638.710-7-guro-b10kYP2dOMg@public.gmane.org>
2017-10-04 20:08     ` Johannes Weiner
2017-10-04 20:08       ` Johannes Weiner
2017-10-04 20:08       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171004195110.GA18900@castle \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.