* [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. @ 2016-02-17 14:31 Tetsuo Handa 2016-02-17 14:48 ` Michal Hocko 2016-02-17 22:31 ` David Rientjes 0 siblings, 2 replies; 14+ messages in thread From: Tetsuo Handa @ 2016-02-17 14:31 UTC (permalink / raw) To: mhocko, akpm Cc: rientjes, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel, Tetsuo Handa oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a thread which returns oom_task_origin() == true. But it is possible that such thread is marked as OOM-unkillable. In that case, the OOM killer must not select such process. Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable process because subsequent oom_badness() call will return 0, this patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE if that process is marked as OOM-unkillable (regardless of oom_task_origin()). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Suggested-by: Michal Hocko <mhocko@kernel.org> --- mm/oom_kill.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 7653055..cf87153 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, if (!is_sysrq_oom(oc)) return OOM_SCAN_ABORT; } - if (!task->mm) + if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) return OOM_SCAN_CONTINUE; /* -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-17 14:31 [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable Tetsuo Handa @ 2016-02-17 14:48 ` Michal Hocko 2016-02-17 22:31 ` David Rientjes 1 sibling, 0 replies; 14+ messages in thread From: Michal Hocko @ 2016-02-17 14:48 UTC (permalink / raw) To: Tetsuo Handa Cc: akpm, rientjes, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Wed 17-02-16 23:31:00, Tetsuo Handa wrote: > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a > thread which returns oom_task_origin() == true. But it is possible > that such thread is marked as OOM-unkillable. In that case, the OOM > killer must not select such process. As already pointed out swapoff or ksm run_store are the only users of OOM_FLAG_ORIGIN and it would be insane to run them from an oom disabled context. So I wouldn't care much about this part that much and consider the patch to be more of a cleanup rather than a bug fix. > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable > process because subsequent oom_badness() call will return 0, this > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE > if that process is marked as OOM-unkillable (regardless of > oom_task_origin()). > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/oom_kill.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 7653055..cf87153 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, > if (!is_sysrq_oom(oc)) > return OOM_SCAN_ABORT; > } > - if (!task->mm) > + if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > return OOM_SCAN_CONTINUE; > > /* > -- > 1.8.3.1 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-17 14:31 [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable Tetsuo Handa 2016-02-17 14:48 ` Michal Hocko @ 2016-02-17 22:31 ` David Rientjes 2016-02-18 8:09 ` Michal Hocko 1 sibling, 1 reply; 14+ messages in thread From: David Rientjes @ 2016-02-17 22:31 UTC (permalink / raw) To: Tetsuo Handa Cc: mhocko, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Wed, 17 Feb 2016, Tetsuo Handa wrote: > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a > thread which returns oom_task_origin() == true. But it is possible > that such thread is marked as OOM-unkillable. In that case, the OOM > killer must not select such process. > > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable > process because subsequent oom_badness() call will return 0, this > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE > if that process is marked as OOM-unkillable (regardless of > oom_task_origin()). > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Suggested-by: Michal Hocko <mhocko@kernel.org> > --- > mm/oom_kill.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 7653055..cf87153 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, > if (!is_sysrq_oom(oc)) > return OOM_SCAN_ABORT; > } > - if (!task->mm) > + if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > return OOM_SCAN_CONTINUE; > > /* I'm getting multiple emails from you with the identical patch, something is definitely wacky in your toolchain. Anyway, this is NACK'd since task->signal->oom_score_adj is checked under task_lock() for threads with memory attached, that's the purpose of finding the correct thread in oom_badness() and taking task_lock(). We aren't going to duplicate logic in several functions that all do the same thing. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-17 22:31 ` David Rientjes @ 2016-02-18 8:09 ` Michal Hocko 2016-02-18 10:30 ` Tetsuo Handa 2016-02-23 1:06 ` David Rientjes 0 siblings, 2 replies; 14+ messages in thread From: Michal Hocko @ 2016-02-18 8:09 UTC (permalink / raw) To: David Rientjes Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Wed 17-02-16 14:31:54, David Rientjes wrote: > On Wed, 17 Feb 2016, Tetsuo Handa wrote: > > > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a > > thread which returns oom_task_origin() == true. But it is possible > > that such thread is marked as OOM-unkillable. In that case, the OOM > > killer must not select such process. > > > > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable > > process because subsequent oom_badness() call will return 0, this > > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE > > if that process is marked as OOM-unkillable (regardless of > > oom_task_origin()). > > > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Suggested-by: Michal Hocko <mhocko@kernel.org> > > --- > > mm/oom_kill.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 7653055..cf87153 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, > > if (!is_sysrq_oom(oc)) > > return OOM_SCAN_ABORT; > > } > > - if (!task->mm) > > + if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > > return OOM_SCAN_CONTINUE; > > > > /* > > I'm getting multiple emails from you with the identical patch, something > is definitely wacky in your toolchain. > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under > task_lock() for threads with memory attached, that's the purpose of > finding the correct thread in oom_badness() and taking task_lock(). We > aren't going to duplicate logic in several functions that all do the same > thing. Is the task_lock really necessary, though? E.g. oom_task_origin() doesn't seem to depend on it for task->signal safety. If you are referring to races with changing oom_score_adj does such a race matter at all? To me this looks like a reasonable cleanup because we _know_ that OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go down to oom_badness to find that out. Or what am I missing? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-18 8:09 ` Michal Hocko @ 2016-02-18 10:30 ` Tetsuo Handa 2016-02-18 12:08 ` Michal Hocko 2016-02-23 1:06 ` David Rientjes 1 sibling, 1 reply; 14+ messages in thread From: Tetsuo Handa @ 2016-02-18 10:30 UTC (permalink / raw) To: mhocko, rientjes Cc: akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel Michal Hocko wrote: > On Wed 17-02-16 14:31:54, David Rientjes wrote: > > On Wed, 17 Feb 2016, Tetsuo Handa wrote: > > > > > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a > > > thread which returns oom_task_origin() == true. But it is possible > > > that such thread is marked as OOM-unkillable. In that case, the OOM > > > killer must not select such process. > > > > > > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable > > > process because subsequent oom_badness() call will return 0, this > > > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE > > > if that process is marked as OOM-unkillable (regardless of > > > oom_task_origin()). > > > > > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > Suggested-by: Michal Hocko <mhocko@kernel.org> > > > --- > > > mm/oom_kill.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 7653055..cf87153 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, > > > if (!is_sysrq_oom(oc)) > > > return OOM_SCAN_ABORT; > > > } > > > - if (!task->mm) > > > + if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > > > return OOM_SCAN_CONTINUE; > > > > > > /* > > > > I'm getting multiple emails from you with the identical patch, something > > is definitely wacky in your toolchain. Sorry, my operation error. I didn't know I can't do like below. git send-email --to="rcpt1 rcpt2" --cc="rcpt3 rcpt4 rcpt5" file.patch Thus, I resent the identical patch. Not a toolchain problem. > > > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under > > task_lock() for threads with memory attached, that's the purpose of > > finding the correct thread in oom_badness() and taking task_lock(). We > > aren't going to duplicate logic in several functions that all do the same > > thing. > > Is the task_lock really necessary, though? E.g. oom_task_origin() > doesn't seem to depend on it for task->signal safety. If you are > referring to races with changing oom_score_adj does such a race matter > at all? > > To me this looks like a reasonable cleanup because we _know_ that > OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go > down to oom_badness to find that out. Or what am I missing? > That NACK will not matter if a draft patch shown bottom is acceptable. I got a question about commit 9cbb78bb314360a8 "mm, memcg: introduce own oom handler to iterate only over its own threads" while trying to kill duplicated oom_unkillable_task() checks by merging oom_scan_process_thread() into oom_badness(). Currently, oom_scan_process_thread() is doing if (oom_unkillable_task(p, NULL, oc->nodemask)) return OOM_SCAN_CONTINUE; and oom_badness() is doing if (oom_unkillable_task(p, memcg, nodemask)) return 0; . For normal OOM case, out_of_memory() is calling oom_scan_process_thread(oc, p, totalpages) oom_badness(p, NULL, oc->nodemask, totalpages) which is translated to if (oom_unkillable_task(p, NULL, oc->nodemask)) return OOM_SCAN_CONTINUE; if (oom_unkillable_task(p, NULL, oc->nodemask)) return 0; . But for memcg OOM case, mem_cgroup_out_of_memory() is calling oom_scan_process_thread(oc, p, totalpages) oom_badness(p, memcg, NULL, totalpages) which is translated to if (oom_unkillable_task(p, NULL, NULL)) return OOM_SCAN_CONTINUE; if (oom_unkillable_task(p, memcg, NULL)) return 0; . Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to always pass memcg == NULL by removing memcg argument from oom_scan_process_thread(). As a result, after that commit, we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and oom_task_origin(p) check between two oom_unkillable_task() calls of memcg OOM case. Why don't we skip these checks by passing memcg != NULL to first oom_unkillable_task() call? Was this change by error? ---------- >From 36f79bc270858e93be75de2adae1afe757d91b94 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Thu, 18 Feb 2016 19:21:26 +0900 Subject: [PATCH] mm,oom: kill duplicated oom_unkillable_task() checks. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- fs/proc/base.c | 2 +- include/linux/oom.h | 16 +++------- mm/memcontrol.c | 23 ++++---------- mm/oom_kill.c | 91 +++++++++++++++++++++++------------------------------ 4 files changed, 51 insertions(+), 81 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index d7c51ca..3020aa2 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -581,7 +581,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, read_lock(&tasklist_lock); if (pid_alive(task)) - points = oom_badness(task, NULL, NULL, totalpages) * + points = oom_badness(task, NULL, NULL, totalpages, false) * 1000 / totalpages; read_unlock(&tasklist_lock); seq_printf(m, "%lu\n", points); diff --git a/include/linux/oom.h b/include/linux/oom.h index 45993b8..b31467e 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -43,13 +43,6 @@ enum oom_constraint { CONSTRAINT_MEMCG, }; -enum oom_scan_t { - OOM_SCAN_OK, /* scan thread and find its badness */ - OOM_SCAN_CONTINUE, /* do not consider thread for oom kill */ - OOM_SCAN_ABORT, /* abort the iteration and return */ - OOM_SCAN_SELECT, /* always select this thread first */ -}; - /* Thread is the potential origin of an oom condition; kill first on oom */ #define OOM_FLAG_ORIGIN ((__force oom_flags_t)0x1) @@ -73,8 +66,10 @@ static inline bool oom_task_origin(const struct task_struct *p) extern void mark_oom_victim(struct task_struct *tsk); extern unsigned long oom_badness(struct task_struct *p, - struct mem_cgroup *memcg, const nodemask_t *nodemask, - unsigned long totalpages); + struct mem_cgroup *memcg, + struct oom_control *oc, + unsigned long totalpages, + bool check_exceptions); extern int oom_kills_count(void); extern void note_oom_kill(void); @@ -86,9 +81,6 @@ extern void check_panic_on_oom(struct oom_control *oc, enum oom_constraint constraint, struct mem_cgroup *memcg); -extern enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, - struct task_struct *task, unsigned long totalpages); - extern bool out_of_memory(struct oom_control *oc); extern void exit_oom_victim(struct task_struct *tsk); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae8b81c..0d422bf 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1248,7 +1248,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, struct mem_cgroup *iter; unsigned long chosen_points = 0; unsigned long totalpages; - unsigned int points = 0; + unsigned long points = 0; struct task_struct *chosen = NULL; mutex_lock(&oom_lock); @@ -1271,28 +1271,17 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, css_task_iter_start(&iter->css, &it); while ((task = css_task_iter_next(&it))) { - switch (oom_scan_process_thread(&oc, task, totalpages)) { - case OOM_SCAN_SELECT: - if (chosen) - put_task_struct(chosen); - chosen = task; - chosen_points = ULONG_MAX; - get_task_struct(chosen); - /* fall through */ - case OOM_SCAN_CONTINUE: + points = oom_badness(task, memcg, &oc, totalpages, + true); + if (!points || points < chosen_points) continue; - case OOM_SCAN_ABORT: + if (points == ULONG_MAX) { css_task_iter_end(&it); mem_cgroup_iter_break(memcg, iter); if (chosen) put_task_struct(chosen); goto unlock; - case OOM_SCAN_OK: - break; - }; - points = oom_badness(task, memcg, NULL, totalpages); - if (!points || points < chosen_points) - continue; + } /* Prefer thread group leaders for display purposes */ if (points == chosen_points && thread_group_leader(chosen)) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index d7bb9c1..f426ce8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -155,21 +155,40 @@ static bool oom_unkillable_task(struct task_struct *p, /** * oom_badness - heuristic function to determine which candidate task to kill * @p: task struct of which task we should calculate + * @memcg: memory cgroup. NULL unless memcg OOM case. + * @oc: oom_control struct. NULL if /proc/pid/oom_score_adj case. * @totalpages: total present RAM allowed for page allocation + * @check_exceptions: whether to check for TIF_MEMDIE and oom_task_origin(). + * + * Returns ULONG_MAX if @p is marked as OOM-victim. + * Returns ULONG_MAX - 1 if @p is marked as oom_task_origin(). + * Returns 0 if @p is marked as OOM-unkillable. + * Returns integer between 1 and ULONG_MAX - 2 otherwise. * * The heuristic for determining which task to kill is made to be as simple and * predictable as possible. The goal is to return the highest value for the * task consuming the most memory to avoid subsequent oom failures. */ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, - const nodemask_t *nodemask, unsigned long totalpages) + struct oom_control *oc, unsigned long totalpages, + bool check_exceptions) { long points; long adj; - if (oom_unkillable_task(p, memcg, nodemask)) + if (oom_unkillable_task(p, memcg, oc ? oc->nodemask : NULL)) return 0; + /* + * This task already has access to memory reserves and is being killed. + * Don't allow any other task to have access to the reserves. + */ + if (check_exceptions) + if (test_tsk_thread_flag(p, TIF_MEMDIE)) { + if (!is_sysrq_oom(oc)) + return ULONG_MAX; + } + p = find_lock_task_mm(p); if (!p) return 0; @@ -181,6 +200,16 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, } /* + * If task is allocating a lot of memory and has been marked to be + * killed first if it triggers an oom, then select it. + */ + if (check_exceptions) + if (oom_task_origin(p)) { + task_unlock(p); + return ULONG_MAX - 1; + } + + /* * The baseline for the badness score is the proportion of RAM that each * task's rss, pagetable and swap space use. */ @@ -268,36 +297,6 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc, } #endif -enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, - struct task_struct *task, unsigned long totalpages) -{ - if (oom_unkillable_task(task, NULL, oc->nodemask)) - return OOM_SCAN_CONTINUE; - - /* - * This task already has access to memory reserves and is being killed. - * Don't allow any other task to have access to the reserves. - */ - if (test_tsk_thread_flag(task, TIF_MEMDIE)) { - if (!is_sysrq_oom(oc)) - return OOM_SCAN_ABORT; - } - if (!task->mm) - return OOM_SCAN_CONTINUE; - - /* - * If task is allocating a lot of memory and has been marked to be - * killed first if it triggers an oom, then select it. - */ - if (oom_task_origin(task)) - return OOM_SCAN_SELECT; - - if (task_will_free_mem(task) && !is_sysrq_oom(oc)) - return OOM_SCAN_ABORT; - - return OOM_SCAN_OK; -} - /* * Simple selection loop. We chose the process with the highest * number of 'points'. Returns -1 on scan abort. @@ -311,24 +310,14 @@ static struct task_struct *select_bad_process(struct oom_control *oc, rcu_read_lock(); for_each_process_thread(g, p) { - unsigned int points; - - switch (oom_scan_process_thread(oc, p, totalpages)) { - case OOM_SCAN_SELECT: - chosen = p; - chosen_points = ULONG_MAX; - /* fall through */ - case OOM_SCAN_CONTINUE: + const unsigned long points = oom_badness(p, NULL, oc, + totalpages, true); + if (!points || points < chosen_points) continue; - case OOM_SCAN_ABORT: + if (points == ULONG_MAX) { rcu_read_unlock(); return (struct task_struct *)(-1UL); - case OOM_SCAN_OK: - break; - }; - points = oom_badness(p, NULL, oc->nodemask, totalpages); - if (!points || points < chosen_points) - continue; + } /* Prefer thread group leaders for display purposes */ if (points == chosen_points && thread_group_leader(chosen)) continue; @@ -679,7 +668,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, struct task_struct *child; struct task_struct *t; struct mm_struct *mm; - unsigned int victim_points = 0; + unsigned long victim_points = 0; static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); bool can_oom_reap = true; @@ -712,15 +701,15 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, read_lock(&tasklist_lock); for_each_thread(p, t) { list_for_each_entry(child, &t->children, sibling) { - unsigned int child_points; + unsigned long child_points; if (process_shares_mm(child, p->mm)) continue; /* * oom_badness() returns 0 if the thread is unkillable */ - child_points = oom_badness(child, memcg, oc->nodemask, - totalpages); + child_points = oom_badness(child, memcg, oc, + totalpages, false); if (child_points > victim_points) { put_task_struct(victim); victim = child; -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-18 10:30 ` Tetsuo Handa @ 2016-02-18 12:08 ` Michal Hocko 2016-02-18 12:13 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2016-02-18 12:08 UTC (permalink / raw) To: Tetsuo Handa Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Thu 18-02-16 19:30:12, Tetsuo Handa wrote: [...] > Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to > always pass memcg == NULL by removing memcg argument from > oom_scan_process_thread(). As a result, after that commit, > we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and > oom_task_origin(p) check between two oom_unkillable_task() > calls of memcg OOM case. Why don't we skip these checks by > passing memcg != NULL to first oom_unkillable_task() call? > Was this change by error? I am not really sure I understand your question. The point is that mem_cgroup_out_of_memory does for_each_mem_cgroup_tree which means that only tasks from the given memcg hierarchy is checked and oom_unkillable_task cares about memcg only for /* When mem_cgroup_out_of_memory() and p is not member of the group */ if (memcg && !task_in_mem_cgroup(p, memcg)) return true; which is never true by definition. I guess we can safely remove the memcg argument from oom_badness and oom_unkillable_task. At least from a quick glance... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-18 12:08 ` Michal Hocko @ 2016-02-18 12:13 ` Michal Hocko 2016-02-19 15:07 ` Tetsuo Handa 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2016-02-18 12:13 UTC (permalink / raw) To: Tetsuo Handa Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Thu 18-02-16 13:08:49, Michal Hocko wrote: > I guess we can safely remove the memcg > argument from oom_badness and oom_unkillable_task. At least from a quick > glance... No we cannot actually. oom_kill_process could select a child which is in a different memcg in that case... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-18 12:13 ` Michal Hocko @ 2016-02-19 15:07 ` Tetsuo Handa 2016-02-19 15:13 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: Tetsuo Handa @ 2016-02-19 15:07 UTC (permalink / raw) To: mhocko Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel Michal Hocko wrote: > On Thu 18-02-16 13:08:49, Michal Hocko wrote: > > I guess we can safely remove the memcg > > argument from oom_badness and oom_unkillable_task. At least from a quick > > glance... > > No we cannot actually. oom_kill_process could select a child which is in > a different memcg in that case... Then, don't we need to check whether processes sharing victim->mm in other thread groups are in the same memcg when we walk the process list? ---------- diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f426ce8..d05db31 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -762,8 +762,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, continue; if (same_thread_group(p, victim)) continue; - if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p) || - p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) { + if (oom_badness(p, memcg, oc, totalpages, false) == 0) { /* * We cannot use oom_reaper for the mm shared by this * process because it wouldn't get killed and so the ---------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-19 15:07 ` Tetsuo Handa @ 2016-02-19 15:13 ` Michal Hocko 0 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2016-02-19 15:13 UTC (permalink / raw) To: Tetsuo Handa Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Sat 20-02-16 00:07:05, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Thu 18-02-16 13:08:49, Michal Hocko wrote: > > > I guess we can safely remove the memcg > > > argument from oom_badness and oom_unkillable_task. At least from a quick > > > glance... > > > > No we cannot actually. oom_kill_process could select a child which is in > > a different memcg in that case... > > Then, don't we need to check whether processes sharing victim->mm in other > thread groups are in the same memcg when we walk the process list? memcg is bound to the mm not to the task. So all processes sharing the mm after in the same memcg (from the memcg POV). See tast_struct::owner. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-18 8:09 ` Michal Hocko 2016-02-18 10:30 ` Tetsuo Handa @ 2016-02-23 1:06 ` David Rientjes 2016-02-23 12:34 ` Michal Hocko 1 sibling, 1 reply; 14+ messages in thread From: David Rientjes @ 2016-02-23 1:06 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Thu, 18 Feb 2016, Michal Hocko wrote: > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under > > task_lock() for threads with memory attached, that's the purpose of > > finding the correct thread in oom_badness() and taking task_lock(). We > > aren't going to duplicate logic in several functions that all do the same > > thing. > > Is the task_lock really necessary, though? E.g. oom_task_origin() > doesn't seem to depend on it for task->signal safety. If you are > referring to races with changing oom_score_adj does such a race matter > at all? > oom_badness() ranges from 0 (don't kill) to 1000 (please kill). It factors in the setting of /proc/self/oom_score_adj to change that value. That is where OOM_SCORE_ADJ_MIN is enforced. It is also needed in oom_badness() to determine whether a child process should be sacrificed for its parent. We don't add duplicate logic everywhere if you want the code to be maintainable; the only exception would be for performance critical code which the oom killer most certainly is not. I'm simply not entertaining any patch to the oom killer that duplicates code everywhere, increases its complexity, makes it grow in text size, and makes it more difficult to maintain. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-23 1:06 ` David Rientjes @ 2016-02-23 12:34 ` Michal Hocko 2016-02-23 22:33 ` David Rientjes 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2016-02-23 12:34 UTC (permalink / raw) To: David Rientjes Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Mon 22-02-16 17:06:29, David Rientjes wrote: > On Thu, 18 Feb 2016, Michal Hocko wrote: > > > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under > > > task_lock() for threads with memory attached, that's the purpose of > > > finding the correct thread in oom_badness() and taking task_lock(). We > > > aren't going to duplicate logic in several functions that all do the same > > > thing. > > > > Is the task_lock really necessary, though? E.g. oom_task_origin() > > doesn't seem to depend on it for task->signal safety. If you are > > referring to races with changing oom_score_adj does such a race matter > > at all? > > > > oom_badness() ranges from 0 (don't kill) to 1000 (please kill). It > factors in the setting of /proc/self/oom_score_adj to change that value. > That is where OOM_SCORE_ADJ_MIN is enforced. The question is whether the current placement of OOM_SCORE_ADJ_MIN is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task instead? Sure, checking oom_score_adj under task_lock inside oom_badness will prevent from races but the question I raised previously was whether we actually care about those races? When would it matter? Is it really likely that the update happen during the oom killing? And if yes what prevents from the update happening _after_ the check? If for nothing else oom_unkillable_task would be complete that way. E.g. sysctl_oom_kill_allocating_task has to check for OOM_SCORE_ADJ_MIN because it doesn't rely on oom_badness and that alone would suggest that the check is misplaced. That being said I do not really care that much. I would just find it neater to have oom_unkillable_task that would really consider all the cases where the OOM should ignore a task. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-23 12:34 ` Michal Hocko @ 2016-02-23 22:33 ` David Rientjes 2016-02-24 10:05 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: David Rientjes @ 2016-02-23 22:33 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Tue, 23 Feb 2016, Michal Hocko wrote: > > oom_badness() ranges from 0 (don't kill) to 1000 (please kill). It > > factors in the setting of /proc/self/oom_score_adj to change that value. > > That is where OOM_SCORE_ADJ_MIN is enforced. > > The question is whether the current placement of OOM_SCORE_ADJ_MIN > is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task > instead? oom_unkillable_task() deals with the type of task it is (init or kthread) or being ineligible due to the memcg and cpuset placement. We want to exclude them from consideration and also suppress them from the task dump in the kernel log. We don't want to suppress oom disabled processes, we really want to know their rss, for example. It could be renamed is_ineligible_task(). > Sure, checking oom_score_adj under task_lock inside oom_badness will > prevent from races but the question I raised previously was whether we > actually care about those races? When would it matter? Is it really > likely that the update happen during the oom killing? And if yes what > prevents from the update happening _after_ the check? > It's not necessarily to take task_lock(), but find_lock_task_mm() is the means we have to iterate threads to find any with memory attached. We need that logic in oom_badness() to avoid racing with threads that have entered exit_mm(). It's possible for a thread to have a non-NULL ->mm in oom_scan_process_thread(), the thread enters exit_mm() without kill, and oom_badness() can still find it to be eligible because other threads have not exited. We still want to issue a kill to this process and task_lock() protects the setting of task->mm to NULL: don't consider it to be a race in setting oom_score_adj, consider it to be a race in unmapping (but not freeing) memory in th exit path. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-23 22:33 ` David Rientjes @ 2016-02-24 10:05 ` Michal Hocko 2016-02-24 21:36 ` David Rientjes 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2016-02-24 10:05 UTC (permalink / raw) To: David Rientjes Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Tue 23-02-16 14:33:01, David Rientjes wrote: > On Tue, 23 Feb 2016, Michal Hocko wrote: > > > > oom_badness() ranges from 0 (don't kill) to 1000 (please kill). It > > > factors in the setting of /proc/self/oom_score_adj to change that value. > > > That is where OOM_SCORE_ADJ_MIN is enforced. > > > > The question is whether the current placement of OOM_SCORE_ADJ_MIN > > is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task > > instead? > > oom_unkillable_task() deals with the type of task it is (init or kthread) > or being ineligible due to the memcg and cpuset placement. Yes and OOM disabled is yet another condition. > We want to > exclude them from consideration and also suppress them from the task dump > in the kernel log. We don't want to suppress oom disabled processes, we > really want to know their rss, for example. Hmm, is it really helpful though? What would you deduce from seeing a large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must have been a reason to mark the task that way in the first place so you can hardly do anything about it. Moreover you can deduce the same from the available information. I would even argue that displaying OOM_SCORE_ADJ_MIN might be a bit counterproductive because you have to filter them out when looking at the listing. > It could be renamed is_ineligible_task(). That wouldn't really help imho because OOM_SCORE_ADJ_MIN is an uneligible task. > > Sure, checking oom_score_adj under task_lock inside oom_badness will > > prevent from races but the question I raised previously was whether we > > actually care about those races? When would it matter? Is it really > > likely that the update happen during the oom killing? And if yes what > > prevents from the update happening _after_ the check? > > > > It's not necessarily to take task_lock(), but find_lock_task_mm() is the > means we have to iterate threads to find any with memory attached. We > need that logic in oom_badness() to avoid racing with threads that have > entered exit_mm(). It's possible for a thread to have a non-NULL ->mm in > oom_scan_process_thread(), the thread enters exit_mm() without kill, and > oom_badness() can still find it to be eligible because other threads have > not exited. We still want to issue a kill to this process and task_lock() > protects the setting of task->mm to NULL: don't consider it to be a race > in setting oom_score_adj, consider it to be a race in unmapping (but not > freeing) memory in th exit path. I am confused now. This all is true but it is independent on OOM_SCORE_ADJ_MIN check? The check is per signal_struct so checking all the threads will not change anything. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable. 2016-02-24 10:05 ` Michal Hocko @ 2016-02-24 21:36 ` David Rientjes 0 siblings, 0 replies; 14+ messages in thread From: David Rientjes @ 2016-02-24 21:36 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm, linux-kernel On Wed, 24 Feb 2016, Michal Hocko wrote: > Hmm, is it really helpful though? What would you deduce from seeing a > large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must > have been a reason to mark the task that way in the first place so you > can hardly do anything about it. Moreover you can deduce the same from > the available information. > Users run processes that are vital to the machine with OOM_SCORE_ADJ_MIN. This does not make them immune to having memory leaks that caused the oom condition, and identifying that has triaged many bugs in the past. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-02-24 21:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-17 14:31 [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable Tetsuo Handa 2016-02-17 14:48 ` Michal Hocko 2016-02-17 22:31 ` David Rientjes 2016-02-18 8:09 ` Michal Hocko 2016-02-18 10:30 ` Tetsuo Handa 2016-02-18 12:08 ` Michal Hocko 2016-02-18 12:13 ` Michal Hocko 2016-02-19 15:07 ` Tetsuo Handa 2016-02-19 15:13 ` Michal Hocko 2016-02-23 1:06 ` David Rientjes 2016-02-23 12:34 ` Michal Hocko 2016-02-23 22:33 ` David Rientjes 2016-02-24 10:05 ` Michal Hocko 2016-02-24 21:36 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).