From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
linux-mm@kvack.org, vdavydov@virtuozzo.com, rientjes@google.com
Subject: Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE
Date: Sun, 3 Jul 2016 15:21:47 +0200 [thread overview]
Message-ID: <20160703132147.GA28267@redhat.com> (raw)
In-Reply-To: <20160630075904.GC18783@dhcp22.suse.cz>
On 06/30, Michal Hocko wrote:
> On Wed 29-06-16 22:01:08, Oleg Nesterov wrote:
> > On 06/29, Michal Hocko wrote:
> > >
> > > > > +void mark_oom_victim(struct task_struct *tsk, struct mm_struct *mm)
> > > > > {
> > > > > WARN_ON(oom_killer_disabled);
> > > > > /* OOM killer might race with memcg OOM */
> > > > > if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> > > > > return;
> > > > > +
> > > > > atomic_inc(&tsk->signal->oom_victims);
> > > > > +
> > > > > + /* oom_mm is bound to the signal struct life time */
> > > > > + if (!tsk->signal->oom_mm) {
> > > > > + atomic_inc(&mm->mm_count);
> > > > > + tsk->signal->oom_mm = mm;
> > > >
> > > > Looks racy, but it is not because we rely on oom_lock? Perhaps a comment
> > > > makes sense.
> > >
> > > mark_oom_victim will be called only for the current or under the
> > > task_lock so it should be stable. Except for...
> >
> > I meant that the code looks racy because 2 threads can see ->oom_mm == NULL
> > at the same time and in this case we have the extra atomic_inc(mm_count).
> > But I guess oom_lock saves us, so the code is correct but not clear.
>
> I have changed that to cmpxchg because lowmemory killer is called
> outside of oom_lock.
Hmm. I do not see anything in android/lowmemorykiller.c which can call
mark_oom_victim() ...
But if this is possible then perhaps we have more problems, note that the
if (tsk == oom_reaper_list || tsk->oom_reaper_list)
check wake_oom_reaper() looks equally racy unless tsk is always current
without oom_lock.
And btw this check probably needs a comment too, we rely on SIGKILL sent
to this task before we do wake_oom_reaper(), or task_will_free_mem() == T.
Otherwise tsk->oom_reaper_list can be non-NULL if a victim forks before
exit, the child will have ->oom_reaper_list copied from parent by
dup_task_struct().
> > > Hmm, I didn't think about exec case. And I guess we have never cared
> > > about that race. We just select a task and then kill it.
> >
> > And I guess we want to fix this too, although this is not that important,
> > but this looks like a minor security problem.
>
> I am not sure I can see security implications but I agree this is less
> than optimal,
Well, just suppose that a memory hog execs a setuid application which does
something important, then we can kill it in some "inconsistent" state. Say,
after it created a file-lock which blocks other instances.
> albeit not critical. Killing a young process which didn't
> have much time to do a useful work doesn't seem that critical.
Yes, agreed, this is minor and very unlikely.
> > And this is another indication that almost everything oom-kill.c does with
> > task_struct is wrong ;) Ideally It should only use task_struct to send the
> > SIGKILL, and now that we kill all users of victim->mm we can hopefully do
> > this later.
>
> Hmm, so you think we should do s@victim@mm_victim@ and then do the
> for_each_process loop to kill all the tasks sharing that mm and kill
> them? We are doing that already so it doesn't sound that bad...
Yes, exactlty. But of course I am not sure about details.
>
> > Btw, do we still need this list_for_each_entry(child, &t->children, sibling)
> > loop in oom_kill_process() ?
>
> Well, to be honest, I don't know. This is a heuristic we have been doing
> for a long time. I do not know how many times it really matters. It can
> even be harmful in loads where children are created in the same pace OOM
> killer is killing them. Not sure how likely is that though...
And it is not clear to me why "child_points > victim_points" can be true if
the victim was chosen by select_bad_process() (to simplify the discussion,
lets ignore has_intersects_mems_allowed/etc).
> Let me think whether we can do something about that.
Perhaps it only makes sense if the caller is out_of_memory() ? I mean the
sysctl_oom_kill_allocating_task branch. In this case it would nice to move
this list_for_each_entry(children) into another helper.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-07-03 13:21 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-24 11:02 [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE Tetsuo Handa
2016-06-24 12:39 ` Michal Hocko
2016-06-24 15:54 ` Tetsuo Handa
2016-06-24 22:42 ` Oleg Nesterov
2016-06-24 21:56 ` Oleg Nesterov
2016-06-25 5:44 ` Tetsuo Handa
2016-06-27 9:23 ` Michal Hocko
2016-06-27 10:36 ` Michal Hocko
2016-06-27 15:51 ` Oleg Nesterov
2016-06-27 16:06 ` Michal Hocko
2016-06-27 17:55 ` Oleg Nesterov
2016-06-28 10:19 ` Michal Hocko
2016-06-29 0:13 ` Oleg Nesterov
2016-06-29 8:33 ` Michal Hocko
2016-06-29 14:19 ` Michal Hocko
2016-07-01 10:15 ` Tetsuo Handa
2016-06-29 20:01 ` Oleg Nesterov
2016-06-30 7:59 ` Michal Hocko
2016-06-30 10:51 ` Tetsuo Handa
2016-06-30 11:21 ` Michal Hocko
2016-07-03 13:32 ` Oleg Nesterov
2016-07-03 13:21 ` Oleg Nesterov [this message]
2016-07-07 11:51 ` Michal Hocko
2016-07-07 16:42 ` Oleg Nesterov
2016-06-29 20:14 ` Oleg Nesterov
2016-06-30 8:07 ` Michal Hocko
2016-07-03 13:24 ` Oleg Nesterov
2016-06-27 21:09 ` Oleg Nesterov
2016-06-28 10:26 ` Michal Hocko
2016-06-29 19:34 ` Oleg Nesterov
2016-06-27 20:40 ` Oleg Nesterov
2016-06-28 10:29 ` Michal Hocko
2016-06-29 20:24 ` Oleg Nesterov
2016-06-30 8:16 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160703132147.GA28267@redhat.com \
--to=oleg@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=vdavydov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.