linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-mm@kvack.org, vdavydov@virtuozzo.com, rientjes@google.com
Subject: Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE
Date: Thu, 30 Jun 2016 09:59:04 +0200	[thread overview]
Message-ID: <20160630075904.GC18783@dhcp22.suse.cz> (raw)
In-Reply-To: <20160629200108.GA19253@redhat.com>

On Wed 29-06-16 22:01:08, Oleg Nesterov wrote:
> On 06/29, Michal Hocko wrote:
> >
> > > > +void mark_oom_victim(struct task_struct *tsk, struct mm_struct *mm)
> > > >  {
> > > >  	WARN_ON(oom_killer_disabled);
> > > >  	/* OOM killer might race with memcg OOM */
> > > >  	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> > > >  		return;
> > > > +
> > > >  	atomic_inc(&tsk->signal->oom_victims);
> > > > +
> > > > +	/* oom_mm is bound to the signal struct life time */
> > > > +	if (!tsk->signal->oom_mm) {
> > > > +		atomic_inc(&mm->mm_count);
> > > > +		tsk->signal->oom_mm = mm;
> > >
> > > Looks racy, but it is not because we rely on oom_lock? Perhaps a comment
> > > makes sense.
> >
> > mark_oom_victim will be called only for the current or under the
> > task_lock so it should be stable. Except for...
> 
> I meant that the code looks racy because 2 threads can see ->oom_mm == NULL
> at the same time and in this case we have the extra atomic_inc(mm_count).
> But I guess oom_lock saves us, so the code is correct but not clear.

I have changed that to cmpxchg because lowmemory killer is called
outside of oom_lock.

> > > > @@ -838,8 +826,8 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > > >  	 * If the task is already exiting, don't alarm the sysadmin or kill
> > > >  	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > >  	 */
> > > > -	if (task_will_free_mem(p)) {
> > > > -		mark_oom_victim(p);
> > > > +	if (mm && task_will_free_mem(p)) {
> > > > +		mark_oom_victim(p, mm);
> >
> > This one. I didn't bother to cover it for the example patch but I have a
> > plan to address that. There are two possible ways. One is to pin
> > mm_count in oom_badness() so that we have a guarantee that it will not
> 
> I thought about this too. And I think that select_bad_process() should even
> return mm_struct or at least a task_lock'ed task for the start.

Yes that would be a plan if I pinned the mm struct in oom_badness. I
ended up using task_lock around task_will_free_mem so it should be goot
for now. Let's see whether we can be more clever about that later.

> > > And this looks really racy at first glance. Suppose that this memory hog execs
> > > (this changes its ->mm) and then exits so that task_will_free_mem() == T, in
> > > this case "mm" has nothing to do with tsk->mm and it can be already freed.
> >
> > Hmm, I didn't think about exec case. And I guess we have never cared
> > about that race. We just select a task and then kill it.
> 
> And I guess we want to fix this too, although this is not that important,
> but this looks like a minor security problem.

I am not sure I can see security implications but I agree this is less
than optimal, albeit not critical. Killing a young process which didn't
have much time to do a useful work doesn't seem that critical. It would
be much better to kill the real holder of the mm though!

> And this is another indication that almost everything oom-kill.c does with
> task_struct is wrong ;) Ideally It should only use task_struct to send the
> SIGKILL, and now that we kill all users of victim->mm we can hopefully do
> this later.

Hmm, so you think we should do s@victim@mm_victim@ and then do the
for_each_process loop to kill all the tasks sharing that mm and kill
them? We are doing that already so it doesn't sound that bad...

> Btw, do we still need this list_for_each_entry(child, &t->children, sibling)
> loop in oom_kill_process() ?

Well, to be honest, I don't know. This is a heuristic we have been doing
for a long time. I do not know how many times it really matters. It can
even be harmful in loads where children are created in the same pace OOM
killer is killing them. Not sure how likely is that though...
Let me think whether we can do something about that.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-06-30  7:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 11:02 [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE Tetsuo Handa
2016-06-24 12:39 ` Michal Hocko
2016-06-24 15:54   ` Tetsuo Handa
2016-06-24 22:42     ` Oleg Nesterov
2016-06-24 21:56 ` Oleg Nesterov
2016-06-25  5:44   ` Tetsuo Handa
2016-06-27  9:23     ` Michal Hocko
2016-06-27 10:36       ` Michal Hocko
2016-06-27 15:51         ` Oleg Nesterov
2016-06-27 16:06           ` Michal Hocko
2016-06-27 17:55             ` Oleg Nesterov
2016-06-28 10:19               ` Michal Hocko
2016-06-29  0:13                 ` Oleg Nesterov
2016-06-29  8:33                   ` Michal Hocko
2016-06-29 14:19                     ` Michal Hocko
2016-07-01 10:15                       ` Tetsuo Handa
2016-06-29 20:01                     ` Oleg Nesterov
2016-06-30  7:59                       ` Michal Hocko [this message]
2016-06-30 10:51                         ` Tetsuo Handa
2016-06-30 11:21                           ` Michal Hocko
2016-07-03 13:32                           ` Oleg Nesterov
2016-07-03 13:21                         ` Oleg Nesterov
2016-07-07 11:51                           ` Michal Hocko
2016-07-07 16:42                             ` Oleg Nesterov
2016-06-29 20:14                 ` Oleg Nesterov
2016-06-30  8:07                   ` Michal Hocko
2016-07-03 13:24                     ` Oleg Nesterov
2016-06-27 21:09       ` Oleg Nesterov
2016-06-28 10:26         ` Michal Hocko
2016-06-29 19:34           ` Oleg Nesterov
2016-06-27 20:40     ` Oleg Nesterov
2016-06-28 10:29       ` Michal Hocko
2016-06-29 20:24         ` Oleg Nesterov
2016-06-30  8:16           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160630075904.GC18783@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).