From: Oleg Nesterov <oleg@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, Andrey Vagin <avagin@openvz.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes
Date: Sun, 13 Mar 2011 22:27:26 +0100 [thread overview]
Message-ID: <20110313212726.GA24530@redhat.com> (raw)
In-Reply-To: <AANLkTinHGSb2_jfkwx=Wjv96phzPCjBROfCTFCKi4Wey@mail.gmail.com>
Hi Hugh,
On 03/12, Hugh Dickins wrote:
>
> On Sat, Mar 12, 2011 at 5:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >>
> >> Also. Could you please look at the patches I sent?
> >>
> >> [PATCH 1/1] oom_kill_task: mark every thread as TIF_MEMDIE
> >> [PATCH v2 1/1] select_bad_process: improve the PF_EXITING check
> >
> > Cough. And both were not right, while_each_thread(p, t) needs the properly
> > initialized "t". At least I warned they were not tested ;)
> >
> >> Note also the note about "p == current" check. it should be fixed too.
> >
> > I am resending the fixes above plus the new one.
>
> I've spent much of the week building up to join in, but the more I
> look around, the more I find to say or investigate, and therefore
> never quite get to write the mail. Let this be a placeholder, that I
> probably disagree (in an amicable way!) with all of you, and maybe
> I'll finally manage to collect my thoughts into mail later today.
Thanks for looking into this!
Before I say anything else, I'd like to repeat that I am not going
to argue with the nack. The only reason I sent these patches is that
I can not understand David's patch at all. I mean, I can not even
understand the problems it should address. Yet I think the patch is
not right.
So I just tried to guess why this change helps, and suggest the
alternatives for review/testing.
> I guess my main point will be that TIF_MEMDIE serves a number of
> slightly different, even conflicting, purposes;
Yes. And let me repeat, I do not pretend understand it. However,
I bet the usage of TIF_MEMDIE is wrong.
> and one of those
> purposes, which present company seems to ignore repeatedly, is to
> serialize access to final reserves of memory
I don't quite understand "serialize" above. I hope you didn't mean
that only one thread can have TIF_MEMDIE to avoid the races...
But yes, I understand that "a lot" of TIF_MEMDIE thread can abuse
__alloc_pages_high_priority/etc. At least, I hope I understand ;)
- as a comment by Nick in
> select_bad_process() makes clear.
Oh. This comment is not clear at all.
* This task already has access to memory reserves and is
* being killed.
which task? it is quite possible that this task is already dead/released.
Or it is just exiting without memory allocations. This is group leader.
All we know is that it has task_struct, no more.
* Don't allow any other task access to the
* memory reserve.
Which other tasks? What about the sub-threads of the killed process?
It is quite possible that another thread triggered OOM and needs
TIF_MEMDIE to access the memory reserves.
And even this is not consistent. sysctl_oom_kill_allocating_task
does oom_kill_process(current) which may be non-leader.
oom_kill_process()->set_tsk_thread_flag(p, TIF_MEMDIE) is simply wrong.
The _trivial_ exploit (distinct from this one) can kill the system. And
worse! I showed this exploit many times (most probably off-list but all
were cc'ed). This was already fixed (iirc), and know I see we have it
again. This because almost every PF_EXITING check in oom_kill.c is wrong.
I'll return to this tomorrow.
> We _might_ choose to abandon that, but if so, it should be a decision,
> not an oversight. So I cannot blindly agree with just setting
> TIF_MEMDIE on more and more tasks,
OK, lets not do this.
> wonder if use of your find_lock_task_mm() in select_bad_process()
> might bring together my wish to continue serialization, David's wish
> to avoid stupid panics, and your wish to avoid deadlocks.
Hmm. Could you explain?
> but even after repeated skims of the ptrace manpage, I'll admit to
> not having a clue, nor the inclination to run and then debug it to
> find out the answer.
Ahh, sorry. I didn't explain what this test-case does... Because we
discussed this many times before, iirc.
> I don't even know if the double pthread_create is
> a vital part of the scheme
it is,
> so I assume it leaves a PF_EXITING around
> forever,
More precisely, it creates the PF_EXITING thread with ->mm != NULL.
It never goes away (unless you kill the tracer, of course).
> but I couldn't quite see how (with PF_EXITING being set after
> the tracehook_report_exit).
Yes. But note that it creates 2 threads, and the second one hangs
in exit_mm() before clearing ->mm while the 3rd thread can waits
for the 1st one which should enter exit_mm() and participate.
But this is not important, you can ignore the actual details.
> And I wonder if a similar case can be
> constructed to deadlock the for_each_process version of
> select_bad_process().
Unfortunately yes. This is documented in the changelog of 2/2.
We should fix the problems with the coredumps. Hmm, we already
had some patches, but they were forgotten/ignored. But at least
we shouldn't move back and assume that "PF_EXITING && mm" means
"we will have more memory soon".
I'll return tomorrow. At first glance, the new patch from David
has the same problem, but I am not sure.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-13 21:36 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-01 19:09 [patch] oom: prevent unnecessary oom kills or kernel panics David Rientjes
2011-03-03 1:20 ` KOSAKI Motohiro
2011-03-03 19:53 ` David Rientjes
2011-03-06 11:14 ` KOSAKI Motohiro
2011-03-06 22:06 ` David Rientjes
2011-03-08 0:24 ` KOSAKI Motohiro
2011-03-08 2:01 ` KOSAKI Motohiro
2011-03-08 13:42 ` Oleg Nesterov
2011-03-08 23:57 ` David Rientjes
2011-03-09 10:36 ` KOSAKI Motohiro
2011-03-09 11:06 ` Oleg Nesterov
2011-03-09 20:32 ` David Rientjes
2011-03-10 12:05 ` Oleg Nesterov
2011-03-10 15:40 ` [PATCH 0/1] Was: " Oleg Nesterov
2011-03-10 15:41 ` [PATCH 1/1] oom_kill_task: mark every thread as TIF_MEMDIE Oleg Nesterov
2011-03-13 1:08 ` David Rientjes
2011-03-10 16:36 ` [PATCH 0/1] select_bad_process: improve the PF_EXITING check Oleg Nesterov
2011-03-10 16:37 ` [PATCH 1/1] " Oleg Nesterov
2011-03-10 16:40 ` [PATCH 0/1] " Oleg Nesterov
2011-03-10 17:18 ` [PATCH v2 " Oleg Nesterov
2011-03-10 17:19 ` [PATCH v2 1/1] " Oleg Nesterov
2011-03-13 1:06 ` [patch] oom: prevent unnecessary oom kills or kernel panics David Rientjes
2011-03-09 23:19 ` Andrew Morton
2011-03-11 19:45 ` David Rientjes
2011-03-12 12:34 ` Oleg Nesterov
2011-03-12 13:43 ` [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes Oleg Nesterov
2011-03-12 13:44 ` [PATCH 1/3] oom: oom_kill_task: mark every thread as TIF_MEMDIE Oleg Nesterov
2011-03-13 1:14 ` David Rientjes
2011-03-12 13:44 ` [PATCH 2/3] oom: select_bad_process: improve the PF_EXITING check Oleg Nesterov
2011-03-12 13:44 ` [PATCH 3/3] oom: select_bad_process: use same_thread_group() Oleg Nesterov
2011-03-12 19:40 ` [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes Hugh Dickins
2011-03-13 8:53 ` KOSAKI Motohiro
2011-03-13 21:27 ` Oleg Nesterov [this message]
2011-03-14 19:04 ` [PATCH 0/3 for 2.6.38] oom: fixes Oleg Nesterov
2011-03-14 19:04 ` Oleg Nesterov
2011-03-14 19:04 ` [PATCH 1/3 for 2.6.38] oom: oom_kill_process: don't set TIF_MEMDIE if !p->mm Oleg Nesterov
2011-03-14 19:04 ` Oleg Nesterov
2011-03-14 19:35 ` Linus Torvalds
2011-03-14 19:35 ` Linus Torvalds
2011-03-14 20:31 ` Oleg Nesterov
2011-03-14 20:31 ` Oleg Nesterov
2011-03-14 20:32 ` David Rientjes
2011-03-14 20:32 ` David Rientjes
2011-03-15 19:12 ` Oleg Nesterov
2011-03-15 19:12 ` Oleg Nesterov
2011-03-15 19:51 ` David Rientjes
2011-03-15 19:51 ` David Rientjes
2011-03-14 20:22 ` David Rientjes
2011-03-14 20:22 ` David Rientjes
2011-03-15 18:53 ` Oleg Nesterov
2011-03-15 18:53 ` Oleg Nesterov
2011-03-15 19:54 ` David Rientjes
2011-03-15 19:54 ` David Rientjes
2011-03-15 21:16 ` Oleg Nesterov
2011-03-15 21:16 ` Oleg Nesterov
2011-03-14 19:05 ` [PATCH 2/3 for 2.6.38] oom: select_bad_process: ignore TIF_MEMDIE zombies Oleg Nesterov
2011-03-14 19:05 ` Oleg Nesterov
2011-03-14 20:50 ` David Rientjes
2011-03-14 20:50 ` David Rientjes
2011-03-14 19:05 ` [PATCH 3/3 for 2.6.38] oom: oom_kill_process: fix the child_points logic Oleg Nesterov
2011-03-14 19:05 ` Oleg Nesterov
2011-03-14 20:41 ` David Rientjes
2011-03-14 20:41 ` David Rientjes
2011-03-15 19:21 ` Oleg Nesterov
2011-03-15 19:21 ` Oleg Nesterov
2011-03-13 11:36 ` [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes KOSAKI Motohiro
2011-03-13 1:11 ` [patch] oom: prevent unnecessary oom kills or kernel panics David Rientjes
2011-03-13 1:15 ` [patch -mm] oom: avoid deferring oom killer if exiting task is being traced David Rientjes
2011-03-14 17:40 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110313212726.GA24530@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@openvz.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.