From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Colin Cross <ccross@google.com>
Cc: David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
werner <w.landgraf@ru.ru>, Rik van Riel <riel@redhat.com>,
Hugh Dickins <hughd@google.com>,
linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
Rabin Vincent <rabin.vincent@stericsson.com>,
Christian Bejram <christian.bejram@stericsson.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Anton Vorontsov <anton.vorontsov@linaro.org>,
stable@vger.kernel.org
Subject: Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
Date: Mon, 9 Apr 2012 15:21:20 -0700 [thread overview]
Message-ID: <20120409222120.GA14149@kroah.com> (raw)
In-Reply-To: <CAMbhsRRUNVn9_eL_obZxcZv7LqDW_8SeRApdKzDePbaW_Y4uvg@mail.gmail.com>
On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >> return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >> - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >> - what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free. Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >> (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >> So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >> The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >> (b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way). I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one. We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
>
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook. I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c. The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem. Has that work stalled somwhere?
greg k-h
WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Colin Cross <ccross@google.com>
Cc: David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
werner <w.landgraf@ru.ru>, Rik van Riel <riel@redhat.com>,
Hugh Dickins <hughd@google.com>,
linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
Rabin Vincent <rabin.vincent@stericsson.com>,
Christian Bejram <christian.bejram@stericsson.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Anton Vorontsov <anton.vorontsov@linaro.org>,
stable@vger.kernel.org
Subject: Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
Date: Mon, 9 Apr 2012 15:21:20 -0700 [thread overview]
Message-ID: <20120409222120.GA14149@kroah.com> (raw)
In-Reply-To: <CAMbhsRRUNVn9_eL_obZxcZv7LqDW_8SeRApdKzDePbaW_Y4uvg@mail.gmail.com>
On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >> � � � � return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >> �- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >> �- what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free. �Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >> �(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >> � � �So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >> � � �The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >> �(b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. �It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way). �I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one. �We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
>
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook. I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c. The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem. Has that work stalled somwhere?
greg k-h
next prev parent reply other threads:[~2012-04-09 22:21 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-09 2:42 v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) Linus Torvalds
2012-04-09 2:50 ` Andrew Morton
2012-04-09 3:11 ` Linus Torvalds
2012-04-09 7:04 ` Sven Joachim
2012-04-09 15:24 ` Linus Torvalds
2012-04-09 15:43 ` Sven Joachim
2012-04-09 15:57 ` Rik van Riel
2012-04-09 16:19 ` Sven Joachim
2012-04-09 16:33 ` Rik van Riel
2012-04-09 17:00 ` Pekka Enberg
2012-04-09 17:19 ` Sven Joachim
2012-04-09 17:00 ` Sven Joachim
2012-04-09 17:20 ` Rik van Riel
2012-04-09 10:15 ` David Rientjes
2012-04-09 15:39 ` Linus Torvalds
2012-04-09 21:22 ` David Rientjes
2012-04-09 22:09 ` Linus Torvalds
2012-04-09 23:25 ` David Rientjes
2012-04-09 23:55 ` Linus Torvalds
2012-04-09 23:55 ` Linus Torvalds
2012-04-10 0:04 ` David Rientjes
2012-04-10 0:04 ` David Rientjes
2012-04-14 20:50 ` Srivatsa S. Bhat
2012-04-09 23:56 ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
2012-04-10 1:23 ` Colin Cross
2012-04-10 1:23 ` Colin Cross
[not found] ` <web-723076709@zbackend1.aha.ru>
[not found] ` <alpine.DEB.2.00.1204091637280.21813@chino.kir.corp.google.com>
[not found] ` <web-723082731@zbackend1.aha.ru>
[not found] ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
2012-04-10 7:09 ` v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) werner
2012-04-10 7:10 ` werner
2012-04-09 22:13 ` Colin Cross
2012-04-09 22:13 ` Colin Cross
2012-04-09 22:21 ` Greg Kroah-Hartman [this message]
2012-04-09 22:21 ` Greg Kroah-Hartman
2012-04-09 22:44 ` john stultz
2012-04-09 22:44 ` john stultz
2012-04-09 22:30 ` Linus Torvalds
2012-04-09 23:37 ` David Rientjes
2012-04-10 0:23 ` Colin Cross
2012-04-10 0:23 ` Colin Cross
2012-04-10 0:32 ` David Rientjes
2012-04-10 1:21 ` Colin Cross
2012-04-10 1:21 ` Colin Cross
2012-04-10 1:33 ` David Rientjes
2012-04-10 1:37 ` Colin Cross
-- strict thread matches above, loose matches on Subject: below --
2012-04-09 6:52 werner
2012-04-09 7:01 werner
2012-04-10 1:52 werner
2012-04-10 1:51 ` Rik van Riel
2012-04-10 2:13 ` werner
2012-04-10 12:53 werner
2012-04-14 19:38 werner
2012-04-14 19:58 ` Rik van Riel
2012-04-14 21:03 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120409222120.GA14149@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=anton.vorontsov@linaro.org \
--cc=ccross@google.com \
--cc=christian.bejram@stericsson.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rabin.vincent@stericsson.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=w.landgraf@ru.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.