stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
       [not found]     ` <20170401064856.GA14971@kroah.com>
@ 2017-04-02  2:34       ` Doug Anderson
  2017-04-03 13:25         ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Anderson @ 2017-04-02  2:34 UTC (permalink / raw)
  To: Greg KH
  Cc: arve, riandrews, Todd Kjos, devel, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org

Hi,

On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
>> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> BTW: I presume that nobody has decided that it would be a wise idea to
>> pick the OOM reaper code back to any stable trees?  It seemed a bit
>> too scary to me, so I wrote a dumber (but easier to backport) solution
>> that avoided the deadlocks I was seeing.  http://crosreview.com/465189
>> and the 3 patches above it in case anyone else stumbles on this thread
>> and is curious.
>
> What specific upstream OOM patches are you referring to?  I'm always
> glad to review patches for stable kernels, just email
> stable@vger.kernel.org the git commit ids and we can take it from there.

+stable

I was wondering about the concept of porting the OOM Reaper back to
older kernels.  The OOM reaper was originally introduced in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855

Basically the problem described in that patch exists in many older
kernels and I've certainly seen crashes related to this in 3.10, but I
believe older kernels see the same problems too.

Personally I wouldn't know exactly which patches were important to
backport and how far to go.  One could arbitrarily try to backport up
to 4.6.7 (since 4.6 was the first kernel to really have the OOM
reaper) and ignore all the reaper fixes that landed since then.  This
would probably be doable for kernel 4.4, though if anyone was trying
to support older kernels it might get harder.


-Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
  2017-04-02  2:34       ` [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read() Doug Anderson
@ 2017-04-03 13:25         ` Greg KH
  2017-04-03 18:09           ` Doug Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2017-04-03 13:25 UTC (permalink / raw)
  To: Doug Anderson
  Cc: arve, riandrews, Todd Kjos, devel, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org

On Sat, Apr 01, 2017 at 07:34:53PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
> >> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >> BTW: I presume that nobody has decided that it would be a wise idea to
> >> pick the OOM reaper code back to any stable trees?  It seemed a bit
> >> too scary to me, so I wrote a dumber (but easier to backport) solution
> >> that avoided the deadlocks I was seeing.  http://crosreview.com/465189
> >> and the 3 patches above it in case anyone else stumbles on this thread
> >> and is curious.
> >
> > What specific upstream OOM patches are you referring to?  I'm always
> > glad to review patches for stable kernels, just email
> > stable@vger.kernel.org the git commit ids and we can take it from there.
> 
> +stable
> 
> I was wondering about the concept of porting the OOM Reaper back to
> older kernels.  The OOM reaper was originally introduced in:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855
> 
> Basically the problem described in that patch exists in many older
> kernels and I've certainly seen crashes related to this in 3.10, but I
> believe older kernels see the same problems too.
> 
> Personally I wouldn't know exactly which patches were important to
> backport and how far to go.  One could arbitrarily try to backport up
> to 4.6.7 (since 4.6 was the first kernel to really have the OOM
> reaper) and ignore all the reaper fixes that landed since then.  This
> would probably be doable for kernel 4.4, though if anyone was trying
> to support older kernels it might get harder.

Well, I would need someone to give me a list of commits, and actually
test it to see if it is something that people use/want before I can
queue anything up for a stable release...

{hint}

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
  2017-04-03 13:25         ` Greg KH
@ 2017-04-03 18:09           ` Doug Anderson
  2017-04-04  7:40             ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Anderson @ 2017-04-03 18:09 UTC (permalink / raw)
  To: Greg KH
  Cc: arve, riandrews, Todd Kjos, devel, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, mhocko

Hi,

On Mon, Apr 3, 2017 at 6:25 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Sat, Apr 01, 2017 at 07:34:53PM -0700, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> > On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
>> >> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> >> BTW: I presume that nobody has decided that it would be a wise idea to
>> >> pick the OOM reaper code back to any stable trees?  It seemed a bit
>> >> too scary to me, so I wrote a dumber (but easier to backport) solution
>> >> that avoided the deadlocks I was seeing.  http://crosreview.com/465189
>> >> and the 3 patches above it in case anyone else stumbles on this thread
>> >> and is curious.
>> >
>> > What specific upstream OOM patches are you referring to?  I'm always
>> > glad to review patches for stable kernels, just email
>> > stable@vger.kernel.org the git commit ids and we can take it from there.
>>
>> +stable
>>
>> I was wondering about the concept of porting the OOM Reaper back to
>> older kernels.  The OOM reaper was originally introduced in:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855
>>
>> Basically the problem described in that patch exists in many older
>> kernels and I've certainly seen crashes related to this in 3.10, but I
>> believe older kernels see the same problems too.
>>
>> Personally I wouldn't know exactly which patches were important to
>> backport and how far to go.  One could arbitrarily try to backport up
>> to 4.6.7 (since 4.6 was the first kernel to really have the OOM
>> reaper) and ignore all the reaper fixes that landed since then.  This
>> would probably be doable for kernel 4.4, though if anyone was trying
>> to support older kernels it might get harder.
>
> Well, I would need someone to give me a list of commits, and actually
> test it to see if it is something that people use/want before I can
> queue anything up for a stable release...
>
> {hint}

Here's a list of the patches between 4.5 and 4.6.7 that touch the oom_killer:

af8e15cc85a2 oom, oom_reaper: do not enqueue task if it is on the
oom_reaper_list head
bb29902a7515 oom, oom_reaper: protect oom_reaper_list using simpler way
e26796066fdf oom: make oom_reaper freezable
29c696e1c6ec oom: make oom_reaper_list single linked
855b01832573 oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task
03049269de43 mm, oom_reaper: implement OOM victims queuing
bc448e897b6d mm, oom_reaper: report success/failure
36324a990cf5 oom: clear TIF_MEMDIE after oom_reaper managed to unmap
the address space
aac453635549 mm, oom: introduce oom reaper
6a618957ad17 mm: oom_kill: don't ignore oom score on exiting tasks
6afcf2895e6f mm,oom: make oom_killer_disable() killable

69b27baf00fa sched: add schedule_timeout_idle()


A few of those are code cleanups or are fixing OOM killer bugs
unrelated to the problem at hand (OOM killer getting stuck forever
because a task won't die), but it may still make sense to take them
just to get the oom killer into a consistent / known state (AKA
4.6.7).


The problem (and the reason I was so hesitant to provide a list of
patches) is that I personally am not a expert on mm.  Thus:

1. I don't know if there are any subtle dependencies on other "mm"
patches.  I can pick the patches above to a 4.4 kernel with only minor
conflicts (plus a fix not to look at MM_SHMEMPAGES, which didn't exist
in 4.4) and they seem to work OK, but with "mm" I always am worried
about minor changes in some other bit of mm code that might be needed
to make all the corner cases work right.

2. There are plenty of other fixes to the OOM reaper in 4.7+.  For
instance this patch:

e2fe14564d33 oom_reaper: close race with exiting task

References a "Fixes" for the original patch that introduced the OOM
reaper, but there are 11 other patches between 4.6 and 4.7 that also
sound like they fix some important bugs and I just don't know if they
are important things to bring back to linux stable or not...  Then
another 13 patches between 4.7 and 4.8.

Maybe +Michal Hocko would have some opinions of which OOM Reaper
patches would be good for picking into linux stable?


-Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
  2017-04-03 18:09           ` Doug Anderson
@ 2017-04-04  7:40             ` Michal Hocko
  2017-04-04 15:10               ` Doug Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2017-04-04  7:40 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg KH, arve, riandrews, Todd Kjos, devel,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org

On Mon 03-04-17 11:09:32, Doug Anderson wrote:
[...]
> Maybe +Michal Hocko would have some opinions of which OOM Reaper
> patches would be good for picking into linux stable?

I am lacking context here but please note that the OOM reaper patches
alone are not sufficient to make the OOM handling lockup free. There
were quite some changes in the core OOM killer handling to make this
possible. This has been work throughout the last year and it will be
really non-trivial to backport to stable trees.

So the primary question is what are you trying to achieve?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
  2017-04-04  7:40             ` Michal Hocko
@ 2017-04-04 15:10               ` Doug Anderson
  2017-04-04 15:29                 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Anderson @ 2017-04-04 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg KH, arve, riandrews, Todd Kjos, devel,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org

Hi,

On Tue, Apr 4, 2017 at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 03-04-17 11:09:32, Doug Anderson wrote:
> [...]
>> Maybe +Michal Hocko would have some opinions of which OOM Reaper
>> patches would be good for picking into linux stable?
>
> I am lacking context here but please note that the OOM reaper patches
> alone are not sufficient to make the OOM handling lockup free. There
> were quite some changes in the core OOM killer handling to make this
> possible. This has been work throughout the last year and it will be
> really non-trivial to backport to stable trees.

Yeah, that was the impression I got.


> So the primary question is what are you trying to achieve?

Ideally it would be nice to bring back patches to make the OOM
handling lockup free.  However, presuming that's not feasible then at
least it would be nice if we could get some minimal set of patches
that:

- Isn't too scary to backport.
- Handles the low hanging fruit.
- Is fairly self contained.

For Chrome OS we've got devices on a number of different kernels
ranging from 3.8 - 4.4.  Due to changes in userspace, it appears much
more likely that we wedge the OOM killer these days, so my main goal
is to avoid this in most cases.

Right now for Chrome OS I have patches that look like this:

* https://chromium-review.googlesource.com/465186
  UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks

* https://chromium-review.googlesource.com/465187
  CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if
we still ...

* https://chromium-review.googlesource.com/465188
  CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

* https://chromium-review.googlesource.com/465189
  CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

...and those seem to fit the bill for us, but:

1. It would be nice if other users of linuxstable could benefit.

2. I know the above patches are not as ideal as the work that has
happened upstream, so of course I'd prefer to get the upstream
solution.

3. I always appreciate being closer to the upstream solution which
means we get more people looking at the code and more people testing
the code.


-Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()
  2017-04-04 15:10               ` Doug Anderson
@ 2017-04-04 15:29                 ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2017-04-04 15:29 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg KH, arve, riandrews, Todd Kjos, devel,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org

On Tue 04-04-17 08:10:03, Doug Anderson wrote:
> Hi,
> 
> On Tue, Apr 4, 2017 at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > So the primary question is what are you trying to achieve?
> 
> Ideally it would be nice to bring back patches to make the OOM
> handling lockup free.  However, presuming that's not feasible then at
> least it would be nice if we could get some minimal set of patches
> that:
> 
> - Isn't too scary to backport.
> - Handles the low hanging fruit.
> - Is fairly self contained.

oom_reaper as introduced by aac4536355496 and 4 follow up patches should
be a good yet not too scary to start.

> 1. It would be nice if other users of linuxstable could benefit.

well most of the oom fixes are rather subtle and it took quite some time
to do the properly so I am not really sure we want to take risk. I would
rather handle those issues and backport specific fixes for the issue.
 
> 2. I know the above patches are not as ideal as the work that has
> happened upstream, so of course I'd prefer to get the upstream
> solution.
> 
> 3. I always appreciate being closer to the upstream solution which
> means we get more people looking at the code and more people testing
> the code.

I can hardly help you more than offer to use the upstream code. I fully
realize that this is hard and I am facing similar issues with enterprise
kernel @Suse but so far I have seen OOMs behaving mostly OK except for
extreme cases which usually do not happen enough to take the risk and
backport non-trivial changes to the stable trees.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-04 15:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170331175341.19889-1-dianders@chromium.org>
     [not found] ` <20170331192944.GB9744@kroah.com>
     [not found]   ` <CAD=FV=X39vVZ0v2_Z5rDrJt1JcdW+N8jRkhc0bz4yqqtox2=4Q@mail.gmail.com>
     [not found]     ` <20170401064856.GA14971@kroah.com>
2017-04-02  2:34       ` [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read() Doug Anderson
2017-04-03 13:25         ` Greg KH
2017-04-03 18:09           ` Doug Anderson
2017-04-04  7:40             ` Michal Hocko
2017-04-04 15:10               ` Doug Anderson
2017-04-04 15:29                 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).