Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: mhocko@kernel.org, hannes@cmpxchg.org, riel@redhat.com,
	akpm@linux-foundation.org, mgorman@suse.de, vbabka@suse.cz,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	rientjes@google.com
Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever
Date: Mon, 24 Apr 2017 14:39:40 +0200	[thread overview]
Message-ID: <20170424123936.GA6152@redhat.com> (raw)
In-Reply-To: <201704231924.GDF05718.LQSMtJOOFOFHFV@I-love.SAKURA.ne.jp>

On Sun, Apr 23, 2017 at 07:24:21PM +0900, Tetsuo Handa wrote:
> On 2017/03/10 20:44, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> >> On Thu 09-03-17 13:05:40, Johannes Weiner wrote:
> >>> On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote:
> >>>> It only does this to some extent.  If reclaim made
> >>>> no progress, for example due to immediately bailing
> >>>> out because the number of already isolated pages is
> >>>> too high (due to many parallel reclaimers), the code
> >>>> could hit the "no_progress_loops > MAX_RECLAIM_RETRIES"
> >>>> test without ever looking at the number of reclaimable
> >>>> pages.
> >>>
> >>> Hm, there is no early return there, actually. We bump the loop counter
> >>> every time it happens, but then *do* look at the reclaimable pages.
> >>>
> >>>> Could that create problems if we have many concurrent
> >>>> reclaimers?
> >>>
> >>> With increased concurrency, the likelihood of OOM will go up if we
> >>> remove the unlimited wait for isolated pages, that much is true.
> >>>
> >>> I'm not sure that's a bad thing, however, because we want the OOM
> >>> killer to be predictable and timely. So a reasonable wait time in
> >>> between 0 and forever before an allocating thread gives up under
> >>> extreme concurrency makes sense to me.
> >>>
> >>>> It may be OK, I just do not understand all the implications.
> >>>>
> >>>> I like the general direction your patch takes the code in,
> >>>> but I would like to understand it better...
> >>>
> >>> I feel the same way. The throttling logic doesn't seem to be very well
> >>> thought out at the moment, making it hard to reason about what happens
> >>> in certain scenarios.
> >>>
> >>> In that sense, this patch isn't really an overall improvement to the
> >>> way things work. It patches a hole that seems to be exploitable only
> >>> from an artificial OOM torture test, at the risk of regressing high
> >>> concurrency workloads that may or may not be artificial.
> >>>
> >>> Unless I'm mistaken, there doesn't seem to be a whole lot of urgency
> >>> behind this patch. Can we think about a general model to deal with
> >>> allocation concurrency? 
> >>
> >> I am definitely not against. There is no reason to rush the patch in.
> > 
> > I don't hurry if we can check using watchdog whether this problem is occurring
> > in the real world. I have to test corner cases because watchdog is missing.
> > 
> Ping?
> 
> This problem can occur even immediately after the first invocation of
> the OOM killer. I believe this problem can occur in the real world.
> When are we going to apply this patch or watchdog patch?
> 
> ----------------------------------------
> [    0.000000] Linux version 4.11.0-rc7-next-20170421+ (root@ccsecurity) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #588 SMP Sun Apr 23 17:38:02 JST 2017
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.11.0-rc7-next-20170421+ root=UUID=17c3c28f-a70a-4666-95fa-ecf6acd901e4 ro vconsole.keymap=jp106 crashkernel=256M vconsole.font=latarcyrheb-sun16 security=none sysrq_always_enabled console=ttyS0,115200n8 console=tty0 LANG=en_US.UTF-8 debug_guardpage_minorder=1

Are you debugging memory corruption problem?

FWIW, if you use debug_guardpage_minorder= you can expect any
allocation memory problems. This option is intended to debug
memory corruption bugs and it shrinks available memory in 
artificial way. Taking that, I don't think justifying any
patch, by problem happened when debug_guardpage_minorder= is 
used, is reasonable.
 
Stanislaw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: mhocko@kernel.org, hannes@cmpxchg.org, riel@redhat.com,
	akpm@linux-foundation.org, mgorman@suse.de, vbabka@suse.cz,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	rientjes@google.com
Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever
Date: Mon, 24 Apr 2017 14:39:40 +0200	[thread overview]
Message-ID: <20170424123936.GA6152@redhat.com> (raw)
In-Reply-To: <201704231924.GDF05718.LQSMtJOOFOFHFV@I-love.SAKURA.ne.jp>

On Sun, Apr 23, 2017 at 07:24:21PM +0900, Tetsuo Handa wrote:
> On 2017/03/10 20:44, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> >> On Thu 09-03-17 13:05:40, Johannes Weiner wrote:
> >>> On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote:
> >>>> It only does this to some extent.  If reclaim made
> >>>> no progress, for example due to immediately bailing
> >>>> out because the number of already isolated pages is
> >>>> too high (due to many parallel reclaimers), the code
> >>>> could hit the "no_progress_loops > MAX_RECLAIM_RETRIES"
> >>>> test without ever looking at the number of reclaimable
> >>>> pages.
> >>>
> >>> Hm, there is no early return there, actually. We bump the loop counter
> >>> every time it happens, but then *do* look at the reclaimable pages.
> >>>
> >>>> Could that create problems if we have many concurrent
> >>>> reclaimers?
> >>>
> >>> With increased concurrency, the likelihood of OOM will go up if we
> >>> remove the unlimited wait for isolated pages, that much is true.
> >>>
> >>> I'm not sure that's a bad thing, however, because we want the OOM
> >>> killer to be predictable and timely. So a reasonable wait time in
> >>> between 0 and forever before an allocating thread gives up under
> >>> extreme concurrency makes sense to me.
> >>>
> >>>> It may be OK, I just do not understand all the implications.
> >>>>
> >>>> I like the general direction your patch takes the code in,
> >>>> but I would like to understand it better...
> >>>
> >>> I feel the same way. The throttling logic doesn't seem to be very well
> >>> thought out at the moment, making it hard to reason about what happens
> >>> in certain scenarios.
> >>>
> >>> In that sense, this patch isn't really an overall improvement to the
> >>> way things work. It patches a hole that seems to be exploitable only
> >>> from an artificial OOM torture test, at the risk of regressing high
> >>> concurrency workloads that may or may not be artificial.
> >>>
> >>> Unless I'm mistaken, there doesn't seem to be a whole lot of urgency
> >>> behind this patch. Can we think about a general model to deal with
> >>> allocation concurrency? 
> >>
> >> I am definitely not against. There is no reason to rush the patch in.
> > 
> > I don't hurry if we can check using watchdog whether this problem is occurring
> > in the real world. I have to test corner cases because watchdog is missing.
> > 
> Ping?
> 
> This problem can occur even immediately after the first invocation of
> the OOM killer. I believe this problem can occur in the real world.
> When are we going to apply this patch or watchdog patch?
> 
> ----------------------------------------
> [    0.000000] Linux version 4.11.0-rc7-next-20170421+ (root@ccsecurity) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #588 SMP Sun Apr 23 17:38:02 JST 2017
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.11.0-rc7-next-20170421+ root=UUID=17c3c28f-a70a-4666-95fa-ecf6acd901e4 ro vconsole.keymap=jp106 crashkernel=256M vconsole.font=latarcyrheb-sun16 security=none sysrq_always_enabled console=ttyS0,115200n8 console=tty0 LANG=en_US.UTF-8 debug_guardpage_minorder=1

Are you debugging memory corruption problem?

FWIW, if you use debug_guardpage_minorder= you can expect any
allocation memory problems. This option is intended to debug
memory corruption bugs and it shrinks available memory in 
artificial way. Taking that, I don't think justifying any
patch, by problem happened when debug_guardpage_minorder= is 
used, is reasonable.
 
Stanislaw

next prev parent reply	other threads:[~2017-04-24 12:39 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-07 13:30 [PATCH] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-03-07 13:30 ` Michal Hocko
2017-03-07 19:52 ` Rik van Riel
2017-03-08  9:21   ` Michal Hocko
2017-03-08  9:21     ` Michal Hocko
2017-03-08 15:54     ` Rik van Riel
2017-03-08 15:54       ` Rik van Riel
2017-03-09  9:12       ` Michal Hocko
2017-03-09  9:12         ` Michal Hocko
2017-03-09 14:16         ` Rik van Riel
2017-03-09 14:59           ` Michal Hocko
2017-03-09 14:59             ` Michal Hocko
2017-03-09 18:05   ` Johannes Weiner
2017-03-09 18:05     ` Johannes Weiner
2017-03-09 22:18     ` Rik van Riel
2017-03-10 10:27       ` Michal Hocko
2017-03-10 10:27         ` Michal Hocko
2017-03-10 10:20     ` Michal Hocko
2017-03-10 10:20       ` Michal Hocko
2017-03-10 11:44       ` Tetsuo Handa
2017-03-10 11:44         ` Tetsuo Handa
2017-03-21 10:37         ` Tetsuo Handa
2017-03-21 10:37           ` Tetsuo Handa
2017-04-23 10:24         ` Tetsuo Handa
2017-04-23 10:24           ` Tetsuo Handa
2017-04-24 12:39           ` Stanislaw Gruszka [this message]
2017-04-24 12:39             ` Stanislaw Gruszka
2017-04-24 13:06             ` Tetsuo Handa
2017-04-24 13:06               ` Tetsuo Handa
2017-04-25  6:33               ` Stanislaw Gruszka
2017-04-25  6:33                 ` Stanislaw Gruszka
2017-06-30  0:14         ` Tetsuo Handa
2017-06-30  0:14           ` Tetsuo Handa
2017-06-30 13:32           ` Michal Hocko
2017-06-30 13:32             ` Michal Hocko
2017-06-30 15:59             ` Tetsuo Handa
2017-06-30 15:59               ` Tetsuo Handa
2017-06-30 16:19               ` Michal Hocko
2017-06-30 16:19                 ` Michal Hocko
2017-07-01 11:43                 ` Tetsuo Handa
2017-07-01 11:43                   ` Tetsuo Handa
2017-07-05  8:19                   ` Michal Hocko
2017-07-05  8:19                     ` Michal Hocko
2017-07-05  8:20                   ` Michal Hocko
2017-07-05  8:20                     ` Michal Hocko
2017-07-06 10:48                     ` Tetsuo Handa
2017-07-06 10:48                       ` Tetsuo Handa
2017-03-09 14:31 ` Mel Gorman
2017-03-09 14:31   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2017-07-10  7:48 Michal Hocko
2017-07-10  7:48 ` Michal Hocko
2017-07-10 13:16 ` Vlastimil Babka
2017-07-10 13:16   ` Vlastimil Babka
2017-07-10 13:58 ` Rik van Riel
2017-07-10 13:58   ` Rik van Riel
2017-07-10 16:58   ` Johannes Weiner
2017-07-10 16:58     ` Johannes Weiner
2017-07-10 17:09     ` Michal Hocko
2017-07-10 17:09       ` Michal Hocko
2017-07-19 22:20 ` Andrew Morton
2017-07-19 22:20   ` Andrew Morton
2017-07-20  6:56   ` Michal Hocko
2017-07-20  6:56     ` Michal Hocko
2017-07-21 23:01     ` Andrew Morton
2017-07-21 23:01       ` Andrew Morton
2017-07-24  6:50       ` Michal Hocko
2017-07-24  6:50         ` Michal Hocko
2017-07-20  1:54 ` Hugh Dickins
2017-07-20  1:54   ` Hugh Dickins
2017-07-20 10:44   ` Tetsuo Handa
2017-07-20 10:44     ` Tetsuo Handa
2017-07-24  7:01     ` Hugh Dickins
2017-07-24  7:01       ` Hugh Dickins
2017-07-24 11:12       ` Tetsuo Handa
2017-07-24 11:12         ` Tetsuo Handa
2017-07-20 13:22   ` Michal Hocko
2017-07-20 13:22     ` Michal Hocko
2017-07-24  7:03     ` Hugh Dickins
2017-07-24  7:03       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170424123936.GA6152@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.