All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: yuwang668899@gmail.com, mhocko@suse.com, linux-mm@kvack.org,
	chenggang.qcg@alibaba-inc.com, yuwang.yuwang@alibaba-inc.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
Date: Fri, 15 Sep 2017 11:44:49 -0700	[thread overview]
Message-ID: <20170915184449.GA9859@cmpxchg.org> (raw)
In-Reply-To: <201709160023.CAE05229.MQHFSJFOOFOVtL@I-love.SAKURA.ne.jp>

On Sat, Sep 16, 2017 at 12:23:53AM +0900, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > How can we figure out if there is a bug here? Can we time the calls to
> > __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> > drill down from there? Print out the number of times we have retried?
> > We're counting no_progress_loops, but we are also very much interested
> > in progress_loops that didn't result in a successful allocation. Too
> > many of those and I think we want to OOM kill as per above.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bec5e96f3b88..01736596389a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  			"page allocation stalls for %ums, order:%u",
> >  			jiffies_to_msecs(jiffies-alloc_start), order);
> >  		stall_timeout += 10 * HZ;
> > +		goto oom;
> >  	}
> >  
> >  	/* Avoid recursion of direct reclaim */
> > @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	if (read_mems_allowed_retry(cpuset_mems_cookie))
> >  		goto retry_cpuset;
> >  
> > +oom:
> >  	/* Reclaim has failed us, start killing things */
> >  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
> >  	if (page)
> > 
> 
> According to my stress tests, it is mutex_trylock() in __alloc_pages_may_oom()
> that causes warn_alloc() to be called for so many times. The comment
> 
> 	/*
> 	 * Acquire the oom lock.  If that fails, somebody else is
> 	 * making progress for us.
> 	 */
> 
> is true only if the owner of oom_lock can call out_of_memory() and is __GFP_FS
> allocation. Consider a situation where there are 1 GFP_KERNEL allocating thread
> and 99 GFP_NOFS/GFP_NOIO allocating threads contending the oom_lock. How likely
> the OOM killer is invoked? It is very unlikely because GFP_KERNEL allocating thread
> likely fails to grab oom_lock because GFP_NOFS/GFP_NOIO allocating threads is
> grabing oom_lock. And GFP_KERNEL allocating thread yields CPU time for
> GFP_NOFS/GFP_NOIO allocating threads to waste pointlessly.
> s/!mutex_trylock(&oom_lock)/mutex_lock_killable()/ significantly improves
> this situation for my stress tests. How is your case?

Interesting analysis, that definitely sounds plausible.

It just started happening to us in production and I haven't isolated
it yet. If you already have a reproducer, that's excellent.

The synchronization has worked this way for a long time (trylock
failure assuming progress, but the order/NOFS/zone bailouts from
actually OOM-killing inside the locked section). We should really fix
*that* rather than serializing warn_alloc().

For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
embed OOM killing naturally into allocation slowpath"). Before that we
didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
still wonder why this only now appears to be causing problems.

In any case, converting that trylock to a sleeping lock in this case
makes sense to me. Nobody is blocking under this lock (except that one
schedule_timeout_killable(1) after dispatching a victim) and it's not
obvious to me why we'd need that level of concurrency under OOM.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-09-15 18:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15  9:58 [PATCH] mm,page_alloc: softlockup on warn_alloc on wang Yu
2017-09-15 10:39 ` Michal Hocko
2017-09-15 11:38   ` Tetsuo Handa
2017-09-15 12:00     ` Michal Hocko
2017-09-15 12:09       ` Tetsuo Handa
2017-09-15 12:14         ` Michal Hocko
2017-09-15 14:12           ` Tetsuo Handa
2017-09-15 14:23             ` Michal Hocko
2017-09-24  1:56             ` Tetsuo Handa
2017-09-15 14:37 ` Johannes Weiner
2017-09-15 15:23   ` Tetsuo Handa
2017-09-15 18:44     ` Johannes Weiner [this message]
2017-09-16  0:25       ` Tetsuo Handa
2017-09-18  6:05         ` Michal Hocko
2017-09-18  6:31           ` Tetsuo Handa
2017-09-18  6:43             ` Michal Hocko
2017-09-16  4:12   ` Tetsuo Handa
2017-10-11 11:14     ` Tetsuo Handa
2017-10-18 10:54       ` Tetsuo Handa
2017-09-18  6:03   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170915184449.GA9859@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=chenggang.qcg@alibaba-inc.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=yuwang.yuwang@alibaba-inc.com \
    --cc=yuwang668899@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.