All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [Question] Should direct reclaim time be bounded?
Date: Wed, 3 Jul 2019 10:43:25 +0100	[thread overview]
Message-ID: <20190703094325.GB2737@techsingularity.net> (raw)
In-Reply-To: <80036eed-993d-1d24-7ab6-e495f01b1caa@oracle.com>

On Mon, Jul 01, 2019 at 08:15:50PM -0700, Mike Kravetz wrote:
> On 7/1/19 1:59 AM, Mel Gorman wrote:
> > On Fri, Jun 28, 2019 at 11:20:42AM -0700, Mike Kravetz wrote:
> >> On 4/24/19 7:35 AM, Vlastimil Babka wrote:
> >>> On 4/23/19 6:39 PM, Mike Kravetz wrote:
> >>>>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It
> >>>>> looks like there is something wrong in the reclaim going on.
> >>>>
> >>>> Ok, I will start digging into that.  Just wanted to make sure before I got
> >>>> into it too deep.
> >>>>
> >>>> BTW - This is very easy to reproduce.  Just try to allocate more huge pages
> >>>> than will fit into memory.  I see this 'reclaim taking forever' behavior on
> >>>> v5.1-rc5-mmotm-2019-04-19-14-53.  Looks like it was there in v5.0 as well.
> >>>
> >>> I'd suspect this in should_continue_reclaim():
> >>>
> >>>         /* Consider stopping depending on scan and reclaim activity */
> >>>         if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) {
> >>>                 /*
> >>>                  * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the
> >>>                  * full LRU list has been scanned and we are still failing
> >>>                  * to reclaim pages. This full LRU scan is potentially
> >>>                  * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed
> >>>                  */
> >>>                 if (!nr_reclaimed && !nr_scanned)
> >>>                         return false;
> >>>
> >>> And that for some reason, nr_scanned never becomes zero. But it's hard
> >>> to figure out through all the layers of functions :/
> >>
> >> I got back to looking into the direct reclaim/compaction stalls when
> >> trying to allocate huge pages.  As previously mentioned, the code is
> >> looping for a long time in shrink_node().  The routine
> >> should_continue_reclaim() returns true perhaps more often than it should.
> >>
> >> As Vlastmil guessed, my debug code output below shows nr_scanned is remaining
> >> non-zero for quite a while.  This was on v5.2-rc6.
> >>
> > 
> > I think it would be reasonable to have should_continue_reclaim allow an
> > exit if scanning at higher priority than DEF_PRIORITY - 2, nr_scanned is
> > less than SWAP_CLUSTER_MAX and no pages are being reclaimed.
> 
> Thanks Mel,
> 
> I added such a check to should_continue_reclaim.  However, it does not
> address the issue I am seeing.  In that do-while loop in shrink_node,
> the scan priority is not raised (priority--).  We can enter the loop
> with priority == DEF_PRIORITY and continue to loop for minutes as seen
> in my previous debug output.
> 

Indeed. I'm getting knocked offline shortly so I didn't give this the
time it deserves but it appears that part of this problem is
hugetlb-specific when one node is full and can enter into this continual
loop due to __GFP_RETRY_MAYFAIL requiring both nr_reclaimed and
nr_scanned to be zero.

Have you considered one of the following as an option?

1. Always use the on-stack nodes_allowed in __nr_hugepages_store_common
   and copy nodes_states if necessary. Add a bool parameter to
   alloc_pool_huge_page that is true when called from set_max_huge_pages.
   If an allocation from alloc_fresh_huge_page, clear the failing node
   from the mask so it's not retried, bail if the mask is empty. The
   consequences are that round-robin allocation of huge pages will be
   different if a node failed to allocate for transient reasons.

2. Alter the condition in should_continue_reclaim for
   __GFP_RETRY_MAYFAIL to consider if nr_scanned < SWAP_CLUSTER_MAX.
   Either raise priority (will interfere with kswapd though) or
   bail entirely.  Consequences may be that other __GFP_RETRY_MAYFAIL
   allocations do not want this behaviour. There are a lot of users.

3. Move where __GFP_RETRY_MAYFAIL is set in a gfp_mask in mm/hugetlb.c.
   Strip the flag if an allocation fails on a node. Consequences are
   that setting the required number of huge pages is more likely to
   return without all the huge pages set.

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2019-07-03  9:43 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23  4:07 [Question] Should direct reclaim time be bounded? Mike Kravetz
2019-04-23  7:19 ` Michal Hocko
2019-04-23 16:39   ` Mike Kravetz
2019-04-24 14:35     ` Vlastimil Babka
2019-06-28 18:20       ` Mike Kravetz
2019-07-01  8:59         ` Mel Gorman
2019-07-02  3:15           ` Mike Kravetz
2019-07-08  5:19             ` Hillf Danton
2019-07-03  9:43             ` Mel Gorman [this message]
2019-07-03 23:54               ` Mike Kravetz
2019-07-04 11:09                 ` Michal Hocko
2019-07-04 15:11                   ` Mike Kravetz
2019-07-10 18:42             ` Mike Kravetz
2019-07-10 19:44               ` Michal Hocko
2019-07-10 23:36                 ` Mike Kravetz
2019-07-11  7:12                   ` Michal Hocko
2019-07-12  9:49                     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2019-07-11 15:44 Hillf Danton
2019-07-12  5:47 Hillf Danton
2019-07-13  1:11 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190703094325.GB2737@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.