Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>, Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs
Date: Thu, 22 Nov 2018 15:04:46 +0000	[thread overview]
Message-ID: <20181122150446.GK23260@techsingularity.net> (raw)
In-Reply-To: <cc8ec820-1526-d753-4619-dedaa227a179@suse.cz>

On Thu, Nov 22, 2018 at 02:53:08PM +0100, Vlastimil Babka wrote:
> > 1-socket Skylake machine
> > config-global-dhp__workload_thpfioscale XFS (no special madvise)
> > 4 fio threads, 1 THP allocating thread
> > --------------------------------------
> > 
> > 4.20-rc1 extfrag events < order 9:  1023463
> > 4.20-rc1+patch:                      358574 (65% reduction)
> > 4.20-rc1+patch1-3:                    19274 (98% reduction)
> 
> So the reason I was wondering about movable vs unmovable fallbacks here
> is that movable fallbacks are ok as they can be migrated later, but the
> unmovable/reclaimable not, which is bad if they fallback to movable
> pageblock. Movable fallbacks can however fill the unmovable pageblocks
> and increase change of the unmovable fallback, but that would depend on
> the workload. So hypothetically if the test workload was such that
> movable fallbacks did not cause unmovable fallbacks, and a patch would
> thus only decrease the movable fallbacks (at the cost of e.g. higher
> reclaim, as this patch) with unmovable fallbacks unchanged, then it
> would be useful to know that for better evaluation of the pros vs cons,
> imho.
> 

I can give the breakdown in the next changelog as it'll be similar for
each instance of the workload.

Movable fallbacks are ok in that they can fallback but not ok in that
they can fill an unmovable/reclaimable pageblock causing them to
fallback later. I think you understand this already. If there is a
movable pageblock, it is pretty much guaranteed to affect an
unmovable/reclaimable pageblock and while it might not be enough to
actually cause a unmovable/reclaimable fallback in the future, we cannot
know that in advance so the patch takes the only option available to it.

In terms of reclaim, what I've observed for a few workloads is that
reclaim is different but not necessarily worse. With the patch, reclaim
is roughly similar overall but at a smoother rate. The vanilla kernel
tends to spike with large amounts of reclaim at semi-regular intervals
where as this path has a small amount of reclaim done steadily over
time.

Now I did encounter a bug whereby fio reduced throughput because the
boosted reclaim also reclaimed slab which caused secondary issues that
were unrelated to the fragmentation pattern. This will be fixed in the
next version.

While it does leave open the possibilty that a slab-intensive workload
occupying lots of memory will still cause fragmentation, that is a
different class of problem and would be approached differently. That
particular problem is not covered by this approach but it does not exclude
it either as the full solution would have to encompass both.

> > +static inline void boost_watermark(struct zone *zone)
> > +{
> > +	unsigned long max_boost;
> > +
> > +	if (!watermark_boost_factor)
> > +		return;
> > +
> > +	max_boost = mult_frac(wmark_pages(zone, WMARK_HIGH),
> > +			watermark_boost_factor, 10000);
> 
> Hmm I assume you didn't use high_wmark_pages() because the calculation
> should start with high watermark not including existing boost. But then,
> wmark_pages() also includes existing boost, so the limit won't work and
> each invocation of boost_watermark() will simply add pageblock_nr_pages?
> I.e. this should use zone->_watermark[] instead of wmark_pages()?
> 

You're right. The consequences are that it boosts by higher than
expected. I'll fix it.

> > +	max_boost = max(pageblock_nr_pages, max_boost);
> > +
> > +	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> > +		max_boost);
> > +}
> > +
> >  /*
> >   * This function implements actual steal behaviour. If order is large enough,
> >   * we can steal whole pageblock. If not, we first move freepages in this
> > @@ -2160,6 +2176,14 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
> >  		goto single_page;
> >  	}
> >  
> > +	/*
> > +	 * Boost watermarks to increase reclaim pressure to reduce the
> > +	 * likelihood of future fallbacks. Wake kswapd now as the node
> > +	 * may be balanced overall and kswapd will not wake naturally.
> > +	 */
> > +	boost_watermark(zone);
> > +	wakeup_kswapd(zone, 0, 0, zone_idx(zone));
> > +
> >  	/* We are not allowed to try stealing from the whole block */
> >  	if (!whole_block)
> >  		goto single_page;
> > @@ -3277,11 +3301,19 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
> >   * probably too small. It only makes sense to spread allocations to avoid
> >   * fragmentation between the Normal and DMA32 zones.
> >   */
> > -static inline unsigned int alloc_flags_nofragment(struct zone *zone)
> > +static inline unsigned int
> > +alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> >  {
> >  	if (zone_idx(zone) != ZONE_NORMAL)
> >  		return 0;
> >  
> > +	/*
> > +	 * A fragmenting fallback will try waking kswapd. ALLOC_NOFRAGMENT
> > +	 * may break that so such callers can introduce fragmentation.
> > +	 */
> 
> I think I don't understand this comment :( Do you want to avoid waking
> up kswapd from steal_suitable_fallback() (introduced above) for
> allocations without __GFP_KSWAPD_RECLAIM? But returning 0 here means
> actually allowing the allocation go through steal_suitable_fallback()?
> So should it return ALLOC_NOFRAGMENT below, or was the intent different?
> 

I want to avoid waking kswapd in steal_suitable_fallback if waking
kswapd is not allowed. If the calling context does not allow it, it does
mean that fragmentation will be allowed to occur. I'm banking on it
being a relatively rare case but potentially it'll be problematic. The
main source of allocation requests that I expect to hit this are THP and
as they are already at pageblock_order, it has limited impact from a
fragmentation perspective -- particularly as pageblock_order stealing is
allowed even with ALLOC_NOFRAGMENT.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2018-11-22 15:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-21 10:14 [PATCH 0/4] Fragmentation avoidance improvements v4 Mel Gorman
2018-11-21 10:14 ` [PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation Mel Gorman
2018-11-21 14:18   ` Vlastimil Babka
2018-11-21 14:31     ` Mel Gorman
2018-11-21 10:14 ` [PATCH 2/4] mm: Move zone watermark accesses behind an accessor Mel Gorman
2018-11-21 22:07   ` Vlastimil Babka
2018-11-21 10:14 ` [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs Mel Gorman
2018-11-22 13:53   ` Vlastimil Babka
2018-11-22 15:04     ` Mel Gorman [this message]
2018-11-22 15:35       ` Vlastimil Babka
2018-11-22 16:22         ` Mel Gorman
2018-11-21 10:14 ` [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event Mel Gorman
2018-11-22 17:02   ` Vlastimil Babka
2018-11-22 19:10     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2018-11-08  9:12 [PATCH 0/4] Fragmentation avoidance improvements v3 Mel Gorman
2018-11-08  9:12 ` [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181122150446.GK23260@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).