Re: [PATCH v2 -mm] limit direct reclaim for higher order allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Weiner <jweiner@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>,
	akpm@linux-foundation.org, Johannes Weiner <hannes@cmpxchg.org>,
	aarcange@redhat.com
Subject: Re: [PATCH v2 -mm] limit direct reclaim for higher order allocations
Date: Fri, 7 Oct 2011 11:07:41 +0200	[thread overview]
Message-ID: <20111007090741.GB2608@redhat.com> (raw)
In-Reply-To: <20110927160648.GA16878@redhat.com>

On Tue, Sep 27, 2011 at 06:06:48PM +0200, Johannes Weiner wrote:
> On Tue, Sep 27, 2011 at 10:52:46AM -0400, Rik van Riel wrote:
> > When suffering from memory fragmentation due to unfreeable pages,
> > THP page faults will repeatedly try to compact memory.  Due to
> > the unfreeable pages, compaction fails.
> > 
> > Needless to say, at that point page reclaim also fails to create
> > free contiguous 2MB areas.  However, that doesn't stop the current
> > code from trying, over and over again, and freeing a minimum of
> > 4MB (2UL << sc->order pages) at every single invocation.
> > 
> > This resulted in my 12GB system having 2-3GB free memory, a
> > corresponding amount of used swap and very sluggish response times.
> > 
> > This can be avoided by having the direct reclaim code not reclaim
> > from zones that already have plenty of free memory available for
> > compaction.
> > 
> > If compaction still fails due to unmovable memory, doing additional
> > reclaim will only hurt the system, not help.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > 
> > ---
> > -v2: shrink_zones now uses the same thresholds as used by compaction itself,
> >      not only is this conceptually nicer, it also results in kswapd doing
> >      some actual work; before all the page freeing work was done by THP
> >      allocators, I seem to see fewer application stalls after this change.
> > 
> >  mm/vmscan.c |   10 ++++++++++
> >  1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b7719ec..117eb4d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2083,6 +2083,16 @@ static void shrink_zones(int priority, struct zonelist *zonelist,
> >  				continue;
> >  			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
> >  				continue;	/* Let kswapd poll it */
> > +			if (COMPACTION_BUILD) {
> > +				/*
> > +				 * If we already have plenty of memory free
> > +				 * for compaction, don't free any more.
> > +				 */
> > +				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > +					(compaction_suitable(zone, sc->order) ||
> > +					 compaction_deferred(zone)))
> > +					continue;
> > +			}
> 
> I don't think the comment is complete in combination with the check
> for order > PAGE_ALLOC_COSTLY_ORDER, as compaction is invoked for all
> non-zero orders.
> 
> But the traditional behaviour does less harm if the orders are small
> and your problem was triggered by THP allocations, so I agree with the
> code itself.

FWIW, an incremental patch to explain the order check.  What do you
think?

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/vmscan.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3817fa9..930085a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2068,8 +2068,14 @@ static void shrink_zones(int priority, struct zonelist *zonelist,
 				continue;	/* Let kswapd poll it */
 			if (COMPACTION_BUILD) {
 				/*
-				 * If we already have plenty of memory free
-				 * for compaction, don't free any more.
+				 * If we already have plenty of memory
+				 * free for compaction, don't free any
+				 * more.  Even though compaction is
+				 * invoked for any non-zero order,
+				 * only frequent costly order
+				 * reclamation is disruptive enough to
+				 * become a noticable problem, like
+				 * transparent huge page allocations.
 				 */
 				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
 					(compaction_suitable(zone, sc->order) ||
-- 
1.7.6.4

WARNING: multiple messages have this Message-ID (diff)

From: Johannes Weiner <jweiner@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>,
	akpm@linux-foundation.org, Johannes Weiner <hannes@cmpxchg.org>,
	aarcange@redhat.com
Subject: Re: [PATCH v2 -mm] limit direct reclaim for higher order allocations
Date: Fri, 7 Oct 2011 11:07:41 +0200	[thread overview]
Message-ID: <20111007090741.GB2608@redhat.com> (raw)
In-Reply-To: <20110927160648.GA16878@redhat.com>

On Tue, Sep 27, 2011 at 06:06:48PM +0200, Johannes Weiner wrote:
> On Tue, Sep 27, 2011 at 10:52:46AM -0400, Rik van Riel wrote:
> > When suffering from memory fragmentation due to unfreeable pages,
> > THP page faults will repeatedly try to compact memory.  Due to
> > the unfreeable pages, compaction fails.
> > 
> > Needless to say, at that point page reclaim also fails to create
> > free contiguous 2MB areas.  However, that doesn't stop the current
> > code from trying, over and over again, and freeing a minimum of
> > 4MB (2UL << sc->order pages) at every single invocation.
> > 
> > This resulted in my 12GB system having 2-3GB free memory, a
> > corresponding amount of used swap and very sluggish response times.
> > 
> > This can be avoided by having the direct reclaim code not reclaim
> > from zones that already have plenty of free memory available for
> > compaction.
> > 
> > If compaction still fails due to unmovable memory, doing additional
> > reclaim will only hurt the system, not help.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > 
> > ---
> > -v2: shrink_zones now uses the same thresholds as used by compaction itself,
> >      not only is this conceptually nicer, it also results in kswapd doing
> >      some actual work; before all the page freeing work was done by THP
> >      allocators, I seem to see fewer application stalls after this change.
> > 
> >  mm/vmscan.c |   10 ++++++++++
> >  1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b7719ec..117eb4d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2083,6 +2083,16 @@ static void shrink_zones(int priority, struct zonelist *zonelist,
> >  				continue;
> >  			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
> >  				continue;	/* Let kswapd poll it */
> > +			if (COMPACTION_BUILD) {
> > +				/*
> > +				 * If we already have plenty of memory free
> > +				 * for compaction, don't free any more.
> > +				 */
> > +				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > +					(compaction_suitable(zone, sc->order) ||
> > +					 compaction_deferred(zone)))
> > +					continue;
> > +			}
> 
> I don't think the comment is complete in combination with the check
> for order > PAGE_ALLOC_COSTLY_ORDER, as compaction is invoked for all
> non-zero orders.
> 
> But the traditional behaviour does less harm if the orders are small
> and your problem was triggered by THP allocations, so I agree with the
> code itself.

FWIW, an incremental patch to explain the order check.  What do you
think?

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/vmscan.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3817fa9..930085a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2068,8 +2068,14 @@ static void shrink_zones(int priority, struct zonelist *zonelist,
 				continue;	/* Let kswapd poll it */
 			if (COMPACTION_BUILD) {
 				/*
-				 * If we already have plenty of memory free
-				 * for compaction, don't free any more.
+				 * If we already have plenty of memory
+				 * free for compaction, don't free any
+				 * more.  Even though compaction is
+				 * invoked for any non-zero order,
+				 * only frequent costly order
+				 * reclamation is disruptive enough to
+				 * become a noticable problem, like
+				 * transparent huge page allocations.
 				 */
 				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
 					(compaction_suitable(zone, sc->order) ||
-- 
1.7.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-10-07  9:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-27 14:52 [PATCH v2 -mm] limit direct reclaim for higher order allocations Rik van Riel
2011-09-27 14:52 ` Rik van Riel
2011-09-27 16:06 ` Johannes Weiner
2011-09-27 16:06   ` Johannes Weiner
2011-10-07  9:07   ` Johannes Weiner [this message]
2011-10-07  9:07     ` Johannes Weiner
2011-09-28 10:02 ` Mel Gorman
2011-09-28 10:02   ` Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3817fa9 dfblob:930085a dfblob:3817fa9 dfblob:930085a )
 OR (
bs:"Re: [PATCH v2 -mm] limit direct reclaim for higher order allocations" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111007090741.GB2608@redhat.com \
    --to=jweiner@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.