All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v2] mm, compaction: avoid isolating pinned pages
Date: Thu, 6 Feb 2014 13:53:37 +0000	[thread overview]
Message-ID: <20140206135337.GV6732@suse.de> (raw)
In-Reply-To: <alpine.LSU.2.11.1402051647360.28926@eggly.anvils>

On Wed, Feb 05, 2014 at 05:16:06PM -0800, Hugh Dickins wrote:
> On Thu, 6 Feb 2014, Joonsoo Kim wrote:
> > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> > > On Tue, 4 Feb 2014, David Rientjes wrote:
> > > 
> > > > Page migration will fail for memory that is pinned in memory with, for
> > > > example, get_user_pages().  In this case, it is unnecessary to take
> > > > zone->lru_lock or isolating the page and passing it to page migration
> > > > which will ultimately fail.
> > > > 
> > > > This is a racy check, the page can still change from under us, but in
> > > > that case we'll just fail later when attempting to move the page.
> > > > 
> > > > This avoids very expensive memory compaction when faulting transparent
> > > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > > 
> > > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > > see the enormous disparity in the number of page migration failures
> > > > because of the pinning (from /proc/vmstat):
> > > > 
> > > > 	compact_pages_moved 8450
> > > > 	compact_pagemigrate_failed 15614415
> > > > 
> > > > 0.05% of pages isolated are successfully migrated and explicitly 
> > > > triggering memory compaction takes 102 seconds.  After the patch:
> > > > 
> > > > 	compact_pages_moved 9197
> > > > 	compact_pagemigrate_failed 7
> > > > 
> > > > 99.9% of pages isolated are now successfully migrated in this 
> > > > configuration and memory compaction takes less than one second.
> > > > 
> > > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > > ---
> > > >  v2: address page count issue per Joonsoo
> > > > 
> > > >  mm/compaction.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > > >  			continue;
> > > >  		}
> > > >  
> > > > +		/*
> > > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > > > +		 * admittedly racy check.
> > > > +		 */
> > > > +		if (!page_mapping(page) &&
> > > > +		    page_count(page) > page_mapcount(page))
> > > > +			continue;
> > > > +
> > > >  		/* Check if it is ok to still hold the lock */
> > > >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> > > 
> > > Much better, maybe good enough as an internal patch to fix a particular
> > > problem you're seeing; but not yet good enough to go upstream.
> > > 
> > > Anonymous pages are not the only pages which might be pinned,
> > > and your test doesn't mention PageAnon, so does not match your comment.
> > > 
> > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> > > more assurance that a page_count - page_has_private test is appropriate,
> > > whatever the filesystem and migrate method to be used.
> > > 
> > > So I think the test you're looking for is
> > > 
> > > 		pincount = page_count(page) - page_mapcount(page);
> > > 		if (page_mapping(page))
> > > 			pincount -= 1 + page_has_private(page);
> > > 		if (pincount > 0)
> > > 			continue;
> > > 
> > > but please cross-check and test that out, it's easy to be off-by-one etc.
> > 
> > Hello, Hugh.
> > 
> > I don't think that this is right.
> > One of migratepage function, aio_migratepage(), pass extra count 1 to
> > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
> > above test.
> > 
> > I think that we should not be aggressive here. This is just for prediction
> > so that it is better not to skip apropriate pages at most. Just for anon case
> > that we are sure easily is the right solution for me.
> 
> Interesting, thank you for the pointer.  That's a pity!
> 
> I hope that later on we can modify fs/aio.c to set PagePrivate on
> ring pages, revert the extra argument to migrate_page_move_mapping(),
> and then let it appear the same as the other filesystems (but lacking
> a writepage, reclaim won't try to free the pages).
> 
> But that's "later on" and may prove impossible in the implementation.
> I agree it's beyond the scope of David's patch, and so only anonymous
> should be dealt with in this way at present.
> 
> And since page_mapping() is non-NULL on PageAnon PageSwapCache pages,
> those will fall through David's test and go on to try migration:
> which is the correct default.  Although we could add code to handle
> pinned swapcache, it would be rather an ugly excrescence, until the case
> gets handled naturally when proper page_mapping() support is added later.
> 
> Okay, to David's current patch
> Acked-by: Hugh Dickins <hughd@google.com>
> though I'd like to hear whether Mel is happy with it.
> 

I have nothing useful to add other than

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v2] mm, compaction: avoid isolating pinned pages
Date: Thu, 6 Feb 2014 13:53:37 +0000	[thread overview]
Message-ID: <20140206135337.GV6732@suse.de> (raw)
In-Reply-To: <alpine.LSU.2.11.1402051647360.28926@eggly.anvils>

On Wed, Feb 05, 2014 at 05:16:06PM -0800, Hugh Dickins wrote:
> On Thu, 6 Feb 2014, Joonsoo Kim wrote:
> > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> > > On Tue, 4 Feb 2014, David Rientjes wrote:
> > > 
> > > > Page migration will fail for memory that is pinned in memory with, for
> > > > example, get_user_pages().  In this case, it is unnecessary to take
> > > > zone->lru_lock or isolating the page and passing it to page migration
> > > > which will ultimately fail.
> > > > 
> > > > This is a racy check, the page can still change from under us, but in
> > > > that case we'll just fail later when attempting to move the page.
> > > > 
> > > > This avoids very expensive memory compaction when faulting transparent
> > > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > > 
> > > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > > see the enormous disparity in the number of page migration failures
> > > > because of the pinning (from /proc/vmstat):
> > > > 
> > > > 	compact_pages_moved 8450
> > > > 	compact_pagemigrate_failed 15614415
> > > > 
> > > > 0.05% of pages isolated are successfully migrated and explicitly 
> > > > triggering memory compaction takes 102 seconds.  After the patch:
> > > > 
> > > > 	compact_pages_moved 9197
> > > > 	compact_pagemigrate_failed 7
> > > > 
> > > > 99.9% of pages isolated are now successfully migrated in this 
> > > > configuration and memory compaction takes less than one second.
> > > > 
> > > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > > ---
> > > >  v2: address page count issue per Joonsoo
> > > > 
> > > >  mm/compaction.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > > >  			continue;
> > > >  		}
> > > >  
> > > > +		/*
> > > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > > > +		 * admittedly racy check.
> > > > +		 */
> > > > +		if (!page_mapping(page) &&
> > > > +		    page_count(page) > page_mapcount(page))
> > > > +			continue;
> > > > +
> > > >  		/* Check if it is ok to still hold the lock */
> > > >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> > > 
> > > Much better, maybe good enough as an internal patch to fix a particular
> > > problem you're seeing; but not yet good enough to go upstream.
> > > 
> > > Anonymous pages are not the only pages which might be pinned,
> > > and your test doesn't mention PageAnon, so does not match your comment.
> > > 
> > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> > > more assurance that a page_count - page_has_private test is appropriate,
> > > whatever the filesystem and migrate method to be used.
> > > 
> > > So I think the test you're looking for is
> > > 
> > > 		pincount = page_count(page) - page_mapcount(page);
> > > 		if (page_mapping(page))
> > > 			pincount -= 1 + page_has_private(page);
> > > 		if (pincount > 0)
> > > 			continue;
> > > 
> > > but please cross-check and test that out, it's easy to be off-by-one etc.
> > 
> > Hello, Hugh.
> > 
> > I don't think that this is right.
> > One of migratepage function, aio_migratepage(), pass extra count 1 to
> > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
> > above test.
> > 
> > I think that we should not be aggressive here. This is just for prediction
> > so that it is better not to skip apropriate pages at most. Just for anon case
> > that we are sure easily is the right solution for me.
> 
> Interesting, thank you for the pointer.  That's a pity!
> 
> I hope that later on we can modify fs/aio.c to set PagePrivate on
> ring pages, revert the extra argument to migrate_page_move_mapping(),
> and then let it appear the same as the other filesystems (but lacking
> a writepage, reclaim won't try to free the pages).
> 
> But that's "later on" and may prove impossible in the implementation.
> I agree it's beyond the scope of David's patch, and so only anonymous
> should be dealt with in this way at present.
> 
> And since page_mapping() is non-NULL on PageAnon PageSwapCache pages,
> those will fall through David's test and go on to try migration:
> which is the correct default.  Although we could add code to handle
> pinned swapcache, it would be rather an ugly excrescence, until the case
> gets handled naturally when proper page_mapping() support is added later.
> 
> Okay, to David's current patch
> Acked-by: Hugh Dickins <hughd@google.com>
> though I'd like to hear whether Mel is happy with it.
> 

I have nothing useful to add other than

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2014-02-07  0:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-02  5:46 [patch] mm, compaction: avoid isolating pinned pages David Rientjes
2014-02-02  5:46 ` David Rientjes
2014-02-03  9:53 ` Mel Gorman
2014-02-03  9:53   ` Mel Gorman
2014-02-03 10:49   ` David Rientjes
2014-02-03 10:49     ` David Rientjes
2014-02-04  0:02     ` Joonsoo Kim
2014-02-04  0:02       ` Joonsoo Kim
2014-02-04  1:20       ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes
2014-02-04  1:20         ` David Rientjes
2014-02-04  1:53         ` Joonsoo Kim
2014-02-04  1:53           ` Joonsoo Kim
2014-02-04  2:00           ` David Rientjes
2014-02-04  2:00             ` David Rientjes
2014-02-04  2:15             ` Joonsoo Kim
2014-02-04  2:15               ` Joonsoo Kim
2014-02-04  2:50               ` David Rientjes
2014-02-04  2:50                 ` David Rientjes
2014-02-04  3:47                 ` Hugh Dickins
2014-02-04  3:47                   ` Hugh Dickins
2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
2014-02-05  2:44                   ` David Rientjes
2014-02-05 20:56                   ` Hugh Dickins
2014-02-05 20:56                     ` Hugh Dickins
2014-02-06  0:05                     ` Joonsoo Kim
2014-02-06  0:05                       ` Joonsoo Kim
2014-02-06  1:16                       ` Hugh Dickins
2014-02-06  1:16                         ` Hugh Dickins
2014-02-06 13:53                         ` Mel Gorman [this message]
2014-02-06 13:53                           ` Mel Gorman
2014-02-06 18:48                   ` Vlastimil Babka
2014-02-06 18:48                     ` Vlastimil Babka
2014-02-06 21:33                     ` David Rientjes
2014-02-06 21:33                       ` David Rientjes
2014-02-04  2:44     ` [patch] " Hugh Dickins
2014-02-04  2:44       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140206135337.GV6732@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.