[patch] mm, compaction: avoid isolating pinned pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] mm, compaction: avoid isolating pinned pages
@ 2014-02-02  5:46 David Rientjes
  2014-02-03  9:53 ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: David Rientjes @ 2014-02-02  5:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Rik van Riel, linux-kernel, linux-mm

Page migration will fail for memory that is pinned in memory with, for
example, get_user_pages().  In this case, it is unnecessary to take
zone->lru_lock or isolating the page and passing it to page migration
which will ultimately fail.

This is a racy check, the page can still change from under us, but in
that case we'll just fail later when attempting to move the page.

This avoids very expensive memory compaction when faulting transparent
hugepages after pinning a lot of memory with a Mellanox driver.

On a 128GB machine and pinning ~120GB of memory, before this patch we
see the enormous disparity in the number of page migration failures
because of the pinning (from /proc/vmstat):

compact_blocks_moved 7609
compact_pages_moved 3431
compact_pagemigrate_failed 133219
compact_stall 13

After the patch, it is much more efficient:

compact_blocks_moved 7998
compact_pages_moved 6403
compact_pagemigrate_failed 3
compact_stall 15

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
+		/*
+		 * Migration will fail if an anonymous page is pinned in memory,
+		 * so avoid taking zone->lru_lock and isolating it unnecessarily
+		 * in an admittedly racy check.
+		 */
+		if (!page_mapping(page) && page_count(page))
+			continue;
+
 		/* Check if it is ok to still hold the lock */
 		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
 								locked, cc);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages
  2014-02-02  5:46 [patch] mm, compaction: avoid isolating pinned pages David Rientjes
@ 2014-02-03  9:53 ` Mel Gorman
  2014-02-03 10:49   ` David Rientjes
  0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2014-02-03  9:53 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Sat, Feb 01, 2014 at 09:46:26PM -0800, David Rientjes wrote:
> Page migration will fail for memory that is pinned in memory with, for
> example, get_user_pages().  In this case, it is unnecessary to take
> zone->lru_lock or isolating the page and passing it to page migration
> which will ultimately fail.
> 
> This is a racy check, the page can still change from under us, but in
> that case we'll just fail later when attempting to move the page.
> 
> This avoids very expensive memory compaction when faulting transparent
> hugepages after pinning a lot of memory with a Mellanox driver.
> 
> On a 128GB machine and pinning ~120GB of memory, before this patch we
> see the enormous disparity in the number of page migration failures
> because of the pinning (from /proc/vmstat):
> 
> compact_blocks_moved 7609
> compact_pages_moved 3431
> compact_pagemigrate_failed 133219
> compact_stall 13
> 
> After the patch, it is much more efficient:
> 
> compact_blocks_moved 7998
> compact_pages_moved 6403
> compact_pagemigrate_failed 3
> compact_stall 15
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  			continue;
>  		}
>  
> +		/*
> +		 * Migration will fail if an anonymous page is pinned in memory,
> +		 * so avoid taking zone->lru_lock and isolating it unnecessarily
> +		 * in an admittedly racy check.
> +		 */
> +		if (!page_mapping(page) && page_count(page))
> +			continue;
> +

Are you sure about this? The page_count check migration does is this

        int expected_count = 1 + extra_count;
        if (!mapping) {
                if (page_count(page) != expected_count)
                        return -EAGAIN;
                return MIGRATEPAGE_SUCCESS;
        }

        spin_lock_irq(&mapping->tree_lock);

        pslot = radix_tree_lookup_slot(&mapping->page_tree,
                                        page_index(page));

        expected_count += 1 + page_has_private(page);

Migration expects and can migrate pages with no mapping and a page count
but you are now skipping them. I think you may have intended to split
migrations page count into a helper or copy the logic.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages
  2014-02-03  9:53 ` Mel Gorman
@ 2014-02-03 10:49   ` David Rientjes
  2014-02-04  0:02     ` Joonsoo Kim
  2014-02-04  2:44     ` [patch] " Hugh Dickins
  0 siblings, 2 replies; 18+ messages in thread
From: David Rientjes @ 2014-02-03 10:49 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Mon, 3 Feb 2014, Mel Gorman wrote:

> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > compact_blocks_moved 7609
> > compact_pages_moved 3431
> > compact_pagemigrate_failed 133219
> > compact_stall 13
> > 
> > After the patch, it is much more efficient:
> > 
> > compact_blocks_moved 7998
> > compact_pages_moved 6403
> > compact_pagemigrate_failed 3
> > compact_stall 15
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/compaction.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * Migration will fail if an anonymous page is pinned in memory,
> > +		 * so avoid taking zone->lru_lock and isolating it unnecessarily
> > +		 * in an admittedly racy check.
> > +		 */
> > +		if (!page_mapping(page) && page_count(page))
> > +			continue;
> > +
> 
> Are you sure about this? The page_count check migration does is this
> 
>         int expected_count = 1 + extra_count;
>         if (!mapping) {
>                 if (page_count(page) != expected_count)
>                         return -EAGAIN;
>                 return MIGRATEPAGE_SUCCESS;
>         }
> 
>         spin_lock_irq(&mapping->tree_lock);
> 
>         pslot = radix_tree_lookup_slot(&mapping->page_tree,
>                                         page_index(page));
> 
>         expected_count += 1 + page_has_private(page);
> 
> Migration expects and can migrate pages with no mapping and a page count
> but you are now skipping them. I think you may have intended to split
> migrations page count into a helper or copy the logic.
> 

Thanks for taking a look!

The patch is correct, it just shows my lack of a complete commit message 
which I'm struggling with recently.  In the case that this is addressing, 
get_user_pages() already gives page_count(page) == 1, then 
__isolate_lru_page() does another get_page() that is dropped in 
putback_lru_page() after the call into migrate_pages().  So in the code 
you quote above we always have page_count(page) == 2 and
expected_count == 1.

So what we desperately need to do is avoid isolating any page where 
page_count(page) is non-zero and !page_mapping(page) and do that before 
the get_page() in __isolate_lru_page() because we want to avoid taking 
zone->lru_lock.  On my 128GB machine filled with ~120GB of pinned memory 
for the driver, this lock gets highly contended under compaction and even 
reclaim if the rest of userspace is using a lot of memory.

It's not really relevant to the commit message, but I found that if all 
that ~120GB is faulted and I manually invoke compaction with the procfs 
trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets 
taken ~450,000 times and only 0.05% of isolated pages are actually 
successfully migrated.

Deferred compaction will certainly help for compaction that isn't induced 
via procfs, but we've encountered massive amounts of lock contention in 
this path and extremely low success to failure ratios of page migration on 
average of 2-3 out of 60 runs and the fault path really does grind to a 
halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver 
does ib_umem_get() for 120GB of memory, but we want those hugepages!).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages
  2014-02-03 10:49   ` David Rientjes
@ 2014-02-04  0:02     ` Joonsoo Kim
  2014-02-04  1:20       ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes
  2014-02-04  2:44     ` [patch] " Hugh Dickins
  1 sibling, 1 reply; 18+ messages in thread
From: Joonsoo Kim @ 2014-02-04  0:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Mon, Feb 03, 2014 at 02:49:32AM -0800, David Rientjes wrote:
> On Mon, 3 Feb 2014, Mel Gorman wrote:
> 
> > > Page migration will fail for memory that is pinned in memory with, for
> > > example, get_user_pages().  In this case, it is unnecessary to take
> > > zone->lru_lock or isolating the page and passing it to page migration
> > > which will ultimately fail.
> > > 
> > > This is a racy check, the page can still change from under us, but in
> > > that case we'll just fail later when attempting to move the page.
> > > 
> > > This avoids very expensive memory compaction when faulting transparent
> > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > 
> > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > see the enormous disparity in the number of page migration failures
> > > because of the pinning (from /proc/vmstat):
> > > 
> > > compact_blocks_moved 7609
> > > compact_pages_moved 3431
> > > compact_pagemigrate_failed 133219
> > > compact_stall 13
> > > 
> > > After the patch, it is much more efficient:
> > > 
> > > compact_blocks_moved 7998
> > > compact_pages_moved 6403
> > > compact_pagemigrate_failed 3
> > > compact_stall 15
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > ---
> > >  mm/compaction.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > >  			continue;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > +		 * so avoid taking zone->lru_lock and isolating it unnecessarily
> > > +		 * in an admittedly racy check.
> > > +		 */
> > > +		if (!page_mapping(page) && page_count(page))
> > > +			continue;
> > > +

Hello,

I think that you need more code to skip this type of page correctly.
Without page_mapped() check, this code makes migratable pages be skipped,
since if page_mapped() case, page_count() may be more than zero.

So I think that you need following change.

(!page_mapping(page) && !page_mapped(page) && page_count(page))

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  0:02     ` Joonsoo Kim
@ 2014-02-04  1:20       ` David Rientjes
  2014-02-04  1:53         ` Joonsoo Kim
  0 siblings, 1 reply; 18+ messages in thread
From: David Rientjes @ 2014-02-04  1:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> I think that you need more code to skip this type of page correctly.
> Without page_mapped() check, this code makes migratable pages be skipped,
> since if page_mapped() case, page_count() may be more than zero.
> 
> So I think that you need following change.
> 
> (!page_mapping(page) && !page_mapped(page) && page_count(page))
> 

These pages returned by get_user_pages() will have a mapcount of 1 so this 
wouldn't actually fix the massive lock contention.  page_mapping() is only 
going to be NULL for pages off the lru like these are for 
PAGE_MAPPING_ANON.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  1:20       ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes
@ 2014-02-04  1:53         ` Joonsoo Kim
  2014-02-04  2:00           ` David Rientjes
  0 siblings, 1 reply; 18+ messages in thread
From: Joonsoo Kim @ 2014-02-04  1:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Mon, Feb 03, 2014 at 05:20:46PM -0800, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > I think that you need more code to skip this type of page correctly.
> > Without page_mapped() check, this code makes migratable pages be skipped,
> > since if page_mapped() case, page_count() may be more than zero.
> > 
> > So I think that you need following change.
> > 
> > (!page_mapping(page) && !page_mapped(page) && page_count(page))
> > 
> 
> These pages returned by get_user_pages() will have a mapcount of 1 so this 
> wouldn't actually fix the massive lock contention.  page_mapping() is only 
> going to be NULL for pages off the lru like these are for 
> PAGE_MAPPING_ANON.

Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
and have positive page_count(), so your code such as
'!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
anon pages and this is incorrect behaviour.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  1:53         ` Joonsoo Kim
@ 2014-02-04  2:00           ` David Rientjes
  2014-02-04  2:15             ` Joonsoo Kim
  0 siblings, 1 reply; 18+ messages in thread
From: David Rientjes @ 2014-02-04  2:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> and have positive page_count(), so your code such as
> '!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
> anon pages and this is incorrect behaviour.
> 

So how does that work with migrate_page_move_mapping() which demands 
page_count(page) == 1 and the get_page_unless_zero() in 
__isolate_lru_page()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  2:00           ` David Rientjes
@ 2014-02-04  2:15             ` Joonsoo Kim
  2014-02-04  2:50               ` David Rientjes
  0 siblings, 1 reply; 18+ messages in thread
From: Joonsoo Kim @ 2014-02-04  2:15 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Mon, Feb 03, 2014 at 06:00:56PM -0800, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> > and have positive page_count(), so your code such as
> > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
> > anon pages and this is incorrect behaviour.
> > 
> 
> So how does that work with migrate_page_move_mapping() which demands 
> page_count(page) == 1 and the get_page_unless_zero() in 
> __isolate_lru_page()?

Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
__isolate_lru_page(). Am I missing something?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  2:15             ` Joonsoo Kim
@ 2014-02-04  2:50               ` David Rientjes
  2014-02-04  3:47                 ` Hugh Dickins
  2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
  0 siblings, 2 replies; 18+ messages in thread
From: David Rientjes @ 2014-02-04  2:50 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> > > and have positive page_count(), so your code such as
> > > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
> > > anon pages and this is incorrect behaviour.
> > > 
> > 
> > So how does that work with migrate_page_move_mapping() which demands 
> > page_count(page) == 1 and the get_page_unless_zero() in 
> > __isolate_lru_page()?
> 
> Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
> mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
> __isolate_lru_page(). Am I missing something?
> 

Ah, good point.  I wonder if we can get away with 
page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() 
pin?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages fix
  2014-02-04  2:50               ` David Rientjes
@ 2014-02-04  3:47                 ` Hugh Dickins
  2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2014-02-04  3:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: Joonsoo Kim, Mel Gorman, Andrew Morton, Rik van Riel,
	linux-kernel, linux-mm

On Mon, 3 Feb 2014, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> > > > and have positive page_count(), so your code such as
> > > > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
> > > > anon pages and this is incorrect behaviour.
> > > > 
> > > 
> > > So how does that work with migrate_page_move_mapping() which demands 
> > > page_count(page) == 1 and the get_page_unless_zero() in 
> > > __isolate_lru_page()?
> > 
> > Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
> > mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
> > __isolate_lru_page(). Am I missing something?
> > 
> 
> Ah, good point.  I wonder if we can get away with 
> page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() 
> pin?

Something like that.  But please go back to migrate_page_move_mapping()
to factor in what it's additionally considering.  Whether you can share
code with it, I don't know - it has to do some things under a lock you
cannot take at the preliminary stage - you haven't isolated or locked
the page yet.

There is a separate issue, that a mapping may supply its own non-default
mapping->a_ops->migratepage(): can we assume that the page_counting is
the same whatever migratepage() is in use?  I'm not sure.

If you stick to special-casing PageAnon pages, you won't face that
issue; but your proposed change would be a lot more satisfying if we
can convince ourselves that it's good for !PageAnon too.  May need a
trawl through the different migratepage() methods that exist in tree.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-04  2:50               ` David Rientjes
  2014-02-04  3:47                 ` Hugh Dickins
@ 2014-02-05  2:44                 ` David Rientjes
  2014-02-05 20:56                   ` Hugh Dickins
  2014-02-06 18:48                   ` Vlastimil Babka
  1 sibling, 2 replies; 18+ messages in thread
From: David Rientjes @ 2014-02-05  2:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, linux-kernel,
	linux-mm

Page migration will fail for memory that is pinned in memory with, for
example, get_user_pages().  In this case, it is unnecessary to take
zone->lru_lock or isolating the page and passing it to page migration
which will ultimately fail.

This is a racy check, the page can still change from under us, but in
that case we'll just fail later when attempting to move the page.

This avoids very expensive memory compaction when faulting transparent
hugepages after pinning a lot of memory with a Mellanox driver.

On a 128GB machine and pinning ~120GB of memory, before this patch we
see the enormous disparity in the number of page migration failures
because of the pinning (from /proc/vmstat):

	compact_pages_moved 8450
	compact_pagemigrate_failed 15614415

0.05% of pages isolated are successfully migrated and explicitly 
triggering memory compaction takes 102 seconds.  After the patch:

	compact_pages_moved 9197
	compact_pagemigrate_failed 7

99.9% of pages isolated are now successfully migrated in this 
configuration and memory compaction takes less than one second.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 v2: address page count issue per Joonsoo

 mm/compaction.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
+		/*
+		 * Migration will fail if an anonymous page is pinned in memory,
+		 * so avoid taking lru_lock and isolating it unnecessarily in an
+		 * admittedly racy check.
+		 */
+		if (!page_mapping(page) &&
+		    page_count(page) > page_mapcount(page))
+			continue;
+
 		/* Check if it is ok to still hold the lock */
 		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
 								locked, cc);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
@ 2014-02-05 20:56                   ` Hugh Dickins
  2014-02-06  0:05                     ` Joonsoo Kim
  2014-02-06 18:48                   ` Vlastimil Babka
  1 sibling, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2014-02-05 20:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Mel Gorman,
	Rik van Riel, Greg Thelen, linux-kernel, linux-mm

On Tue, 4 Feb 2014, David Rientjes wrote:

> Page migration will fail for memory that is pinned in memory with, for
> example, get_user_pages().  In this case, it is unnecessary to take
> zone->lru_lock or isolating the page and passing it to page migration
> which will ultimately fail.
> 
> This is a racy check, the page can still change from under us, but in
> that case we'll just fail later when attempting to move the page.
> 
> This avoids very expensive memory compaction when faulting transparent
> hugepages after pinning a lot of memory with a Mellanox driver.
> 
> On a 128GB machine and pinning ~120GB of memory, before this patch we
> see the enormous disparity in the number of page migration failures
> because of the pinning (from /proc/vmstat):
> 
> 	compact_pages_moved 8450
> 	compact_pagemigrate_failed 15614415
> 
> 0.05% of pages isolated are successfully migrated and explicitly 
> triggering memory compaction takes 102 seconds.  After the patch:
> 
> 	compact_pages_moved 9197
> 	compact_pagemigrate_failed 7
> 
> 99.9% of pages isolated are now successfully migrated in this 
> configuration and memory compaction takes less than one second.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  v2: address page count issue per Joonsoo
> 
>  mm/compaction.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  			continue;
>  		}
>  
> +		/*
> +		 * Migration will fail if an anonymous page is pinned in memory,
> +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> +		 * admittedly racy check.
> +		 */
> +		if (!page_mapping(page) &&
> +		    page_count(page) > page_mapcount(page))
> +			continue;
> +
>  		/* Check if it is ok to still hold the lock */
>  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,

Much better, maybe good enough as an internal patch to fix a particular
problem you're seeing; but not yet good enough to go upstream.

Anonymous pages are not the only pages which might be pinned,
and your test doesn't mention PageAnon, so does not match your comment.

I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
more assurance that a page_count - page_has_private test is appropriate,
whatever the filesystem and migrate method to be used.

So I think the test you're looking for is

		pincount = page_count(page) - page_mapcount(page);
		if (page_mapping(page))
			pincount -= 1 + page_has_private(page);
		if (pincount > 0)
			continue;

but please cross-check and test that out, it's easy to be off-by-one etc.

For a moment I thought a PageWriteback test would be useful too,
but no, that should already appear in the pincount.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-05 20:56                   ` Hugh Dickins
@ 2014-02-06  0:05                     ` Joonsoo Kim
  2014-02-06  1:16                       ` Hugh Dickins
  0 siblings, 1 reply; 18+ messages in thread
From: Joonsoo Kim @ 2014-02-06  0:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: David Rientjes, Andrew Morton, Mel Gorman, Rik van Riel,
	Greg Thelen, linux-kernel, linux-mm

On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> On Tue, 4 Feb 2014, David Rientjes wrote:
> 
> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > 	compact_pages_moved 8450
> > 	compact_pagemigrate_failed 15614415
> > 
> > 0.05% of pages isolated are successfully migrated and explicitly 
> > triggering memory compaction takes 102 seconds.  After the patch:
> > 
> > 	compact_pages_moved 9197
> > 	compact_pagemigrate_failed 7
> > 
> > 99.9% of pages isolated are now successfully migrated in this 
> > configuration and memory compaction takes less than one second.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >  v2: address page count issue per Joonsoo
> > 
> >  mm/compaction.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * Migration will fail if an anonymous page is pinned in memory,
> > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > +		 * admittedly racy check.
> > +		 */
> > +		if (!page_mapping(page) &&
> > +		    page_count(page) > page_mapcount(page))
> > +			continue;
> > +
> >  		/* Check if it is ok to still hold the lock */
> >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> 
> Much better, maybe good enough as an internal patch to fix a particular
> problem you're seeing; but not yet good enough to go upstream.
> 
> Anonymous pages are not the only pages which might be pinned,
> and your test doesn't mention PageAnon, so does not match your comment.
> 
> I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> more assurance that a page_count - page_has_private test is appropriate,
> whatever the filesystem and migrate method to be used.
> 
> So I think the test you're looking for is
> 
> 		pincount = page_count(page) - page_mapcount(page);
> 		if (page_mapping(page))
> 			pincount -= 1 + page_has_private(page);
> 		if (pincount > 0)
> 			continue;
> 
> but please cross-check and test that out, it's easy to be off-by-one etc.

Hello, Hugh.

I don't think that this is right.
One of migratepage function, aio_migratepage(), pass extra count 1 to
migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
above test.

I think that we should not be aggressive here. This is just for prediction
so that it is better not to skip apropriate pages at most. Just for anon case
that we are sure easily is the right solution for me.

Thanks.

> 
> For a moment I thought a PageWriteback test would be useful too,
> but no, that should already appear in the pincount.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-06  0:05                     ` Joonsoo Kim
@ 2014-02-06  1:16                       ` Hugh Dickins
  2014-02-06 13:53                         ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2014-02-06  1:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Hugh Dickins, David Rientjes, Andrew Morton, Mel Gorman,
	Rik van Riel, Greg Thelen, linux-kernel, linux-mm

On Thu, 6 Feb 2014, Joonsoo Kim wrote:
> On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> > On Tue, 4 Feb 2014, David Rientjes wrote:
> > 
> > > Page migration will fail for memory that is pinned in memory with, for
> > > example, get_user_pages().  In this case, it is unnecessary to take
> > > zone->lru_lock or isolating the page and passing it to page migration
> > > which will ultimately fail.
> > > 
> > > This is a racy check, the page can still change from under us, but in
> > > that case we'll just fail later when attempting to move the page.
> > > 
> > > This avoids very expensive memory compaction when faulting transparent
> > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > 
> > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > see the enormous disparity in the number of page migration failures
> > > because of the pinning (from /proc/vmstat):
> > > 
> > > 	compact_pages_moved 8450
> > > 	compact_pagemigrate_failed 15614415
> > > 
> > > 0.05% of pages isolated are successfully migrated and explicitly 
> > > triggering memory compaction takes 102 seconds.  After the patch:
> > > 
> > > 	compact_pages_moved 9197
> > > 	compact_pagemigrate_failed 7
> > > 
> > > 99.9% of pages isolated are now successfully migrated in this 
> > > configuration and memory compaction takes less than one second.
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > ---
> > >  v2: address page count issue per Joonsoo
> > > 
> > >  mm/compaction.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > >  			continue;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > > +		 * admittedly racy check.
> > > +		 */
> > > +		if (!page_mapping(page) &&
> > > +		    page_count(page) > page_mapcount(page))
> > > +			continue;
> > > +
> > >  		/* Check if it is ok to still hold the lock */
> > >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> > 
> > Much better, maybe good enough as an internal patch to fix a particular
> > problem you're seeing; but not yet good enough to go upstream.
> > 
> > Anonymous pages are not the only pages which might be pinned,
> > and your test doesn't mention PageAnon, so does not match your comment.
> > 
> > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> > more assurance that a page_count - page_has_private test is appropriate,
> > whatever the filesystem and migrate method to be used.
> > 
> > So I think the test you're looking for is
> > 
> > 		pincount = page_count(page) - page_mapcount(page);
> > 		if (page_mapping(page))
> > 			pincount -= 1 + page_has_private(page);
> > 		if (pincount > 0)
> > 			continue;
> > 
> > but please cross-check and test that out, it's easy to be off-by-one etc.
> 
> Hello, Hugh.
> 
> I don't think that this is right.
> One of migratepage function, aio_migratepage(), pass extra count 1 to
> migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
> above test.
> 
> I think that we should not be aggressive here. This is just for prediction
> so that it is better not to skip apropriate pages at most. Just for anon case
> that we are sure easily is the right solution for me.

Interesting, thank you for the pointer.  That's a pity!

I hope that later on we can modify fs/aio.c to set PagePrivate on
ring pages, revert the extra argument to migrate_page_move_mapping(),
and then let it appear the same as the other filesystems (but lacking
a writepage, reclaim won't try to free the pages).

But that's "later on" and may prove impossible in the implementation.
I agree it's beyond the scope of David's patch, and so only anonymous
should be dealt with in this way at present.

And since page_mapping() is non-NULL on PageAnon PageSwapCache pages,
those will fall through David's test and go on to try migration:
which is the correct default.  Although we could add code to handle
pinned swapcache, it would be rather an ugly excrescence, until the case
gets handled naturally when proper page_mapping() support is added later.

Okay, to David's current patch
Acked-by: Hugh Dickins <hughd@google.com>
though I'd like to hear whether Mel is happy with it.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-06  1:16                       ` Hugh Dickins
@ 2014-02-06 13:53                         ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2014-02-06 13:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Joonsoo Kim, David Rientjes, Andrew Morton, Rik van Riel,
	Greg Thelen, linux-kernel, linux-mm

On Wed, Feb 05, 2014 at 05:16:06PM -0800, Hugh Dickins wrote:
> On Thu, 6 Feb 2014, Joonsoo Kim wrote:
> > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> > > On Tue, 4 Feb 2014, David Rientjes wrote:
> > > 
> > > > Page migration will fail for memory that is pinned in memory with, for
> > > > example, get_user_pages().  In this case, it is unnecessary to take
> > > > zone->lru_lock or isolating the page and passing it to page migration
> > > > which will ultimately fail.
> > > > 
> > > > This is a racy check, the page can still change from under us, but in
> > > > that case we'll just fail later when attempting to move the page.
> > > > 
> > > > This avoids very expensive memory compaction when faulting transparent
> > > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > > 
> > > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > > see the enormous disparity in the number of page migration failures
> > > > because of the pinning (from /proc/vmstat):
> > > > 
> > > > 	compact_pages_moved 8450
> > > > 	compact_pagemigrate_failed 15614415
> > > > 
> > > > 0.05% of pages isolated are successfully migrated and explicitly 
> > > > triggering memory compaction takes 102 seconds.  After the patch:
> > > > 
> > > > 	compact_pages_moved 9197
> > > > 	compact_pagemigrate_failed 7
> > > > 
> > > > 99.9% of pages isolated are now successfully migrated in this 
> > > > configuration and memory compaction takes less than one second.
> > > > 
> > > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > > ---
> > > >  v2: address page count issue per Joonsoo
> > > > 
> > > >  mm/compaction.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > > >  			continue;
> > > >  		}
> > > >  
> > > > +		/*
> > > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > > > +		 * admittedly racy check.
> > > > +		 */
> > > > +		if (!page_mapping(page) &&
> > > > +		    page_count(page) > page_mapcount(page))
> > > > +			continue;
> > > > +
> > > >  		/* Check if it is ok to still hold the lock */
> > > >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> > > 
> > > Much better, maybe good enough as an internal patch to fix a particular
> > > problem you're seeing; but not yet good enough to go upstream.
> > > 
> > > Anonymous pages are not the only pages which might be pinned,
> > > and your test doesn't mention PageAnon, so does not match your comment.
> > > 
> > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> > > more assurance that a page_count - page_has_private test is appropriate,
> > > whatever the filesystem and migrate method to be used.
> > > 
> > > So I think the test you're looking for is
> > > 
> > > 		pincount = page_count(page) - page_mapcount(page);
> > > 		if (page_mapping(page))
> > > 			pincount -= 1 + page_has_private(page);
> > > 		if (pincount > 0)
> > > 			continue;
> > > 
> > > but please cross-check and test that out, it's easy to be off-by-one etc.
> > 
> > Hello, Hugh.
> > 
> > I don't think that this is right.
> > One of migratepage function, aio_migratepage(), pass extra count 1 to
> > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
> > above test.
> > 
> > I think that we should not be aggressive here. This is just for prediction
> > so that it is better not to skip apropriate pages at most. Just for anon case
> > that we are sure easily is the right solution for me.
> 
> Interesting, thank you for the pointer.  That's a pity!
> 
> I hope that later on we can modify fs/aio.c to set PagePrivate on
> ring pages, revert the extra argument to migrate_page_move_mapping(),
> and then let it appear the same as the other filesystems (but lacking
> a writepage, reclaim won't try to free the pages).
> 
> But that's "later on" and may prove impossible in the implementation.
> I agree it's beyond the scope of David's patch, and so only anonymous
> should be dealt with in this way at present.
> 
> And since page_mapping() is non-NULL on PageAnon PageSwapCache pages,
> those will fall through David's test and go on to try migration:
> which is the correct default.  Although we could add code to handle
> pinned swapcache, it would be rather an ugly excrescence, until the case
> gets handled naturally when proper page_mapping() support is added later.
> 
> Okay, to David's current patch
> Acked-by: Hugh Dickins <hughd@google.com>
> though I'd like to hear whether Mel is happy with it.
> 

I have nothing useful to add other than

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
  2014-02-05 20:56                   ` Hugh Dickins
@ 2014-02-06 18:48                   ` Vlastimil Babka
  2014-02-06 21:33                     ` David Rientjes
  1 sibling, 1 reply; 18+ messages in thread
From: Vlastimil Babka @ 2014-02-06 18:48 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, linux-kernel,
	linux-mm

On 5.2.2014 3:44, David Rientjes wrote:
> Page migration will fail for memory that is pinned in memory with, for
> example, get_user_pages().  In this case, it is unnecessary to take
> zone->lru_lock or isolating the page and passing it to page migration
> which will ultimately fail.
>
> This is a racy check, the page can still change from under us, but in
> that case we'll just fail later when attempting to move the page.
>
> This avoids very expensive memory compaction when faulting transparent
> hugepages after pinning a lot of memory with a Mellanox driver.
>
> On a 128GB machine and pinning ~120GB of memory, before this patch we
> see the enormous disparity in the number of page migration failures
> because of the pinning (from /proc/vmstat):
>
> 	compact_pages_moved 8450
> 	compact_pagemigrate_failed 15614415
>
> 0.05% of pages isolated are successfully migrated and explicitly
> triggering memory compaction takes 102 seconds.  After the patch:
>
> 	compact_pages_moved 9197
> 	compact_pagemigrate_failed 7
>
> 99.9% of pages isolated are now successfully migrated in this
> configuration and memory compaction takes less than one second.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>   v2: address page count issue per Joonsoo
>
>   mm/compaction.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>   			continue;
>   		}
>   
> +		/*
> +		 * Migration will fail if an anonymous page is pinned in memory,
> +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> +		 * admittedly racy check.
> +		 */
> +		if (!page_mapping(page) &&
> +		    page_count(page) > page_mapcount(page))
> +			continue;
> +

Hm this page_count() seems it could substantially increase the chance of 
race with prep_compound_page that your patch "mm, page_alloc: make 
first_page visible before PageTail" tries to fix :)

>   		/* Check if it is ok to still hold the lock */
>   		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
>   								locked, cc);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch v2] mm, compaction: avoid isolating pinned pages
  2014-02-06 18:48                   ` Vlastimil Babka
@ 2014-02-06 21:33                     ` David Rientjes
  0 siblings, 0 replies; 18+ messages in thread
From: David Rientjes @ 2014-02-06 21:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On Thu, 6 Feb 2014, Vlastimil Babka wrote:

> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > 	compact_pages_moved 8450
> > 	compact_pagemigrate_failed 15614415
> > 
> > 0.05% of pages isolated are successfully migrated and explicitly
> > triggering memory compaction takes 102 seconds.  After the patch:
> > 
> > 	compact_pages_moved 9197
> > 	compact_pagemigrate_failed 7
> > 
> > 99.9% of pages isolated are now successfully migrated in this
> > configuration and memory compaction takes less than one second.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >   v2: address page count issue per Joonsoo
> > 
> >   mm/compaction.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct
> > compact_control *cc,
> >   			continue;
> >   		}
> >   +		/*
> > +		 * Migration will fail if an anonymous page is pinned in
> > memory,
> > +		 * so avoid taking lru_lock and isolating it unnecessarily in
> > an
> > +		 * admittedly racy check.
> > +		 */
> > +		if (!page_mapping(page) &&
> > +		    page_count(page) > page_mapcount(page))
> > +			continue;
> > +
> 
> Hm this page_count() seems it could substantially increase the chance of race
> with prep_compound_page that your patch "mm, page_alloc: make first_page
> visible before PageTail" tries to fix :)
> 

That's why I sent the fix for page_count().

The "racy check" the comment eludes to above concerns the fact that 
page_count() and page_mapcount() can change out from under us before 
isolation and if we had not avoided isolating them that they would have 
been migratable later.  We accept that as a consequence of doing this in a 
lockless way without page references.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch] mm, compaction: avoid isolating pinned pages
  2014-02-03 10:49   ` David Rientjes
  2014-02-04  0:02     ` Joonsoo Kim
@ 2014-02-04  2:44     ` Hugh Dickins
  1 sibling, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2014-02-04  2:44 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, Joonsoo Kim,
	linux-kernel, linux-mm

On Mon, 3 Feb 2014, David Rientjes wrote:
> On Mon, 3 Feb 2014, Mel Gorman wrote:
> 
> > > Page migration will fail for memory that is pinned in memory with, for
> > > example, get_user_pages().  In this case, it is unnecessary to take
> > > zone->lru_lock or isolating the page and passing it to page migration
> > > which will ultimately fail.
> > > 
> > > This is a racy check, the page can still change from under us, but in
> > > that case we'll just fail later when attempting to move the page.
> > > 
> > > This avoids very expensive memory compaction when faulting transparent
> > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > 
> > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > see the enormous disparity in the number of page migration failures
> > > because of the pinning (from /proc/vmstat):

120GB of memory on the active/inactive lrus but longterm pinned,
that's quite worrying: not just a great waste of time for compaction,
but for page reclaim also.  I suppose a fairly easy way around it would
be for the driver to use mlock too, moving them all to unevictable lru.

But in general, you may well  be right that, racy as this isolation/
migration procedure necessarily is, in the face of longterm pinning it
may make more sense to test page_count before proceding to isolation
rather than only after in migration.  We always took the view that it's
better to give up only at the last moment, but that may be a bad bet.

> > > 
> > > compact_blocks_moved 7609
> > > compact_pages_moved 3431
> > > compact_pagemigrate_failed 133219
> > > compact_stall 13
> > > 
> > > After the patch, it is much more efficient:
> > > 
> > > compact_blocks_moved 7998
> > > compact_pages_moved 6403
> > > compact_pagemigrate_failed 3
> > > compact_stall 15
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > > ---
> > >  mm/compaction.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > >  			continue;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Migration will fail if an anonymous page is pinned in memory,
> > > +		 * so avoid taking zone->lru_lock and isolating it unnecessarily
> > > +		 * in an admittedly racy check.
> > > +		 */
> > > +		if (!page_mapping(page) && page_count(page))
> > > +			continue;
> > > +
> > 
> > Are you sure about this? The page_count check migration does is this
> > 
> >         int expected_count = 1 + extra_count;
> >         if (!mapping) {
> >                 if (page_count(page) != expected_count)
> >                         return -EAGAIN;
> >                 return MIGRATEPAGE_SUCCESS;
> >         }
> > 
> >         spin_lock_irq(&mapping->tree_lock);
> > 
> >         pslot = radix_tree_lookup_slot(&mapping->page_tree,
> >                                         page_index(page));
> > 
> >         expected_count += 1 + page_has_private(page);
> > 
> > Migration expects and can migrate pages with no mapping and a page count
> > but you are now skipping them. I think you may have intended to split
> > migrations page count into a helper or copy the logic.
> > 
> 
> Thanks for taking a look!
> 
> The patch is correct, it just shows my lack of a complete commit message 

I don't think so.  I agree with Mel that you should be reconsidering
those tests that migrate_page_move_mapping() makes, but remembering that
it's called at a stage between try_to_unmap() and remove_migration_ptes(),
when page_mapcount has been brought down to 0 - not the case here.

> which I'm struggling with recently.  In the case that this is addressing, 
> get_user_pages() already gives page_count(page) == 1, then 

But get_user_pages() brings the pages into user address space (if not
already there), page_mapcount 1 and page_count 1, and does an additional
pin on the page, page_count 2.  Or if it's a page_mapping page (perhaps
even PageAnon in SwapCache) there's another +1; if page_has_buffers
another +1; mapped into more user address spaces, +more.

Your	if (!page_mapping(page) && page_count(page))
		continue;
is letting through any Anon SwapCache pages (probably no great concern
in your 120GB example; but I don't understand why you want to special-
case Anon anyway, beyond your specific testcase); and refusing to
isolate all those unpinned anonymous pages mapped into userspace which
migration is perfectly capable of migrating.  If 120GB out of 128GB is
pinned, that won't be a significant proportion, and of course your
change saves a lot of wasted time and lock contention; but for most
people it's a considerable proportion of their memory, and needs to
be migratable.

I think Joonsoo is making the same point (though I disagree with the
test he suggested); but I've not yet read the latest mails under a
separate subject (" fix" appended).

Hugh

> __isolate_lru_page() does another get_page() that is dropped in 
> putback_lru_page() after the call into migrate_pages().  So in the code 
> you quote above we always have page_count(page) == 2 and
> expected_count == 1.
> 
> So what we desperately need to do is avoid isolating any page where 
> page_count(page) is non-zero and !page_mapping(page) and do that before 
> the get_page() in __isolate_lru_page() because we want to avoid taking 
> zone->lru_lock.  On my 128GB machine filled with ~120GB of pinned memory 
> for the driver, this lock gets highly contended under compaction and even 
> reclaim if the rest of userspace is using a lot of memory.
> 
> It's not really relevant to the commit message, but I found that if all 
> that ~120GB is faulted and I manually invoke compaction with the procfs 
> trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets 
> taken ~450,000 times and only 0.05% of isolated pages are actually 
> successfully migrated.
> 
> Deferred compaction will certainly help for compaction that isn't induced 
> via procfs, but we've encountered massive amounts of lock contention in 
> this path and extremely low success to failure ratios of page migration on 
> average of 2-3 out of 60 runs and the fault path really does grind to a 
> halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver 
> does ib_umem_get() for 120GB of memory, but we want those hugepages!).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-02-07  0:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02  5:46 [patch] mm, compaction: avoid isolating pinned pages David Rientjes
2014-02-03  9:53 ` Mel Gorman
2014-02-03 10:49   ` David Rientjes
2014-02-04  0:02     ` Joonsoo Kim
2014-02-04  1:20       ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes
2014-02-04  1:53         ` Joonsoo Kim
2014-02-04  2:00           ` David Rientjes
2014-02-04  2:15             ` Joonsoo Kim
2014-02-04  2:50               ` David Rientjes
2014-02-04  3:47                 ` Hugh Dickins
2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
2014-02-05 20:56                   ` Hugh Dickins
2014-02-06  0:05                     ` Joonsoo Kim
2014-02-06  1:16                       ` Hugh Dickins
2014-02-06 13:53                         ` Mel Gorman
2014-02-06 18:48                   ` Vlastimil Babka
2014-02-06 21:33                     ` David Rientjes
2014-02-04  2:44     ` [patch] " Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).