* [patch] mm, compaction: avoid isolating pinned pages @ 2014-02-02 5:46 David Rientjes 2014-02-03 9:53 ` Mel Gorman 0 siblings, 1 reply; 18+ messages in thread From: David Rientjes @ 2014-02-02 5:46 UTC (permalink / raw) To: Andrew Morton; +Cc: Mel Gorman, Rik van Riel, linux-kernel, linux-mm Page migration will fail for memory that is pinned in memory with, for example, get_user_pages(). In this case, it is unnecessary to take zone->lru_lock or isolating the page and passing it to page migration which will ultimately fail. This is a racy check, the page can still change from under us, but in that case we'll just fail later when attempting to move the page. This avoids very expensive memory compaction when faulting transparent hugepages after pinning a lot of memory with a Mellanox driver. On a 128GB machine and pinning ~120GB of memory, before this patch we see the enormous disparity in the number of page migration failures because of the pinning (from /proc/vmstat): compact_blocks_moved 7609 compact_pages_moved 3431 compact_pagemigrate_failed 133219 compact_stall 13 After the patch, it is much more efficient: compact_blocks_moved 7998 compact_pages_moved 6403 compact_pagemigrate_failed 3 compact_stall 15 Signed-off-by: David Rientjes <rientjes@google.com> --- mm/compaction.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c --- a/mm/compaction.c +++ b/mm/compaction.c @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, continue; } + /* + * Migration will fail if an anonymous page is pinned in memory, + * so avoid taking zone->lru_lock and isolating it unnecessarily + * in an admittedly racy check. + */ + if (!page_mapping(page) && page_count(page)) + continue; + /* Check if it is ok to still hold the lock */ locked = compact_checklock_irqsave(&zone->lru_lock, &flags, locked, cc); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages 2014-02-02 5:46 [patch] mm, compaction: avoid isolating pinned pages David Rientjes @ 2014-02-03 9:53 ` Mel Gorman 2014-02-03 10:49 ` David Rientjes 0 siblings, 1 reply; 18+ messages in thread From: Mel Gorman @ 2014-02-03 9:53 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, Feb 01, 2014 at 09:46:26PM -0800, David Rientjes wrote: > Page migration will fail for memory that is pinned in memory with, for > example, get_user_pages(). In this case, it is unnecessary to take > zone->lru_lock or isolating the page and passing it to page migration > which will ultimately fail. > > This is a racy check, the page can still change from under us, but in > that case we'll just fail later when attempting to move the page. > > This avoids very expensive memory compaction when faulting transparent > hugepages after pinning a lot of memory with a Mellanox driver. > > On a 128GB machine and pinning ~120GB of memory, before this patch we > see the enormous disparity in the number of page migration failures > because of the pinning (from /proc/vmstat): > > compact_blocks_moved 7609 > compact_pages_moved 3431 > compact_pagemigrate_failed 133219 > compact_stall 13 > > After the patch, it is much more efficient: > > compact_blocks_moved 7998 > compact_pages_moved 6403 > compact_pagemigrate_failed 3 > compact_stall 15 > > Signed-off-by: David Rientjes <rientjes@google.com> > --- > mm/compaction.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > continue; > } > > + /* > + * Migration will fail if an anonymous page is pinned in memory, > + * so avoid taking zone->lru_lock and isolating it unnecessarily > + * in an admittedly racy check. > + */ > + if (!page_mapping(page) && page_count(page)) > + continue; > + Are you sure about this? The page_count check migration does is this int expected_count = 1 + extra_count; if (!mapping) { if (page_count(page) != expected_count) return -EAGAIN; return MIGRATEPAGE_SUCCESS; } spin_lock_irq(&mapping->tree_lock); pslot = radix_tree_lookup_slot(&mapping->page_tree, page_index(page)); expected_count += 1 + page_has_private(page); Migration expects and can migrate pages with no mapping and a page count but you are now skipping them. I think you may have intended to split migrations page count into a helper or copy the logic. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages 2014-02-03 9:53 ` Mel Gorman @ 2014-02-03 10:49 ` David Rientjes 2014-02-04 0:02 ` Joonsoo Kim 2014-02-04 2:44 ` [patch] " Hugh Dickins 0 siblings, 2 replies; 18+ messages in thread From: David Rientjes @ 2014-02-03 10:49 UTC (permalink / raw) To: Mel Gorman; +Cc: Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Mon, 3 Feb 2014, Mel Gorman wrote: > > Page migration will fail for memory that is pinned in memory with, for > > example, get_user_pages(). In this case, it is unnecessary to take > > zone->lru_lock or isolating the page and passing it to page migration > > which will ultimately fail. > > > > This is a racy check, the page can still change from under us, but in > > that case we'll just fail later when attempting to move the page. > > > > This avoids very expensive memory compaction when faulting transparent > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > see the enormous disparity in the number of page migration failures > > because of the pinning (from /proc/vmstat): > > > > compact_blocks_moved 7609 > > compact_pages_moved 3431 > > compact_pagemigrate_failed 133219 > > compact_stall 13 > > > > After the patch, it is much more efficient: > > > > compact_blocks_moved 7998 > > compact_pages_moved 6403 > > compact_pagemigrate_failed 3 > > compact_stall 15 > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > mm/compaction.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > continue; > > } > > > > + /* > > + * Migration will fail if an anonymous page is pinned in memory, > > + * so avoid taking zone->lru_lock and isolating it unnecessarily > > + * in an admittedly racy check. > > + */ > > + if (!page_mapping(page) && page_count(page)) > > + continue; > > + > > Are you sure about this? The page_count check migration does is this > > int expected_count = 1 + extra_count; > if (!mapping) { > if (page_count(page) != expected_count) > return -EAGAIN; > return MIGRATEPAGE_SUCCESS; > } > > spin_lock_irq(&mapping->tree_lock); > > pslot = radix_tree_lookup_slot(&mapping->page_tree, > page_index(page)); > > expected_count += 1 + page_has_private(page); > > Migration expects and can migrate pages with no mapping and a page count > but you are now skipping them. I think you may have intended to split > migrations page count into a helper or copy the logic. > Thanks for taking a look! The patch is correct, it just shows my lack of a complete commit message which I'm struggling with recently. In the case that this is addressing, get_user_pages() already gives page_count(page) == 1, then __isolate_lru_page() does another get_page() that is dropped in putback_lru_page() after the call into migrate_pages(). So in the code you quote above we always have page_count(page) == 2 and expected_count == 1. So what we desperately need to do is avoid isolating any page where page_count(page) is non-zero and !page_mapping(page) and do that before the get_page() in __isolate_lru_page() because we want to avoid taking zone->lru_lock. On my 128GB machine filled with ~120GB of pinned memory for the driver, this lock gets highly contended under compaction and even reclaim if the rest of userspace is using a lot of memory. It's not really relevant to the commit message, but I found that if all that ~120GB is faulted and I manually invoke compaction with the procfs trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets taken ~450,000 times and only 0.05% of isolated pages are actually successfully migrated. Deferred compaction will certainly help for compaction that isn't induced via procfs, but we've encountered massive amounts of lock contention in this path and extremely low success to failure ratios of page migration on average of 2-3 out of 60 runs and the fault path really does grind to a halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver does ib_umem_get() for 120GB of memory, but we want those hugepages!). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages 2014-02-03 10:49 ` David Rientjes @ 2014-02-04 0:02 ` Joonsoo Kim 2014-02-04 1:20 ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes 2014-02-04 2:44 ` [patch] " Hugh Dickins 1 sibling, 1 reply; 18+ messages in thread From: Joonsoo Kim @ 2014-02-04 0:02 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Mon, Feb 03, 2014 at 02:49:32AM -0800, David Rientjes wrote: > On Mon, 3 Feb 2014, Mel Gorman wrote: > > > > Page migration will fail for memory that is pinned in memory with, for > > > example, get_user_pages(). In this case, it is unnecessary to take > > > zone->lru_lock or isolating the page and passing it to page migration > > > which will ultimately fail. > > > > > > This is a racy check, the page can still change from under us, but in > > > that case we'll just fail later when attempting to move the page. > > > > > > This avoids very expensive memory compaction when faulting transparent > > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > > see the enormous disparity in the number of page migration failures > > > because of the pinning (from /proc/vmstat): > > > > > > compact_blocks_moved 7609 > > > compact_pages_moved 3431 > > > compact_pagemigrate_failed 133219 > > > compact_stall 13 > > > > > > After the patch, it is much more efficient: > > > > > > compact_blocks_moved 7998 > > > compact_pages_moved 6403 > > > compact_pagemigrate_failed 3 > > > compact_stall 15 > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > --- > > > mm/compaction.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > > continue; > > > } > > > > > > + /* > > > + * Migration will fail if an anonymous page is pinned in memory, > > > + * so avoid taking zone->lru_lock and isolating it unnecessarily > > > + * in an admittedly racy check. > > > + */ > > > + if (!page_mapping(page) && page_count(page)) > > > + continue; > > > + Hello, I think that you need more code to skip this type of page correctly. Without page_mapped() check, this code makes migratable pages be skipped, since if page_mapped() case, page_count() may be more than zero. So I think that you need following change. (!page_mapping(page) && !page_mapped(page) && page_count(page)) Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 0:02 ` Joonsoo Kim @ 2014-02-04 1:20 ` David Rientjes 2014-02-04 1:53 ` Joonsoo Kim 0 siblings, 1 reply; 18+ messages in thread From: David Rientjes @ 2014-02-04 1:20 UTC (permalink / raw) To: Joonsoo Kim Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Tue, 4 Feb 2014, Joonsoo Kim wrote: > I think that you need more code to skip this type of page correctly. > Without page_mapped() check, this code makes migratable pages be skipped, > since if page_mapped() case, page_count() may be more than zero. > > So I think that you need following change. > > (!page_mapping(page) && !page_mapped(page) && page_count(page)) > These pages returned by get_user_pages() will have a mapcount of 1 so this wouldn't actually fix the massive lock contention. page_mapping() is only going to be NULL for pages off the lru like these are for PAGE_MAPPING_ANON. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 1:20 ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes @ 2014-02-04 1:53 ` Joonsoo Kim 2014-02-04 2:00 ` David Rientjes 0 siblings, 1 reply; 18+ messages in thread From: Joonsoo Kim @ 2014-02-04 1:53 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Mon, Feb 03, 2014 at 05:20:46PM -0800, David Rientjes wrote: > On Tue, 4 Feb 2014, Joonsoo Kim wrote: > > > I think that you need more code to skip this type of page correctly. > > Without page_mapped() check, this code makes migratable pages be skipped, > > since if page_mapped() case, page_count() may be more than zero. > > > > So I think that you need following change. > > > > (!page_mapping(page) && !page_mapped(page) && page_count(page)) > > > > These pages returned by get_user_pages() will have a mapcount of 1 so this > wouldn't actually fix the massive lock contention. page_mapping() is only > going to be NULL for pages off the lru like these are for > PAGE_MAPPING_ANON. Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped and have positive page_count(), so your code such as '!page_mapping(page) && page_count(page)' makes compaction skip these *normal* anon pages and this is incorrect behaviour. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 1:53 ` Joonsoo Kim @ 2014-02-04 2:00 ` David Rientjes 2014-02-04 2:15 ` Joonsoo Kim 0 siblings, 1 reply; 18+ messages in thread From: David Rientjes @ 2014-02-04 2:00 UTC (permalink / raw) To: Joonsoo Kim Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Tue, 4 Feb 2014, Joonsoo Kim wrote: > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped > and have positive page_count(), so your code such as > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal* > anon pages and this is incorrect behaviour. > So how does that work with migrate_page_move_mapping() which demands page_count(page) == 1 and the get_page_unless_zero() in __isolate_lru_page()? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 2:00 ` David Rientjes @ 2014-02-04 2:15 ` Joonsoo Kim 2014-02-04 2:50 ` David Rientjes 0 siblings, 1 reply; 18+ messages in thread From: Joonsoo Kim @ 2014-02-04 2:15 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Mon, Feb 03, 2014 at 06:00:56PM -0800, David Rientjes wrote: > On Tue, 4 Feb 2014, Joonsoo Kim wrote: > > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped > > and have positive page_count(), so your code such as > > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal* > > anon pages and this is incorrect behaviour. > > > > So how does that work with migrate_page_move_mapping() which demands > page_count(page) == 1 and the get_page_unless_zero() in > __isolate_lru_page()? Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all mapping is unmapped. Then, remained page_count() is 1 which is grabbed by __isolate_lru_page(). Am I missing something? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 2:15 ` Joonsoo Kim @ 2014-02-04 2:50 ` David Rientjes 2014-02-04 3:47 ` Hugh Dickins 2014-02-05 2:44 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes 0 siblings, 2 replies; 18+ messages in thread From: David Rientjes @ 2014-02-04 2:50 UTC (permalink / raw) To: Joonsoo Kim Cc: Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Tue, 4 Feb 2014, Joonsoo Kim wrote: > > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped > > > and have positive page_count(), so your code such as > > > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal* > > > anon pages and this is incorrect behaviour. > > > > > > > So how does that work with migrate_page_move_mapping() which demands > > page_count(page) == 1 and the get_page_unless_zero() in > > __isolate_lru_page()? > > Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all > mapping is unmapped. Then, remained page_count() is 1 which is grabbed by > __isolate_lru_page(). Am I missing something? > Ah, good point. I wonder if we can get away with page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() pin? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages fix 2014-02-04 2:50 ` David Rientjes @ 2014-02-04 3:47 ` Hugh Dickins 2014-02-05 2:44 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes 1 sibling, 0 replies; 18+ messages in thread From: Hugh Dickins @ 2014-02-04 3:47 UTC (permalink / raw) To: David Rientjes Cc: Joonsoo Kim, Mel Gorman, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Mon, 3 Feb 2014, David Rientjes wrote: > On Tue, 4 Feb 2014, Joonsoo Kim wrote: > > > > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped > > > > and have positive page_count(), so your code such as > > > > '!page_mapping(page) && page_count(page)' makes compaction skip these *normal* > > > > anon pages and this is incorrect behaviour. > > > > > > > > > > So how does that work with migrate_page_move_mapping() which demands > > > page_count(page) == 1 and the get_page_unless_zero() in > > > __isolate_lru_page()? > > > > Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all > > mapping is unmapped. Then, remained page_count() is 1 which is grabbed by > > __isolate_lru_page(). Am I missing something? > > > > Ah, good point. I wonder if we can get away with > page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() > pin? Something like that. But please go back to migrate_page_move_mapping() to factor in what it's additionally considering. Whether you can share code with it, I don't know - it has to do some things under a lock you cannot take at the preliminary stage - you haven't isolated or locked the page yet. There is a separate issue, that a mapping may supply its own non-default mapping->a_ops->migratepage(): can we assume that the page_counting is the same whatever migratepage() is in use? I'm not sure. If you stick to special-casing PageAnon pages, you won't face that issue; but your proposed change would be a lot more satisfying if we can convince ourselves that it's good for !PageAnon too. May need a trawl through the different migratepage() methods that exist in tree. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-04 2:50 ` David Rientjes 2014-02-04 3:47 ` Hugh Dickins @ 2014-02-05 2:44 ` David Rientjes 2014-02-05 20:56 ` Hugh Dickins 2014-02-06 18:48 ` Vlastimil Babka 1 sibling, 2 replies; 18+ messages in thread From: David Rientjes @ 2014-02-05 2:44 UTC (permalink / raw) To: Andrew Morton Cc: Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, linux-kernel, linux-mm Page migration will fail for memory that is pinned in memory with, for example, get_user_pages(). In this case, it is unnecessary to take zone->lru_lock or isolating the page and passing it to page migration which will ultimately fail. This is a racy check, the page can still change from under us, but in that case we'll just fail later when attempting to move the page. This avoids very expensive memory compaction when faulting transparent hugepages after pinning a lot of memory with a Mellanox driver. On a 128GB machine and pinning ~120GB of memory, before this patch we see the enormous disparity in the number of page migration failures because of the pinning (from /proc/vmstat): compact_pages_moved 8450 compact_pagemigrate_failed 15614415 0.05% of pages isolated are successfully migrated and explicitly triggering memory compaction takes 102 seconds. After the patch: compact_pages_moved 9197 compact_pagemigrate_failed 7 99.9% of pages isolated are now successfully migrated in this configuration and memory compaction takes less than one second. Signed-off-by: David Rientjes <rientjes@google.com> --- v2: address page count issue per Joonsoo mm/compaction.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c --- a/mm/compaction.c +++ b/mm/compaction.c @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, continue; } + /* + * Migration will fail if an anonymous page is pinned in memory, + * so avoid taking lru_lock and isolating it unnecessarily in an + * admittedly racy check. + */ + if (!page_mapping(page) && + page_count(page) > page_mapcount(page)) + continue; + /* Check if it is ok to still hold the lock */ locked = compact_checklock_irqsave(&zone->lru_lock, &flags, locked, cc); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-05 2:44 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes @ 2014-02-05 20:56 ` Hugh Dickins 2014-02-06 0:05 ` Joonsoo Kim 2014-02-06 18:48 ` Vlastimil Babka 1 sibling, 1 reply; 18+ messages in thread From: Hugh Dickins @ 2014-02-05 20:56 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, Greg Thelen, linux-kernel, linux-mm On Tue, 4 Feb 2014, David Rientjes wrote: > Page migration will fail for memory that is pinned in memory with, for > example, get_user_pages(). In this case, it is unnecessary to take > zone->lru_lock or isolating the page and passing it to page migration > which will ultimately fail. > > This is a racy check, the page can still change from under us, but in > that case we'll just fail later when attempting to move the page. > > This avoids very expensive memory compaction when faulting transparent > hugepages after pinning a lot of memory with a Mellanox driver. > > On a 128GB machine and pinning ~120GB of memory, before this patch we > see the enormous disparity in the number of page migration failures > because of the pinning (from /proc/vmstat): > > compact_pages_moved 8450 > compact_pagemigrate_failed 15614415 > > 0.05% of pages isolated are successfully migrated and explicitly > triggering memory compaction takes 102 seconds. After the patch: > > compact_pages_moved 9197 > compact_pagemigrate_failed 7 > > 99.9% of pages isolated are now successfully migrated in this > configuration and memory compaction takes less than one second. > > Signed-off-by: David Rientjes <rientjes@google.com> > --- > v2: address page count issue per Joonsoo > > mm/compaction.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > continue; > } > > + /* > + * Migration will fail if an anonymous page is pinned in memory, > + * so avoid taking lru_lock and isolating it unnecessarily in an > + * admittedly racy check. > + */ > + if (!page_mapping(page) && > + page_count(page) > page_mapcount(page)) > + continue; > + > /* Check if it is ok to still hold the lock */ > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, Much better, maybe good enough as an internal patch to fix a particular problem you're seeing; but not yet good enough to go upstream. Anonymous pages are not the only pages which might be pinned, and your test doesn't mention PageAnon, so does not match your comment. I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives more assurance that a page_count - page_has_private test is appropriate, whatever the filesystem and migrate method to be used. So I think the test you're looking for is pincount = page_count(page) - page_mapcount(page); if (page_mapping(page)) pincount -= 1 + page_has_private(page); if (pincount > 0) continue; but please cross-check and test that out, it's easy to be off-by-one etc. For a moment I thought a PageWriteback test would be useful too, but no, that should already appear in the pincount. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-05 20:56 ` Hugh Dickins @ 2014-02-06 0:05 ` Joonsoo Kim 2014-02-06 1:16 ` Hugh Dickins 0 siblings, 1 reply; 18+ messages in thread From: Joonsoo Kim @ 2014-02-06 0:05 UTC (permalink / raw) To: Hugh Dickins Cc: David Rientjes, Andrew Morton, Mel Gorman, Rik van Riel, Greg Thelen, linux-kernel, linux-mm On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote: > On Tue, 4 Feb 2014, David Rientjes wrote: > > > Page migration will fail for memory that is pinned in memory with, for > > example, get_user_pages(). In this case, it is unnecessary to take > > zone->lru_lock or isolating the page and passing it to page migration > > which will ultimately fail. > > > > This is a racy check, the page can still change from under us, but in > > that case we'll just fail later when attempting to move the page. > > > > This avoids very expensive memory compaction when faulting transparent > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > see the enormous disparity in the number of page migration failures > > because of the pinning (from /proc/vmstat): > > > > compact_pages_moved 8450 > > compact_pagemigrate_failed 15614415 > > > > 0.05% of pages isolated are successfully migrated and explicitly > > triggering memory compaction takes 102 seconds. After the patch: > > > > compact_pages_moved 9197 > > compact_pagemigrate_failed 7 > > > > 99.9% of pages isolated are now successfully migrated in this > > configuration and memory compaction takes less than one second. > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > v2: address page count issue per Joonsoo > > > > mm/compaction.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > continue; > > } > > > > + /* > > + * Migration will fail if an anonymous page is pinned in memory, > > + * so avoid taking lru_lock and isolating it unnecessarily in an > > + * admittedly racy check. > > + */ > > + if (!page_mapping(page) && > > + page_count(page) > page_mapcount(page)) > > + continue; > > + > > /* Check if it is ok to still hold the lock */ > > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, > > Much better, maybe good enough as an internal patch to fix a particular > problem you're seeing; but not yet good enough to go upstream. > > Anonymous pages are not the only pages which might be pinned, > and your test doesn't mention PageAnon, so does not match your comment. > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives > more assurance that a page_count - page_has_private test is appropriate, > whatever the filesystem and migrate method to be used. > > So I think the test you're looking for is > > pincount = page_count(page) - page_mapcount(page); > if (page_mapping(page)) > pincount -= 1 + page_has_private(page); > if (pincount > 0) > continue; > > but please cross-check and test that out, it's easy to be off-by-one etc. Hello, Hugh. I don't think that this is right. One of migratepage function, aio_migratepage(), pass extra count 1 to migrate_page_move_mapping(). So it can be migrated when pincount == 1 in above test. I think that we should not be aggressive here. This is just for prediction so that it is better not to skip apropriate pages at most. Just for anon case that we are sure easily is the right solution for me. Thanks. > > For a moment I thought a PageWriteback test would be useful too, > but no, that should already appear in the pincount. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-06 0:05 ` Joonsoo Kim @ 2014-02-06 1:16 ` Hugh Dickins 2014-02-06 13:53 ` Mel Gorman 0 siblings, 1 reply; 18+ messages in thread From: Hugh Dickins @ 2014-02-06 1:16 UTC (permalink / raw) To: Joonsoo Kim Cc: Hugh Dickins, David Rientjes, Andrew Morton, Mel Gorman, Rik van Riel, Greg Thelen, linux-kernel, linux-mm On Thu, 6 Feb 2014, Joonsoo Kim wrote: > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote: > > On Tue, 4 Feb 2014, David Rientjes wrote: > > > > > Page migration will fail for memory that is pinned in memory with, for > > > example, get_user_pages(). In this case, it is unnecessary to take > > > zone->lru_lock or isolating the page and passing it to page migration > > > which will ultimately fail. > > > > > > This is a racy check, the page can still change from under us, but in > > > that case we'll just fail later when attempting to move the page. > > > > > > This avoids very expensive memory compaction when faulting transparent > > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > > see the enormous disparity in the number of page migration failures > > > because of the pinning (from /proc/vmstat): > > > > > > compact_pages_moved 8450 > > > compact_pagemigrate_failed 15614415 > > > > > > 0.05% of pages isolated are successfully migrated and explicitly > > > triggering memory compaction takes 102 seconds. After the patch: > > > > > > compact_pages_moved 9197 > > > compact_pagemigrate_failed 7 > > > > > > 99.9% of pages isolated are now successfully migrated in this > > > configuration and memory compaction takes less than one second. > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > --- > > > v2: address page count issue per Joonsoo > > > > > > mm/compaction.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > > continue; > > > } > > > > > > + /* > > > + * Migration will fail if an anonymous page is pinned in memory, > > > + * so avoid taking lru_lock and isolating it unnecessarily in an > > > + * admittedly racy check. > > > + */ > > > + if (!page_mapping(page) && > > > + page_count(page) > page_mapcount(page)) > > > + continue; > > > + > > > /* Check if it is ok to still hold the lock */ > > > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, > > > > Much better, maybe good enough as an internal patch to fix a particular > > problem you're seeing; but not yet good enough to go upstream. > > > > Anonymous pages are not the only pages which might be pinned, > > and your test doesn't mention PageAnon, so does not match your comment. > > > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives > > more assurance that a page_count - page_has_private test is appropriate, > > whatever the filesystem and migrate method to be used. > > > > So I think the test you're looking for is > > > > pincount = page_count(page) - page_mapcount(page); > > if (page_mapping(page)) > > pincount -= 1 + page_has_private(page); > > if (pincount > 0) > > continue; > > > > but please cross-check and test that out, it's easy to be off-by-one etc. > > Hello, Hugh. > > I don't think that this is right. > One of migratepage function, aio_migratepage(), pass extra count 1 to > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in > above test. > > I think that we should not be aggressive here. This is just for prediction > so that it is better not to skip apropriate pages at most. Just for anon case > that we are sure easily is the right solution for me. Interesting, thank you for the pointer. That's a pity! I hope that later on we can modify fs/aio.c to set PagePrivate on ring pages, revert the extra argument to migrate_page_move_mapping(), and then let it appear the same as the other filesystems (but lacking a writepage, reclaim won't try to free the pages). But that's "later on" and may prove impossible in the implementation. I agree it's beyond the scope of David's patch, and so only anonymous should be dealt with in this way at present. And since page_mapping() is non-NULL on PageAnon PageSwapCache pages, those will fall through David's test and go on to try migration: which is the correct default. Although we could add code to handle pinned swapcache, it would be rather an ugly excrescence, until the case gets handled naturally when proper page_mapping() support is added later. Okay, to David's current patch Acked-by: Hugh Dickins <hughd@google.com> though I'd like to hear whether Mel is happy with it. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-06 1:16 ` Hugh Dickins @ 2014-02-06 13:53 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2014-02-06 13:53 UTC (permalink / raw) To: Hugh Dickins Cc: Joonsoo Kim, David Rientjes, Andrew Morton, Rik van Riel, Greg Thelen, linux-kernel, linux-mm On Wed, Feb 05, 2014 at 05:16:06PM -0800, Hugh Dickins wrote: > On Thu, 6 Feb 2014, Joonsoo Kim wrote: > > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote: > > > On Tue, 4 Feb 2014, David Rientjes wrote: > > > > > > > Page migration will fail for memory that is pinned in memory with, for > > > > example, get_user_pages(). In this case, it is unnecessary to take > > > > zone->lru_lock or isolating the page and passing it to page migration > > > > which will ultimately fail. > > > > > > > > This is a racy check, the page can still change from under us, but in > > > > that case we'll just fail later when attempting to move the page. > > > > > > > > This avoids very expensive memory compaction when faulting transparent > > > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > > > see the enormous disparity in the number of page migration failures > > > > because of the pinning (from /proc/vmstat): > > > > > > > > compact_pages_moved 8450 > > > > compact_pagemigrate_failed 15614415 > > > > > > > > 0.05% of pages isolated are successfully migrated and explicitly > > > > triggering memory compaction takes 102 seconds. After the patch: > > > > > > > > compact_pages_moved 9197 > > > > compact_pagemigrate_failed 7 > > > > > > > > 99.9% of pages isolated are now successfully migrated in this > > > > configuration and memory compaction takes less than one second. > > > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > > --- > > > > v2: address page count issue per Joonsoo > > > > > > > > mm/compaction.c | 9 +++++++++ > > > > 1 file changed, 9 insertions(+) > > > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > > --- a/mm/compaction.c > > > > +++ b/mm/compaction.c > > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > > > continue; > > > > } > > > > > > > > + /* > > > > + * Migration will fail if an anonymous page is pinned in memory, > > > > + * so avoid taking lru_lock and isolating it unnecessarily in an > > > > + * admittedly racy check. > > > > + */ > > > > + if (!page_mapping(page) && > > > > + page_count(page) > page_mapcount(page)) > > > > + continue; > > > > + > > > > /* Check if it is ok to still hold the lock */ > > > > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, > > > > > > Much better, maybe good enough as an internal patch to fix a particular > > > problem you're seeing; but not yet good enough to go upstream. > > > > > > Anonymous pages are not the only pages which might be pinned, > > > and your test doesn't mention PageAnon, so does not match your comment. > > > > > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives > > > more assurance that a page_count - page_has_private test is appropriate, > > > whatever the filesystem and migrate method to be used. > > > > > > So I think the test you're looking for is > > > > > > pincount = page_count(page) - page_mapcount(page); > > > if (page_mapping(page)) > > > pincount -= 1 + page_has_private(page); > > > if (pincount > 0) > > > continue; > > > > > > but please cross-check and test that out, it's easy to be off-by-one etc. > > > > Hello, Hugh. > > > > I don't think that this is right. > > One of migratepage function, aio_migratepage(), pass extra count 1 to > > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in > > above test. > > > > I think that we should not be aggressive here. This is just for prediction > > so that it is better not to skip apropriate pages at most. Just for anon case > > that we are sure easily is the right solution for me. > > Interesting, thank you for the pointer. That's a pity! > > I hope that later on we can modify fs/aio.c to set PagePrivate on > ring pages, revert the extra argument to migrate_page_move_mapping(), > and then let it appear the same as the other filesystems (but lacking > a writepage, reclaim won't try to free the pages). > > But that's "later on" and may prove impossible in the implementation. > I agree it's beyond the scope of David's patch, and so only anonymous > should be dealt with in this way at present. > > And since page_mapping() is non-NULL on PageAnon PageSwapCache pages, > those will fall through David's test and go on to try migration: > which is the correct default. Although we could add code to handle > pinned swapcache, it would be rather an ugly excrescence, until the case > gets handled naturally when proper page_mapping() support is added later. > > Okay, to David's current patch > Acked-by: Hugh Dickins <hughd@google.com> > though I'd like to hear whether Mel is happy with it. > I have nothing useful to add other than Acked-by: Mel Gorman <mgorman@suse.de> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-05 2:44 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes 2014-02-05 20:56 ` Hugh Dickins @ 2014-02-06 18:48 ` Vlastimil Babka 2014-02-06 21:33 ` David Rientjes 1 sibling, 1 reply; 18+ messages in thread From: Vlastimil Babka @ 2014-02-06 18:48 UTC (permalink / raw) To: David Rientjes, Andrew Morton Cc: Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, linux-kernel, linux-mm On 5.2.2014 3:44, David Rientjes wrote: > Page migration will fail for memory that is pinned in memory with, for > example, get_user_pages(). In this case, it is unnecessary to take > zone->lru_lock or isolating the page and passing it to page migration > which will ultimately fail. > > This is a racy check, the page can still change from under us, but in > that case we'll just fail later when attempting to move the page. > > This avoids very expensive memory compaction when faulting transparent > hugepages after pinning a lot of memory with a Mellanox driver. > > On a 128GB machine and pinning ~120GB of memory, before this patch we > see the enormous disparity in the number of page migration failures > because of the pinning (from /proc/vmstat): > > compact_pages_moved 8450 > compact_pagemigrate_failed 15614415 > > 0.05% of pages isolated are successfully migrated and explicitly > triggering memory compaction takes 102 seconds. After the patch: > > compact_pages_moved 9197 > compact_pagemigrate_failed 7 > > 99.9% of pages isolated are now successfully migrated in this > configuration and memory compaction takes less than one second. > > Signed-off-by: David Rientjes <rientjes@google.com> > --- > v2: address page count issue per Joonsoo > > mm/compaction.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > continue; > } > > + /* > + * Migration will fail if an anonymous page is pinned in memory, > + * so avoid taking lru_lock and isolating it unnecessarily in an > + * admittedly racy check. > + */ > + if (!page_mapping(page) && > + page_count(page) > page_mapcount(page)) > + continue; > + Hm this page_count() seems it could substantially increase the chance of race with prep_compound_page that your patch "mm, page_alloc: make first_page visible before PageTail" tries to fix :) > /* Check if it is ok to still hold the lock */ > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, > locked, cc); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch v2] mm, compaction: avoid isolating pinned pages 2014-02-06 18:48 ` Vlastimil Babka @ 2014-02-06 21:33 ` David Rientjes 0 siblings, 0 replies; 18+ messages in thread From: David Rientjes @ 2014-02-06 21:33 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Mel Gorman, Rik van Riel, linux-kernel, linux-mm On Thu, 6 Feb 2014, Vlastimil Babka wrote: > > Page migration will fail for memory that is pinned in memory with, for > > example, get_user_pages(). In this case, it is unnecessary to take > > zone->lru_lock or isolating the page and passing it to page migration > > which will ultimately fail. > > > > This is a racy check, the page can still change from under us, but in > > that case we'll just fail later when attempting to move the page. > > > > This avoids very expensive memory compaction when faulting transparent > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > see the enormous disparity in the number of page migration failures > > because of the pinning (from /proc/vmstat): > > > > compact_pages_moved 8450 > > compact_pagemigrate_failed 15614415 > > > > 0.05% of pages isolated are successfully migrated and explicitly > > triggering memory compaction takes 102 seconds. After the patch: > > > > compact_pages_moved 9197 > > compact_pagemigrate_failed 7 > > > > 99.9% of pages isolated are now successfully migrated in this > > configuration and memory compaction takes less than one second. > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > v2: address page count issue per Joonsoo > > > > mm/compaction.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct > > compact_control *cc, > > continue; > > } > > + /* > > + * Migration will fail if an anonymous page is pinned in > > memory, > > + * so avoid taking lru_lock and isolating it unnecessarily in > > an > > + * admittedly racy check. > > + */ > > + if (!page_mapping(page) && > > + page_count(page) > page_mapcount(page)) > > + continue; > > + > > Hm this page_count() seems it could substantially increase the chance of race > with prep_compound_page that your patch "mm, page_alloc: make first_page > visible before PageTail" tries to fix :) > That's why I sent the fix for page_count(). The "racy check" the comment eludes to above concerns the fact that page_count() and page_mapcount() can change out from under us before isolation and if we had not avoided isolating them that they would have been migratable later. We accept that as a consequence of doing this in a lockless way without page references. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch] mm, compaction: avoid isolating pinned pages 2014-02-03 10:49 ` David Rientjes 2014-02-04 0:02 ` Joonsoo Kim @ 2014-02-04 2:44 ` Hugh Dickins 1 sibling, 0 replies; 18+ messages in thread From: Hugh Dickins @ 2014-02-04 2:44 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Andrew Morton, Rik van Riel, Joonsoo Kim, linux-kernel, linux-mm On Mon, 3 Feb 2014, David Rientjes wrote: > On Mon, 3 Feb 2014, Mel Gorman wrote: > > > > Page migration will fail for memory that is pinned in memory with, for > > > example, get_user_pages(). In this case, it is unnecessary to take > > > zone->lru_lock or isolating the page and passing it to page migration > > > which will ultimately fail. > > > > > > This is a racy check, the page can still change from under us, but in > > > that case we'll just fail later when attempting to move the page. > > > > > > This avoids very expensive memory compaction when faulting transparent > > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > > see the enormous disparity in the number of page migration failures > > > because of the pinning (from /proc/vmstat): 120GB of memory on the active/inactive lrus but longterm pinned, that's quite worrying: not just a great waste of time for compaction, but for page reclaim also. I suppose a fairly easy way around it would be for the driver to use mlock too, moving them all to unevictable lru. But in general, you may well be right that, racy as this isolation/ migration procedure necessarily is, in the face of longterm pinning it may make more sense to test page_count before proceding to isolation rather than only after in migration. We always took the view that it's better to give up only at the last moment, but that may be a bad bet. > > > > > > compact_blocks_moved 7609 > > > compact_pages_moved 3431 > > > compact_pagemigrate_failed 133219 > > > compact_stall 13 > > > > > > After the patch, it is much more efficient: > > > > > > compact_blocks_moved 7998 > > > compact_pages_moved 6403 > > > compact_pagemigrate_failed 3 > > > compact_stall 15 > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > --- > > > mm/compaction.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > > continue; > > > } > > > > > > + /* > > > + * Migration will fail if an anonymous page is pinned in memory, > > > + * so avoid taking zone->lru_lock and isolating it unnecessarily > > > + * in an admittedly racy check. > > > + */ > > > + if (!page_mapping(page) && page_count(page)) > > > + continue; > > > + > > > > Are you sure about this? The page_count check migration does is this > > > > int expected_count = 1 + extra_count; > > if (!mapping) { > > if (page_count(page) != expected_count) > > return -EAGAIN; > > return MIGRATEPAGE_SUCCESS; > > } > > > > spin_lock_irq(&mapping->tree_lock); > > > > pslot = radix_tree_lookup_slot(&mapping->page_tree, > > page_index(page)); > > > > expected_count += 1 + page_has_private(page); > > > > Migration expects and can migrate pages with no mapping and a page count > > but you are now skipping them. I think you may have intended to split > > migrations page count into a helper or copy the logic. > > > > Thanks for taking a look! > > The patch is correct, it just shows my lack of a complete commit message I don't think so. I agree with Mel that you should be reconsidering those tests that migrate_page_move_mapping() makes, but remembering that it's called at a stage between try_to_unmap() and remove_migration_ptes(), when page_mapcount has been brought down to 0 - not the case here. > which I'm struggling with recently. In the case that this is addressing, > get_user_pages() already gives page_count(page) == 1, then But get_user_pages() brings the pages into user address space (if not already there), page_mapcount 1 and page_count 1, and does an additional pin on the page, page_count 2. Or if it's a page_mapping page (perhaps even PageAnon in SwapCache) there's another +1; if page_has_buffers another +1; mapped into more user address spaces, +more. Your if (!page_mapping(page) && page_count(page)) continue; is letting through any Anon SwapCache pages (probably no great concern in your 120GB example; but I don't understand why you want to special- case Anon anyway, beyond your specific testcase); and refusing to isolate all those unpinned anonymous pages mapped into userspace which migration is perfectly capable of migrating. If 120GB out of 128GB is pinned, that won't be a significant proportion, and of course your change saves a lot of wasted time and lock contention; but for most people it's a considerable proportion of their memory, and needs to be migratable. I think Joonsoo is making the same point (though I disagree with the test he suggested); but I've not yet read the latest mails under a separate subject (" fix" appended). Hugh > __isolate_lru_page() does another get_page() that is dropped in > putback_lru_page() after the call into migrate_pages(). So in the code > you quote above we always have page_count(page) == 2 and > expected_count == 1. > > So what we desperately need to do is avoid isolating any page where > page_count(page) is non-zero and !page_mapping(page) and do that before > the get_page() in __isolate_lru_page() because we want to avoid taking > zone->lru_lock. On my 128GB machine filled with ~120GB of pinned memory > for the driver, this lock gets highly contended under compaction and even > reclaim if the rest of userspace is using a lot of memory. > > It's not really relevant to the commit message, but I found that if all > that ~120GB is faulted and I manually invoke compaction with the procfs > trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets > taken ~450,000 times and only 0.05% of isolated pages are actually > successfully migrated. > > Deferred compaction will certainly help for compaction that isn't induced > via procfs, but we've encountered massive amounts of lock contention in > this path and extremely low success to failure ratios of page migration on > average of 2-3 out of 60 runs and the fault path really does grind to a > halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver > does ib_umem_get() for 120GB of memory, but we want those hugepages!). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-02-07 0:45 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-02 5:46 [patch] mm, compaction: avoid isolating pinned pages David Rientjes 2014-02-03 9:53 ` Mel Gorman 2014-02-03 10:49 ` David Rientjes 2014-02-04 0:02 ` Joonsoo Kim 2014-02-04 1:20 ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes 2014-02-04 1:53 ` Joonsoo Kim 2014-02-04 2:00 ` David Rientjes 2014-02-04 2:15 ` Joonsoo Kim 2014-02-04 2:50 ` David Rientjes 2014-02-04 3:47 ` Hugh Dickins 2014-02-05 2:44 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes 2014-02-05 20:56 ` Hugh Dickins 2014-02-06 0:05 ` Joonsoo Kim 2014-02-06 1:16 ` Hugh Dickins 2014-02-06 13:53 ` Mel Gorman 2014-02-06 18:48 ` Vlastimil Babka 2014-02-06 21:33 ` David Rientjes 2014-02-04 2:44 ` [patch] " Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).