From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f169.google.com (mail-we0-f169.google.com [74.125.82.169]) by kanga.kvack.org (Postfix) with ESMTP id 5ED286B0035 for ; Thu, 6 Feb 2014 19:45:35 -0500 (EST) Received: by mail-we0-f169.google.com with SMTP id t61so1866819wes.28 for ; Thu, 06 Feb 2014 16:45:34 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id hq3si11427400wib.38.2014.02.06.05.54.37 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 06 Feb 2014 05:55:07 -0800 (PST) Date: Thu, 6 Feb 2014 13:53:37 +0000 From: Mel Gorman Subject: Re: [patch v2] mm, compaction: avoid isolating pinned pages Message-ID: <20140206135337.GV6732@suse.de> References: <20140204000237.GA17331@lge.com> <20140204015332.GA14779@lge.com> <20140204021533.GA14924@lge.com> <20140206000504.GA17465@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Joonsoo Kim , David Rientjes , Andrew Morton , Rik van Riel , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Wed, Feb 05, 2014 at 05:16:06PM -0800, Hugh Dickins wrote: > On Thu, 6 Feb 2014, Joonsoo Kim wrote: > > On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote: > > > On Tue, 4 Feb 2014, David Rientjes wrote: > > > > > > > Page migration will fail for memory that is pinned in memory with, for > > > > example, get_user_pages(). In this case, it is unnecessary to take > > > > zone->lru_lock or isolating the page and passing it to page migration > > > > which will ultimately fail. > > > > > > > > This is a racy check, the page can still change from under us, but in > > > > that case we'll just fail later when attempting to move the page. > > > > > > > > This avoids very expensive memory compaction when faulting transparent > > > > hugepages after pinning a lot of memory with a Mellanox driver. > > > > > > > > On a 128GB machine and pinning ~120GB of memory, before this patch we > > > > see the enormous disparity in the number of page migration failures > > > > because of the pinning (from /proc/vmstat): > > > > > > > > compact_pages_moved 8450 > > > > compact_pagemigrate_failed 15614415 > > > > > > > > 0.05% of pages isolated are successfully migrated and explicitly > > > > triggering memory compaction takes 102 seconds. After the patch: > > > > > > > > compact_pages_moved 9197 > > > > compact_pagemigrate_failed 7 > > > > > > > > 99.9% of pages isolated are now successfully migrated in this > > > > configuration and memory compaction takes less than one second. > > > > > > > > Signed-off-by: David Rientjes > > > > --- > > > > v2: address page count issue per Joonsoo > > > > > > > > mm/compaction.c | 9 +++++++++ > > > > 1 file changed, 9 insertions(+) > > > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > > --- a/mm/compaction.c > > > > +++ b/mm/compaction.c > > > > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, > > > > continue; > > > > } > > > > > > > > + /* > > > > + * Migration will fail if an anonymous page is pinned in memory, > > > > + * so avoid taking lru_lock and isolating it unnecessarily in an > > > > + * admittedly racy check. > > > > + */ > > > > + if (!page_mapping(page) && > > > > + page_count(page) > page_mapcount(page)) > > > > + continue; > > > > + > > > > /* Check if it is ok to still hold the lock */ > > > > locked = compact_checklock_irqsave(&zone->lru_lock, &flags, > > > > > > Much better, maybe good enough as an internal patch to fix a particular > > > problem you're seeing; but not yet good enough to go upstream. > > > > > > Anonymous pages are not the only pages which might be pinned, > > > and your test doesn't mention PageAnon, so does not match your comment. > > > > > > I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives > > > more assurance that a page_count - page_has_private test is appropriate, > > > whatever the filesystem and migrate method to be used. > > > > > > So I think the test you're looking for is > > > > > > pincount = page_count(page) - page_mapcount(page); > > > if (page_mapping(page)) > > > pincount -= 1 + page_has_private(page); > > > if (pincount > 0) > > > continue; > > > > > > but please cross-check and test that out, it's easy to be off-by-one etc. > > > > Hello, Hugh. > > > > I don't think that this is right. > > One of migratepage function, aio_migratepage(), pass extra count 1 to > > migrate_page_move_mapping(). So it can be migrated when pincount == 1 in > > above test. > > > > I think that we should not be aggressive here. This is just for prediction > > so that it is better not to skip apropriate pages at most. Just for anon case > > that we are sure easily is the right solution for me. > > Interesting, thank you for the pointer. That's a pity! > > I hope that later on we can modify fs/aio.c to set PagePrivate on > ring pages, revert the extra argument to migrate_page_move_mapping(), > and then let it appear the same as the other filesystems (but lacking > a writepage, reclaim won't try to free the pages). > > But that's "later on" and may prove impossible in the implementation. > I agree it's beyond the scope of David's patch, and so only anonymous > should be dealt with in this way at present. > > And since page_mapping() is non-NULL on PageAnon PageSwapCache pages, > those will fall through David's test and go on to try migration: > which is the correct default. Although we could add code to handle > pinned swapcache, it would be rather an ugly excrescence, until the case > gets handled naturally when proper page_mapping() support is added later. > > Okay, to David's current patch > Acked-by: Hugh Dickins > though I'd like to hear whether Mel is happy with it. > I have nothing useful to add other than Acked-by: Mel Gorman -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org