linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Simon Jeons <simon.jeons@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Petr Holasek <pholasek@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 7/11] ksm: make KSM page migration possible
Date: Sat, 26 Jan 2013 23:47:15 -0600	[thread overview]
Message-ID: <1359265635.6763.0.camel@kernel> (raw)
In-Reply-To: <alpine.LNX.2.00.1301251802050.29196@eggly.anvils>

On Fri, 2013-01-25 at 18:03 -0800, Hugh Dickins wrote:
> KSM page migration is already supported in the case of memory hotremove,
> which takes the ksm_thread_mutex across all its migrations to keep life
> simple.
> 
> But the new KSM NUMA merge_across_nodes knob introduces a problem, when
> it's set to non-default 0: if a KSM page is migrated to a different NUMA
> node, how do we migrate its stable node to the right tree?  And what if
> that collides with an existing stable node?
> 
> So far there's no provision for that, and this patch does not attempt
> to deal with it either.  But how will I test a solution, when I don't
> know how to hotremove memory?  The best answer is to enable KSM page
> migration in all cases now, and test more common cases.  With THP and
> compaction added since KSM came in, page migration is now mainstream,
> and it's a shame that a KSM page can frustrate freeing a page block.
> 
> Without worrying about merge_across_nodes 0 for now, this patch gets
> KSM page migration working reliably for default merge_across_nodes 1
> (but leave the patch enabling it until near the end of the series).
> 
> It's much simpler than I'd originally imagined, and does not require
> an additional tier of locking: page migration relies on the page lock,
> KSM page reclaim relies on the page lock, the page lock is enough for
> KSM page migration too.
> 
> Almost all the care has to be in get_ksm_page(): that's the function
> which worries about when a stable node is stale and should be freed,
> now it also has to worry about the KSM page being migrated.
> 
> The only new overhead is an additional put/get/lock/unlock_page when
> stable_tree_search() arrives at a matching node: to make sure migration
> respects the raised page count, and so does not migrate the page while
> we're busy with it here.  That's probably avoidable, either by changing
> internal interfaces from using kpage to stable_node, or by moving the
> ksm_migrate_page() callsite into a page_freeze_refs() section (even if
> not swapcache); but this works well, I've no urge to pull it apart now.
> 
> (Descents of the stable tree may pass through nodes whose KSM pages are
> under migration: being unlocked, the raised page count does not prevent
> that, nor need it: it's safe to memcmp against either old or new page.)
> 
> You might worry about mremap, and whether page migration's rmap_walk
> to remove migration entries will find all the KSM locations where it
> inserted earlier: that should already be handled, by the satisfyingly
> heavy hammer of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/ksm.c     |   94 ++++++++++++++++++++++++++++++++++++++-----------
>  mm/migrate.c |    5 ++
>  2 files changed, 77 insertions(+), 22 deletions(-)
> 
> --- mmotm.orig/mm/ksm.c	2013-01-25 14:37:00.768206145 -0800
> +++ mmotm/mm/ksm.c	2013-01-25 14:37:03.832206218 -0800
> @@ -499,6 +499,7 @@ static void remove_node_from_stable_tree
>   * In which case we can trust the content of the page, and it
>   * returns the gotten page; but if the page has now been zapped,
>   * remove the stale node from the stable tree and return NULL.
> + * But beware, the stable node's page might be being migrated.
>   *
>   * You would expect the stable_node to hold a reference to the ksm page.
>   * But if it increments the page's count, swapping out has to wait for
> @@ -509,44 +510,77 @@ static void remove_node_from_stable_tree
>   * pointing back to this stable node.  This relies on freeing a PageAnon
>   * page to reset its page->mapping to NULL, and relies on no other use of
>   * a page to put something that might look like our key in page->mapping.
> - *
> - * include/linux/pagemap.h page_cache_get_speculative() is a good reference,
> - * but this is different - made simpler by ksm_thread_mutex being held, but
> - * interesting for assuming that no other use of the struct page could ever
> - * put our expected_mapping into page->mapping (or a field of the union which
> - * coincides with page->mapping).
> - *
> - * Note: it is possible that get_ksm_page() will return NULL one moment,
> - * then page the next, if the page is in between page_freeze_refs() and
> - * page_unfreeze_refs(): this shouldn't be a problem anywhere, the page
>   * is on its way to being freed; but it is an anomaly to bear in mind.
>   */
>  static struct page *get_ksm_page(struct stable_node *stable_node, bool locked)
>  {
>  	struct page *page;
>  	void *expected_mapping;
> +	unsigned long kpfn;
>  
> -	page = pfn_to_page(stable_node->kpfn);
>  	expected_mapping = (void *)stable_node +
>  				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> -	if (page->mapping != expected_mapping)
> -		goto stale;
> -	if (!get_page_unless_zero(page))
> +again:
> +	kpfn = ACCESS_ONCE(stable_node->kpfn);
> +	page = pfn_to_page(kpfn);
> +
> +	/*
> +	 * page is computed from kpfn, so on most architectures reading
> +	 * page->mapping is naturally ordered after reading node->kpfn,
> +	 * but on Alpha we need to be more careful.
> +	 */
> +	smp_read_barrier_depends();
> +	if (ACCESS_ONCE(page->mapping) != expected_mapping)
>  		goto stale;
> -	if (page->mapping != expected_mapping) {
> +
> +	/*
> +	 * We cannot do anything with the page while its refcount is 0.
> +	 * Usually 0 means free, or tail of a higher-order page: in which
> +	 * case this node is no longer referenced, and should be freed;
> +	 * however, it might mean that the page is under page_freeze_refs().
> +	 * The __remove_mapping() case is easy, again the node is now stale;
> +	 * but if page is swapcache in migrate_page_move_mapping(), it might
> +	 * still be our page, in which case it's essential to keep the node.
> +	 */
> +	while (!get_page_unless_zero(page)) {
> +		/*
> +		 * Another check for page->mapping != expected_mapping would
> +		 * work here too.  We have chosen the !PageSwapCache test to
> +		 * optimize the common case, when the page is or is about to
> +		 * be freed: PageSwapCache is cleared (under spin_lock_irq)
> +		 * in the freeze_refs section of __remove_mapping(); but Anon
> +		 * page->mapping reset to NULL later, in free_pages_prepare().
> +		 */
> +		if (!PageSwapCache(page))
> +			goto stale;
> +		cpu_relax();
> +	}
> +
> +	if (ACCESS_ONCE(page->mapping) != expected_mapping) {
>  		put_page(page);
>  		goto stale;
>  	}
> +
>  	if (locked) {
>  		lock_page(page);
> -		if (page->mapping != expected_mapping) {
> +		if (ACCESS_ONCE(page->mapping) != expected_mapping) {
>  			unlock_page(page);
>  			put_page(page);
>  			goto stale;
>  		}
>  	}

Could you explain why need check page->mapping twice after get page?

>  	return page;
> +
>  stale:
> +	/*
> +	 * We come here from above when page->mapping or !PageSwapCache
> +	 * suggests that the node is stale; but it might be under migration.
> +	 * We need smp_rmb(), matching the smp_wmb() in ksm_migrate_page(),
> +	 * before checking whether node->kpfn has been changed.
> +	 */
> +	smp_rmb();
> +	if (ACCESS_ONCE(stable_node->kpfn) != kpfn)
> +		goto again;
>  	remove_node_from_stable_tree(stable_node);
>  	return NULL;
>  }
> @@ -1103,15 +1137,25 @@ static struct page *stable_tree_search(s
>  			return NULL;
>  
>  		ret = memcmp_pages(page, tree_page);
> +		put_page(tree_page);
>  
> -		if (ret < 0) {
> -			put_page(tree_page);
> +		if (ret < 0)
>  			node = node->rb_left;
> -		} else if (ret > 0) {
> -			put_page(tree_page);
> +		else if (ret > 0)
>  			node = node->rb_right;
> -		} else
> +		else {
> +			/*
> +			 * Lock and unlock the stable_node's page (which
> +			 * might already have been migrated) so that page
> +			 * migration is sure to notice its raised count.
> +			 * It would be more elegant to return stable_node
> +			 * than kpage, but that involves more changes.
> +			 */
> +			tree_page = get_ksm_page(stable_node, true);
> +			if (tree_page)
> +				unlock_page(tree_page);
>  			return tree_page;
> +		}
>  	}
>  
>  	return NULL;
> @@ -1903,6 +1947,14 @@ void ksm_migrate_page(struct page *newpa
>  	if (stable_node) {
>  		VM_BUG_ON(stable_node->kpfn != page_to_pfn(oldpage));
>  		stable_node->kpfn = page_to_pfn(newpage);
> +		/*
> +		 * newpage->mapping was set in advance; now we need smp_wmb()
> +		 * to make sure that the new stable_node->kpfn is visible
> +		 * to get_ksm_page() before it can see that oldpage->mapping
> +		 * has gone stale (or that PageSwapCache has been cleared).
> +		 */
> +		smp_wmb();
> +		set_page_stable_node(oldpage, NULL);
>  	}
>  }
>  #endif /* CONFIG_MIGRATION */
> --- mmotm.orig/mm/migrate.c	2013-01-25 14:27:58.140193249 -0800
> +++ mmotm/mm/migrate.c	2013-01-25 14:37:03.832206218 -0800
> @@ -464,7 +464,10 @@ void migrate_page_copy(struct page *newp
>  
>  	mlock_migrate_page(newpage, page);
>  	ksm_migrate_page(newpage, page);
> -
> +	/*
> +	 * Please do not reorder this without considering how mm/ksm.c's
> +	 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
> +	 */
>  	ClearPageSwapCache(page);
>  	ClearPagePrivate(page);
>  	set_page_private(page, 0);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-01-27  5:47 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26  1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26  1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27  1:14   ` Simon Jeons
2013-01-27  2:54     ` Hugh Dickins
2013-01-27  3:16       ` Simon Jeons
2013-01-27 21:55         ` Hugh Dickins
2013-01-28 23:03   ` Andrew Morton
2013-01-29  1:17     ` Hugh Dickins
2013-01-28 23:08   ` Andrew Morton
2013-01-29  1:38     ` Hugh Dickins
2013-02-05 16:41   ` Mel Gorman
2013-02-07 23:57     ` Hugh Dickins
2013-01-26  1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26  1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11   ` Andrew Morton
2013-01-29  1:44     ` Hugh Dickins
2013-01-26  1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48   ` Mel Gorman
2013-02-08  0:07     ` Hugh Dickins
2013-02-14 11:30       ` Mel Gorman
2013-01-26  2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27  2:36   ` Simon Jeons
2013-01-27 22:08     ` Hugh Dickins
2013-01-28  0:36       ` Simon Jeons
2013-01-28  3:35         ` Hugh Dickins
2013-01-27  2:48   ` Simon Jeons
2013-01-27 22:10     ` Hugh Dickins
2013-02-05 17:18   ` Mel Gorman
2013-02-08  0:33     ` Hugh Dickins
2013-02-14 11:34       ` Mel Gorman
2013-01-26  2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27  4:55   ` Simon Jeons
2013-01-27 23:05     ` Hugh Dickins
2013-01-28  1:42       ` Simon Jeons
2013-01-28  4:14         ` Hugh Dickins
2013-01-28  2:12   ` Simon Jeons
2013-01-28  4:19     ` Hugh Dickins
2013-01-28  6:36   ` Simon Jeons
2013-01-28 23:44   ` Andrew Morton
2013-01-29  2:03     ` Hugh Dickins
2013-02-05 17:55   ` Mel Gorman
2013-02-08 19:33     ` Hugh Dickins
2013-02-14 11:58       ` Mel Gorman
2013-02-14 22:19         ` Hugh Dickins
2013-01-26  2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27  5:47   ` Simon Jeons [this message]
2013-01-27 23:12     ` Hugh Dickins
2013-01-28  0:41       ` Simon Jeons
2013-01-28  3:44         ` Hugh Dickins
2013-02-05 19:11   ` Mel Gorman
2013-02-08 20:52     ` Hugh Dickins
2013-01-26  2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27  8:49   ` Simon Jeons
2013-01-27 23:25     ` Hugh Dickins
2013-01-28  3:44   ` Simon Jeons
2013-01-26  2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26  2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26  2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27  6:23   ` Simon Jeons
2013-01-27 23:35     ` Hugh Dickins
2013-02-08 18:45   ` Gerald Schaefer
2013-02-11 22:13     ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29  0:49   ` Izik Eidus
2013-01-29  2:26     ` Izik Eidus
2013-01-29 16:51       ` Andrea Arcangeli
2013-01-31  0:05         ` Ric Mason
2013-01-29  1:07   ` Hugh Dickins
2013-01-29 10:45     ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1359265635.6763.0.camel@kernel \
    --to=simon.jeons@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pholasek@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).