From: Simon Jeons <simon.jeons@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Petr Holasek <pholasek@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Izik Eidus <izik.eidus@ravellosystems.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 7/11] ksm: make KSM page migration possible
Date: Sat, 26 Jan 2013 23:47:15 -0600 [thread overview]
Message-ID: <1359265635.6763.0.camel@kernel> (raw)
In-Reply-To: <alpine.LNX.2.00.1301251802050.29196@eggly.anvils>
On Fri, 2013-01-25 at 18:03 -0800, Hugh Dickins wrote:
> KSM page migration is already supported in the case of memory hotremove,
> which takes the ksm_thread_mutex across all its migrations to keep life
> simple.
>
> But the new KSM NUMA merge_across_nodes knob introduces a problem, when
> it's set to non-default 0: if a KSM page is migrated to a different NUMA
> node, how do we migrate its stable node to the right tree? And what if
> that collides with an existing stable node?
>
> So far there's no provision for that, and this patch does not attempt
> to deal with it either. But how will I test a solution, when I don't
> know how to hotremove memory? The best answer is to enable KSM page
> migration in all cases now, and test more common cases. With THP and
> compaction added since KSM came in, page migration is now mainstream,
> and it's a shame that a KSM page can frustrate freeing a page block.
>
> Without worrying about merge_across_nodes 0 for now, this patch gets
> KSM page migration working reliably for default merge_across_nodes 1
> (but leave the patch enabling it until near the end of the series).
>
> It's much simpler than I'd originally imagined, and does not require
> an additional tier of locking: page migration relies on the page lock,
> KSM page reclaim relies on the page lock, the page lock is enough for
> KSM page migration too.
>
> Almost all the care has to be in get_ksm_page(): that's the function
> which worries about when a stable node is stale and should be freed,
> now it also has to worry about the KSM page being migrated.
>
> The only new overhead is an additional put/get/lock/unlock_page when
> stable_tree_search() arrives at a matching node: to make sure migration
> respects the raised page count, and so does not migrate the page while
> we're busy with it here. That's probably avoidable, either by changing
> internal interfaces from using kpage to stable_node, or by moving the
> ksm_migrate_page() callsite into a page_freeze_refs() section (even if
> not swapcache); but this works well, I've no urge to pull it apart now.
>
> (Descents of the stable tree may pass through nodes whose KSM pages are
> under migration: being unlocked, the raised page count does not prevent
> that, nor need it: it's safe to memcmp against either old or new page.)
>
> You might worry about mremap, and whether page migration's rmap_walk
> to remove migration entries will find all the KSM locations where it
> inserted earlier: that should already be handled, by the satisfyingly
> heavy hammer of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,).
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> mm/ksm.c | 94 ++++++++++++++++++++++++++++++++++++++-----------
> mm/migrate.c | 5 ++
> 2 files changed, 77 insertions(+), 22 deletions(-)
>
> --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:00.768206145 -0800
> +++ mmotm/mm/ksm.c 2013-01-25 14:37:03.832206218 -0800
> @@ -499,6 +499,7 @@ static void remove_node_from_stable_tree
> * In which case we can trust the content of the page, and it
> * returns the gotten page; but if the page has now been zapped,
> * remove the stale node from the stable tree and return NULL.
> + * But beware, the stable node's page might be being migrated.
> *
> * You would expect the stable_node to hold a reference to the ksm page.
> * But if it increments the page's count, swapping out has to wait for
> @@ -509,44 +510,77 @@ static void remove_node_from_stable_tree
> * pointing back to this stable node. This relies on freeing a PageAnon
> * page to reset its page->mapping to NULL, and relies on no other use of
> * a page to put something that might look like our key in page->mapping.
> - *
> - * include/linux/pagemap.h page_cache_get_speculative() is a good reference,
> - * but this is different - made simpler by ksm_thread_mutex being held, but
> - * interesting for assuming that no other use of the struct page could ever
> - * put our expected_mapping into page->mapping (or a field of the union which
> - * coincides with page->mapping).
> - *
> - * Note: it is possible that get_ksm_page() will return NULL one moment,
> - * then page the next, if the page is in between page_freeze_refs() and
> - * page_unfreeze_refs(): this shouldn't be a problem anywhere, the page
> * is on its way to being freed; but it is an anomaly to bear in mind.
> */
> static struct page *get_ksm_page(struct stable_node *stable_node, bool locked)
> {
> struct page *page;
> void *expected_mapping;
> + unsigned long kpfn;
>
> - page = pfn_to_page(stable_node->kpfn);
> expected_mapping = (void *)stable_node +
> (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> - if (page->mapping != expected_mapping)
> - goto stale;
> - if (!get_page_unless_zero(page))
> +again:
> + kpfn = ACCESS_ONCE(stable_node->kpfn);
> + page = pfn_to_page(kpfn);
> +
> + /*
> + * page is computed from kpfn, so on most architectures reading
> + * page->mapping is naturally ordered after reading node->kpfn,
> + * but on Alpha we need to be more careful.
> + */
> + smp_read_barrier_depends();
> + if (ACCESS_ONCE(page->mapping) != expected_mapping)
> goto stale;
> - if (page->mapping != expected_mapping) {
> +
> + /*
> + * We cannot do anything with the page while its refcount is 0.
> + * Usually 0 means free, or tail of a higher-order page: in which
> + * case this node is no longer referenced, and should be freed;
> + * however, it might mean that the page is under page_freeze_refs().
> + * The __remove_mapping() case is easy, again the node is now stale;
> + * but if page is swapcache in migrate_page_move_mapping(), it might
> + * still be our page, in which case it's essential to keep the node.
> + */
> + while (!get_page_unless_zero(page)) {
> + /*
> + * Another check for page->mapping != expected_mapping would
> + * work here too. We have chosen the !PageSwapCache test to
> + * optimize the common case, when the page is or is about to
> + * be freed: PageSwapCache is cleared (under spin_lock_irq)
> + * in the freeze_refs section of __remove_mapping(); but Anon
> + * page->mapping reset to NULL later, in free_pages_prepare().
> + */
> + if (!PageSwapCache(page))
> + goto stale;
> + cpu_relax();
> + }
> +
> + if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> put_page(page);
> goto stale;
> }
> +
> if (locked) {
> lock_page(page);
> - if (page->mapping != expected_mapping) {
> + if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> unlock_page(page);
> put_page(page);
> goto stale;
> }
> }
Could you explain why need check page->mapping twice after get page?
> return page;
> +
> stale:
> + /*
> + * We come here from above when page->mapping or !PageSwapCache
> + * suggests that the node is stale; but it might be under migration.
> + * We need smp_rmb(), matching the smp_wmb() in ksm_migrate_page(),
> + * before checking whether node->kpfn has been changed.
> + */
> + smp_rmb();
> + if (ACCESS_ONCE(stable_node->kpfn) != kpfn)
> + goto again;
> remove_node_from_stable_tree(stable_node);
> return NULL;
> }
> @@ -1103,15 +1137,25 @@ static struct page *stable_tree_search(s
> return NULL;
>
> ret = memcmp_pages(page, tree_page);
> + put_page(tree_page);
>
> - if (ret < 0) {
> - put_page(tree_page);
> + if (ret < 0)
> node = node->rb_left;
> - } else if (ret > 0) {
> - put_page(tree_page);
> + else if (ret > 0)
> node = node->rb_right;
> - } else
> + else {
> + /*
> + * Lock and unlock the stable_node's page (which
> + * might already have been migrated) so that page
> + * migration is sure to notice its raised count.
> + * It would be more elegant to return stable_node
> + * than kpage, but that involves more changes.
> + */
> + tree_page = get_ksm_page(stable_node, true);
> + if (tree_page)
> + unlock_page(tree_page);
> return tree_page;
> + }
> }
>
> return NULL;
> @@ -1903,6 +1947,14 @@ void ksm_migrate_page(struct page *newpa
> if (stable_node) {
> VM_BUG_ON(stable_node->kpfn != page_to_pfn(oldpage));
> stable_node->kpfn = page_to_pfn(newpage);
> + /*
> + * newpage->mapping was set in advance; now we need smp_wmb()
> + * to make sure that the new stable_node->kpfn is visible
> + * to get_ksm_page() before it can see that oldpage->mapping
> + * has gone stale (or that PageSwapCache has been cleared).
> + */
> + smp_wmb();
> + set_page_stable_node(oldpage, NULL);
> }
> }
> #endif /* CONFIG_MIGRATION */
> --- mmotm.orig/mm/migrate.c 2013-01-25 14:27:58.140193249 -0800
> +++ mmotm/mm/migrate.c 2013-01-25 14:37:03.832206218 -0800
> @@ -464,7 +464,10 @@ void migrate_page_copy(struct page *newp
>
> mlock_migrate_page(newpage, page);
> ksm_migrate_page(newpage, page);
> -
> + /*
> + * Please do not reorder this without considering how mm/ksm.c's
> + * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
> + */
> ClearPageSwapCache(page);
> ClearPagePrivate(page);
> set_page_private(page, 0);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-27 5:47 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-26 1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26 1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27 1:14 ` Simon Jeons
2013-01-27 2:54 ` Hugh Dickins
2013-01-27 3:16 ` Simon Jeons
2013-01-27 21:55 ` Hugh Dickins
2013-01-28 23:03 ` Andrew Morton
2013-01-29 1:17 ` Hugh Dickins
2013-01-28 23:08 ` Andrew Morton
2013-01-29 1:38 ` Hugh Dickins
2013-02-05 16:41 ` Mel Gorman
2013-02-07 23:57 ` Hugh Dickins
2013-01-26 1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26 1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11 ` Andrew Morton
2013-01-29 1:44 ` Hugh Dickins
2013-01-26 1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48 ` Mel Gorman
2013-02-08 0:07 ` Hugh Dickins
2013-02-14 11:30 ` Mel Gorman
2013-01-26 2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27 2:36 ` Simon Jeons
2013-01-27 22:08 ` Hugh Dickins
2013-01-28 0:36 ` Simon Jeons
2013-01-28 3:35 ` Hugh Dickins
2013-01-27 2:48 ` Simon Jeons
2013-01-27 22:10 ` Hugh Dickins
2013-02-05 17:18 ` Mel Gorman
2013-02-08 0:33 ` Hugh Dickins
2013-02-14 11:34 ` Mel Gorman
2013-01-26 2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27 4:55 ` Simon Jeons
2013-01-27 23:05 ` Hugh Dickins
2013-01-28 1:42 ` Simon Jeons
2013-01-28 4:14 ` Hugh Dickins
2013-01-28 2:12 ` Simon Jeons
2013-01-28 4:19 ` Hugh Dickins
2013-01-28 6:36 ` Simon Jeons
2013-01-28 23:44 ` Andrew Morton
2013-01-29 2:03 ` Hugh Dickins
2013-02-05 17:55 ` Mel Gorman
2013-02-08 19:33 ` Hugh Dickins
2013-02-14 11:58 ` Mel Gorman
2013-02-14 22:19 ` Hugh Dickins
2013-01-26 2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27 5:47 ` Simon Jeons [this message]
2013-01-27 23:12 ` Hugh Dickins
2013-01-28 0:41 ` Simon Jeons
2013-01-28 3:44 ` Hugh Dickins
2013-02-05 19:11 ` Mel Gorman
2013-02-08 20:52 ` Hugh Dickins
2013-01-26 2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27 8:49 ` Simon Jeons
2013-01-27 23:25 ` Hugh Dickins
2013-01-28 3:44 ` Simon Jeons
2013-01-26 2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26 2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26 2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27 6:23 ` Simon Jeons
2013-01-27 23:35 ` Hugh Dickins
2013-02-08 18:45 ` Gerald Schaefer
2013-02-11 22:13 ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29 0:49 ` Izik Eidus
2013-01-29 2:26 ` Izik Eidus
2013-01-29 16:51 ` Andrea Arcangeli
2013-01-31 0:05 ` Ric Mason
2013-01-29 1:07 ` Hugh Dickins
2013-01-29 10:45 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1359265635.6763.0.camel@kernel \
--to=simon.jeons@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=izik.eidus@ravellosystems.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pholasek@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).