[PATCH v2 2/2] free_pcppages_bulk: prefetch buddy while not holding lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Aaron Lu <aaron.lu@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>,
	Kemi Wang <kemi.wang@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v2 2/2] free_pcppages_bulk: prefetch buddy while not holding lock
Date: Thu, 25 Jan 2018 15:25:55 +0800	[thread overview]
Message-ID: <20180125072555.GB27678@intel.com> (raw)
In-Reply-To: <20180124211228.3k7tuuji7a7mvyh2@techsingularity.net>

When a page is freed back to the global pool, its buddy will be checked
to see if it's possible to do a merge. This requires accessing buddy's
page structure and that access could take a long time if it's cache cold.

This patch adds a prefetch to the to-be-freed page's buddy outside of
zone->lock in hope of accessing buddy's page structure later under
zone->lock will be faster. Since we *always* do buddy merging and check
an order-0 page's buddy to try to merge it when it goes into the main
allocator, the cacheline will always come in, i.e. the prefetched data
will never be unused.

In the meantime, there is the concern of a prefetch potentially evicting
existing cachelines. This can be true for L1D cache since it is not huge.
Considering the prefetch instruction used is prefetchnta, which will only
store the date in L2 for "Pentium 4 and Intel Xeon processors" according
to Intel's "Instruction Manual Set" document, it is not likely to cause
cache pollution. Other architectures may have this cache pollution problem
though.

There is also some additional instruction overhead, namely calculating
buddy pfn twice. Since the calculation is a XOR on two local variables,
it's expected in many cases that cycles spent will be offset by reduced
memory latency later. This is especially true for NUMA machines where multiple
CPUs are contending on zone->lock and the most time consuming part under
zone->lock is the wait of 'struct page' cacheline of the to-be-freed pages
and their buddies.

Test with will-it-scale/page_fault1 full load:

kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
v4.15-rc4   9037332        8000124       13642741       15728686
patch1/2    9608786 +6.3%  8368915 +4.6% 14042169 +2.9% 17433559 +10.8%
this patch 10462292 +8.9%  8602889 +2.8% 14802073 +5.4% 17624575 +1.1%

Note: this patch's performance improvement percent is against patch1/2.

Please also note the actual benefit of this patch will be workload/CPU
dependant.

[changelog stole from Dave Hansen and Mel Gorman's comments]
https://lkml.org/lkml/2018/1/24/551
Suggested-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
v2:
update changelog according to Dave Hansen and Mel Gorman's comments.
Add more comments in code to explain why prefetch is done as requested
by Mel Gorman.

 mm/page_alloc.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9e5ded39b16..6566a4b5b124 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1138,6 +1138,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			batch_free = count;

 		do {
+			unsigned long pfn, buddy_pfn;
+			struct page *buddy;
+
 			page = list_last_entry(list, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
@@ -1146,6 +1149,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 				continue;

 			list_add_tail(&page->lru, &head);
+
+			/*
+			 * We are going to put the page back to the global
+			 * pool, prefetch its buddy to speed up later access
+			 * under zone->lock. It is believed the overhead of
+			 * calculating buddy_pfn here can be offset by reduced
+			 * memory latency later.
+			 */
+			pfn = page_to_pfn(page);
+			buddy_pfn = __find_buddy_pfn(pfn, 0);
+			buddy = page + (buddy_pfn - pfn);
+			prefetch(buddy);
 		} while (--count && --batch_free && !list_empty(list));
 	}

-- 
2.14.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2018-01-25  7:25 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24  2:30 [PATCH 1/2] free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
2018-01-24  2:30 ` [PATCH 2/2] free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu
2018-01-24 16:43   ` Mel Gorman
2018-01-24 16:57     ` Dave Hansen
2018-01-24 18:19       ` Mel Gorman
2018-01-24 19:23         ` Dave Hansen
2018-01-24 21:12           ` Mel Gorman
2018-01-25  7:25             ` Aaron Lu [this message]
2018-01-24 16:40 ` [PATCH 1/2] free_pcppages_bulk: do not hold lock when picking pages to free Mel Gorman
2018-01-25  7:21   ` [PATCH v2 " Aaron Lu
2018-02-15 12:06     ` Mel Gorman
2018-02-23  1:37       ` Aaron Lu
2018-02-15 12:46     ` Matthew Wilcox
2018-02-15 14:55       ` Mel Gorman
2018-02-23  1:42       ` Aaron Lu
2018-02-05  5:30 ` RFC: eliminate zone->lock contention for will-it-scale/page_fault1 on big server Aaron Lu
2018-02-05  5:31   ` [RFC PATCH 1/2] __free_one_page: skip merge for order-0 page unless compaction is in progress Aaron Lu
2018-02-05 22:17     ` Dave Hansen
2018-02-05  5:32   ` [RFC PATCH 2/2] rmqueue_bulk: avoid touching page structures under zone->lock Aaron Lu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c9e5ded39b1 dfblob:6566a4b5b12 )
 OR (
bs:"[PATCH v2 2/2] free_pcppages_bulk: prefetch buddy while not holding lock" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125072555.GB27678@intel.com \
    --to=aaron.lu@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).