From: Aaron Lu <aaron.lu@intel.com>
To: linux-mm <linux-mm@kvack.org>, lkml <linux-kernel@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <ak@linux.intel.com>,
Huang Ying <ying.huang@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Kemi Wang <kemi.wang@intel.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>
Subject: [PATCH v2] mm/page_alloc.c: inline __rmqueue()
Date: Tue, 10 Oct 2017 10:56:01 +0800 [thread overview]
Message-ID: <20171010025601.GE1798@intel.com> (raw)
In-Reply-To: <20171010025151.GD1798@intel.com>
__rmqueue() is called by rmqueue_bulk() and rmqueue() under zone->lock
and the two __rmqueue() call sites are in very hot page allocator paths.
Since __rmqueue() is a small function, inline it can save us some time.
With the will-it-scale/page_fault1/process benchmark, when using nr_cpu
processes to stress buddy, this patch improved the benchmark by 6.3% on
a 2-sockets Intel-Skylake system and 4.6% on a 4-sockets Intel-Skylake
system. The benefit being less on 4 sockets machine is due to the lock
contention there(perf-profile/native_queued_spin_lock_slowpath=81%) is
less severe than on the 2 sockets machine(84%).
What the benchmark does is: it forks nr_cpu processes and then each
process does the following:
1 mmap() 128M anonymous space;
2 writes to each page there to trigger actual page allocation;
3 munmap() it.
in a loop.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c
This patch adds inline to __rmqueue() and vmlinux' size doesn't have any
change after this patch according to size(1).
without this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
with this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
v2: change commit message according to Dave Hansen's suggestion.
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e309ce4a44a..c9605c7ebaf6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2291,7 +2291,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
-static struct page *__rmqueue(struct zone *zone, unsigned int order,
+static inline struct page *__rmqueue(struct zone *zone, unsigned int order,
int migratetype)
{
struct page *page;
--
2.13.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Aaron Lu <aaron.lu@intel.com>
To: linux-mm <linux-mm@kvack.org>, lkml <linux-kernel@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <ak@linux.intel.com>,
Huang Ying <ying.huang@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Kemi Wang <kemi.wang@intel.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>
Subject: [PATCH v2] mm/page_alloc.c: inline __rmqueue()
Date: Tue, 10 Oct 2017 10:56:01 +0800 [thread overview]
Message-ID: <20171010025601.GE1798@intel.com> (raw)
In-Reply-To: <20171010025151.GD1798@intel.com>
__rmqueue() is called by rmqueue_bulk() and rmqueue() under zone->lock
and the two __rmqueue() call sites are in very hot page allocator paths.
Since __rmqueue() is a small function, inline it can save us some time.
With the will-it-scale/page_fault1/process benchmark, when using nr_cpu
processes to stress buddy, this patch improved the benchmark by 6.3% on
a 2-sockets Intel-Skylake system and 4.6% on a 4-sockets Intel-Skylake
system. The benefit being less on 4 sockets machine is due to the lock
contention there(perf-profile/native_queued_spin_lock_slowpath=81%) is
less severe than on the 2 sockets machine(84%).
What the benchmark does is: it forks nr_cpu processes and then each
process does the following:
1 mmap() 128M anonymous space;
2 writes to each page there to trigger actual page allocation;
3 munmap() it.
in a loop.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c
This patch adds inline to __rmqueue() and vmlinux' size doesn't have any
change after this patch according to size(1).
without this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
with this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
v2: change commit message according to Dave Hansen's suggestion.
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e309ce4a44a..c9605c7ebaf6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2291,7 +2291,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
-static struct page *__rmqueue(struct zone *zone, unsigned int order,
+static inline struct page *__rmqueue(struct zone *zone, unsigned int order,
int migratetype)
{
struct page *page;
--
2.13.6
next prev parent reply other threads:[~2017-10-10 2:56 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 5:44 [PATCH] page_alloc.c: inline __rmqueue() Aaron Lu
2017-10-09 5:44 ` Aaron Lu
2017-10-09 7:37 ` Anshuman Khandual
2017-10-09 7:37 ` Anshuman Khandual
2017-10-09 7:53 ` Aaron Lu
2017-10-09 7:53 ` Aaron Lu
2017-10-09 20:23 ` Dave Hansen
2017-10-09 20:23 ` Dave Hansen
2017-10-10 2:51 ` Aaron Lu
2017-10-10 2:51 ` Aaron Lu
2017-10-10 2:56 ` Aaron Lu [this message]
2017-10-10 2:56 ` [PATCH v2] mm/page_alloc.c: " Aaron Lu
2017-10-10 5:19 ` Dave Hansen
2017-10-10 5:19 ` Dave Hansen
2017-10-10 5:43 ` Aaron Lu
2017-10-10 5:43 ` Aaron Lu
2017-10-10 21:45 ` Andrew Morton
2017-10-10 21:45 ` Andrew Morton
2017-10-10 22:27 ` Andi Kleen
2017-10-10 22:27 ` Andi Kleen
2017-10-11 2:34 ` Aaron Lu
2017-10-11 2:34 ` Aaron Lu
2017-10-13 6:31 ` [PATCH] mm/page_alloc: make sure __rmqueue() etc. always inline Aaron Lu
2017-10-13 6:31 ` Aaron Lu
2017-10-17 11:32 ` Vlastimil Babka
2017-10-17 11:32 ` Vlastimil Babka
2017-10-18 1:53 ` Lu, Aaron
2017-10-18 6:28 ` Vlastimil Babka
2017-10-18 6:28 ` Vlastimil Babka
2017-10-18 8:57 ` Aaron Lu
2017-10-18 8:57 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171010025601.GE1798@intel.com \
--to=aaron.lu@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=kemi.wang@intel.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tim.c.chen@linux.intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.