Re: [RFC PATCH 0/5] Support multiple pages allocation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Zhang Yanfei <zhangyanfei.yes@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Glauber Costa <glommer@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] Support multiple pages allocation
Date: Thu, 4 Jul 2013 13:24:50 +0900	[thread overview]
Message-ID: <20130704042450.GA7132@lge.com> (raw)
In-Reply-To: <51D44AE7.1090701@gmail.com>

On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> >> [...]
> >>> For one page allocation at once, this patchset makes allocator slower than
> >>> before (-5%). 
> >>
> >> Slowing down the most used path is a no-go. Where does this slow down
> >> come from?
> > 
> > I guess, it might be: for one page allocation at once, comparing to the original
> > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > for the parameter nr_pages in the allocation path.
> > 
> 
> If so, adding a separate path for the multiple allocations seems better.

Hello, all.

I modify the code for optimizing one page allocation via likely macro.
I attach a new one at the end of this mail.

In this case, performance degradation for one page allocation at once is -2.5%.
I guess, remained overhead comes from two added parameters.
Is it unreasonable cost to support this new feature?
I think that readahead path is one of the most used path, so this penalty looks
endurable. And after supporting this feature, we can find more use cases.

I will try to add a new function for the multiple allocations and test it. But,
IMHO, adding a new function is not good idea, because we should duplicate
various checks which are already in __alloc_pages_nodemask and even if
we introduce a new function, we cannot avoid to pass two parameters
to get_page_from_freelist(), so slight performance degradation on
one page allocation is inevitable. Anyway, I will do and test it.

Thanks.

-------------------------------8<----------------------------
>From cee05ad3bcf1c5774fabf797b5dc8f78f812ca36 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Wed, 26 Jun 2013 13:37:57 +0900
Subject: [PATCH] mm, page_alloc: support multiple pages allocation

This patch introduces multiple pages allocation feature to buddy
allocator. Currently, there is no ability to allocate multiple
pages at once, so we should invoke single page allocation logic
repeatedly. This has some overheads like as function call
overhead with many arguments and overhead for finding proper
node and zone.

With this patchset, we can reduce these overheads. Device I/O is
getting faster rapidly and allocator should catch up this speed.
This patch help this situation.

In this patch, I introduce new arguments, nr_pages and pages, to
core function of allocator and try to allocate multiple pages
in first attempt(fast path). I think that multiple page allocation
is not valid for slow path, so current implementation consider
just fast path.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..8bfa87b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -298,13 +298,15 @@ static inline void arch_alloc_page(struct page *page, int order) { }
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-		       struct zonelist *zonelist, nodemask_t *nodemask);
+		       struct zonelist *zonelist, nodemask_t *nodemask,
+		       unsigned long *nr_pages, struct page **pages);
 
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order,
 		struct zonelist *zonelist)
 {
-	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
+	return __alloc_pages_nodemask(gfp_mask, order,
+				zonelist, NULL, NULL, NULL);
 }
 
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7431001..b17e48c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2004,7 +2004,8 @@ retry_cpuset:
 	}
 	page = __alloc_pages_nodemask(gfp, order,
 				      policy_zonelist(gfp, pol, node),
-				      policy_nodemask(gfp, pol));
+				      policy_nodemask(gfp, pol),
+				      NULL, NULL);
 	if (unlikely(mpol_needs_cond_ref(pol)))
 		__mpol_put(pol);
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
@@ -2052,7 +2053,8 @@ retry_cpuset:
 	else
 		page = __alloc_pages_nodemask(gfp, order,
 				policy_zonelist(gfp, pol, numa_node_id()),
-				policy_nodemask(gfp, pol));
+				policy_nodemask(gfp, pol),
+				NULL, NULL);
 
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..0ba9f63 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1846,7 +1846,8 @@ static inline void init_zone_allows_reclaim(int nid)
 static struct page *
 get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 		struct zonelist *zonelist, int high_zoneidx, int alloc_flags,
-		struct zone *preferred_zone, int migratetype)
+		struct zone *preferred_zone, int migratetype,
+		unsigned long *nr_pages, struct page **pages)
 {
 	struct zoneref *z;
 	struct page *page = NULL;
@@ -1968,8 +1969,33 @@ zonelist_scan:
 try_this_zone:
 		page = buffered_rmqueue(preferred_zone, zone, order,
 						gfp_mask, migratetype);
-		if (page)
+		if (page) {
+			unsigned long mark;
+			unsigned long count;
+			unsigned long nr;
+
+			if (likely(!nr_pages))
+				break;
+
+			count = 0;
+			pages[count++] = page;
+			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
+			nr = *nr_pages;
+			while (count < nr) {
+				if (!zone_watermark_ok(zone, order, mark,
+					classzone_idx, alloc_flags))
+					break;
+				page = buffered_rmqueue(preferred_zone, zone,
+						order, gfp_mask, migratetype);
+				if (!page)
+					break;
+				pages[count++] = page;
+			}
+			*nr_pages = count;
+			page = pages[0];
 			break;
+		}
+
 this_zone_full:
 		if (IS_ENABLED(CONFIG_NUMA))
 			zlc_mark_zone_full(zonelist, z);
@@ -2125,7 +2151,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
 		order, zonelist, high_zoneidx,
 		ALLOC_WMARK_HIGH|ALLOC_CPUSET,
-		preferred_zone, migratetype);
+		preferred_zone, migratetype,
+		NULL, NULL);
 	if (page)
 		goto out;
 
@@ -2188,7 +2215,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		page = get_page_from_freelist(gfp_mask, nodemask,
 				order, zonelist, high_zoneidx,
 				alloc_flags & ~ALLOC_NO_WATERMARKS,
-				preferred_zone, migratetype);
+				preferred_zone, migratetype,
+				NULL, NULL);
 		if (page) {
 			preferred_zone->compact_blockskip_flush = false;
 			preferred_zone->compact_considered = 0;
@@ -2282,7 +2310,8 @@ retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags & ~ALLOC_NO_WATERMARKS,
-					preferred_zone, migratetype);
+					preferred_zone, migratetype,
+					NULL, NULL);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -2312,7 +2341,8 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 	do {
 		page = get_page_from_freelist(gfp_mask, nodemask, order,
 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 
 		if (!page && gfp_mask & __GFP_NOFAIL)
 			wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
@@ -2449,7 +2479,8 @@ rebalance:
 	/* This is the last chance, in general, before the goto nopage. */
 	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
 			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 	if (page)
 		goto got_pg;
 
@@ -2598,7 +2629,8 @@ got_pg:
  */
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-			struct zonelist *zonelist, nodemask_t *nodemask)
+			struct zonelist *zonelist, nodemask_t *nodemask,
+			unsigned long *nr_pages, struct page **pages)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	struct zone *preferred_zone;
@@ -2647,9 +2679,11 @@ retry_cpuset:
 		alloc_flags |= ALLOC_CMA;
 #endif
 	/* First allocation attempt */
+	/* We only try to allocate nr_pages in first attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			nr_pages, pages);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
-- 
1.7.9.5

next prev parent reply	other threads:[~2013-07-04  4:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
2013-07-03 15:57   ` Christoph Lameter
2013-07-04  4:29     ` Joonsoo Kim
2013-07-10 22:52   ` Dave Hansen
2013-07-11  1:02     ` Joonsoo Kim
2013-07-11  5:38       ` Dave Hansen
2013-07-11  6:12         ` Joonsoo Kim
2013-07-11 15:51           ` Dave Hansen
2013-07-16  0:26             ` Joonsoo Kim
2013-07-12 16:31           ` Dave Hansen
2013-07-16  0:37             ` Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 4/5] readahead: remove end range check Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead Joonsoo Kim
2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
2013-07-03 15:51   ` Zhang Yanfei
2013-07-03 16:01     ` Zhang Yanfei
2013-07-04  4:24       ` Joonsoo Kim [this message]
2013-07-04 10:00         ` Michal Hocko
2013-07-10  0:31           ` Joonsoo Kim
2013-07-10  1:20             ` Zhang Yanfei
2013-07-10  9:56               ` Joonsoo Kim
2013-07-10  9:17             ` Michal Hocko
2013-07-10  9:55               ` Joonsoo Kim
2013-07-10 11:27                 ` Michal Hocko
2013-07-11  1:05                   ` Joonsoo Kim

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0f615eb dfblob:8bfa87b dfblob:7431001 dfblob:b17e48c
dfblob:c3edb62 dfblob:0ba9f63 )
 OR (
bs:"mm, page_alloc: support multiple pages allocation" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130704042450.GA7132@lge.com \
    --to=iamjoonsoo.kim@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jiang.liu@huawei.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=zhangyanfei.yes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox