[PATCH V4 0/4] Reducing parameters of alloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions
@ 2015-01-05 17:17 Vlastimil Babka
  2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Aneesh Kumar K.V, David Rientjes,
	Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov, Mel Gorman,
	Michal Hocko, Minchan Kim, Rik van Riel, Zhang Yanfei

Changes since v3:
o Moved struct alloc_context definition to mm/internal.h
o Rebased on latest -next and re-measured. Sadly, the code/stack size
  improvements are smaller with the new baseline.

The possibility of replacing the numerous parameters of alloc_pages* functions
with a single structure has been discussed when Minchan proposed to expand the
x86 kernel stack [1]. This series implements the change, along with few more
cleanups/microoptimizations.

The series is based on next-20150105 and I used gcc 4.8.3 20140627 on openSUSE
13.2 for compiling. Config includess NUMA and COMPACTION.

The core change is the introduction of a new struct alloc_context, which looks
like this:

struct alloc_context {
        struct zonelist *zonelist;
        nodemask_t *nodemask;
        struct zone *preferred_zone;
        int classzone_idx;
        int migratetype;
        enum zone_type high_zoneidx;
};

All the contents is mostly constant, except that __alloc_pages_slowpath()
changes preferred_zone, classzone_idx and potentially zonelist. But that's not
a problem in case control returns to retry_cpuset: in __alloc_pages_nodemask(),
those will be reset to initial values again (although it's a bit subtle).
On the other hand, gfp_flags and alloc_info mutate so much that it doesn't
make sense to put them into alloc_context. Still, the result is one parameter
instead of up to 7. This is all in Patch 2.

Patch 3 is a step to expand alloc_context usage out of page_alloc.c itself.
The function try_to_compact_pages() can also much benefit from the parameter
reduction, but it means the struct definition has to be moved to a shared
header.

Patch 1 should IMHO be included even if the rest is deemed not useful enough.
It improves maintainability and also has some code/stack reduction. Patch 4
is OTOH a tiny optimization.

Overall bloat-o-meter results:

add/remove: 0/1 grow/shrink: 1/3 up/down: 1587/-1941 (-354)
function                                     old     new   delta
__alloc_pages_nodemask                       589    2176   +1587
nr_free_zone_pages                           129     115     -14
__alloc_pages_direct_compact                 329     256     -73
get_page_from_freelist                      2670    2576     -94
__alloc_pages_slowpath                      1760       -   -1760
try_to_compact_pages                         582     579      -3

Overal bloat-o-meter with forced inline in baseline, for fair comparison:

add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-512 (-512)
function                                     old     new   delta
nr_free_zone_pages                           129     115     -14
__alloc_pages_direct_compact                 329     256     -73
get_page_from_freelist                      2670    2576     -94
__alloc_pages_nodemask                      2507    2176    -331
try_to_compact_pages                         582     579      -3

Overall stack sizes per ./scripts/checkstack.pl:

                          old   new delta
__alloc_pages_slowpath    152     -  -152
get_page_from_freelist:   184   184     0
__alloc_pages_nodemask    120   184   +64
__alloc_pages_direct_c     40    40   -40  
try_to_compact_pages       72    72     0
                                     -128

Again with forced inline on baseline:

                          old   new delta
get_page_from_freelist:   184   184     0
__alloc_pages_nodemask    216   184   -32
__alloc_pages_direct_c     40     -   -40  
try_to_compact_pages       72    72     0
                                      -72

[1] http://marc.info/?l=linux-mm&m=140142462528257&w=2

Vlastimil Babka (4):
  mm: set page->pfmemalloc in prep_new_page()
  mm, page_alloc: reduce number of alloc_pages* functions' parameters
  mm: reduce try_to_compact_pages parameters
  mm: microoptimize zonelist operations

 include/linux/compaction.h |  17 ++--
 include/linux/mmzone.h     |  13 +--
 mm/compaction.c            |  23 ++---
 mm/internal.h              |  14 +++
 mm/mmzone.c                |   4 +-
 mm/page_alloc.c            | 245 +++++++++++++++++++--------------------------
 6 files changed, 144 insertions(+), 172 deletions(-)

-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka
@ 2015-01-05 17:17 ` Vlastimil Babka
  2015-01-06 14:30   ` Michal Hocko
  2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko

The function prep_new_page() sets almost everything in the struct page of the
page being allocated, except page->pfmemalloc. This is not obvious and has at
least once led to a bug where page->pfmemalloc was forgotten to be set
correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture
of suitable high-order page").

This patch moves the pfmemalloc setting to prep_new_page(), which means it
needs to gain alloc_flags parameter. The call to prep_new_page is moved from
buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler
code. An obsolete comment for buffered_rmqueue() is replaced.

In addition to better maintainability there is a small reduction of code and
stack usage for get_page_from_freelist(), which inlines the other functions
involved.

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145)
function                                     old     new   delta
get_page_from_freelist                      2670    2525    -145

Stack usage is reduced from 184 to 168 bytes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
---
 mm/page_alloc.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1bb65e6..0c77a97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -970,7 +970,8 @@ static inline int check_new_page(struct page *page)
 	return 0;
 }
 
-static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags)
+static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
+								int alloc_flags)
 {
 	int i;
 
@@ -994,6 +995,14 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags)
 
 	set_page_owner(page, order, gfp_flags);
 
+	/*
+	 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was necessary to
+	 * allocate the page. The expectation is that the caller is taking
+	 * steps that will free more memory. The caller should avoid the page
+	 * being used for !PFMEMALLOC purposes.
+	 */
+	page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
+
 	return 0;
 }
 
@@ -1642,9 +1651,7 @@ int split_free_page(struct page *page)
 }
 
 /*
- * Really, prep_compound_page() should be called from __rmqueue_bulk().  But
- * we cheat by calling it from here, in the order > 0 path.  Saves a branch
- * or two.
+ * Allocate a page from the given zone. Use pcplists for order-0 allocations.
  */
 static inline
 struct page *buffered_rmqueue(struct zone *preferred_zone,
@@ -1655,7 +1662,6 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 	struct page *page;
 	bool cold = ((gfp_flags & __GFP_COLD) != 0);
 
-again:
 	if (likely(order == 0)) {
 		struct per_cpu_pages *pcp;
 		struct list_head *list;
@@ -1711,8 +1717,6 @@ again:
 	local_irq_restore(flags);
 
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
-	if (prep_new_page(page, order, gfp_flags))
-		goto again;
 	return page;
 
 failed:
@@ -2177,25 +2181,16 @@ zonelist_scan:
 try_this_zone:
 		page = buffered_rmqueue(preferred_zone, zone, order,
 						gfp_mask, migratetype);
-		if (page)
-			break;
+		if (page) {
+			if (prep_new_page(page, order, gfp_mask, alloc_flags))
+				goto try_this_zone;
+			return page;
+		}
 this_zone_full:
 		if (IS_ENABLED(CONFIG_NUMA) && zlc_active)
 			zlc_mark_zone_full(zonelist, z);
 	}
 
-	if (page) {
-		/*
-		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
-		 * necessary to allocate the page. The expectation is
-		 * that the caller is taking steps that will free more
-		 * memory. The caller should avoid the page being used
-		 * for !PFMEMALLOC purposes.
-		 */
-		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
-		return page;
-	}
-
 	/*
 	 * The first pass makes sure allocations are spread fairly within the
 	 * local node.  However, the local node might have free pages left
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka
@ 2015-01-06 14:30   ` Michal Hocko
  2015-01-06 21:10     ` Vlastimil Babka
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 14:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Mon 05-01-15 18:17:40, Vlastimil Babka wrote:
> The function prep_new_page() sets almost everything in the struct page of the
> page being allocated, except page->pfmemalloc. This is not obvious and has at
> least once led to a bug where page->pfmemalloc was forgotten to be set
> correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture
> of suitable high-order page").
> 
> This patch moves the pfmemalloc setting to prep_new_page(), which means it
> needs to gain alloc_flags parameter. The call to prep_new_page is moved from
> buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler
> code. An obsolete comment for buffered_rmqueue() is replaced.
> 
> In addition to better maintainability there is a small reduction of code and
> stack usage for get_page_from_freelist(), which inlines the other functions
> involved.
> 
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145)
> function                                     old     new   delta
> get_page_from_freelist                      2670    2525    -145
> 
> Stack usage is reduced from 184 to 168 bytes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Hocko <mhocko@suse.cz>

get_page_from_freelist has grown too hairy. I agree that it is tiny less
confusing now because we are not breaking out of the loop in the
successful case.

Acked-by: Michal Hocko <mhocko@suse.cz>

[...]
> @@ -2177,25 +2181,16 @@ zonelist_scan:
>  try_this_zone:
>  		page = buffered_rmqueue(preferred_zone, zone, order,
>  						gfp_mask, migratetype);
> -		if (page)
> -			break;
> +		if (page) {
> +			if (prep_new_page(page, order, gfp_mask, alloc_flags))
> +				goto try_this_zone;
> +			return page;
> +		}

I would probably liked `do {} while ()' more because it wouldn't use the
goto, but this is up to you:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1bb65e6f48dd..1682d766cb8e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2175,10 +2175,11 @@ zonelist_scan:
 		}
 
 try_this_zone:
-		page = buffered_rmqueue(preferred_zone, zone, order,
+		do {
+			page = buffered_rmqueue(preferred_zone, zone, order,
 						gfp_mask, migratetype);
-		if (page)
-			break;
+		} while (page && prep_new_page(page, order, gfp_mask,
+					       alloc_flags));
 this_zone_full:
 		if (IS_ENABLED(CONFIG_NUMA) && zlc_active)
 			zlc_mark_zone_full(zonelist, z);

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-06 14:30   ` Michal Hocko
@ 2015-01-06 21:10     ` Vlastimil Babka
  2015-01-06 21:44       ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-06 21:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On 01/06/2015 03:30 PM, Michal Hocko wrote:
> On Mon 05-01-15 18:17:40, Vlastimil Babka wrote:
>> The function prep_new_page() sets almost everything in the struct page of the
>> page being allocated, except page->pfmemalloc. This is not obvious and has at
>> least once led to a bug where page->pfmemalloc was forgotten to be set
>> correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture
>> of suitable high-order page").
>> 
>> This patch moves the pfmemalloc setting to prep_new_page(), which means it
>> needs to gain alloc_flags parameter. The call to prep_new_page is moved from
>> buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler
>> code. An obsolete comment for buffered_rmqueue() is replaced.
>> 
>> In addition to better maintainability there is a small reduction of code and
>> stack usage for get_page_from_freelist(), which inlines the other functions
>> involved.
>> 
>> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145)
>> function                                     old     new   delta
>> get_page_from_freelist                      2670    2525    -145
>> 
>> Stack usage is reduced from 184 to 168 bytes.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
> 
> get_page_from_freelist has grown too hairy. I agree that it is tiny less
> confusing now because we are not breaking out of the loop in the
> successful case.

Well, we are returning instead. So there's no more code to follow by anyone
reading the function.

> Acked-by: Michal Hocko <mhocko@suse.cz>
> 
> [...]
>> @@ -2177,25 +2181,16 @@ zonelist_scan:
>>  try_this_zone:
>>  		page = buffered_rmqueue(preferred_zone, zone, order,
>>  						gfp_mask, migratetype);
>> -		if (page)
>> -			break;
>> +		if (page) {
>> +			if (prep_new_page(page, order, gfp_mask, alloc_flags))
>> +				goto try_this_zone;
>> +			return page;
>> +		}
> 
> I would probably liked `do {} while ()' more because it wouldn't use the
> goto, but this is up to you:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1bb65e6f48dd..1682d766cb8e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2175,10 +2175,11 @@ zonelist_scan:
>  		}
>  
>  try_this_zone:
> -		page = buffered_rmqueue(preferred_zone, zone, order,
> +		do {
> +			page = buffered_rmqueue(preferred_zone, zone, order,
>  						gfp_mask, migratetype);
> -		if (page)
> -			break;
> +		} while (page && prep_new_page(page, order, gfp_mask,
> +					       alloc_flags));

Hm but here we wouldn't return page on success. I wonder if you overlooked the
return, hence your "not breaking out of the loop" remark?

>  this_zone_full:
>  		if (IS_ENABLED(CONFIG_NUMA) && zlc_active)
>  			zlc_mark_zone_full(zonelist, z);
> 
> [...]
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-06 21:10     ` Vlastimil Babka
@ 2015-01-06 21:44       ` Michal Hocko
  2015-01-07  9:36         ` Vlastimil Babka
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 21:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Tue 06-01-15 22:10:55, Vlastimil Babka wrote:
> On 01/06/2015 03:30 PM, Michal Hocko wrote:
[...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 1bb65e6f48dd..1682d766cb8e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2175,10 +2175,11 @@ zonelist_scan:
> >  		}
> >  
> >  try_this_zone:
> > -		page = buffered_rmqueue(preferred_zone, zone, order,
> > +		do {
> > +			page = buffered_rmqueue(preferred_zone, zone, order,
> >  						gfp_mask, migratetype);
> > -		if (page)
> > -			break;
> > +		} while (page && prep_new_page(page, order, gfp_mask,
> > +					       alloc_flags));
> 
> Hm but here we wouldn't return page on success.

Right.

> I wonder if you overlooked the return, hence your "not breaking out of
> the loop" remark?

This was merely to show the intention. Sorry for not being clear enough.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-06 21:44       ` Michal Hocko
@ 2015-01-07  9:36         ` Vlastimil Babka
  2015-01-07 10:54           ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-07  9:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On 01/06/2015 10:44 PM, Michal Hocko wrote:
> On Tue 06-01-15 22:10:55, Vlastimil Babka wrote:
>> On 01/06/2015 03:30 PM, Michal Hocko wrote:
> [...]
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 1bb65e6f48dd..1682d766cb8e 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -2175,10 +2175,11 @@ zonelist_scan:
>> >  		}
>> >  
>> >  try_this_zone:
>> > -		page = buffered_rmqueue(preferred_zone, zone, order,
>> > +		do {
>> > +			page = buffered_rmqueue(preferred_zone, zone, order,
>> >  						gfp_mask, migratetype);
>> > -		if (page)
>> > -			break;
>> > +		} while (page && prep_new_page(page, order, gfp_mask,
>> > +					       alloc_flags));
>> 
>> Hm but here we wouldn't return page on success.
> 
> Right.
> 
>> I wonder if you overlooked the return, hence your "not breaking out of
>> the loop" remark?
> 
> This was merely to show the intention. Sorry for not being clear enough.

OK, but I don't see other way than to follow this do-while with another

if (page)
    return page;

So I think it would be more complicated than now. We wouldn't even be able to
remove the 'try_this_zone' label, since it's used for goto from elsewhere as well.

Now that I'm thinking of it, maybe we should have a "goto zonelist_scan" there
instead. We discard a bad page and might come below the watermarks. But the
chances of this mattering are tiny I guess.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page()
  2015-01-07  9:36         ` Vlastimil Babka
@ 2015-01-07 10:54           ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2015-01-07 10:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Wed 07-01-15 10:36:58, Vlastimil Babka wrote:
> On 01/06/2015 10:44 PM, Michal Hocko wrote:
> > On Tue 06-01-15 22:10:55, Vlastimil Babka wrote:
> >> On 01/06/2015 03:30 PM, Michal Hocko wrote:
> > [...]
> >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> > index 1bb65e6f48dd..1682d766cb8e 100644
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -2175,10 +2175,11 @@ zonelist_scan:
> >> >  		}
> >> >  
> >> >  try_this_zone:
> >> > -		page = buffered_rmqueue(preferred_zone, zone, order,
> >> > +		do {
> >> > +			page = buffered_rmqueue(preferred_zone, zone, order,
> >> >  						gfp_mask, migratetype);
> >> > -		if (page)
> >> > -			break;
> >> > +		} while (page && prep_new_page(page, order, gfp_mask,
> >> > +					       alloc_flags));
> >> 
> >> Hm but here we wouldn't return page on success.
> > 
> > Right.
> > 
> >> I wonder if you overlooked the return, hence your "not breaking out of
> >> the loop" remark?
> > 
> > This was merely to show the intention. Sorry for not being clear enough.
> 
> OK, but I don't see other way than to follow this do-while with another
> 
> if (page)
>     return page;
> 
> So I think it would be more complicated than now. We wouldn't even be able to
> remove the 'try_this_zone' label, since it's used for goto from elsewhere as well.

Getting rid of the label wasn't the intention. I just found the
allocation retry easier to follow this way. I have no objection if you
keep the code as is.

> Now that I'm thinking of it, maybe we should have a "goto zonelist_scan" there
> instead. We discard a bad page and might come below the watermarks. But the
> chances of this mattering are tiny I guess.

If anything this would be worth a separate patch.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters
  2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka
  2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka
@ 2015-01-05 17:17 ` Vlastimil Babka
  2015-01-06 14:45   ` Michal Hocko
  2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka
  2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka
  3 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko

Introduce struct alloc_context to accumulate the numerous parameters passed
between the alloc_pages* family of functions and get_page_from_freelist().
This excludes gfp_flags and alloc_info, which mutate too much along the way,
and allocation order, which is conceptually different.

The result is shorter function signatures, as well as overal code size and
stack usage reductions.

bloat-o-meter:

add/remove: 0/0 grow/shrink: 1/2 up/down: 127/-371 (-244)
function                                     old     new   delta
get_page_from_freelist                      2525    2652    +127
__alloc_pages_direct_compact                 329     283     -46
__alloc_pages_nodemask                      2507    2182    -325

checkstack.pl:

function                            old    new
__alloc_pages_nodemask              216    184
get_page_from_freelist              168    184
__alloc_pages_direct_compact         40     24

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
---
 mm/page_alloc.c | 221 +++++++++++++++++++++++++-------------------------------
 1 file changed, 100 insertions(+), 121 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c77a97..bf0359c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -232,6 +232,19 @@ EXPORT_SYMBOL(nr_node_ids);
 EXPORT_SYMBOL(nr_online_nodes);
 #endif
 
+/*
+ * Structure for holding the mostly immutable allocation parameters passed
+ * between alloc_pages* family of functions.
+ */
+struct alloc_context {
+	struct zonelist *zonelist;
+	nodemask_t *nodemask;
+	struct zone *preferred_zone;
+	int classzone_idx;
+	int migratetype;
+	enum zone_type high_zoneidx;
+};
+
 int page_group_by_mobility_disabled __read_mostly;
 
 void set_pageblock_migratetype(struct page *page, int migratetype)
@@ -2037,10 +2050,10 @@ static void reset_alloc_batches(struct zone *preferred_zone)
  * a page.
  */
 static struct page *
-get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
-		struct zonelist *zonelist, int high_zoneidx, int alloc_flags,
-		struct zone *preferred_zone, int classzone_idx, int migratetype)
+get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
+						const struct alloc_context *ac)
 {
+	struct zonelist *zonelist = ac->zonelist;
 	struct zoneref *z;
 	struct page *page = NULL;
 	struct zone *zone;
@@ -2059,8 +2072,8 @@ zonelist_scan:
 	 * Scan zonelist, looking for a zone with enough free.
 	 * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
 	 */
-	for_each_zone_zonelist_nodemask(zone, z, zonelist,
-						high_zoneidx, nodemask) {
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
+								ac->nodemask) {
 		unsigned long mark;
 
 		if (IS_ENABLED(CONFIG_NUMA) && zlc_active &&
@@ -2077,7 +2090,7 @@ zonelist_scan:
 		 * time the page has in memory before being reclaimed.
 		 */
 		if (alloc_flags & ALLOC_FAIR) {
-			if (!zone_local(preferred_zone, zone))
+			if (!zone_local(ac->preferred_zone, zone))
 				break;
 			if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
 				nr_fair_skipped++;
@@ -2115,7 +2128,7 @@ zonelist_scan:
 
 		mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 		if (!zone_watermark_ok(zone, order, mark,
-				       classzone_idx, alloc_flags)) {
+				       ac->classzone_idx, alloc_flags)) {
 			int ret;
 
 			/* Checked here to keep the fast path fast */
@@ -2136,7 +2149,7 @@ zonelist_scan:
 			}
 
 			if (zone_reclaim_mode == 0 ||
-			    !zone_allows_reclaim(preferred_zone, zone))
+			    !zone_allows_reclaim(ac->preferred_zone, zone))
 				goto this_zone_full;
 
 			/*
@@ -2158,7 +2171,7 @@ zonelist_scan:
 			default:
 				/* did we reclaim enough */
 				if (zone_watermark_ok(zone, order, mark,
-						classzone_idx, alloc_flags))
+						ac->classzone_idx, alloc_flags))
 					goto try_this_zone;
 
 				/*
@@ -2179,8 +2192,8 @@ zonelist_scan:
 		}
 
 try_this_zone:
-		page = buffered_rmqueue(preferred_zone, zone, order,
-						gfp_mask, migratetype);
+		page = buffered_rmqueue(ac->preferred_zone, zone, order,
+						gfp_mask, ac->migratetype);
 		if (page) {
 			if (prep_new_page(page, order, gfp_mask, alloc_flags))
 				goto try_this_zone;
@@ -2203,7 +2216,7 @@ this_zone_full:
 		alloc_flags &= ~ALLOC_FAIR;
 		if (nr_fair_skipped) {
 			zonelist_rescan = true;
-			reset_alloc_batches(preferred_zone);
+			reset_alloc_batches(ac->preferred_zone);
 		}
 		if (nr_online_nodes > 1)
 			zonelist_rescan = true;
@@ -2325,9 +2338,7 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
 
 static inline struct page *
 __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, struct zone *preferred_zone,
-	int classzone_idx, int migratetype, unsigned long *did_some_progress)
+	const struct alloc_context *ac, unsigned long *did_some_progress)
 {
 	struct page *page;
 
@@ -2340,7 +2351,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 * Acquire the per-zone oom lock for each zone.  If that
 	 * fails, somebody else is making progress for us.
 	 */
-	if (!oom_zonelist_trylock(zonelist, gfp_mask)) {
+	if (!oom_zonelist_trylock(ac->zonelist, gfp_mask)) {
 		*did_some_progress = 1;
 		schedule_timeout_uninterruptible(1);
 		return NULL;
@@ -2359,10 +2370,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 * here, this is only to catch a parallel oom killing, we must fail if
 	 * we're still under heavy pressure.
 	 */
-	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
-		order, zonelist, high_zoneidx,
-		ALLOC_WMARK_HIGH|ALLOC_CPUSET,
-		preferred_zone, classzone_idx, migratetype);
+	page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
+					ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
 	if (page)
 		goto out;
 
@@ -2374,7 +2383,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		if (order > PAGE_ALLOC_COSTLY_ORDER)
 			goto out;
 		/* The OOM killer does not needlessly kill tasks for lowmem */
-		if (high_zoneidx < ZONE_NORMAL)
+		if (ac->high_zoneidx < ZONE_NORMAL)
 			goto out;
 		/* The OOM killer does not compensate for light reclaim */
 		if (!(gfp_mask & __GFP_FS))
@@ -2390,10 +2399,10 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 			goto out;
 	}
 	/* Exhausted what can be done so it's blamo time */
-	out_of_memory(zonelist, gfp_mask, order, nodemask, false);
+	out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false);
 	*did_some_progress = 1;
 out:
-	oom_zonelist_unlock(zonelist, gfp_mask);
+	oom_zonelist_unlock(ac->zonelist, gfp_mask);
 	return page;
 }
 
@@ -2401,10 +2410,9 @@ out:
 /* Try memory compaction for high-order allocations before reclaim */
 static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int classzone_idx, int migratetype, enum migrate_mode mode,
-	int *contended_compaction, bool *deferred_compaction)
+		int alloc_flags, const struct alloc_context *ac,
+		enum migrate_mode mode, int *contended_compaction,
+		bool *deferred_compaction)
 {
 	unsigned long compact_result;
 	struct page *page;
@@ -2413,10 +2421,10 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 
 	current->flags |= PF_MEMALLOC;
-	compact_result = try_to_compact_pages(zonelist, order, gfp_mask,
-						nodemask, mode,
+	compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask,
+						ac->nodemask, mode,
 						contended_compaction,
-						alloc_flags, classzone_idx);
+						alloc_flags, ac->classzone_idx);
 	current->flags &= ~PF_MEMALLOC;
 
 	switch (compact_result) {
@@ -2435,10 +2443,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTSTALL);
 
-	page = get_page_from_freelist(gfp_mask, nodemask,
-			order, zonelist, high_zoneidx,
-			alloc_flags & ~ALLOC_NO_WATERMARKS,
-			preferred_zone, classzone_idx, migratetype);
+	page = get_page_from_freelist(gfp_mask, order,
+					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
@@ -2462,10 +2468,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 #else
 static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int classzone_idx, int migratetype, enum migrate_mode mode,
-	int *contended_compaction, bool *deferred_compaction)
+		int alloc_flags, const struct alloc_context *ac,
+		enum migrate_mode mode, int *contended_compaction,
+		bool *deferred_compaction)
 {
 	return NULL;
 }
@@ -2473,8 +2478,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 /* Perform direct synchronous page reclaim */
 static int
-__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
-		  nodemask_t *nodemask)
+__perform_reclaim(gfp_t gfp_mask, unsigned int order,
+					const struct alloc_context *ac)
 {
 	struct reclaim_state reclaim_state;
 	int progress;
@@ -2488,7 +2493,8 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
+	progress = try_to_free_pages(ac->zonelist, order, gfp_mask,
+								ac->nodemask);
 
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
@@ -2502,28 +2508,23 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int classzone_idx, int migratetype, unsigned long *did_some_progress)
+		int alloc_flags, const struct alloc_context *ac,
+		unsigned long *did_some_progress)
 {
 	struct page *page = NULL;
 	bool drained = false;
 
-	*did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
-					       nodemask);
+	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
 	/* After successful reclaim, reconsider all zones for allocation */
 	if (IS_ENABLED(CONFIG_NUMA))
-		zlc_clear_zones_full(zonelist);
+		zlc_clear_zones_full(ac->zonelist);
 
 retry:
-	page = get_page_from_freelist(gfp_mask, nodemask, order,
-					zonelist, high_zoneidx,
-					alloc_flags & ~ALLOC_NO_WATERMARKS,
-					preferred_zone, classzone_idx,
-					migratetype);
+	page = get_page_from_freelist(gfp_mask, order,
+					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -2544,36 +2545,30 @@ retry:
  */
 static inline struct page *
 __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, struct zone *preferred_zone,
-	int classzone_idx, int migratetype)
+				const struct alloc_context *ac)
 {
 	struct page *page;
 
 	do {
-		page = get_page_from_freelist(gfp_mask, nodemask, order,
-			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
-			preferred_zone, classzone_idx, migratetype);
+		page = get_page_from_freelist(gfp_mask, order,
+						ALLOC_NO_WATERMARKS, ac);
 
 		if (!page && gfp_mask & __GFP_NOFAIL)
-			wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
+			wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC,
+									HZ/50);
 	} while (!page && (gfp_mask & __GFP_NOFAIL));
 
 	return page;
 }
 
-static void wake_all_kswapds(unsigned int order,
-			     struct zonelist *zonelist,
-			     enum zone_type high_zoneidx,
-			     struct zone *preferred_zone,
-			     nodemask_t *nodemask)
+static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
 {
 	struct zoneref *z;
 	struct zone *zone;
 
-	for_each_zone_zonelist_nodemask(zone, z, zonelist,
-						high_zoneidx, nodemask)
-		wakeup_kswapd(zone, order, zone_idx(preferred_zone));
+	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
+						ac->high_zoneidx, ac->nodemask)
+		wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
 }
 
 static inline int
@@ -2632,9 +2627,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 
 static inline struct page *
 __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, struct zone *preferred_zone,
-	int classzone_idx, int migratetype)
+						struct alloc_context *ac)
 {
 	const gfp_t wait = gfp_mask & __GFP_WAIT;
 	struct page *page = NULL;
@@ -2669,8 +2662,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto nopage;
 
 	if (!(gfp_mask & __GFP_NO_KSWAPD))
-		wake_all_kswapds(order, zonelist, high_zoneidx,
-				preferred_zone, nodemask);
+		wake_all_kswapds(order, ac);
 
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
@@ -2683,18 +2675,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Find the true preferred zone if the allocation is unconstrained by
 	 * cpusets.
 	 */
-	if (!(alloc_flags & ALLOC_CPUSET) && !nodemask) {
+	if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) {
 		struct zoneref *preferred_zoneref;
-		preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx,
-				NULL, &preferred_zone);
-		classzone_idx = zonelist_zone_idx(preferred_zoneref);
+		preferred_zoneref = first_zones_zonelist(ac->zonelist,
+				ac->high_zoneidx, NULL, &ac->preferred_zone);
+		ac->classzone_idx = zonelist_zone_idx(preferred_zoneref);
 	}
 
 rebalance:
 	/* This is the last chance, in general, before the goto nopage. */
-	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
-			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
-			preferred_zone, classzone_idx, migratetype);
+	page = get_page_from_freelist(gfp_mask, order,
+				alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
 	if (page)
 		goto got_pg;
 
@@ -2705,11 +2696,10 @@ rebalance:
 		 * the allocation is high priority and these type of
 		 * allocations are system rather than user orientated
 		 */
-		zonelist = node_zonelist(numa_node_id(), gfp_mask);
+		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+
+		page = __alloc_pages_high_priority(gfp_mask, order, ac);
 
-		page = __alloc_pages_high_priority(gfp_mask, order,
-				zonelist, high_zoneidx, nodemask,
-				preferred_zone, classzone_idx, migratetype);
 		if (page) {
 			goto got_pg;
 		}
@@ -2738,11 +2728,9 @@ rebalance:
 	 * Try direct compaction. The first pass is asynchronous. Subsequent
 	 * attempts after direct reclaim are synchronous
 	 */
-	page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
-					high_zoneidx, nodemask, alloc_flags,
-					preferred_zone,
-					classzone_idx, migratetype,
-					migration_mode, &contended_compaction,
+	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
+					migration_mode,
+					&contended_compaction,
 					&deferred_compaction);
 	if (page)
 		goto got_pg;
@@ -2788,12 +2776,8 @@ rebalance:
 		migration_mode = MIGRATE_SYNC_LIGHT;
 
 	/* Try direct reclaim and then allocating */
-	page = __alloc_pages_direct_reclaim(gfp_mask, order,
-					zonelist, high_zoneidx,
-					nodemask,
-					alloc_flags, preferred_zone,
-					classzone_idx, migratetype,
-					&did_some_progress);
+	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
+							&did_some_progress);
 	if (page)
 		goto got_pg;
 
@@ -2807,10 +2791,8 @@ rebalance:
 		 * start OOM killing tasks.
 		 */
 		if (!did_some_progress) {
-			page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
-						high_zoneidx, nodemask,
-						preferred_zone, classzone_idx,
-						migratetype,&did_some_progress);
+			page = __alloc_pages_may_oom(gfp_mask, order, ac,
+							&did_some_progress);
 			if (page)
 				goto got_pg;
 			if (!did_some_progress) {
@@ -2819,7 +2801,7 @@ rebalance:
 			}
 		}
 		/* Wait for some write requests to complete then retry */
-		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
+		wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
 		goto rebalance;
 	} else {
 		/*
@@ -2827,11 +2809,9 @@ rebalance:
 		 * direct reclaim and reclaim/compaction depends on compaction
 		 * being called after reclaim so call directly if necessary
 		 */
-		page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
-					high_zoneidx, nodemask, alloc_flags,
-					preferred_zone,
-					classzone_idx, migratetype,
-					migration_mode, &contended_compaction,
+		page = __alloc_pages_direct_compact(gfp_mask, order,
+					alloc_flags, ac, migration_mode,
+					&contended_compaction,
 					&deferred_compaction);
 		if (page)
 			goto got_pg;
@@ -2854,15 +2834,16 @@ struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 			struct zonelist *zonelist, nodemask_t *nodemask)
 {
-	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
-	struct zone *preferred_zone;
 	struct zoneref *preferred_zoneref;
 	struct page *page = NULL;
-	int migratetype = gfpflags_to_migratetype(gfp_mask);
 	unsigned int cpuset_mems_cookie;
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
-	int classzone_idx;
 	gfp_t mask;
+	struct alloc_context ac = {
+		.high_zoneidx = gfp_zone(gfp_mask),
+		.nodemask = nodemask,
+		.migratetype = gfpflags_to_migratetype(gfp_mask),
+	};
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2881,25 +2862,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE)
+	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
+	/* We set it here, as __alloc_pages_slowpath might have changed it */
+	ac.zonelist = zonelist;
 	/* The preferred zone is used for statistics later */
-	preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx,
-				nodemask ? : &cpuset_current_mems_allowed,
-				&preferred_zone);
-	if (!preferred_zone)
+	preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
+				ac.nodemask ? : &cpuset_current_mems_allowed,
+				&ac.preferred_zone);
+	if (!ac.preferred_zone)
 		goto out;
-	classzone_idx = zonelist_zone_idx(preferred_zoneref);
+	ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
 	/* First allocation attempt */
 	mask = gfp_mask|__GFP_HARDWALL;
-	page = get_page_from_freelist(mask, nodemask, order, zonelist,
-			high_zoneidx, alloc_flags, preferred_zone,
-			classzone_idx, migratetype);
+	page = get_page_from_freelist(mask, order, alloc_flags, &ac);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
@@ -2908,12 +2889,10 @@ retry_cpuset:
 		 */
 		mask = memalloc_noio_flags(gfp_mask);
 
-		page = __alloc_pages_slowpath(mask, order,
-				zonelist, high_zoneidx, nodemask,
-				preferred_zone, classzone_idx, migratetype);
+		page = __alloc_pages_slowpath(mask, order, &ac);
 	}
 
-	trace_mm_page_alloc(page, order, mask, migratetype);
+	trace_mm_page_alloc(page, order, mask, ac.migratetype);
 
 out:
 	/*
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters
  2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka
@ 2015-01-06 14:45   ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 14:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Mon 05-01-15 18:17:41, Vlastimil Babka wrote:
> Introduce struct alloc_context to accumulate the numerous parameters passed
> between the alloc_pages* family of functions and get_page_from_freelist().
> This excludes gfp_flags and alloc_info, which mutate too much along the way,
> and allocation order, which is conceptually different.
> 
> The result is shorter function signatures, as well as overal code size and
> stack usage reductions.
> 
> bloat-o-meter:
> 
> add/remove: 0/0 grow/shrink: 1/2 up/down: 127/-371 (-244)
> function                                     old     new   delta
> get_page_from_freelist                      2525    2652    +127
> __alloc_pages_direct_compact                 329     283     -46
> __alloc_pages_nodemask                      2507    2182    -325
> 
> checkstack.pl:
> 
> function                            old    new
> __alloc_pages_nodemask              216    184
> get_page_from_freelist              168    184
> __alloc_pages_direct_compact         40     24
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Hocko <mhocko@suse.cz>

Looks good to me. I would just mention fields which might be reseted and
where.

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/page_alloc.c | 221 +++++++++++++++++++++++++-------------------------------
>  1 file changed, 100 insertions(+), 121 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c77a97..bf0359c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -232,6 +232,19 @@ EXPORT_SYMBOL(nr_node_ids);
>  EXPORT_SYMBOL(nr_online_nodes);
>  #endif
>  
> +/*
> + * Structure for holding the mostly immutable allocation parameters passed
> + * between alloc_pages* family of functions.
> + */
> +struct alloc_context {
> +	struct zonelist *zonelist;
> +	nodemask_t *nodemask;
> +	struct zone *preferred_zone;
> +	int classzone_idx;
> +	int migratetype;
> +	enum zone_type high_zoneidx;
> +};
> +
>  int page_group_by_mobility_disabled __read_mostly;
>  
>  void set_pageblock_migratetype(struct page *page, int migratetype)
> @@ -2037,10 +2050,10 @@ static void reset_alloc_batches(struct zone *preferred_zone)
>   * a page.
>   */
>  static struct page *
> -get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
> -		struct zonelist *zonelist, int high_zoneidx, int alloc_flags,
> -		struct zone *preferred_zone, int classzone_idx, int migratetype)
> +get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> +						const struct alloc_context *ac)
>  {
> +	struct zonelist *zonelist = ac->zonelist;
>  	struct zoneref *z;
>  	struct page *page = NULL;
>  	struct zone *zone;
> @@ -2059,8 +2072,8 @@ zonelist_scan:
>  	 * Scan zonelist, looking for a zone with enough free.
>  	 * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
>  	 */
> -	for_each_zone_zonelist_nodemask(zone, z, zonelist,
> -						high_zoneidx, nodemask) {
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
> +								ac->nodemask) {
>  		unsigned long mark;
>  
>  		if (IS_ENABLED(CONFIG_NUMA) && zlc_active &&
> @@ -2077,7 +2090,7 @@ zonelist_scan:
>  		 * time the page has in memory before being reclaimed.
>  		 */
>  		if (alloc_flags & ALLOC_FAIR) {
> -			if (!zone_local(preferred_zone, zone))
> +			if (!zone_local(ac->preferred_zone, zone))
>  				break;
>  			if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
>  				nr_fair_skipped++;
> @@ -2115,7 +2128,7 @@ zonelist_scan:
>  
>  		mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>  		if (!zone_watermark_ok(zone, order, mark,
> -				       classzone_idx, alloc_flags)) {
> +				       ac->classzone_idx, alloc_flags)) {
>  			int ret;
>  
>  			/* Checked here to keep the fast path fast */
> @@ -2136,7 +2149,7 @@ zonelist_scan:
>  			}
>  
>  			if (zone_reclaim_mode == 0 ||
> -			    !zone_allows_reclaim(preferred_zone, zone))
> +			    !zone_allows_reclaim(ac->preferred_zone, zone))
>  				goto this_zone_full;
>  
>  			/*
> @@ -2158,7 +2171,7 @@ zonelist_scan:
>  			default:
>  				/* did we reclaim enough */
>  				if (zone_watermark_ok(zone, order, mark,
> -						classzone_idx, alloc_flags))
> +						ac->classzone_idx, alloc_flags))
>  					goto try_this_zone;
>  
>  				/*
> @@ -2179,8 +2192,8 @@ zonelist_scan:
>  		}
>  
>  try_this_zone:
> -		page = buffered_rmqueue(preferred_zone, zone, order,
> -						gfp_mask, migratetype);
> +		page = buffered_rmqueue(ac->preferred_zone, zone, order,
> +						gfp_mask, ac->migratetype);
>  		if (page) {
>  			if (prep_new_page(page, order, gfp_mask, alloc_flags))
>  				goto try_this_zone;
> @@ -2203,7 +2216,7 @@ this_zone_full:
>  		alloc_flags &= ~ALLOC_FAIR;
>  		if (nr_fair_skipped) {
>  			zonelist_rescan = true;
> -			reset_alloc_batches(preferred_zone);
> +			reset_alloc_batches(ac->preferred_zone);
>  		}
>  		if (nr_online_nodes > 1)
>  			zonelist_rescan = true;
> @@ -2325,9 +2338,7 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
>  
>  static inline struct page *
>  __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype, unsigned long *did_some_progress)
> +	const struct alloc_context *ac, unsigned long *did_some_progress)
>  {
>  	struct page *page;
>  
> @@ -2340,7 +2351,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	 * Acquire the per-zone oom lock for each zone.  If that
>  	 * fails, somebody else is making progress for us.
>  	 */
> -	if (!oom_zonelist_trylock(zonelist, gfp_mask)) {
> +	if (!oom_zonelist_trylock(ac->zonelist, gfp_mask)) {
>  		*did_some_progress = 1;
>  		schedule_timeout_uninterruptible(1);
>  		return NULL;
> @@ -2359,10 +2370,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	 * here, this is only to catch a parallel oom killing, we must fail if
>  	 * we're still under heavy pressure.
>  	 */
> -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
> -		order, zonelist, high_zoneidx,
> -		ALLOC_WMARK_HIGH|ALLOC_CPUSET,
> -		preferred_zone, classzone_idx, migratetype);
> +	page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
> +					ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
>  	if (page)
>  		goto out;
>  
> @@ -2374,7 +2383,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  		if (order > PAGE_ALLOC_COSTLY_ORDER)
>  			goto out;
>  		/* The OOM killer does not needlessly kill tasks for lowmem */
> -		if (high_zoneidx < ZONE_NORMAL)
> +		if (ac->high_zoneidx < ZONE_NORMAL)
>  			goto out;
>  		/* The OOM killer does not compensate for light reclaim */
>  		if (!(gfp_mask & __GFP_FS))
> @@ -2390,10 +2399,10 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  			goto out;
>  	}
>  	/* Exhausted what can be done so it's blamo time */
> -	out_of_memory(zonelist, gfp_mask, order, nodemask, false);
> +	out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false);
>  	*did_some_progress = 1;
>  out:
> -	oom_zonelist_unlock(zonelist, gfp_mask);
> +	oom_zonelist_unlock(ac->zonelist, gfp_mask);
>  	return page;
>  }
>  
> @@ -2401,10 +2410,9 @@ out:
>  /* Try memory compaction for high-order allocations before reclaim */
>  static struct page *
>  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype, enum migrate_mode mode,
> -	int *contended_compaction, bool *deferred_compaction)
> +		int alloc_flags, const struct alloc_context *ac,
> +		enum migrate_mode mode, int *contended_compaction,
> +		bool *deferred_compaction)
>  {
>  	unsigned long compact_result;
>  	struct page *page;
> @@ -2413,10 +2421,10 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  		return NULL;
>  
>  	current->flags |= PF_MEMALLOC;
> -	compact_result = try_to_compact_pages(zonelist, order, gfp_mask,
> -						nodemask, mode,
> +	compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask,
> +						ac->nodemask, mode,
>  						contended_compaction,
> -						alloc_flags, classzone_idx);
> +						alloc_flags, ac->classzone_idx);
>  	current->flags &= ~PF_MEMALLOC;
>  
>  	switch (compact_result) {
> @@ -2435,10 +2443,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	count_vm_event(COMPACTSTALL);
>  
> -	page = get_page_from_freelist(gfp_mask, nodemask,
> -			order, zonelist, high_zoneidx,
> -			alloc_flags & ~ALLOC_NO_WATERMARKS,
> -			preferred_zone, classzone_idx, migratetype);
> +	page = get_page_from_freelist(gfp_mask, order,
> +					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
>  
>  	if (page) {
>  		struct zone *zone = page_zone(page);
> @@ -2462,10 +2468,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  #else
>  static inline struct page *
>  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype, enum migrate_mode mode,
> -	int *contended_compaction, bool *deferred_compaction)
> +		int alloc_flags, const struct alloc_context *ac,
> +		enum migrate_mode mode, int *contended_compaction,
> +		bool *deferred_compaction)
>  {
>  	return NULL;
>  }
> @@ -2473,8 +2478,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  
>  /* Perform direct synchronous page reclaim */
>  static int
> -__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
> -		  nodemask_t *nodemask)
> +__perform_reclaim(gfp_t gfp_mask, unsigned int order,
> +					const struct alloc_context *ac)
>  {
>  	struct reclaim_state reclaim_state;
>  	int progress;
> @@ -2488,7 +2493,8 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
>  	reclaim_state.reclaimed_slab = 0;
>  	current->reclaim_state = &reclaim_state;
>  
> -	progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
> +	progress = try_to_free_pages(ac->zonelist, order, gfp_mask,
> +								ac->nodemask);
>  
>  	current->reclaim_state = NULL;
>  	lockdep_clear_current_reclaim_state();
> @@ -2502,28 +2508,23 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
>  /* The really slow allocator path where we enter direct reclaim */
>  static inline struct page *
>  __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype, unsigned long *did_some_progress)
> +		int alloc_flags, const struct alloc_context *ac,
> +		unsigned long *did_some_progress)
>  {
>  	struct page *page = NULL;
>  	bool drained = false;
>  
> -	*did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
> -					       nodemask);
> +	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>  	if (unlikely(!(*did_some_progress)))
>  		return NULL;
>  
>  	/* After successful reclaim, reconsider all zones for allocation */
>  	if (IS_ENABLED(CONFIG_NUMA))
> -		zlc_clear_zones_full(zonelist);
> +		zlc_clear_zones_full(ac->zonelist);
>  
>  retry:
> -	page = get_page_from_freelist(gfp_mask, nodemask, order,
> -					zonelist, high_zoneidx,
> -					alloc_flags & ~ALLOC_NO_WATERMARKS,
> -					preferred_zone, classzone_idx,
> -					migratetype);
> +	page = get_page_from_freelist(gfp_mask, order,
> +					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
>  
>  	/*
>  	 * If an allocation failed after direct reclaim, it could be because
> @@ -2544,36 +2545,30 @@ retry:
>   */
>  static inline struct page *
>  __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype)
> +				const struct alloc_context *ac)
>  {
>  	struct page *page;
>  
>  	do {
> -		page = get_page_from_freelist(gfp_mask, nodemask, order,
> -			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
> -			preferred_zone, classzone_idx, migratetype);
> +		page = get_page_from_freelist(gfp_mask, order,
> +						ALLOC_NO_WATERMARKS, ac);
>  
>  		if (!page && gfp_mask & __GFP_NOFAIL)
> -			wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> +			wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC,
> +									HZ/50);
>  	} while (!page && (gfp_mask & __GFP_NOFAIL));
>  
>  	return page;
>  }
>  
> -static void wake_all_kswapds(unsigned int order,
> -			     struct zonelist *zonelist,
> -			     enum zone_type high_zoneidx,
> -			     struct zone *preferred_zone,
> -			     nodemask_t *nodemask)
> +static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
>  {
>  	struct zoneref *z;
>  	struct zone *zone;
>  
> -	for_each_zone_zonelist_nodemask(zone, z, zonelist,
> -						high_zoneidx, nodemask)
> -		wakeup_kswapd(zone, order, zone_idx(preferred_zone));
> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
> +						ac->high_zoneidx, ac->nodemask)
> +		wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
>  }
>  
>  static inline int
> @@ -2632,9 +2627,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  
>  static inline struct page *
>  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype)
> +						struct alloc_context *ac)
>  {
>  	const gfp_t wait = gfp_mask & __GFP_WAIT;
>  	struct page *page = NULL;
> @@ -2669,8 +2662,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto nopage;
>  
>  	if (!(gfp_mask & __GFP_NO_KSWAPD))
> -		wake_all_kswapds(order, zonelist, high_zoneidx,
> -				preferred_zone, nodemask);
> +		wake_all_kswapds(order, ac);
>  
>  	/*
>  	 * OK, we're below the kswapd watermark and have kicked background
> @@ -2683,18 +2675,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * Find the true preferred zone if the allocation is unconstrained by
>  	 * cpusets.
>  	 */
> -	if (!(alloc_flags & ALLOC_CPUSET) && !nodemask) {
> +	if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) {
>  		struct zoneref *preferred_zoneref;
> -		preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx,
> -				NULL, &preferred_zone);
> -		classzone_idx = zonelist_zone_idx(preferred_zoneref);
> +		preferred_zoneref = first_zones_zonelist(ac->zonelist,
> +				ac->high_zoneidx, NULL, &ac->preferred_zone);
> +		ac->classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  	}
>  
>  rebalance:
>  	/* This is the last chance, in general, before the goto nopage. */
> -	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
> -			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
> -			preferred_zone, classzone_idx, migratetype);
> +	page = get_page_from_freelist(gfp_mask, order,
> +				alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
>  	if (page)
>  		goto got_pg;
>  
> @@ -2705,11 +2696,10 @@ rebalance:
>  		 * the allocation is high priority and these type of
>  		 * allocations are system rather than user orientated
>  		 */
> -		zonelist = node_zonelist(numa_node_id(), gfp_mask);
> +		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> +
> +		page = __alloc_pages_high_priority(gfp_mask, order, ac);
>  
> -		page = __alloc_pages_high_priority(gfp_mask, order,
> -				zonelist, high_zoneidx, nodemask,
> -				preferred_zone, classzone_idx, migratetype);
>  		if (page) {
>  			goto got_pg;
>  		}
> @@ -2738,11 +2728,9 @@ rebalance:
>  	 * Try direct compaction. The first pass is asynchronous. Subsequent
>  	 * attempts after direct reclaim are synchronous
>  	 */
> -	page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
> -					high_zoneidx, nodemask, alloc_flags,
> -					preferred_zone,
> -					classzone_idx, migratetype,
> -					migration_mode, &contended_compaction,
> +	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
> +					migration_mode,
> +					&contended_compaction,
>  					&deferred_compaction);
>  	if (page)
>  		goto got_pg;
> @@ -2788,12 +2776,8 @@ rebalance:
>  		migration_mode = MIGRATE_SYNC_LIGHT;
>  
>  	/* Try direct reclaim and then allocating */
> -	page = __alloc_pages_direct_reclaim(gfp_mask, order,
> -					zonelist, high_zoneidx,
> -					nodemask,
> -					alloc_flags, preferred_zone,
> -					classzone_idx, migratetype,
> -					&did_some_progress);
> +	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
> +							&did_some_progress);
>  	if (page)
>  		goto got_pg;
>  
> @@ -2807,10 +2791,8 @@ rebalance:
>  		 * start OOM killing tasks.
>  		 */
>  		if (!did_some_progress) {
> -			page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
> -						high_zoneidx, nodemask,
> -						preferred_zone, classzone_idx,
> -						migratetype,&did_some_progress);
> +			page = __alloc_pages_may_oom(gfp_mask, order, ac,
> +							&did_some_progress);
>  			if (page)
>  				goto got_pg;
>  			if (!did_some_progress) {
> @@ -2819,7 +2801,7 @@ rebalance:
>  			}
>  		}
>  		/* Wait for some write requests to complete then retry */
> -		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> +		wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
>  		goto rebalance;
>  	} else {
>  		/*
> @@ -2827,11 +2809,9 @@ rebalance:
>  		 * direct reclaim and reclaim/compaction depends on compaction
>  		 * being called after reclaim so call directly if necessary
>  		 */
> -		page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
> -					high_zoneidx, nodemask, alloc_flags,
> -					preferred_zone,
> -					classzone_idx, migratetype,
> -					migration_mode, &contended_compaction,
> +		page = __alloc_pages_direct_compact(gfp_mask, order,
> +					alloc_flags, ac, migration_mode,
> +					&contended_compaction,
>  					&deferred_compaction);
>  		if (page)
>  			goto got_pg;
> @@ -2854,15 +2834,16 @@ struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  			struct zonelist *zonelist, nodemask_t *nodemask)
>  {
> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> -	struct zone *preferred_zone;
>  	struct zoneref *preferred_zoneref;
>  	struct page *page = NULL;
> -	int migratetype = gfpflags_to_migratetype(gfp_mask);
>  	unsigned int cpuset_mems_cookie;
>  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> -	int classzone_idx;
>  	gfp_t mask;
> +	struct alloc_context ac = {
> +		.high_zoneidx = gfp_zone(gfp_mask),
> +		.nodemask = nodemask,
> +		.migratetype = gfpflags_to_migratetype(gfp_mask),
> +	};
>  
>  	gfp_mask &= gfp_allowed_mask;
>  
> @@ -2881,25 +2862,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  	if (unlikely(!zonelist->_zonerefs->zone))
>  		return NULL;
>  
> -	if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE)
> +	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
>  		alloc_flags |= ALLOC_CMA;
>  
>  retry_cpuset:
>  	cpuset_mems_cookie = read_mems_allowed_begin();
>  
> +	/* We set it here, as __alloc_pages_slowpath might have changed it */
> +	ac.zonelist = zonelist;
>  	/* The preferred zone is used for statistics later */
> -	preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx,
> -				nodemask ? : &cpuset_current_mems_allowed,
> -				&preferred_zone);
> -	if (!preferred_zone)
> +	preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
> +				ac.nodemask ? : &cpuset_current_mems_allowed,
> +				&ac.preferred_zone);
> +	if (!ac.preferred_zone)
>  		goto out;
> -	classzone_idx = zonelist_zone_idx(preferred_zoneref);
> +	ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  
>  	/* First allocation attempt */
>  	mask = gfp_mask|__GFP_HARDWALL;
> -	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> -			high_zoneidx, alloc_flags, preferred_zone,
> -			classzone_idx, migratetype);
> +	page = get_page_from_freelist(mask, order, alloc_flags, &ac);
>  	if (unlikely(!page)) {
>  		/*
>  		 * Runtime PM, block IO and its error handling path
> @@ -2908,12 +2889,10 @@ retry_cpuset:
>  		 */
>  		mask = memalloc_noio_flags(gfp_mask);
>  
> -		page = __alloc_pages_slowpath(mask, order,
> -				zonelist, high_zoneidx, nodemask,
> -				preferred_zone, classzone_idx, migratetype);
> +		page = __alloc_pages_slowpath(mask, order, &ac);
>  	}
>  
> -	trace_mm_page_alloc(page, order, mask, migratetype);
> +	trace_mm_page_alloc(page, order, mask, ac.migratetype);
>  
>  out:
>  	/*
> -- 
> 2.1.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka
  2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka
  2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka
@ 2015-01-05 17:17 ` Vlastimil Babka
  2015-01-06 14:53   ` Michal Hocko
  2015-01-06 14:57   ` Michal Hocko
  2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka
  3 siblings, 2 replies; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko

Expand the usage of the struct alloc_context introduced in the previous patch
also for calling try_to_compact_pages(), to reduce the number of its
parameters. Since the function is in different compilation unit, we need to
move alloc_context definition in the shared mm/internal.h header.

With this change we get simpler code and small savings of code size and stack
usage:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
function                                     old     new   delta
__alloc_pages_direct_compact                 283     256     -27
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
function                                     old     new   delta
try_to_compact_pages                         582     569     -13

Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
scripts/checkstack.pl).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
---
 include/linux/compaction.h | 17 +++++++++--------
 mm/compaction.c            | 23 +++++++++++------------
 mm/internal.h              | 14 ++++++++++++++
 mm/page_alloc.c            | 19 ++-----------------
 4 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 3238ffa..f2efda2 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -21,6 +21,8 @@
 /* Zone lock or lru_lock was contended in async compaction */
 #define COMPACT_CONTENDED_LOCK	2
 
+struct alloc_context; /* in mm/internal.h */
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
@@ -30,10 +32,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
-extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *mask,
-			enum migrate_mode mode, int *contended,
-			int alloc_flags, int classzone_idx);
+extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
+			int alloc_flags, const struct alloc_context *ac,
+			enum migrate_mode mode, int *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern unsigned long compaction_suitable(struct zone *zone, int order,
@@ -101,10 +102,10 @@ static inline bool compaction_restarting(struct zone *zone, int order)
 }
 
 #else
-static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			enum migrate_mode mode, int *contended,
-			int alloc_flags, int classzone_idx)
+static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
+			unsigned int order, int alloc_flags,
+			const struct alloc_context *ac,
+			enum migrate_mode mode, int *contended)
 {
 	return COMPACT_CONTINUE;
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 546e571..9c7e690 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1335,22 +1335,20 @@ int sysctl_extfrag_threshold = 500;
 
 /**
  * try_to_compact_pages - Direct compact to satisfy a high-order allocation
- * @zonelist: The zonelist used for the current allocation
- * @order: The order of the current allocation
  * @gfp_mask: The GFP mask of the current allocation
- * @nodemask: The allowed nodes to allocate from
+ * @order: The order of the current allocation
+ * @alloc_flags: The allocation flags of the current allocation
+ * @ac: The context of current allocation
  * @mode: The migration mode for async, sync light, or sync migration
  * @contended: Return value that determines if compaction was aborted due to
  *	       need_resched() or lock contention
  *
  * This is the main entry point for direct page compaction.
  */
-unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			enum migrate_mode mode, int *contended,
-			int alloc_flags, int classzone_idx)
+unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
+			int alloc_flags, const struct alloc_context *ac,
+			enum migrate_mode mode, int *contended)
 {
-	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
@@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 		return COMPACT_SKIPPED;
 
 	/* Compact each zone in the list */
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
-								nodemask) {
+	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
+								ac->nodemask) {
 		int status;
 		int zone_contended;
 
@@ -1374,7 +1372,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			continue;
 
 		status = compact_zone_order(zone, order, gfp_mask, mode,
-				&zone_contended, alloc_flags, classzone_idx);
+				&zone_contended, alloc_flags,
+				ac->classzone_idx);
 		rc = max(status, rc);
 		/*
 		 * It takes at least one zone that wasn't lock contended
@@ -1384,7 +1383,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
-					classzone_idx, alloc_flags)) {
+					ac->classzone_idx, alloc_flags)) {
 			/*
 			 * We think the allocation will succeed in this zone,
 			 * but it is not certain, hence the false. The caller
diff --git a/mm/internal.h b/mm/internal.h
index efad241..cd5418b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -110,6 +110,20 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
  */
 
 /*
+ * Structure for holding the mostly immutable allocation parameters passed
+ * between functions involved in allocations, including the alloc_pages*
+ * family of functions.
+ */
+struct alloc_context {
+	struct zonelist *zonelist;
+	nodemask_t *nodemask;
+	struct zone *preferred_zone;
+	int classzone_idx;
+	int migratetype;
+	enum zone_type high_zoneidx;
+};
+
+/*
  * Locate the struct page for both the matching buddy in our
  * pair (buddy1) and the combined O(n+1) page they form (page).
  *
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf0359c..f5f5e2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -232,19 +232,6 @@ EXPORT_SYMBOL(nr_node_ids);
 EXPORT_SYMBOL(nr_online_nodes);
 #endif
 
-/*
- * Structure for holding the mostly immutable allocation parameters passed
- * between alloc_pages* family of functions.
- */
-struct alloc_context {
-	struct zonelist *zonelist;
-	nodemask_t *nodemask;
-	struct zone *preferred_zone;
-	int classzone_idx;
-	int migratetype;
-	enum zone_type high_zoneidx;
-};
-
 int page_group_by_mobility_disabled __read_mostly;
 
 void set_pageblock_migratetype(struct page *page, int migratetype)
@@ -2421,10 +2408,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 
 	current->flags |= PF_MEMALLOC;
-	compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask,
-						ac->nodemask, mode,
-						contended_compaction,
-						alloc_flags, ac->classzone_idx);
+	compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
+						mode, contended_compaction);
 	current->flags &= ~PF_MEMALLOC;
 
 	switch (compact_result) {
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka
@ 2015-01-06 14:53   ` Michal Hocko
  2015-01-06 14:57   ` Michal Hocko
  1 sibling, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 14:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Mon 05-01-15 18:17:42, Vlastimil Babka wrote:
> Expand the usage of the struct alloc_context introduced in the previous patch
> also for calling try_to_compact_pages(), to reduce the number of its
> parameters. Since the function is in different compilation unit, we need to
> move alloc_context definition in the shared mm/internal.h header.
> 
> With this change we get simpler code and small savings of code size and stack
> usage:
> 
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
> function                                     old     new   delta
> __alloc_pages_direct_compact                 283     256     -27
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
> function                                     old     new   delta
> try_to_compact_pages                         582     569     -13
> 
> Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
> scripts/checkstack.pl).
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Hocko <mhocko@suse.cz>

Looks good as well.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/compaction.h | 17 +++++++++--------
>  mm/compaction.c            | 23 +++++++++++------------
>  mm/internal.h              | 14 ++++++++++++++
>  mm/page_alloc.c            | 19 ++-----------------
>  4 files changed, 36 insertions(+), 37 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 3238ffa..f2efda2 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -21,6 +21,8 @@
>  /* Zone lock or lru_lock was contended in async compaction */
>  #define COMPACT_CONTENDED_LOCK	2
>  
> +struct alloc_context; /* in mm/internal.h */
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> @@ -30,10 +32,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
>  			void __user *buffer, size_t *length, loff_t *ppos);
>  
>  extern int fragmentation_index(struct zone *zone, unsigned int order);
> -extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
> -			int order, gfp_t gfp_mask, nodemask_t *mask,
> -			enum migrate_mode mode, int *contended,
> -			int alloc_flags, int classzone_idx);
> +extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> +			int alloc_flags, const struct alloc_context *ac,
> +			enum migrate_mode mode, int *contended);
>  extern void compact_pgdat(pg_data_t *pgdat, int order);
>  extern void reset_isolation_suitable(pg_data_t *pgdat);
>  extern unsigned long compaction_suitable(struct zone *zone, int order,
> @@ -101,10 +102,10 @@ static inline bool compaction_restarting(struct zone *zone, int order)
>  }
>  
>  #else
> -static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> -			enum migrate_mode mode, int *contended,
> -			int alloc_flags, int classzone_idx)
> +static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> +			unsigned int order, int alloc_flags,
> +			const struct alloc_context *ac,
> +			enum migrate_mode mode, int *contended)
>  {
>  	return COMPACT_CONTINUE;
>  }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 546e571..9c7e690 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1335,22 +1335,20 @@ int sysctl_extfrag_threshold = 500;
>  
>  /**
>   * try_to_compact_pages - Direct compact to satisfy a high-order allocation
> - * @zonelist: The zonelist used for the current allocation
> - * @order: The order of the current allocation
>   * @gfp_mask: The GFP mask of the current allocation
> - * @nodemask: The allowed nodes to allocate from
> + * @order: The order of the current allocation
> + * @alloc_flags: The allocation flags of the current allocation
> + * @ac: The context of current allocation
>   * @mode: The migration mode for async, sync light, or sync migration
>   * @contended: Return value that determines if compaction was aborted due to
>   *	       need_resched() or lock contention
>   *
>   * This is the main entry point for direct page compaction.
>   */
> -unsigned long try_to_compact_pages(struct zonelist *zonelist,
> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> -			enum migrate_mode mode, int *contended,
> -			int alloc_flags, int classzone_idx)
> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> +			int alloc_flags, const struct alloc_context *ac,
> +			enum migrate_mode mode, int *contended)
>  {
> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>  	int may_enter_fs = gfp_mask & __GFP_FS;
>  	int may_perform_io = gfp_mask & __GFP_IO;
>  	struct zoneref *z;
> @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  		return COMPACT_SKIPPED;
>  
>  	/* Compact each zone in the list */
> -	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
> -								nodemask) {
> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
> +								ac->nodemask) {
>  		int status;
>  		int zone_contended;
>  
> @@ -1374,7 +1372,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			continue;
>  
>  		status = compact_zone_order(zone, order, gfp_mask, mode,
> -				&zone_contended, alloc_flags, classzone_idx);
> +				&zone_contended, alloc_flags,
> +				ac->classzone_idx);
>  		rc = max(status, rc);
>  		/*
>  		 * It takes at least one zone that wasn't lock contended
> @@ -1384,7 +1383,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  
>  		/* If a normal allocation would succeed, stop compacting */
>  		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
> -					classzone_idx, alloc_flags)) {
> +					ac->classzone_idx, alloc_flags)) {
>  			/*
>  			 * We think the allocation will succeed in this zone,
>  			 * but it is not certain, hence the false. The caller
> diff --git a/mm/internal.h b/mm/internal.h
> index efad241..cd5418b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -110,6 +110,20 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>   */
>  
>  /*
> + * Structure for holding the mostly immutable allocation parameters passed
> + * between functions involved in allocations, including the alloc_pages*
> + * family of functions.
> + */
> +struct alloc_context {
> +	struct zonelist *zonelist;
> +	nodemask_t *nodemask;
> +	struct zone *preferred_zone;
> +	int classzone_idx;
> +	int migratetype;
> +	enum zone_type high_zoneidx;
> +};
> +
> +/*
>   * Locate the struct page for both the matching buddy in our
>   * pair (buddy1) and the combined O(n+1) page they form (page).
>   *
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf0359c..f5f5e2a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -232,19 +232,6 @@ EXPORT_SYMBOL(nr_node_ids);
>  EXPORT_SYMBOL(nr_online_nodes);
>  #endif
>  
> -/*
> - * Structure for holding the mostly immutable allocation parameters passed
> - * between alloc_pages* family of functions.
> - */
> -struct alloc_context {
> -	struct zonelist *zonelist;
> -	nodemask_t *nodemask;
> -	struct zone *preferred_zone;
> -	int classzone_idx;
> -	int migratetype;
> -	enum zone_type high_zoneidx;
> -};
> -
>  int page_group_by_mobility_disabled __read_mostly;
>  
>  void set_pageblock_migratetype(struct page *page, int migratetype)
> @@ -2421,10 +2408,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  		return NULL;
>  
>  	current->flags |= PF_MEMALLOC;
> -	compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask,
> -						ac->nodemask, mode,
> -						contended_compaction,
> -						alloc_flags, ac->classzone_idx);
> +	compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
> +						mode, contended_compaction);
>  	current->flags &= ~PF_MEMALLOC;
>  
>  	switch (compact_result) {
> -- 
> 2.1.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka
  2015-01-06 14:53   ` Michal Hocko
@ 2015-01-06 14:57   ` Michal Hocko
  2015-01-07  9:11     ` Vlastimil Babka
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 14:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

Hmm, wait a minute

On Mon 05-01-15 18:17:42, Vlastimil Babka wrote:
[...]
> -unsigned long try_to_compact_pages(struct zonelist *zonelist,
> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> -			enum migrate_mode mode, int *contended,
> -			int alloc_flags, int classzone_idx)
> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> +			int alloc_flags, const struct alloc_context *ac,
> +			enum migrate_mode mode, int *contended)
>  {
> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>  	int may_enter_fs = gfp_mask & __GFP_FS;
>  	int may_perform_io = gfp_mask & __GFP_IO;
>  	struct zoneref *z;

gfp_mask might change since the high_zoneidx was set up in the call
chain. I guess this shouldn't change to the gfp_zone output but it is
worth double checking.

> @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  		return COMPACT_SKIPPED;
>  
>  	/* Compact each zone in the list */
> -	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
> -								nodemask) {
> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
> +								ac->nodemask) {
>  		int status;
>  		int zone_contended;
>  
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-06 14:57   ` Michal Hocko
@ 2015-01-07  9:11     ` Vlastimil Babka
  2015-01-07  9:18       ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-07  9:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On 01/06/2015 03:57 PM, Michal Hocko wrote:
> Hmm, wait a minute
> 
> On Mon 05-01-15 18:17:42, Vlastimil Babka wrote:
> [...]
>> -unsigned long try_to_compact_pages(struct zonelist *zonelist,
>> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
>> -			enum migrate_mode mode, int *contended,
>> -			int alloc_flags, int classzone_idx)
>> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>> +			int alloc_flags, const struct alloc_context *ac,
>> +			enum migrate_mode mode, int *contended)
>>  {
>> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>>  	int may_enter_fs = gfp_mask & __GFP_FS;
>>  	int may_perform_io = gfp_mask & __GFP_IO;
>>  	struct zoneref *z;
> 
> gfp_mask might change since the high_zoneidx was set up in the call
> chain. I guess this shouldn't change to the gfp_zone output but it is
> worth double checking.

Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags,
and we don't change that.

>> @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>>  		return COMPACT_SKIPPED;
>>  
>>  	/* Compact each zone in the list */
>> -	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>> -								nodemask) {
>> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
>> +								ac->nodemask) {
>>  		int status;
>>  		int zone_contended;
>>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-07  9:11     ` Vlastimil Babka
@ 2015-01-07  9:18       ` Michal Hocko
  2015-01-07  9:21         ` Vlastimil Babka
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-07  9:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Wed 07-01-15 10:11:45, Vlastimil Babka wrote:
> On 01/06/2015 03:57 PM, Michal Hocko wrote:
> > Hmm, wait a minute
> > 
> > On Mon 05-01-15 18:17:42, Vlastimil Babka wrote:
> > [...]
> >> -unsigned long try_to_compact_pages(struct zonelist *zonelist,
> >> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> >> -			enum migrate_mode mode, int *contended,
> >> -			int alloc_flags, int classzone_idx)
> >> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> >> +			int alloc_flags, const struct alloc_context *ac,
> >> +			enum migrate_mode mode, int *contended)
> >>  {
> >> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> >>  	int may_enter_fs = gfp_mask & __GFP_FS;
> >>  	int may_perform_io = gfp_mask & __GFP_IO;
> >>  	struct zoneref *z;
> > 
> > gfp_mask might change since the high_zoneidx was set up in the call
> > chain. I guess this shouldn't change to the gfp_zone output but it is
> > worth double checking.
> 
> Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags,
> and we don't change that.

That was my understanding as well. Maybe we want
VM_BUG_ON(gfp_zone(gfp_mask) != ac->high_zoneidx);
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters
  2015-01-07  9:18       ` Michal Hocko
@ 2015-01-07  9:21         ` Vlastimil Babka
  0 siblings, 0 replies; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-07  9:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On 01/07/2015 10:18 AM, Michal Hocko wrote:
> On Wed 07-01-15 10:11:45, Vlastimil Babka wrote:
>> On 01/06/2015 03:57 PM, Michal Hocko wrote:
>> > Hmm, wait a minute
>> > 
>> > On Mon 05-01-15 18:17:42, Vlastimil Babka wrote:
>> > [...]
>> >> -unsigned long try_to_compact_pages(struct zonelist *zonelist,
>> >> -			int order, gfp_t gfp_mask, nodemask_t *nodemask,
>> >> -			enum migrate_mode mode, int *contended,
>> >> -			int alloc_flags, int classzone_idx)
>> >> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>> >> +			int alloc_flags, const struct alloc_context *ac,
>> >> +			enum migrate_mode mode, int *contended)
>> >>  {
>> >> -	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>> >>  	int may_enter_fs = gfp_mask & __GFP_FS;
>> >>  	int may_perform_io = gfp_mask & __GFP_IO;
>> >>  	struct zoneref *z;
>> > 
>> > gfp_mask might change since the high_zoneidx was set up in the call
>> > chain. I guess this shouldn't change to the gfp_zone output but it is
>> > worth double checking.
>> 
>> Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags,
>> and we don't change that.
> 
> That was my understanding as well. Maybe we want
> VM_BUG_ON(gfp_zone(gfp_mask) != ac->high_zoneidx);

That sounds like an arbitrary overkill to me. I think that gfp_zone() was just
used here to save an extra parameter.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka
                   ` (2 preceding siblings ...)
  2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka
@ 2015-01-05 17:17 ` Vlastimil Babka
  2015-01-06 15:09   ` Michal Hocko
  3 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko

The function next_zones_zonelist() returns zoneref pointer, as well as zone
pointer via extra parameter. Since the latter can be trivially obtained by
dereferencing the former, the overhead of the extra parameter is unjustified.

This patch thus removes the zone parameter from next_zones_zonelist(). Both
callers happen to be in the same header file, so it's simple to add the
zoneref dereference inline. We save some bytes of code size.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-96 (-96)
function                                     old     new   delta
__alloc_pages_nodemask                      2182    2176      -6
nr_free_zone_pages                           129     115     -14
get_page_from_freelist                      2652    2576     -76

add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
function                                     old     new   delta
try_to_compact_pages                         569     579     +10

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
---
 include/linux/mmzone.h | 13 +++++++------
 mm/mmzone.c            |  4 +---
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2f0856d..a2884ef 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -970,7 +970,6 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
  * @z - The cursor used as a starting point for the search
  * @highest_zoneidx - The zone index of the highest zone to return
  * @nodes - An optional nodemask to filter the zonelist with
- * @zone - The first suitable zone found is returned via this parameter
  *
  * This function returns the next zone at or below a given zone index that is
  * within the allowed nodemask using a cursor as the starting point for the
@@ -980,8 +979,7 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
  */
 struct zoneref *next_zones_zonelist(struct zoneref *z,
 					enum zone_type highest_zoneidx,
-					nodemask_t *nodes,
-					struct zone **zone);
+					nodemask_t *nodes);
 
 /**
  * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
@@ -1000,8 +998,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 					nodemask_t *nodes,
 					struct zone **zone)
 {
-	return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes,
-								zone);
+	struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
+							highest_zoneidx, nodes);
+	*zone = zonelist_zone(z);
+	return z;
 }
 
 /**
@@ -1018,7 +1018,8 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
 	for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone);	\
 		zone;							\
-		z = next_zones_zonelist(++z, highidx, nodemask, &zone))	\
+		z = next_zones_zonelist(++z, highidx, nodemask),	\
+			zone = zonelist_zone(z))			\
 
 /**
  * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
diff --git a/mm/mmzone.c b/mm/mmzone.c
index bf34fb8..7d87ebb 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -54,8 +54,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
 /* Returns the next zone at or below highest_zoneidx in a zonelist */
 struct zoneref *next_zones_zonelist(struct zoneref *z,
 					enum zone_type highest_zoneidx,
-					nodemask_t *nodes,
-					struct zone **zone)
+					nodemask_t *nodes)
 {
 	/*
 	 * Find the next suitable zone to use for the allocation.
@@ -69,7 +68,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
 				(z->zone && !zref_in_nodemask(z, nodes)))
 			z++;
 
-	*zone = zonelist_zone(z);
 	return z;
 }
 
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka
@ 2015-01-06 15:09   ` Michal Hocko
  2015-01-07  9:15     ` Vlastimil Babka
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-06 15:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Mon 05-01-15 18:17:43, Vlastimil Babka wrote:
> The function next_zones_zonelist() returns zoneref pointer, as well as zone
> pointer via extra parameter. Since the latter can be trivially obtained by
> dereferencing the former, the overhead of the extra parameter is unjustified.
> 
> This patch thus removes the zone parameter from next_zones_zonelist(). Both
> callers happen to be in the same header file, so it's simple to add the
> zoneref dereference inline. We save some bytes of code size.

Dunno. It makes first_zones_zonelist and next_zones_zonelist look
different which might be a bit confusing. It's not a big deal but
I am not sure it is worth it.

> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-96 (-96)
> function                                     old     new   delta
> __alloc_pages_nodemask                      2182    2176      -6
> nr_free_zone_pages                           129     115     -14
> get_page_from_freelist                      2652    2576     -76
> 
> add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
> function                                     old     new   delta
> try_to_compact_pages                         569     579     +10
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> ---
>  include/linux/mmzone.h | 13 +++++++------
>  mm/mmzone.c            |  4 +---
>  2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2f0856d..a2884ef 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -970,7 +970,6 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
>   * @z - The cursor used as a starting point for the search
>   * @highest_zoneidx - The zone index of the highest zone to return
>   * @nodes - An optional nodemask to filter the zonelist with
> - * @zone - The first suitable zone found is returned via this parameter
>   *
>   * This function returns the next zone at or below a given zone index that is
>   * within the allowed nodemask using a cursor as the starting point for the
> @@ -980,8 +979,7 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
>   */
>  struct zoneref *next_zones_zonelist(struct zoneref *z,
>  					enum zone_type highest_zoneidx,
> -					nodemask_t *nodes,
> -					struct zone **zone);
> +					nodemask_t *nodes);
>  
>  /**
>   * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
> @@ -1000,8 +998,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
>  					nodemask_t *nodes,
>  					struct zone **zone)
>  {
> -	return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes,
> -								zone);
> +	struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
> +							highest_zoneidx, nodes);
> +	*zone = zonelist_zone(z);
> +	return z;
>  }
>  
>  /**
> @@ -1018,7 +1018,8 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
>  #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
>  	for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone);	\
>  		zone;							\
> -		z = next_zones_zonelist(++z, highidx, nodemask, &zone))	\
> +		z = next_zones_zonelist(++z, highidx, nodemask),	\
> +			zone = zonelist_zone(z))			\
>  
>  /**
>   * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
> diff --git a/mm/mmzone.c b/mm/mmzone.c
> index bf34fb8..7d87ebb 100644
> --- a/mm/mmzone.c
> +++ b/mm/mmzone.c
> @@ -54,8 +54,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
>  /* Returns the next zone at or below highest_zoneidx in a zonelist */
>  struct zoneref *next_zones_zonelist(struct zoneref *z,
>  					enum zone_type highest_zoneidx,
> -					nodemask_t *nodes,
> -					struct zone **zone)
> +					nodemask_t *nodes)
>  {
>  	/*
>  	 * Find the next suitable zone to use for the allocation.
> @@ -69,7 +68,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
>  				(z->zone && !zref_in_nodemask(z, nodes)))
>  			z++;
>  
> -	*zone = zonelist_zone(z);
>  	return z;
>  }
>  
> -- 
> 2.1.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-06 15:09   ` Michal Hocko
@ 2015-01-07  9:15     ` Vlastimil Babka
  2015-01-07 10:57       ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2015-01-07  9:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On 01/06/2015 04:09 PM, Michal Hocko wrote:
> On Mon 05-01-15 18:17:43, Vlastimil Babka wrote:
>> The function next_zones_zonelist() returns zoneref pointer, as well as zone
>> pointer via extra parameter. Since the latter can be trivially obtained by
>> dereferencing the former, the overhead of the extra parameter is unjustified.
>> 
>> This patch thus removes the zone parameter from next_zones_zonelist(). Both
>> callers happen to be in the same header file, so it's simple to add the
>> zoneref dereference inline. We save some bytes of code size.
> 
> Dunno. It makes first_zones_zonelist and next_zones_zonelist look
> different which might be a bit confusing. It's not a big deal but
> I am not sure it is worth it.

Yeah I thought that nobody uses them directly anyway thanks to
for_each_zone_zonelist* so it's not a big deal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-07  9:15     ` Vlastimil Babka
@ 2015-01-07 10:57       ` Michal Hocko
  2015-01-07 11:17         ` Mel Gorman
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2015-01-07 10:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei,
	Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V,
	Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim

On Wed 07-01-15 10:15:39, Vlastimil Babka wrote:
> On 01/06/2015 04:09 PM, Michal Hocko wrote:
> > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote:
> >> The function next_zones_zonelist() returns zoneref pointer, as well as zone
> >> pointer via extra parameter. Since the latter can be trivially obtained by
> >> dereferencing the former, the overhead of the extra parameter is unjustified.
> >> 
> >> This patch thus removes the zone parameter from next_zones_zonelist(). Both
> >> callers happen to be in the same header file, so it's simple to add the
> >> zoneref dereference inline. We save some bytes of code size.
> > 
> > Dunno. It makes first_zones_zonelist and next_zones_zonelist look
> > different which might be a bit confusing. It's not a big deal but
> > I am not sure it is worth it.
> 
> Yeah I thought that nobody uses them directly anyway thanks to
> for_each_zone_zonelist* so it's not a big deal.

OK, I have checked why we need the whole struct zoneref when it
only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains
structs with both a zone pointer and zone_idx) claims this will
reduce cache contention by reducing pointer chasing because we
do not have to dereference pgdat so often in hot paths. Fair
enough but I do not see any numbers in the changelog nor in the
original discussion (https://lkml.org/lkml/2007/11/20/547 resp.
https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the
benchmark which has shown the difference so that we can check whether
this is still relevant and caching the index is still worth it.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-07 10:57       ` Michal Hocko
@ 2015-01-07 11:17         ` Mel Gorman
  2015-01-07 14:47           ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2015-01-07 11:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel,
	Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner,
	Joonsoo Kim

On Wed, Jan 07, 2015 at 11:57:49AM +0100, Michal Hocko wrote:
> On Wed 07-01-15 10:15:39, Vlastimil Babka wrote:
> > On 01/06/2015 04:09 PM, Michal Hocko wrote:
> > > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote:
> > >> The function next_zones_zonelist() returns zoneref pointer, as well as zone
> > >> pointer via extra parameter. Since the latter can be trivially obtained by
> > >> dereferencing the former, the overhead of the extra parameter is unjustified.
> > >> 
> > >> This patch thus removes the zone parameter from next_zones_zonelist(). Both
> > >> callers happen to be in the same header file, so it's simple to add the
> > >> zoneref dereference inline. We save some bytes of code size.
> > > 
> > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look
> > > different which might be a bit confusing. It's not a big deal but
> > > I am not sure it is worth it.
> > 
> > Yeah I thought that nobody uses them directly anyway thanks to
> > for_each_zone_zonelist* so it's not a big deal.
> 
> OK, I have checked why we need the whole struct zoneref when it
> only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains
> structs with both a zone pointer and zone_idx) claims this will
> reduce cache contention by reducing pointer chasing because we
> do not have to dereference pgdat so often in hot paths. Fair
> enough but I do not see any numbers in the changelog nor in the
> original discussion (https://lkml.org/lkml/2007/11/20/547 resp.
> https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the
> benchmark which has shown the difference so that we can check whether
> this is still relevant and caching the index is still worth it.
> 

IIRC, the difference was a few percent on instruction profiles and cache
profiles when driven from a systemtap microbenchmark but I no longer have
the data and besides it would have been based on an ancient machine by
todays standards. When zeroing of pages is taken into account it's going
to be marginal so a userspace test would probably show nothing. Still,
I see little motivation to replace a single deference with multiple
dereferences and pointer arithmetic when zonelist_zone_idx() is called.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations
  2015-01-07 11:17         ` Mel Gorman
@ 2015-01-07 14:47           ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2015-01-07 14:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel,
	Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner,
	Joonsoo Kim

On Wed 07-01-15 11:17:07, Mel Gorman wrote:
> On Wed, Jan 07, 2015 at 11:57:49AM +0100, Michal Hocko wrote:
> > On Wed 07-01-15 10:15:39, Vlastimil Babka wrote:
> > > On 01/06/2015 04:09 PM, Michal Hocko wrote:
> > > > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote:
> > > >> The function next_zones_zonelist() returns zoneref pointer, as well as zone
> > > >> pointer via extra parameter. Since the latter can be trivially obtained by
> > > >> dereferencing the former, the overhead of the extra parameter is unjustified.
> > > >> 
> > > >> This patch thus removes the zone parameter from next_zones_zonelist(). Both
> > > >> callers happen to be in the same header file, so it's simple to add the
> > > >> zoneref dereference inline. We save some bytes of code size.
> > > > 
> > > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look
> > > > different which might be a bit confusing. It's not a big deal but
> > > > I am not sure it is worth it.
> > > 
> > > Yeah I thought that nobody uses them directly anyway thanks to
> > > for_each_zone_zonelist* so it's not a big deal.
> > 
> > OK, I have checked why we need the whole struct zoneref when it
> > only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains
> > structs with both a zone pointer and zone_idx) claims this will
> > reduce cache contention by reducing pointer chasing because we
> > do not have to dereference pgdat so often in hot paths. Fair
> > enough but I do not see any numbers in the changelog nor in the
> > original discussion (https://lkml.org/lkml/2007/11/20/547 resp.
> > https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the
> > benchmark which has shown the difference so that we can check whether
> > this is still relevant and caching the index is still worth it.
> > 
> 
> IIRC, the difference was a few percent on instruction profiles and cache
> profiles when driven from a systemtap microbenchmark but I no longer have
> the data and besides it would have been based on an ancient machine by
> todays standards. When zeroing of pages is taken into account it's going
> to be marginal so a userspace test would probably show nothing. Still,
> I see little motivation to replace a single deference with multiple
> dereferences and pointer arithmetic when zonelist_zone_idx() is called.

OK, fair enough. I have tried to convert back to simple zone * and it
turned out we wouldn't save too much code so this is really not worth
time and possible complications.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-01-07 14:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka
2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka
2015-01-06 14:30   ` Michal Hocko
2015-01-06 21:10     ` Vlastimil Babka
2015-01-06 21:44       ` Michal Hocko
2015-01-07  9:36         ` Vlastimil Babka
2015-01-07 10:54           ` Michal Hocko
2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka
2015-01-06 14:45   ` Michal Hocko
2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka
2015-01-06 14:53   ` Michal Hocko
2015-01-06 14:57   ` Michal Hocko
2015-01-07  9:11     ` Vlastimil Babka
2015-01-07  9:18       ` Michal Hocko
2015-01-07  9:21         ` Vlastimil Babka
2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka
2015-01-06 15:09   ` Michal Hocko
2015-01-07  9:15     ` Vlastimil Babka
2015-01-07 10:57       ` Michal Hocko
2015-01-07 11:17         ` Mel Gorman
2015-01-07 14:47           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).