* [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions @ 2015-01-05 17:17 Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Vlastimil Babka, Aneesh Kumar K.V, David Rientjes, Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov, Mel Gorman, Michal Hocko, Minchan Kim, Rik van Riel, Zhang Yanfei Changes since v3: o Moved struct alloc_context definition to mm/internal.h o Rebased on latest -next and re-measured. Sadly, the code/stack size improvements are smaller with the new baseline. The possibility of replacing the numerous parameters of alloc_pages* functions with a single structure has been discussed when Minchan proposed to expand the x86 kernel stack [1]. This series implements the change, along with few more cleanups/microoptimizations. The series is based on next-20150105 and I used gcc 4.8.3 20140627 on openSUSE 13.2 for compiling. Config includess NUMA and COMPACTION. The core change is the introduction of a new struct alloc_context, which looks like this: struct alloc_context { struct zonelist *zonelist; nodemask_t *nodemask; struct zone *preferred_zone; int classzone_idx; int migratetype; enum zone_type high_zoneidx; }; All the contents is mostly constant, except that __alloc_pages_slowpath() changes preferred_zone, classzone_idx and potentially zonelist. But that's not a problem in case control returns to retry_cpuset: in __alloc_pages_nodemask(), those will be reset to initial values again (although it's a bit subtle). On the other hand, gfp_flags and alloc_info mutate so much that it doesn't make sense to put them into alloc_context. Still, the result is one parameter instead of up to 7. This is all in Patch 2. Patch 3 is a step to expand alloc_context usage out of page_alloc.c itself. The function try_to_compact_pages() can also much benefit from the parameter reduction, but it means the struct definition has to be moved to a shared header. Patch 1 should IMHO be included even if the rest is deemed not useful enough. It improves maintainability and also has some code/stack reduction. Patch 4 is OTOH a tiny optimization. Overall bloat-o-meter results: add/remove: 0/1 grow/shrink: 1/3 up/down: 1587/-1941 (-354) function old new delta __alloc_pages_nodemask 589 2176 +1587 nr_free_zone_pages 129 115 -14 __alloc_pages_direct_compact 329 256 -73 get_page_from_freelist 2670 2576 -94 __alloc_pages_slowpath 1760 - -1760 try_to_compact_pages 582 579 -3 Overal bloat-o-meter with forced inline in baseline, for fair comparison: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-512 (-512) function old new delta nr_free_zone_pages 129 115 -14 __alloc_pages_direct_compact 329 256 -73 get_page_from_freelist 2670 2576 -94 __alloc_pages_nodemask 2507 2176 -331 try_to_compact_pages 582 579 -3 Overall stack sizes per ./scripts/checkstack.pl: old new delta __alloc_pages_slowpath 152 - -152 get_page_from_freelist: 184 184 0 __alloc_pages_nodemask 120 184 +64 __alloc_pages_direct_c 40 40 -40 try_to_compact_pages 72 72 0 -128 Again with forced inline on baseline: old new delta get_page_from_freelist: 184 184 0 __alloc_pages_nodemask 216 184 -32 __alloc_pages_direct_c 40 - -40 try_to_compact_pages 72 72 0 -72 [1] http://marc.info/?l=linux-mm&m=140142462528257&w=2 Vlastimil Babka (4): mm: set page->pfmemalloc in prep_new_page() mm, page_alloc: reduce number of alloc_pages* functions' parameters mm: reduce try_to_compact_pages parameters mm: microoptimize zonelist operations include/linux/compaction.h | 17 ++-- include/linux/mmzone.h | 13 +-- mm/compaction.c | 23 ++--- mm/internal.h | 14 +++ mm/mmzone.c | 4 +- mm/page_alloc.c | 245 +++++++++++++++++++-------------------------- 6 files changed, 144 insertions(+), 172 deletions(-) -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka @ 2015-01-05 17:17 ` Vlastimil Babka 2015-01-06 14:30 ` Michal Hocko 2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko The function prep_new_page() sets almost everything in the struct page of the page being allocated, except page->pfmemalloc. This is not obvious and has at least once led to a bug where page->pfmemalloc was forgotten to be set correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture of suitable high-order page"). This patch moves the pfmemalloc setting to prep_new_page(), which means it needs to gain alloc_flags parameter. The call to prep_new_page is moved from buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler code. An obsolete comment for buffered_rmqueue() is replaced. In addition to better maintainability there is a small reduction of code and stack usage for get_page_from_freelist(), which inlines the other functions involved. add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145) function old new delta get_page_from_freelist 2670 2525 -145 Stack usage is reduced from 184 to 168 bytes. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> --- mm/page_alloc.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1bb65e6..0c77a97 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -970,7 +970,8 @@ static inline int check_new_page(struct page *page) return 0; } -static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags) +static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, + int alloc_flags) { int i; @@ -994,6 +995,14 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags) set_page_owner(page, order, gfp_flags); + /* + * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was necessary to + * allocate the page. The expectation is that the caller is taking + * steps that will free more memory. The caller should avoid the page + * being used for !PFMEMALLOC purposes. + */ + page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS); + return 0; } @@ -1642,9 +1651,7 @@ int split_free_page(struct page *page) } /* - * Really, prep_compound_page() should be called from __rmqueue_bulk(). But - * we cheat by calling it from here, in the order > 0 path. Saves a branch - * or two. + * Allocate a page from the given zone. Use pcplists for order-0 allocations. */ static inline struct page *buffered_rmqueue(struct zone *preferred_zone, @@ -1655,7 +1662,6 @@ struct page *buffered_rmqueue(struct zone *preferred_zone, struct page *page; bool cold = ((gfp_flags & __GFP_COLD) != 0); -again: if (likely(order == 0)) { struct per_cpu_pages *pcp; struct list_head *list; @@ -1711,8 +1717,6 @@ again: local_irq_restore(flags); VM_BUG_ON_PAGE(bad_range(zone, page), page); - if (prep_new_page(page, order, gfp_flags)) - goto again; return page; failed: @@ -2177,25 +2181,16 @@ zonelist_scan: try_this_zone: page = buffered_rmqueue(preferred_zone, zone, order, gfp_mask, migratetype); - if (page) - break; + if (page) { + if (prep_new_page(page, order, gfp_mask, alloc_flags)) + goto try_this_zone; + return page; + } this_zone_full: if (IS_ENABLED(CONFIG_NUMA) && zlc_active) zlc_mark_zone_full(zonelist, z); } - if (page) { - /* - * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was - * necessary to allocate the page. The expectation is - * that the caller is taking steps that will free more - * memory. The caller should avoid the page being used - * for !PFMEMALLOC purposes. - */ - page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS); - return page; - } - /* * The first pass makes sure allocations are spread fairly within the * local node. However, the local node might have free pages left -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka @ 2015-01-06 14:30 ` Michal Hocko 2015-01-06 21:10 ` Vlastimil Babka 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-06 14:30 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Mon 05-01-15 18:17:40, Vlastimil Babka wrote: > The function prep_new_page() sets almost everything in the struct page of the > page being allocated, except page->pfmemalloc. This is not obvious and has at > least once led to a bug where page->pfmemalloc was forgotten to be set > correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture > of suitable high-order page"). > > This patch moves the pfmemalloc setting to prep_new_page(), which means it > needs to gain alloc_flags parameter. The call to prep_new_page is moved from > buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler > code. An obsolete comment for buffered_rmqueue() is replaced. > > In addition to better maintainability there is a small reduction of code and > stack usage for get_page_from_freelist(), which inlines the other functions > involved. > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145) > function old new delta > get_page_from_freelist 2670 2525 -145 > > Stack usage is reduced from 184 to 168 bytes. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Michal Hocko <mhocko@suse.cz> get_page_from_freelist has grown too hairy. I agree that it is tiny less confusing now because we are not breaking out of the loop in the successful case. Acked-by: Michal Hocko <mhocko@suse.cz> [...] > @@ -2177,25 +2181,16 @@ zonelist_scan: > try_this_zone: > page = buffered_rmqueue(preferred_zone, zone, order, > gfp_mask, migratetype); > - if (page) > - break; > + if (page) { > + if (prep_new_page(page, order, gfp_mask, alloc_flags)) > + goto try_this_zone; > + return page; > + } I would probably liked `do {} while ()' more because it wouldn't use the goto, but this is up to you: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1bb65e6f48dd..1682d766cb8e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2175,10 +2175,11 @@ zonelist_scan: } try_this_zone: - page = buffered_rmqueue(preferred_zone, zone, order, + do { + page = buffered_rmqueue(preferred_zone, zone, order, gfp_mask, migratetype); - if (page) - break; + } while (page && prep_new_page(page, order, gfp_mask, + alloc_flags)); this_zone_full: if (IS_ENABLED(CONFIG_NUMA) && zlc_active) zlc_mark_zone_full(zonelist, z); [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-06 14:30 ` Michal Hocko @ 2015-01-06 21:10 ` Vlastimil Babka 2015-01-06 21:44 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-06 21:10 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On 01/06/2015 03:30 PM, Michal Hocko wrote: > On Mon 05-01-15 18:17:40, Vlastimil Babka wrote: >> The function prep_new_page() sets almost everything in the struct page of the >> page being allocated, except page->pfmemalloc. This is not obvious and has at >> least once led to a bug where page->pfmemalloc was forgotten to be set >> correctly, see commit 8fb74b9fb2b1 ("mm: compaction: partially revert capture >> of suitable high-order page"). >> >> This patch moves the pfmemalloc setting to prep_new_page(), which means it >> needs to gain alloc_flags parameter. The call to prep_new_page is moved from >> buffered_rmqueue() to get_page_from_freelist(), which also leads to simpler >> code. An obsolete comment for buffered_rmqueue() is replaced. >> >> In addition to better maintainability there is a small reduction of code and >> stack usage for get_page_from_freelist(), which inlines the other functions >> involved. >> >> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145) >> function old new delta >> get_page_from_freelist 2670 2525 -145 >> >> Stack usage is reduced from 184 to 168 bytes. >> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> >> Cc: Mel Gorman <mgorman@suse.de> >> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> >> Cc: Minchan Kim <minchan@kernel.org> >> Cc: David Rientjes <rientjes@google.com> >> Cc: Rik van Riel <riel@redhat.com> >> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> >> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> >> Cc: Johannes Weiner <hannes@cmpxchg.org> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> >> Cc: Michal Hocko <mhocko@suse.cz> > > get_page_from_freelist has grown too hairy. I agree that it is tiny less > confusing now because we are not breaking out of the loop in the > successful case. Well, we are returning instead. So there's no more code to follow by anyone reading the function. > Acked-by: Michal Hocko <mhocko@suse.cz> > > [...] >> @@ -2177,25 +2181,16 @@ zonelist_scan: >> try_this_zone: >> page = buffered_rmqueue(preferred_zone, zone, order, >> gfp_mask, migratetype); >> - if (page) >> - break; >> + if (page) { >> + if (prep_new_page(page, order, gfp_mask, alloc_flags)) >> + goto try_this_zone; >> + return page; >> + } > > I would probably liked `do {} while ()' more because it wouldn't use the > goto, but this is up to you: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 1bb65e6f48dd..1682d766cb8e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2175,10 +2175,11 @@ zonelist_scan: > } > > try_this_zone: > - page = buffered_rmqueue(preferred_zone, zone, order, > + do { > + page = buffered_rmqueue(preferred_zone, zone, order, > gfp_mask, migratetype); > - if (page) > - break; > + } while (page && prep_new_page(page, order, gfp_mask, > + alloc_flags)); Hm but here we wouldn't return page on success. I wonder if you overlooked the return, hence your "not breaking out of the loop" remark? > this_zone_full: > if (IS_ENABLED(CONFIG_NUMA) && zlc_active) > zlc_mark_zone_full(zonelist, z); > > [...] > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-06 21:10 ` Vlastimil Babka @ 2015-01-06 21:44 ` Michal Hocko 2015-01-07 9:36 ` Vlastimil Babka 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-06 21:44 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Tue 06-01-15 22:10:55, Vlastimil Babka wrote: > On 01/06/2015 03:30 PM, Michal Hocko wrote: [...] > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 1bb65e6f48dd..1682d766cb8e 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2175,10 +2175,11 @@ zonelist_scan: > > } > > > > try_this_zone: > > - page = buffered_rmqueue(preferred_zone, zone, order, > > + do { > > + page = buffered_rmqueue(preferred_zone, zone, order, > > gfp_mask, migratetype); > > - if (page) > > - break; > > + } while (page && prep_new_page(page, order, gfp_mask, > > + alloc_flags)); > > Hm but here we wouldn't return page on success. Right. > I wonder if you overlooked the return, hence your "not breaking out of > the loop" remark? This was merely to show the intention. Sorry for not being clear enough. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-06 21:44 ` Michal Hocko @ 2015-01-07 9:36 ` Vlastimil Babka 2015-01-07 10:54 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-07 9:36 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On 01/06/2015 10:44 PM, Michal Hocko wrote: > On Tue 06-01-15 22:10:55, Vlastimil Babka wrote: >> On 01/06/2015 03:30 PM, Michal Hocko wrote: > [...] >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> > index 1bb65e6f48dd..1682d766cb8e 100644 >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -2175,10 +2175,11 @@ zonelist_scan: >> > } >> > >> > try_this_zone: >> > - page = buffered_rmqueue(preferred_zone, zone, order, >> > + do { >> > + page = buffered_rmqueue(preferred_zone, zone, order, >> > gfp_mask, migratetype); >> > - if (page) >> > - break; >> > + } while (page && prep_new_page(page, order, gfp_mask, >> > + alloc_flags)); >> >> Hm but here we wouldn't return page on success. > > Right. > >> I wonder if you overlooked the return, hence your "not breaking out of >> the loop" remark? > > This was merely to show the intention. Sorry for not being clear enough. OK, but I don't see other way than to follow this do-while with another if (page) return page; So I think it would be more complicated than now. We wouldn't even be able to remove the 'try_this_zone' label, since it's used for goto from elsewhere as well. Now that I'm thinking of it, maybe we should have a "goto zonelist_scan" there instead. We discard a bad page and might come below the watermarks. But the chances of this mattering are tiny I guess. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() 2015-01-07 9:36 ` Vlastimil Babka @ 2015-01-07 10:54 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2015-01-07 10:54 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Wed 07-01-15 10:36:58, Vlastimil Babka wrote: > On 01/06/2015 10:44 PM, Michal Hocko wrote: > > On Tue 06-01-15 22:10:55, Vlastimil Babka wrote: > >> On 01/06/2015 03:30 PM, Michal Hocko wrote: > > [...] > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> > index 1bb65e6f48dd..1682d766cb8e 100644 > >> > --- a/mm/page_alloc.c > >> > +++ b/mm/page_alloc.c > >> > @@ -2175,10 +2175,11 @@ zonelist_scan: > >> > } > >> > > >> > try_this_zone: > >> > - page = buffered_rmqueue(preferred_zone, zone, order, > >> > + do { > >> > + page = buffered_rmqueue(preferred_zone, zone, order, > >> > gfp_mask, migratetype); > >> > - if (page) > >> > - break; > >> > + } while (page && prep_new_page(page, order, gfp_mask, > >> > + alloc_flags)); > >> > >> Hm but here we wouldn't return page on success. > > > > Right. > > > >> I wonder if you overlooked the return, hence your "not breaking out of > >> the loop" remark? > > > > This was merely to show the intention. Sorry for not being clear enough. > > OK, but I don't see other way than to follow this do-while with another > > if (page) > return page; > > So I think it would be more complicated than now. We wouldn't even be able to > remove the 'try_this_zone' label, since it's used for goto from elsewhere as well. Getting rid of the label wasn't the intention. I just found the allocation retry easier to follow this way. I have no objection if you keep the code as is. > Now that I'm thinking of it, maybe we should have a "goto zonelist_scan" there > instead. We discard a bad page and might come below the watermarks. But the > chances of this mattering are tiny I guess. If anything this would be worth a separate patch. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters 2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka @ 2015-01-05 17:17 ` Vlastimil Babka 2015-01-06 14:45 ` Michal Hocko 2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka 3 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko Introduce struct alloc_context to accumulate the numerous parameters passed between the alloc_pages* family of functions and get_page_from_freelist(). This excludes gfp_flags and alloc_info, which mutate too much along the way, and allocation order, which is conceptually different. The result is shorter function signatures, as well as overal code size and stack usage reductions. bloat-o-meter: add/remove: 0/0 grow/shrink: 1/2 up/down: 127/-371 (-244) function old new delta get_page_from_freelist 2525 2652 +127 __alloc_pages_direct_compact 329 283 -46 __alloc_pages_nodemask 2507 2182 -325 checkstack.pl: function old new __alloc_pages_nodemask 216 184 get_page_from_freelist 168 184 __alloc_pages_direct_compact 40 24 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> --- mm/page_alloc.c | 221 +++++++++++++++++++++++++------------------------------- 1 file changed, 100 insertions(+), 121 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0c77a97..bf0359c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -232,6 +232,19 @@ EXPORT_SYMBOL(nr_node_ids); EXPORT_SYMBOL(nr_online_nodes); #endif +/* + * Structure for holding the mostly immutable allocation parameters passed + * between alloc_pages* family of functions. + */ +struct alloc_context { + struct zonelist *zonelist; + nodemask_t *nodemask; + struct zone *preferred_zone; + int classzone_idx; + int migratetype; + enum zone_type high_zoneidx; +}; + int page_group_by_mobility_disabled __read_mostly; void set_pageblock_migratetype(struct page *page, int migratetype) @@ -2037,10 +2050,10 @@ static void reset_alloc_batches(struct zone *preferred_zone) * a page. */ static struct page * -get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order, - struct zonelist *zonelist, int high_zoneidx, int alloc_flags, - struct zone *preferred_zone, int classzone_idx, int migratetype) +get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, + const struct alloc_context *ac) { + struct zonelist *zonelist = ac->zonelist; struct zoneref *z; struct page *page = NULL; struct zone *zone; @@ -2059,8 +2072,8 @@ zonelist_scan: * Scan zonelist, looking for a zone with enough free. * See also __cpuset_node_allowed() comment in kernel/cpuset.c. */ - for_each_zone_zonelist_nodemask(zone, z, zonelist, - high_zoneidx, nodemask) { + for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx, + ac->nodemask) { unsigned long mark; if (IS_ENABLED(CONFIG_NUMA) && zlc_active && @@ -2077,7 +2090,7 @@ zonelist_scan: * time the page has in memory before being reclaimed. */ if (alloc_flags & ALLOC_FAIR) { - if (!zone_local(preferred_zone, zone)) + if (!zone_local(ac->preferred_zone, zone)) break; if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) { nr_fair_skipped++; @@ -2115,7 +2128,7 @@ zonelist_scan: mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; if (!zone_watermark_ok(zone, order, mark, - classzone_idx, alloc_flags)) { + ac->classzone_idx, alloc_flags)) { int ret; /* Checked here to keep the fast path fast */ @@ -2136,7 +2149,7 @@ zonelist_scan: } if (zone_reclaim_mode == 0 || - !zone_allows_reclaim(preferred_zone, zone)) + !zone_allows_reclaim(ac->preferred_zone, zone)) goto this_zone_full; /* @@ -2158,7 +2171,7 @@ zonelist_scan: default: /* did we reclaim enough */ if (zone_watermark_ok(zone, order, mark, - classzone_idx, alloc_flags)) + ac->classzone_idx, alloc_flags)) goto try_this_zone; /* @@ -2179,8 +2192,8 @@ zonelist_scan: } try_this_zone: - page = buffered_rmqueue(preferred_zone, zone, order, - gfp_mask, migratetype); + page = buffered_rmqueue(ac->preferred_zone, zone, order, + gfp_mask, ac->migratetype); if (page) { if (prep_new_page(page, order, gfp_mask, alloc_flags)) goto try_this_zone; @@ -2203,7 +2216,7 @@ this_zone_full: alloc_flags &= ~ALLOC_FAIR; if (nr_fair_skipped) { zonelist_rescan = true; - reset_alloc_batches(preferred_zone); + reset_alloc_batches(ac->preferred_zone); } if (nr_online_nodes > 1) zonelist_rescan = true; @@ -2325,9 +2338,7 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order, static inline struct page * __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, struct zone *preferred_zone, - int classzone_idx, int migratetype, unsigned long *did_some_progress) + const struct alloc_context *ac, unsigned long *did_some_progress) { struct page *page; @@ -2340,7 +2351,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, * Acquire the per-zone oom lock for each zone. If that * fails, somebody else is making progress for us. */ - if (!oom_zonelist_trylock(zonelist, gfp_mask)) { + if (!oom_zonelist_trylock(ac->zonelist, gfp_mask)) { *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; @@ -2359,10 +2370,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, * here, this is only to catch a parallel oom killing, we must fail if * we're still under heavy pressure. */ - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, - order, zonelist, high_zoneidx, - ALLOC_WMARK_HIGH|ALLOC_CPUSET, - preferred_zone, classzone_idx, migratetype); + page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order, + ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); if (page) goto out; @@ -2374,7 +2383,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; /* The OOM killer does not needlessly kill tasks for lowmem */ - if (high_zoneidx < ZONE_NORMAL) + if (ac->high_zoneidx < ZONE_NORMAL) goto out; /* The OOM killer does not compensate for light reclaim */ if (!(gfp_mask & __GFP_FS)) @@ -2390,10 +2399,10 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; } /* Exhausted what can be done so it's blamo time */ - out_of_memory(zonelist, gfp_mask, order, nodemask, false); + out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false); *did_some_progress = 1; out: - oom_zonelist_unlock(zonelist, gfp_mask); + oom_zonelist_unlock(ac->zonelist, gfp_mask); return page; } @@ -2401,10 +2410,9 @@ out: /* Try memory compaction for high-order allocations before reclaim */ static struct page * __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, - int classzone_idx, int migratetype, enum migrate_mode mode, - int *contended_compaction, bool *deferred_compaction) + int alloc_flags, const struct alloc_context *ac, + enum migrate_mode mode, int *contended_compaction, + bool *deferred_compaction) { unsigned long compact_result; struct page *page; @@ -2413,10 +2421,10 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, return NULL; current->flags |= PF_MEMALLOC; - compact_result = try_to_compact_pages(zonelist, order, gfp_mask, - nodemask, mode, + compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask, + ac->nodemask, mode, contended_compaction, - alloc_flags, classzone_idx); + alloc_flags, ac->classzone_idx); current->flags &= ~PF_MEMALLOC; switch (compact_result) { @@ -2435,10 +2443,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, */ count_vm_event(COMPACTSTALL); - page = get_page_from_freelist(gfp_mask, nodemask, - order, zonelist, high_zoneidx, - alloc_flags & ~ALLOC_NO_WATERMARKS, - preferred_zone, classzone_idx, migratetype); + page = get_page_from_freelist(gfp_mask, order, + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); if (page) { struct zone *zone = page_zone(page); @@ -2462,10 +2468,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, #else static inline struct page * __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, - int classzone_idx, int migratetype, enum migrate_mode mode, - int *contended_compaction, bool *deferred_compaction) + int alloc_flags, const struct alloc_context *ac, + enum migrate_mode mode, int *contended_compaction, + bool *deferred_compaction) { return NULL; } @@ -2473,8 +2478,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, /* Perform direct synchronous page reclaim */ static int -__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, - nodemask_t *nodemask) +__perform_reclaim(gfp_t gfp_mask, unsigned int order, + const struct alloc_context *ac) { struct reclaim_state reclaim_state; int progress; @@ -2488,7 +2493,8 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, reclaim_state.reclaimed_slab = 0; current->reclaim_state = &reclaim_state; - progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask); + progress = try_to_free_pages(ac->zonelist, order, gfp_mask, + ac->nodemask); current->reclaim_state = NULL; lockdep_clear_current_reclaim_state(); @@ -2502,28 +2508,23 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, /* The really slow allocator path where we enter direct reclaim */ static inline struct page * __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, - int classzone_idx, int migratetype, unsigned long *did_some_progress) + int alloc_flags, const struct alloc_context *ac, + unsigned long *did_some_progress) { struct page *page = NULL; bool drained = false; - *did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, - nodemask); + *did_some_progress = __perform_reclaim(gfp_mask, order, ac); if (unlikely(!(*did_some_progress))) return NULL; /* After successful reclaim, reconsider all zones for allocation */ if (IS_ENABLED(CONFIG_NUMA)) - zlc_clear_zones_full(zonelist); + zlc_clear_zones_full(ac->zonelist); retry: - page = get_page_from_freelist(gfp_mask, nodemask, order, - zonelist, high_zoneidx, - alloc_flags & ~ALLOC_NO_WATERMARKS, - preferred_zone, classzone_idx, - migratetype); + page = get_page_from_freelist(gfp_mask, order, + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); /* * If an allocation failed after direct reclaim, it could be because @@ -2544,36 +2545,30 @@ retry: */ static inline struct page * __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, struct zone *preferred_zone, - int classzone_idx, int migratetype) + const struct alloc_context *ac) { struct page *page; do { - page = get_page_from_freelist(gfp_mask, nodemask, order, - zonelist, high_zoneidx, ALLOC_NO_WATERMARKS, - preferred_zone, classzone_idx, migratetype); + page = get_page_from_freelist(gfp_mask, order, + ALLOC_NO_WATERMARKS, ac); if (!page && gfp_mask & __GFP_NOFAIL) - wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); + wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, + HZ/50); } while (!page && (gfp_mask & __GFP_NOFAIL)); return page; } -static void wake_all_kswapds(unsigned int order, - struct zonelist *zonelist, - enum zone_type high_zoneidx, - struct zone *preferred_zone, - nodemask_t *nodemask) +static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac) { struct zoneref *z; struct zone *zone; - for_each_zone_zonelist_nodemask(zone, z, zonelist, - high_zoneidx, nodemask) - wakeup_kswapd(zone, order, zone_idx(preferred_zone)); + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, + ac->high_zoneidx, ac->nodemask) + wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone)); } static inline int @@ -2632,9 +2627,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, enum zone_type high_zoneidx, - nodemask_t *nodemask, struct zone *preferred_zone, - int classzone_idx, int migratetype) + struct alloc_context *ac) { const gfp_t wait = gfp_mask & __GFP_WAIT; struct page *page = NULL; @@ -2669,8 +2662,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; if (!(gfp_mask & __GFP_NO_KSWAPD)) - wake_all_kswapds(order, zonelist, high_zoneidx, - preferred_zone, nodemask); + wake_all_kswapds(order, ac); /* * OK, we're below the kswapd watermark and have kicked background @@ -2683,18 +2675,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * Find the true preferred zone if the allocation is unconstrained by * cpusets. */ - if (!(alloc_flags & ALLOC_CPUSET) && !nodemask) { + if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) { struct zoneref *preferred_zoneref; - preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx, - NULL, &preferred_zone); - classzone_idx = zonelist_zone_idx(preferred_zoneref); + preferred_zoneref = first_zones_zonelist(ac->zonelist, + ac->high_zoneidx, NULL, &ac->preferred_zone); + ac->classzone_idx = zonelist_zone_idx(preferred_zoneref); } rebalance: /* This is the last chance, in general, before the goto nopage. */ - page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, - high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, - preferred_zone, classzone_idx, migratetype); + page = get_page_from_freelist(gfp_mask, order, + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); if (page) goto got_pg; @@ -2705,11 +2696,10 @@ rebalance: * the allocation is high priority and these type of * allocations are system rather than user orientated */ - zonelist = node_zonelist(numa_node_id(), gfp_mask); + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); + + page = __alloc_pages_high_priority(gfp_mask, order, ac); - page = __alloc_pages_high_priority(gfp_mask, order, - zonelist, high_zoneidx, nodemask, - preferred_zone, classzone_idx, migratetype); if (page) { goto got_pg; } @@ -2738,11 +2728,9 @@ rebalance: * Try direct compaction. The first pass is asynchronous. Subsequent * attempts after direct reclaim are synchronous */ - page = __alloc_pages_direct_compact(gfp_mask, order, zonelist, - high_zoneidx, nodemask, alloc_flags, - preferred_zone, - classzone_idx, migratetype, - migration_mode, &contended_compaction, + page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, + migration_mode, + &contended_compaction, &deferred_compaction); if (page) goto got_pg; @@ -2788,12 +2776,8 @@ rebalance: migration_mode = MIGRATE_SYNC_LIGHT; /* Try direct reclaim and then allocating */ - page = __alloc_pages_direct_reclaim(gfp_mask, order, - zonelist, high_zoneidx, - nodemask, - alloc_flags, preferred_zone, - classzone_idx, migratetype, - &did_some_progress); + page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, + &did_some_progress); if (page) goto got_pg; @@ -2807,10 +2791,8 @@ rebalance: * start OOM killing tasks. */ if (!did_some_progress) { - page = __alloc_pages_may_oom(gfp_mask, order, zonelist, - high_zoneidx, nodemask, - preferred_zone, classzone_idx, - migratetype,&did_some_progress); + page = __alloc_pages_may_oom(gfp_mask, order, ac, + &did_some_progress); if (page) goto got_pg; if (!did_some_progress) { @@ -2819,7 +2801,7 @@ rebalance: } } /* Wait for some write requests to complete then retry */ - wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); + wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50); goto rebalance; } else { /* @@ -2827,11 +2809,9 @@ rebalance: * direct reclaim and reclaim/compaction depends on compaction * being called after reclaim so call directly if necessary */ - page = __alloc_pages_direct_compact(gfp_mask, order, zonelist, - high_zoneidx, nodemask, alloc_flags, - preferred_zone, - classzone_idx, migratetype, - migration_mode, &contended_compaction, + page = __alloc_pages_direct_compact(gfp_mask, order, + alloc_flags, ac, migration_mode, + &contended_compaction, &deferred_compaction); if (page) goto got_pg; @@ -2854,15 +2834,16 @@ struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { - enum zone_type high_zoneidx = gfp_zone(gfp_mask); - struct zone *preferred_zone; struct zoneref *preferred_zoneref; struct page *page = NULL; - int migratetype = gfpflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - int classzone_idx; gfp_t mask; + struct alloc_context ac = { + .high_zoneidx = gfp_zone(gfp_mask), + .nodemask = nodemask, + .migratetype = gfpflags_to_migratetype(gfp_mask), + }; gfp_mask &= gfp_allowed_mask; @@ -2881,25 +2862,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE) + if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE) alloc_flags |= ALLOC_CMA; retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); + /* We set it here, as __alloc_pages_slowpath might have changed it */ + ac.zonelist = zonelist; /* The preferred zone is used for statistics later */ - preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx, - nodemask ? : &cpuset_current_mems_allowed, - &preferred_zone); - if (!preferred_zone) + preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx, + ac.nodemask ? : &cpuset_current_mems_allowed, + &ac.preferred_zone); + if (!ac.preferred_zone) goto out; - classzone_idx = zonelist_zone_idx(preferred_zoneref); + ac.classzone_idx = zonelist_zone_idx(preferred_zoneref); /* First allocation attempt */ mask = gfp_mask|__GFP_HARDWALL; - page = get_page_from_freelist(mask, nodemask, order, zonelist, - high_zoneidx, alloc_flags, preferred_zone, - classzone_idx, migratetype); + page = get_page_from_freelist(mask, order, alloc_flags, &ac); if (unlikely(!page)) { /* * Runtime PM, block IO and its error handling path @@ -2908,12 +2889,10 @@ retry_cpuset: */ mask = memalloc_noio_flags(gfp_mask); - page = __alloc_pages_slowpath(mask, order, - zonelist, high_zoneidx, nodemask, - preferred_zone, classzone_idx, migratetype); + page = __alloc_pages_slowpath(mask, order, &ac); } - trace_mm_page_alloc(page, order, mask, migratetype); + trace_mm_page_alloc(page, order, mask, ac.migratetype); out: /* -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters 2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka @ 2015-01-06 14:45 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2015-01-06 14:45 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Mon 05-01-15 18:17:41, Vlastimil Babka wrote: > Introduce struct alloc_context to accumulate the numerous parameters passed > between the alloc_pages* family of functions and get_page_from_freelist(). > This excludes gfp_flags and alloc_info, which mutate too much along the way, > and allocation order, which is conceptually different. > > The result is shorter function signatures, as well as overal code size and > stack usage reductions. > > bloat-o-meter: > > add/remove: 0/0 grow/shrink: 1/2 up/down: 127/-371 (-244) > function old new delta > get_page_from_freelist 2525 2652 +127 > __alloc_pages_direct_compact 329 283 -46 > __alloc_pages_nodemask 2507 2182 -325 > > checkstack.pl: > > function old new > __alloc_pages_nodemask 216 184 > get_page_from_freelist 168 184 > __alloc_pages_direct_compact 40 24 > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Michal Hocko <mhocko@suse.cz> Looks good to me. I would just mention fields which might be reseted and where. Acked-by: Michal Hocko <mhocko@suse.cz> > --- > mm/page_alloc.c | 221 +++++++++++++++++++++++++------------------------------- > 1 file changed, 100 insertions(+), 121 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0c77a97..bf0359c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -232,6 +232,19 @@ EXPORT_SYMBOL(nr_node_ids); > EXPORT_SYMBOL(nr_online_nodes); > #endif > > +/* > + * Structure for holding the mostly immutable allocation parameters passed > + * between alloc_pages* family of functions. > + */ > +struct alloc_context { > + struct zonelist *zonelist; > + nodemask_t *nodemask; > + struct zone *preferred_zone; > + int classzone_idx; > + int migratetype; > + enum zone_type high_zoneidx; > +}; > + > int page_group_by_mobility_disabled __read_mostly; > > void set_pageblock_migratetype(struct page *page, int migratetype) > @@ -2037,10 +2050,10 @@ static void reset_alloc_batches(struct zone *preferred_zone) > * a page. > */ > static struct page * > -get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order, > - struct zonelist *zonelist, int high_zoneidx, int alloc_flags, > - struct zone *preferred_zone, int classzone_idx, int migratetype) > +get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > + const struct alloc_context *ac) > { > + struct zonelist *zonelist = ac->zonelist; > struct zoneref *z; > struct page *page = NULL; > struct zone *zone; > @@ -2059,8 +2072,8 @@ zonelist_scan: > * Scan zonelist, looking for a zone with enough free. > * See also __cpuset_node_allowed() comment in kernel/cpuset.c. > */ > - for_each_zone_zonelist_nodemask(zone, z, zonelist, > - high_zoneidx, nodemask) { > + for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx, > + ac->nodemask) { > unsigned long mark; > > if (IS_ENABLED(CONFIG_NUMA) && zlc_active && > @@ -2077,7 +2090,7 @@ zonelist_scan: > * time the page has in memory before being reclaimed. > */ > if (alloc_flags & ALLOC_FAIR) { > - if (!zone_local(preferred_zone, zone)) > + if (!zone_local(ac->preferred_zone, zone)) > break; > if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) { > nr_fair_skipped++; > @@ -2115,7 +2128,7 @@ zonelist_scan: > > mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; > if (!zone_watermark_ok(zone, order, mark, > - classzone_idx, alloc_flags)) { > + ac->classzone_idx, alloc_flags)) { > int ret; > > /* Checked here to keep the fast path fast */ > @@ -2136,7 +2149,7 @@ zonelist_scan: > } > > if (zone_reclaim_mode == 0 || > - !zone_allows_reclaim(preferred_zone, zone)) > + !zone_allows_reclaim(ac->preferred_zone, zone)) > goto this_zone_full; > > /* > @@ -2158,7 +2171,7 @@ zonelist_scan: > default: > /* did we reclaim enough */ > if (zone_watermark_ok(zone, order, mark, > - classzone_idx, alloc_flags)) > + ac->classzone_idx, alloc_flags)) > goto try_this_zone; > > /* > @@ -2179,8 +2192,8 @@ zonelist_scan: > } > > try_this_zone: > - page = buffered_rmqueue(preferred_zone, zone, order, > - gfp_mask, migratetype); > + page = buffered_rmqueue(ac->preferred_zone, zone, order, > + gfp_mask, ac->migratetype); > if (page) { > if (prep_new_page(page, order, gfp_mask, alloc_flags)) > goto try_this_zone; > @@ -2203,7 +2216,7 @@ this_zone_full: > alloc_flags &= ~ALLOC_FAIR; > if (nr_fair_skipped) { > zonelist_rescan = true; > - reset_alloc_batches(preferred_zone); > + reset_alloc_batches(ac->preferred_zone); > } > if (nr_online_nodes > 1) > zonelist_rescan = true; > @@ -2325,9 +2338,7 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order, > > static inline struct page * > __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, struct zone *preferred_zone, > - int classzone_idx, int migratetype, unsigned long *did_some_progress) > + const struct alloc_context *ac, unsigned long *did_some_progress) > { > struct page *page; > > @@ -2340,7 +2351,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > * Acquire the per-zone oom lock for each zone. If that > * fails, somebody else is making progress for us. > */ > - if (!oom_zonelist_trylock(zonelist, gfp_mask)) { > + if (!oom_zonelist_trylock(ac->zonelist, gfp_mask)) { > *did_some_progress = 1; > schedule_timeout_uninterruptible(1); > return NULL; > @@ -2359,10 +2370,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > * here, this is only to catch a parallel oom killing, we must fail if > * we're still under heavy pressure. > */ > - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, > - order, zonelist, high_zoneidx, > - ALLOC_WMARK_HIGH|ALLOC_CPUSET, > - preferred_zone, classzone_idx, migratetype); > + page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order, > + ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); > if (page) > goto out; > > @@ -2374,7 +2383,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > if (order > PAGE_ALLOC_COSTLY_ORDER) > goto out; > /* The OOM killer does not needlessly kill tasks for lowmem */ > - if (high_zoneidx < ZONE_NORMAL) > + if (ac->high_zoneidx < ZONE_NORMAL) > goto out; > /* The OOM killer does not compensate for light reclaim */ > if (!(gfp_mask & __GFP_FS)) > @@ -2390,10 +2399,10 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > } > /* Exhausted what can be done so it's blamo time */ > - out_of_memory(zonelist, gfp_mask, order, nodemask, false); > + out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false); > *did_some_progress = 1; > out: > - oom_zonelist_unlock(zonelist, gfp_mask); > + oom_zonelist_unlock(ac->zonelist, gfp_mask); > return page; > } > > @@ -2401,10 +2410,9 @@ out: > /* Try memory compaction for high-order allocations before reclaim */ > static struct page * > __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, > - int classzone_idx, int migratetype, enum migrate_mode mode, > - int *contended_compaction, bool *deferred_compaction) > + int alloc_flags, const struct alloc_context *ac, > + enum migrate_mode mode, int *contended_compaction, > + bool *deferred_compaction) > { > unsigned long compact_result; > struct page *page; > @@ -2413,10 +2421,10 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > return NULL; > > current->flags |= PF_MEMALLOC; > - compact_result = try_to_compact_pages(zonelist, order, gfp_mask, > - nodemask, mode, > + compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask, > + ac->nodemask, mode, > contended_compaction, > - alloc_flags, classzone_idx); > + alloc_flags, ac->classzone_idx); > current->flags &= ~PF_MEMALLOC; > > switch (compact_result) { > @@ -2435,10 +2443,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > */ > count_vm_event(COMPACTSTALL); > > - page = get_page_from_freelist(gfp_mask, nodemask, > - order, zonelist, high_zoneidx, > - alloc_flags & ~ALLOC_NO_WATERMARKS, > - preferred_zone, classzone_idx, migratetype); > + page = get_page_from_freelist(gfp_mask, order, > + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); > > if (page) { > struct zone *zone = page_zone(page); > @@ -2462,10 +2468,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > #else > static inline struct page * > __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, > - int classzone_idx, int migratetype, enum migrate_mode mode, > - int *contended_compaction, bool *deferred_compaction) > + int alloc_flags, const struct alloc_context *ac, > + enum migrate_mode mode, int *contended_compaction, > + bool *deferred_compaction) > { > return NULL; > } > @@ -2473,8 +2478,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > > /* Perform direct synchronous page reclaim */ > static int > -__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, > - nodemask_t *nodemask) > +__perform_reclaim(gfp_t gfp_mask, unsigned int order, > + const struct alloc_context *ac) > { > struct reclaim_state reclaim_state; > int progress; > @@ -2488,7 +2493,8 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, > reclaim_state.reclaimed_slab = 0; > current->reclaim_state = &reclaim_state; > > - progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask); > + progress = try_to_free_pages(ac->zonelist, order, gfp_mask, > + ac->nodemask); > > current->reclaim_state = NULL; > lockdep_clear_current_reclaim_state(); > @@ -2502,28 +2508,23 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, > /* The really slow allocator path where we enter direct reclaim */ > static inline struct page * > __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, > - int classzone_idx, int migratetype, unsigned long *did_some_progress) > + int alloc_flags, const struct alloc_context *ac, > + unsigned long *did_some_progress) > { > struct page *page = NULL; > bool drained = false; > > - *did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, > - nodemask); > + *did_some_progress = __perform_reclaim(gfp_mask, order, ac); > if (unlikely(!(*did_some_progress))) > return NULL; > > /* After successful reclaim, reconsider all zones for allocation */ > if (IS_ENABLED(CONFIG_NUMA)) > - zlc_clear_zones_full(zonelist); > + zlc_clear_zones_full(ac->zonelist); > > retry: > - page = get_page_from_freelist(gfp_mask, nodemask, order, > - zonelist, high_zoneidx, > - alloc_flags & ~ALLOC_NO_WATERMARKS, > - preferred_zone, classzone_idx, > - migratetype); > + page = get_page_from_freelist(gfp_mask, order, > + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); > > /* > * If an allocation failed after direct reclaim, it could be because > @@ -2544,36 +2545,30 @@ retry: > */ > static inline struct page * > __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, struct zone *preferred_zone, > - int classzone_idx, int migratetype) > + const struct alloc_context *ac) > { > struct page *page; > > do { > - page = get_page_from_freelist(gfp_mask, nodemask, order, > - zonelist, high_zoneidx, ALLOC_NO_WATERMARKS, > - preferred_zone, classzone_idx, migratetype); > + page = get_page_from_freelist(gfp_mask, order, > + ALLOC_NO_WATERMARKS, ac); > > if (!page && gfp_mask & __GFP_NOFAIL) > - wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > + wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, > + HZ/50); > } while (!page && (gfp_mask & __GFP_NOFAIL)); > > return page; > } > > -static void wake_all_kswapds(unsigned int order, > - struct zonelist *zonelist, > - enum zone_type high_zoneidx, > - struct zone *preferred_zone, > - nodemask_t *nodemask) > +static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac) > { > struct zoneref *z; > struct zone *zone; > > - for_each_zone_zonelist_nodemask(zone, z, zonelist, > - high_zoneidx, nodemask) > - wakeup_kswapd(zone, order, zone_idx(preferred_zone)); > + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, > + ac->high_zoneidx, ac->nodemask) > + wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone)); > } > > static inline int > @@ -2632,9 +2627,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > static inline struct page * > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > - struct zonelist *zonelist, enum zone_type high_zoneidx, > - nodemask_t *nodemask, struct zone *preferred_zone, > - int classzone_idx, int migratetype) > + struct alloc_context *ac) > { > const gfp_t wait = gfp_mask & __GFP_WAIT; > struct page *page = NULL; > @@ -2669,8 +2662,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto nopage; > > if (!(gfp_mask & __GFP_NO_KSWAPD)) > - wake_all_kswapds(order, zonelist, high_zoneidx, > - preferred_zone, nodemask); > + wake_all_kswapds(order, ac); > > /* > * OK, we're below the kswapd watermark and have kicked background > @@ -2683,18 +2675,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * Find the true preferred zone if the allocation is unconstrained by > * cpusets. > */ > - if (!(alloc_flags & ALLOC_CPUSET) && !nodemask) { > + if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) { > struct zoneref *preferred_zoneref; > - preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx, > - NULL, &preferred_zone); > - classzone_idx = zonelist_zone_idx(preferred_zoneref); > + preferred_zoneref = first_zones_zonelist(ac->zonelist, > + ac->high_zoneidx, NULL, &ac->preferred_zone); > + ac->classzone_idx = zonelist_zone_idx(preferred_zoneref); > } > > rebalance: > /* This is the last chance, in general, before the goto nopage. */ > - page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, > - high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, > - preferred_zone, classzone_idx, migratetype); > + page = get_page_from_freelist(gfp_mask, order, > + alloc_flags & ~ALLOC_NO_WATERMARKS, ac); > if (page) > goto got_pg; > > @@ -2705,11 +2696,10 @@ rebalance: > * the allocation is high priority and these type of > * allocations are system rather than user orientated > */ > - zonelist = node_zonelist(numa_node_id(), gfp_mask); > + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); > + > + page = __alloc_pages_high_priority(gfp_mask, order, ac); > > - page = __alloc_pages_high_priority(gfp_mask, order, > - zonelist, high_zoneidx, nodemask, > - preferred_zone, classzone_idx, migratetype); > if (page) { > goto got_pg; > } > @@ -2738,11 +2728,9 @@ rebalance: > * Try direct compaction. The first pass is asynchronous. Subsequent > * attempts after direct reclaim are synchronous > */ > - page = __alloc_pages_direct_compact(gfp_mask, order, zonelist, > - high_zoneidx, nodemask, alloc_flags, > - preferred_zone, > - classzone_idx, migratetype, > - migration_mode, &contended_compaction, > + page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, > + migration_mode, > + &contended_compaction, > &deferred_compaction); > if (page) > goto got_pg; > @@ -2788,12 +2776,8 @@ rebalance: > migration_mode = MIGRATE_SYNC_LIGHT; > > /* Try direct reclaim and then allocating */ > - page = __alloc_pages_direct_reclaim(gfp_mask, order, > - zonelist, high_zoneidx, > - nodemask, > - alloc_flags, preferred_zone, > - classzone_idx, migratetype, > - &did_some_progress); > + page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, > + &did_some_progress); > if (page) > goto got_pg; > > @@ -2807,10 +2791,8 @@ rebalance: > * start OOM killing tasks. > */ > if (!did_some_progress) { > - page = __alloc_pages_may_oom(gfp_mask, order, zonelist, > - high_zoneidx, nodemask, > - preferred_zone, classzone_idx, > - migratetype,&did_some_progress); > + page = __alloc_pages_may_oom(gfp_mask, order, ac, > + &did_some_progress); > if (page) > goto got_pg; > if (!did_some_progress) { > @@ -2819,7 +2801,7 @@ rebalance: > } > } > /* Wait for some write requests to complete then retry */ > - wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > + wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50); > goto rebalance; > } else { > /* > @@ -2827,11 +2809,9 @@ rebalance: > * direct reclaim and reclaim/compaction depends on compaction > * being called after reclaim so call directly if necessary > */ > - page = __alloc_pages_direct_compact(gfp_mask, order, zonelist, > - high_zoneidx, nodemask, alloc_flags, > - preferred_zone, > - classzone_idx, migratetype, > - migration_mode, &contended_compaction, > + page = __alloc_pages_direct_compact(gfp_mask, order, > + alloc_flags, ac, migration_mode, > + &contended_compaction, > &deferred_compaction); > if (page) > goto got_pg; > @@ -2854,15 +2834,16 @@ struct page * > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > struct zonelist *zonelist, nodemask_t *nodemask) > { > - enum zone_type high_zoneidx = gfp_zone(gfp_mask); > - struct zone *preferred_zone; > struct zoneref *preferred_zoneref; > struct page *page = NULL; > - int migratetype = gfpflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; > - int classzone_idx; > gfp_t mask; > + struct alloc_context ac = { > + .high_zoneidx = gfp_zone(gfp_mask), > + .nodemask = nodemask, > + .migratetype = gfpflags_to_migratetype(gfp_mask), > + }; > > gfp_mask &= gfp_allowed_mask; > > @@ -2881,25 +2862,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > if (unlikely(!zonelist->_zonerefs->zone)) > return NULL; > > - if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE) > + if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE) > alloc_flags |= ALLOC_CMA; > > retry_cpuset: > cpuset_mems_cookie = read_mems_allowed_begin(); > > + /* We set it here, as __alloc_pages_slowpath might have changed it */ > + ac.zonelist = zonelist; > /* The preferred zone is used for statistics later */ > - preferred_zoneref = first_zones_zonelist(zonelist, high_zoneidx, > - nodemask ? : &cpuset_current_mems_allowed, > - &preferred_zone); > - if (!preferred_zone) > + preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx, > + ac.nodemask ? : &cpuset_current_mems_allowed, > + &ac.preferred_zone); > + if (!ac.preferred_zone) > goto out; > - classzone_idx = zonelist_zone_idx(preferred_zoneref); > + ac.classzone_idx = zonelist_zone_idx(preferred_zoneref); > > /* First allocation attempt */ > mask = gfp_mask|__GFP_HARDWALL; > - page = get_page_from_freelist(mask, nodemask, order, zonelist, > - high_zoneidx, alloc_flags, preferred_zone, > - classzone_idx, migratetype); > + page = get_page_from_freelist(mask, order, alloc_flags, &ac); > if (unlikely(!page)) { > /* > * Runtime PM, block IO and its error handling path > @@ -2908,12 +2889,10 @@ retry_cpuset: > */ > mask = memalloc_noio_flags(gfp_mask); > > - page = __alloc_pages_slowpath(mask, order, > - zonelist, high_zoneidx, nodemask, > - preferred_zone, classzone_idx, migratetype); > + page = __alloc_pages_slowpath(mask, order, &ac); > } > > - trace_mm_page_alloc(page, order, mask, migratetype); > + trace_mm_page_alloc(page, order, mask, ac.migratetype); > > out: > /* > -- > 2.1.2 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka @ 2015-01-05 17:17 ` Vlastimil Babka 2015-01-06 14:53 ` Michal Hocko 2015-01-06 14:57 ` Michal Hocko 2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka 3 siblings, 2 replies; 21+ messages in thread From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko Expand the usage of the struct alloc_context introduced in the previous patch also for calling try_to_compact_pages(), to reduce the number of its parameters. Since the function is in different compilation unit, we need to move alloc_context definition in the shared mm/internal.h header. With this change we get simpler code and small savings of code size and stack usage: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27) function old new delta __alloc_pages_direct_compact 283 256 -27 add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13) function old new delta try_to_compact_pages 582 569 -13 Stack usage of __alloc_pages_direct_compact goes from 24 to none (per scripts/checkstack.pl). Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> --- include/linux/compaction.h | 17 +++++++++-------- mm/compaction.c | 23 +++++++++++------------ mm/internal.h | 14 ++++++++++++++ mm/page_alloc.c | 19 ++----------------- 4 files changed, 36 insertions(+), 37 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 3238ffa..f2efda2 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -21,6 +21,8 @@ /* Zone lock or lru_lock was contended in async compaction */ #define COMPACT_CONTENDED_LOCK 2 +struct alloc_context; /* in mm/internal.h */ + #ifdef CONFIG_COMPACTION extern int sysctl_compact_memory; extern int sysctl_compaction_handler(struct ctl_table *table, int write, @@ -30,10 +32,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); extern int fragmentation_index(struct zone *zone, unsigned int order); -extern unsigned long try_to_compact_pages(struct zonelist *zonelist, - int order, gfp_t gfp_mask, nodemask_t *mask, - enum migrate_mode mode, int *contended, - int alloc_flags, int classzone_idx); +extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, + int alloc_flags, const struct alloc_context *ac, + enum migrate_mode mode, int *contended); extern void compact_pgdat(pg_data_t *pgdat, int order); extern void reset_isolation_suitable(pg_data_t *pgdat); extern unsigned long compaction_suitable(struct zone *zone, int order, @@ -101,10 +102,10 @@ static inline bool compaction_restarting(struct zone *zone, int order) } #else -static inline unsigned long try_to_compact_pages(struct zonelist *zonelist, - int order, gfp_t gfp_mask, nodemask_t *nodemask, - enum migrate_mode mode, int *contended, - int alloc_flags, int classzone_idx) +static inline unsigned long try_to_compact_pages(gfp_t gfp_mask, + unsigned int order, int alloc_flags, + const struct alloc_context *ac, + enum migrate_mode mode, int *contended) { return COMPACT_CONTINUE; } diff --git a/mm/compaction.c b/mm/compaction.c index 546e571..9c7e690 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1335,22 +1335,20 @@ int sysctl_extfrag_threshold = 500; /** * try_to_compact_pages - Direct compact to satisfy a high-order allocation - * @zonelist: The zonelist used for the current allocation - * @order: The order of the current allocation * @gfp_mask: The GFP mask of the current allocation - * @nodemask: The allowed nodes to allocate from + * @order: The order of the current allocation + * @alloc_flags: The allocation flags of the current allocation + * @ac: The context of current allocation * @mode: The migration mode for async, sync light, or sync migration * @contended: Return value that determines if compaction was aborted due to * need_resched() or lock contention * * This is the main entry point for direct page compaction. */ -unsigned long try_to_compact_pages(struct zonelist *zonelist, - int order, gfp_t gfp_mask, nodemask_t *nodemask, - enum migrate_mode mode, int *contended, - int alloc_flags, int classzone_idx) +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, + int alloc_flags, const struct alloc_context *ac, + enum migrate_mode mode, int *contended) { - enum zone_type high_zoneidx = gfp_zone(gfp_mask); int may_enter_fs = gfp_mask & __GFP_FS; int may_perform_io = gfp_mask & __GFP_IO; struct zoneref *z; @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, return COMPACT_SKIPPED; /* Compact each zone in the list */ - for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, - nodemask) { + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, + ac->nodemask) { int status; int zone_contended; @@ -1374,7 +1372,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, continue; status = compact_zone_order(zone, order, gfp_mask, mode, - &zone_contended, alloc_flags, classzone_idx); + &zone_contended, alloc_flags, + ac->classzone_idx); rc = max(status, rc); /* * It takes at least one zone that wasn't lock contended @@ -1384,7 +1383,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, /* If a normal allocation would succeed, stop compacting */ if (zone_watermark_ok(zone, order, low_wmark_pages(zone), - classzone_idx, alloc_flags)) { + ac->classzone_idx, alloc_flags)) { /* * We think the allocation will succeed in this zone, * but it is not certain, hence the false. The caller diff --git a/mm/internal.h b/mm/internal.h index efad241..cd5418b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -110,6 +110,20 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); */ /* + * Structure for holding the mostly immutable allocation parameters passed + * between functions involved in allocations, including the alloc_pages* + * family of functions. + */ +struct alloc_context { + struct zonelist *zonelist; + nodemask_t *nodemask; + struct zone *preferred_zone; + int classzone_idx; + int migratetype; + enum zone_type high_zoneidx; +}; + +/* * Locate the struct page for both the matching buddy in our * pair (buddy1) and the combined O(n+1) page they form (page). * diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf0359c..f5f5e2a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -232,19 +232,6 @@ EXPORT_SYMBOL(nr_node_ids); EXPORT_SYMBOL(nr_online_nodes); #endif -/* - * Structure for holding the mostly immutable allocation parameters passed - * between alloc_pages* family of functions. - */ -struct alloc_context { - struct zonelist *zonelist; - nodemask_t *nodemask; - struct zone *preferred_zone; - int classzone_idx; - int migratetype; - enum zone_type high_zoneidx; -}; - int page_group_by_mobility_disabled __read_mostly; void set_pageblock_migratetype(struct page *page, int migratetype) @@ -2421,10 +2408,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, return NULL; current->flags |= PF_MEMALLOC; - compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask, - ac->nodemask, mode, - contended_compaction, - alloc_flags, ac->classzone_idx); + compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, + mode, contended_compaction); current->flags &= ~PF_MEMALLOC; switch (compact_result) { -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka @ 2015-01-06 14:53 ` Michal Hocko 2015-01-06 14:57 ` Michal Hocko 1 sibling, 0 replies; 21+ messages in thread From: Michal Hocko @ 2015-01-06 14:53 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Mon 05-01-15 18:17:42, Vlastimil Babka wrote: > Expand the usage of the struct alloc_context introduced in the previous patch > also for calling try_to_compact_pages(), to reduce the number of its > parameters. Since the function is in different compilation unit, we need to > move alloc_context definition in the shared mm/internal.h header. > > With this change we get simpler code and small savings of code size and stack > usage: > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27) > function old new delta > __alloc_pages_direct_compact 283 256 -27 > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13) > function old new delta > try_to_compact_pages 582 569 -13 > > Stack usage of __alloc_pages_direct_compact goes from 24 to none (per > scripts/checkstack.pl). > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Michal Hocko <mhocko@suse.cz> Looks good as well. Acked-by: Michal Hocko <mhocko@suse.cz> > --- > include/linux/compaction.h | 17 +++++++++-------- > mm/compaction.c | 23 +++++++++++------------ > mm/internal.h | 14 ++++++++++++++ > mm/page_alloc.c | 19 ++----------------- > 4 files changed, 36 insertions(+), 37 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index 3238ffa..f2efda2 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -21,6 +21,8 @@ > /* Zone lock or lru_lock was contended in async compaction */ > #define COMPACT_CONTENDED_LOCK 2 > > +struct alloc_context; /* in mm/internal.h */ > + > #ifdef CONFIG_COMPACTION > extern int sysctl_compact_memory; > extern int sysctl_compaction_handler(struct ctl_table *table, int write, > @@ -30,10 +32,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, > void __user *buffer, size_t *length, loff_t *ppos); > > extern int fragmentation_index(struct zone *zone, unsigned int order); > -extern unsigned long try_to_compact_pages(struct zonelist *zonelist, > - int order, gfp_t gfp_mask, nodemask_t *mask, > - enum migrate_mode mode, int *contended, > - int alloc_flags, int classzone_idx); > +extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > + int alloc_flags, const struct alloc_context *ac, > + enum migrate_mode mode, int *contended); > extern void compact_pgdat(pg_data_t *pgdat, int order); > extern void reset_isolation_suitable(pg_data_t *pgdat); > extern unsigned long compaction_suitable(struct zone *zone, int order, > @@ -101,10 +102,10 @@ static inline bool compaction_restarting(struct zone *zone, int order) > } > > #else > -static inline unsigned long try_to_compact_pages(struct zonelist *zonelist, > - int order, gfp_t gfp_mask, nodemask_t *nodemask, > - enum migrate_mode mode, int *contended, > - int alloc_flags, int classzone_idx) > +static inline unsigned long try_to_compact_pages(gfp_t gfp_mask, > + unsigned int order, int alloc_flags, > + const struct alloc_context *ac, > + enum migrate_mode mode, int *contended) > { > return COMPACT_CONTINUE; > } > diff --git a/mm/compaction.c b/mm/compaction.c > index 546e571..9c7e690 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1335,22 +1335,20 @@ int sysctl_extfrag_threshold = 500; > > /** > * try_to_compact_pages - Direct compact to satisfy a high-order allocation > - * @zonelist: The zonelist used for the current allocation > - * @order: The order of the current allocation > * @gfp_mask: The GFP mask of the current allocation > - * @nodemask: The allowed nodes to allocate from > + * @order: The order of the current allocation > + * @alloc_flags: The allocation flags of the current allocation > + * @ac: The context of current allocation > * @mode: The migration mode for async, sync light, or sync migration > * @contended: Return value that determines if compaction was aborted due to > * need_resched() or lock contention > * > * This is the main entry point for direct page compaction. > */ > -unsigned long try_to_compact_pages(struct zonelist *zonelist, > - int order, gfp_t gfp_mask, nodemask_t *nodemask, > - enum migrate_mode mode, int *contended, > - int alloc_flags, int classzone_idx) > +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > + int alloc_flags, const struct alloc_context *ac, > + enum migrate_mode mode, int *contended) > { > - enum zone_type high_zoneidx = gfp_zone(gfp_mask); > int may_enter_fs = gfp_mask & __GFP_FS; > int may_perform_io = gfp_mask & __GFP_IO; > struct zoneref *z; > @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, > return COMPACT_SKIPPED; > > /* Compact each zone in the list */ > - for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, > - nodemask) { > + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, > + ac->nodemask) { > int status; > int zone_contended; > > @@ -1374,7 +1372,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, > continue; > > status = compact_zone_order(zone, order, gfp_mask, mode, > - &zone_contended, alloc_flags, classzone_idx); > + &zone_contended, alloc_flags, > + ac->classzone_idx); > rc = max(status, rc); > /* > * It takes at least one zone that wasn't lock contended > @@ -1384,7 +1383,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, > > /* If a normal allocation would succeed, stop compacting */ > if (zone_watermark_ok(zone, order, low_wmark_pages(zone), > - classzone_idx, alloc_flags)) { > + ac->classzone_idx, alloc_flags)) { > /* > * We think the allocation will succeed in this zone, > * but it is not certain, hence the false. The caller > diff --git a/mm/internal.h b/mm/internal.h > index efad241..cd5418b 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -110,6 +110,20 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); > */ > > /* > + * Structure for holding the mostly immutable allocation parameters passed > + * between functions involved in allocations, including the alloc_pages* > + * family of functions. > + */ > +struct alloc_context { > + struct zonelist *zonelist; > + nodemask_t *nodemask; > + struct zone *preferred_zone; > + int classzone_idx; > + int migratetype; > + enum zone_type high_zoneidx; > +}; > + > +/* > * Locate the struct page for both the matching buddy in our > * pair (buddy1) and the combined O(n+1) page they form (page). > * > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bf0359c..f5f5e2a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -232,19 +232,6 @@ EXPORT_SYMBOL(nr_node_ids); > EXPORT_SYMBOL(nr_online_nodes); > #endif > > -/* > - * Structure for holding the mostly immutable allocation parameters passed > - * between alloc_pages* family of functions. > - */ > -struct alloc_context { > - struct zonelist *zonelist; > - nodemask_t *nodemask; > - struct zone *preferred_zone; > - int classzone_idx; > - int migratetype; > - enum zone_type high_zoneidx; > -}; > - > int page_group_by_mobility_disabled __read_mostly; > > void set_pageblock_migratetype(struct page *page, int migratetype) > @@ -2421,10 +2408,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > return NULL; > > current->flags |= PF_MEMALLOC; > - compact_result = try_to_compact_pages(ac->zonelist, order, gfp_mask, > - ac->nodemask, mode, > - contended_compaction, > - alloc_flags, ac->classzone_idx); > + compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, > + mode, contended_compaction); > current->flags &= ~PF_MEMALLOC; > > switch (compact_result) { > -- > 2.1.2 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka 2015-01-06 14:53 ` Michal Hocko @ 2015-01-06 14:57 ` Michal Hocko 2015-01-07 9:11 ` Vlastimil Babka 1 sibling, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-06 14:57 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim Hmm, wait a minute On Mon 05-01-15 18:17:42, Vlastimil Babka wrote: [...] > -unsigned long try_to_compact_pages(struct zonelist *zonelist, > - int order, gfp_t gfp_mask, nodemask_t *nodemask, > - enum migrate_mode mode, int *contended, > - int alloc_flags, int classzone_idx) > +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > + int alloc_flags, const struct alloc_context *ac, > + enum migrate_mode mode, int *contended) > { > - enum zone_type high_zoneidx = gfp_zone(gfp_mask); > int may_enter_fs = gfp_mask & __GFP_FS; > int may_perform_io = gfp_mask & __GFP_IO; > struct zoneref *z; gfp_mask might change since the high_zoneidx was set up in the call chain. I guess this shouldn't change to the gfp_zone output but it is worth double checking. > @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, > return COMPACT_SKIPPED; > > /* Compact each zone in the list */ > - for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, > - nodemask) { > + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, > + ac->nodemask) { > int status; > int zone_contended; > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-06 14:57 ` Michal Hocko @ 2015-01-07 9:11 ` Vlastimil Babka 2015-01-07 9:18 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-07 9:11 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On 01/06/2015 03:57 PM, Michal Hocko wrote: > Hmm, wait a minute > > On Mon 05-01-15 18:17:42, Vlastimil Babka wrote: > [...] >> -unsigned long try_to_compact_pages(struct zonelist *zonelist, >> - int order, gfp_t gfp_mask, nodemask_t *nodemask, >> - enum migrate_mode mode, int *contended, >> - int alloc_flags, int classzone_idx) >> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, >> + int alloc_flags, const struct alloc_context *ac, >> + enum migrate_mode mode, int *contended) >> { >> - enum zone_type high_zoneidx = gfp_zone(gfp_mask); >> int may_enter_fs = gfp_mask & __GFP_FS; >> int may_perform_io = gfp_mask & __GFP_IO; >> struct zoneref *z; > > gfp_mask might change since the high_zoneidx was set up in the call > chain. I guess this shouldn't change to the gfp_zone output but it is > worth double checking. Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags, and we don't change that. >> @@ -1365,8 +1363,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, >> return COMPACT_SKIPPED; >> >> /* Compact each zone in the list */ >> - for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx, >> - nodemask) { >> + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, >> + ac->nodemask) { >> int status; >> int zone_contended; >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-07 9:11 ` Vlastimil Babka @ 2015-01-07 9:18 ` Michal Hocko 2015-01-07 9:21 ` Vlastimil Babka 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-07 9:18 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Wed 07-01-15 10:11:45, Vlastimil Babka wrote: > On 01/06/2015 03:57 PM, Michal Hocko wrote: > > Hmm, wait a minute > > > > On Mon 05-01-15 18:17:42, Vlastimil Babka wrote: > > [...] > >> -unsigned long try_to_compact_pages(struct zonelist *zonelist, > >> - int order, gfp_t gfp_mask, nodemask_t *nodemask, > >> - enum migrate_mode mode, int *contended, > >> - int alloc_flags, int classzone_idx) > >> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > >> + int alloc_flags, const struct alloc_context *ac, > >> + enum migrate_mode mode, int *contended) > >> { > >> - enum zone_type high_zoneidx = gfp_zone(gfp_mask); > >> int may_enter_fs = gfp_mask & __GFP_FS; > >> int may_perform_io = gfp_mask & __GFP_IO; > >> struct zoneref *z; > > > > gfp_mask might change since the high_zoneidx was set up in the call > > chain. I guess this shouldn't change to the gfp_zone output but it is > > worth double checking. > > Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags, > and we don't change that. That was my understanding as well. Maybe we want VM_BUG_ON(gfp_zone(gfp_mask) != ac->high_zoneidx); -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters 2015-01-07 9:18 ` Michal Hocko @ 2015-01-07 9:21 ` Vlastimil Babka 0 siblings, 0 replies; 21+ messages in thread From: Vlastimil Babka @ 2015-01-07 9:21 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On 01/07/2015 10:18 AM, Michal Hocko wrote: > On Wed 07-01-15 10:11:45, Vlastimil Babka wrote: >> On 01/06/2015 03:57 PM, Michal Hocko wrote: >> > Hmm, wait a minute >> > >> > On Mon 05-01-15 18:17:42, Vlastimil Babka wrote: >> > [...] >> >> -unsigned long try_to_compact_pages(struct zonelist *zonelist, >> >> - int order, gfp_t gfp_mask, nodemask_t *nodemask, >> >> - enum migrate_mode mode, int *contended, >> >> - int alloc_flags, int classzone_idx) >> >> +unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, >> >> + int alloc_flags, const struct alloc_context *ac, >> >> + enum migrate_mode mode, int *contended) >> >> { >> >> - enum zone_type high_zoneidx = gfp_zone(gfp_mask); >> >> int may_enter_fs = gfp_mask & __GFP_FS; >> >> int may_perform_io = gfp_mask & __GFP_IO; >> >> struct zoneref *z; >> > >> > gfp_mask might change since the high_zoneidx was set up in the call >> > chain. I guess this shouldn't change to the gfp_zone output but it is >> > worth double checking. >> >> Yeah I checked that. gfp_zone() operates just on GFP_ZONEMASK part of the flags, >> and we don't change that. > > That was my understanding as well. Maybe we want > VM_BUG_ON(gfp_zone(gfp_mask) != ac->high_zoneidx); That sounds like an arbitrary overkill to me. I think that gfp_zone() was just used here to save an extra parameter. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka ` (2 preceding siblings ...) 2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka @ 2015-01-05 17:17 ` Vlastimil Babka 2015-01-06 15:09 ` Michal Hocko 3 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-05 17:17 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim, Michal Hocko The function next_zones_zonelist() returns zoneref pointer, as well as zone pointer via extra parameter. Since the latter can be trivially obtained by dereferencing the former, the overhead of the extra parameter is unjustified. This patch thus removes the zone parameter from next_zones_zonelist(). Both callers happen to be in the same header file, so it's simple to add the zoneref dereference inline. We save some bytes of code size. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-96 (-96) function old new delta __alloc_pages_nodemask 2182 2176 -6 nr_free_zone_pages 129 115 -14 get_page_from_freelist 2652 2576 -76 add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10) function old new delta try_to_compact_pages 569 579 +10 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> --- include/linux/mmzone.h | 13 +++++++------ mm/mmzone.c | 4 +--- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2f0856d..a2884ef 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -970,7 +970,6 @@ static inline int zonelist_node_idx(struct zoneref *zoneref) * @z - The cursor used as a starting point for the search * @highest_zoneidx - The zone index of the highest zone to return * @nodes - An optional nodemask to filter the zonelist with - * @zone - The first suitable zone found is returned via this parameter * * This function returns the next zone at or below a given zone index that is * within the allowed nodemask using a cursor as the starting point for the @@ -980,8 +979,7 @@ static inline int zonelist_node_idx(struct zoneref *zoneref) */ struct zoneref *next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, - nodemask_t *nodes, - struct zone **zone); + nodemask_t *nodes); /** * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist @@ -1000,8 +998,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, nodemask_t *nodes, struct zone **zone) { - return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes, - zone); + struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs, + highest_zoneidx, nodes); + *zone = zonelist_zone(z); + return z; } /** @@ -1018,7 +1018,8 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \ zone; \ - z = next_zones_zonelist(++z, highidx, nodemask, &zone)) \ + z = next_zones_zonelist(++z, highidx, nodemask), \ + zone = zonelist_zone(z)) \ /** * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index diff --git a/mm/mmzone.c b/mm/mmzone.c index bf34fb8..7d87ebb 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -54,8 +54,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) /* Returns the next zone at or below highest_zoneidx in a zonelist */ struct zoneref *next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, - nodemask_t *nodes, - struct zone **zone) + nodemask_t *nodes) { /* * Find the next suitable zone to use for the allocation. @@ -69,7 +68,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z, (z->zone && !zref_in_nodemask(z, nodes))) z++; - *zone = zonelist_zone(z); return z; } -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka @ 2015-01-06 15:09 ` Michal Hocko 2015-01-07 9:15 ` Vlastimil Babka 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-06 15:09 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Mon 05-01-15 18:17:43, Vlastimil Babka wrote: > The function next_zones_zonelist() returns zoneref pointer, as well as zone > pointer via extra parameter. Since the latter can be trivially obtained by > dereferencing the former, the overhead of the extra parameter is unjustified. > > This patch thus removes the zone parameter from next_zones_zonelist(). Both > callers happen to be in the same header file, so it's simple to add the > zoneref dereference inline. We save some bytes of code size. Dunno. It makes first_zones_zonelist and next_zones_zonelist look different which might be a bit confusing. It's not a big deal but I am not sure it is worth it. > add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-96 (-96) > function old new delta > __alloc_pages_nodemask 2182 2176 -6 > nr_free_zone_pages 129 115 -14 > get_page_from_freelist 2652 2576 -76 > > add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10) > function old new delta > try_to_compact_pages 569 579 +10 > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Michal Hocko <mhocko@suse.cz> > --- > include/linux/mmzone.h | 13 +++++++------ > mm/mmzone.c | 4 +--- > 2 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 2f0856d..a2884ef 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -970,7 +970,6 @@ static inline int zonelist_node_idx(struct zoneref *zoneref) > * @z - The cursor used as a starting point for the search > * @highest_zoneidx - The zone index of the highest zone to return > * @nodes - An optional nodemask to filter the zonelist with > - * @zone - The first suitable zone found is returned via this parameter > * > * This function returns the next zone at or below a given zone index that is > * within the allowed nodemask using a cursor as the starting point for the > @@ -980,8 +979,7 @@ static inline int zonelist_node_idx(struct zoneref *zoneref) > */ > struct zoneref *next_zones_zonelist(struct zoneref *z, > enum zone_type highest_zoneidx, > - nodemask_t *nodes, > - struct zone **zone); > + nodemask_t *nodes); > > /** > * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist > @@ -1000,8 +998,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > nodemask_t *nodes, > struct zone **zone) > { > - return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes, > - zone); > + struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs, > + highest_zoneidx, nodes); > + *zone = zonelist_zone(z); > + return z; > } > > /** > @@ -1018,7 +1018,8 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ > for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \ > zone; \ > - z = next_zones_zonelist(++z, highidx, nodemask, &zone)) \ > + z = next_zones_zonelist(++z, highidx, nodemask), \ > + zone = zonelist_zone(z)) \ > > /** > * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index > diff --git a/mm/mmzone.c b/mm/mmzone.c > index bf34fb8..7d87ebb 100644 > --- a/mm/mmzone.c > +++ b/mm/mmzone.c > @@ -54,8 +54,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) > /* Returns the next zone at or below highest_zoneidx in a zonelist */ > struct zoneref *next_zones_zonelist(struct zoneref *z, > enum zone_type highest_zoneidx, > - nodemask_t *nodes, > - struct zone **zone) > + nodemask_t *nodes) > { > /* > * Find the next suitable zone to use for the allocation. > @@ -69,7 +68,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z, > (z->zone && !zref_in_nodemask(z, nodes))) > z++; > > - *zone = zonelist_zone(z); > return z; > } > > -- > 2.1.2 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-06 15:09 ` Michal Hocko @ 2015-01-07 9:15 ` Vlastimil Babka 2015-01-07 10:57 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Vlastimil Babka @ 2015-01-07 9:15 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On 01/06/2015 04:09 PM, Michal Hocko wrote: > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote: >> The function next_zones_zonelist() returns zoneref pointer, as well as zone >> pointer via extra parameter. Since the latter can be trivially obtained by >> dereferencing the former, the overhead of the extra parameter is unjustified. >> >> This patch thus removes the zone parameter from next_zones_zonelist(). Both >> callers happen to be in the same header file, so it's simple to add the >> zoneref dereference inline. We save some bytes of code size. > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look > different which might be a bit confusing. It's not a big deal but > I am not sure it is worth it. Yeah I thought that nobody uses them directly anyway thanks to for_each_zone_zonelist* so it's not a big deal. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-07 9:15 ` Vlastimil Babka @ 2015-01-07 10:57 ` Michal Hocko 2015-01-07 11:17 ` Mel Gorman 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2015-01-07 10:57 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Wed 07-01-15 10:15:39, Vlastimil Babka wrote: > On 01/06/2015 04:09 PM, Michal Hocko wrote: > > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote: > >> The function next_zones_zonelist() returns zoneref pointer, as well as zone > >> pointer via extra parameter. Since the latter can be trivially obtained by > >> dereferencing the former, the overhead of the extra parameter is unjustified. > >> > >> This patch thus removes the zone parameter from next_zones_zonelist(). Both > >> callers happen to be in the same header file, so it's simple to add the > >> zoneref dereference inline. We save some bytes of code size. > > > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look > > different which might be a bit confusing. It's not a big deal but > > I am not sure it is worth it. > > Yeah I thought that nobody uses them directly anyway thanks to > for_each_zone_zonelist* so it's not a big deal. OK, I have checked why we need the whole struct zoneref when it only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains structs with both a zone pointer and zone_idx) claims this will reduce cache contention by reducing pointer chasing because we do not have to dereference pgdat so often in hot paths. Fair enough but I do not see any numbers in the changelog nor in the original discussion (https://lkml.org/lkml/2007/11/20/547 resp. https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the benchmark which has shown the difference so that we can check whether this is still relevant and caching the index is still worth it. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-07 10:57 ` Michal Hocko @ 2015-01-07 11:17 ` Mel Gorman 2015-01-07 14:47 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Mel Gorman @ 2015-01-07 11:17 UTC (permalink / raw) To: Michal Hocko Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Wed, Jan 07, 2015 at 11:57:49AM +0100, Michal Hocko wrote: > On Wed 07-01-15 10:15:39, Vlastimil Babka wrote: > > On 01/06/2015 04:09 PM, Michal Hocko wrote: > > > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote: > > >> The function next_zones_zonelist() returns zoneref pointer, as well as zone > > >> pointer via extra parameter. Since the latter can be trivially obtained by > > >> dereferencing the former, the overhead of the extra parameter is unjustified. > > >> > > >> This patch thus removes the zone parameter from next_zones_zonelist(). Both > > >> callers happen to be in the same header file, so it's simple to add the > > >> zoneref dereference inline. We save some bytes of code size. > > > > > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look > > > different which might be a bit confusing. It's not a big deal but > > > I am not sure it is worth it. > > > > Yeah I thought that nobody uses them directly anyway thanks to > > for_each_zone_zonelist* so it's not a big deal. > > OK, I have checked why we need the whole struct zoneref when it > only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains > structs with both a zone pointer and zone_idx) claims this will > reduce cache contention by reducing pointer chasing because we > do not have to dereference pgdat so often in hot paths. Fair > enough but I do not see any numbers in the changelog nor in the > original discussion (https://lkml.org/lkml/2007/11/20/547 resp. > https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the > benchmark which has shown the difference so that we can check whether > this is still relevant and caching the index is still worth it. > IIRC, the difference was a few percent on instruction profiles and cache profiles when driven from a systemtap microbenchmark but I no longer have the data and besides it would have been based on an ancient machine by todays standards. When zeroing of pages is taken into account it's going to be marginal so a userspace test would probably show nothing. Still, I see little motivation to replace a single deference with multiple dereferences and pointer arithmetic when zonelist_zone_idx() is called. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V4 4/4] mm: microoptimize zonelist operations 2015-01-07 11:17 ` Mel Gorman @ 2015-01-07 14:47 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2015-01-07 14:47 UTC (permalink / raw) To: Mel Gorman Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel, Zhang Yanfei, Minchan Kim, David Rientjes, Rik van Riel, Aneesh Kumar K.V, Kirill A. Shutemov, Johannes Weiner, Joonsoo Kim On Wed 07-01-15 11:17:07, Mel Gorman wrote: > On Wed, Jan 07, 2015 at 11:57:49AM +0100, Michal Hocko wrote: > > On Wed 07-01-15 10:15:39, Vlastimil Babka wrote: > > > On 01/06/2015 04:09 PM, Michal Hocko wrote: > > > > On Mon 05-01-15 18:17:43, Vlastimil Babka wrote: > > > >> The function next_zones_zonelist() returns zoneref pointer, as well as zone > > > >> pointer via extra parameter. Since the latter can be trivially obtained by > > > >> dereferencing the former, the overhead of the extra parameter is unjustified. > > > >> > > > >> This patch thus removes the zone parameter from next_zones_zonelist(). Both > > > >> callers happen to be in the same header file, so it's simple to add the > > > >> zoneref dereference inline. We save some bytes of code size. > > > > > > > > Dunno. It makes first_zones_zonelist and next_zones_zonelist look > > > > different which might be a bit confusing. It's not a big deal but > > > > I am not sure it is worth it. > > > > > > Yeah I thought that nobody uses them directly anyway thanks to > > > for_each_zone_zonelist* so it's not a big deal. > > > > OK, I have checked why we need the whole struct zoneref when it > > only caches zone_idx. dd1a239f6f2d (mm: have zonelist contains > > structs with both a zone pointer and zone_idx) claims this will > > reduce cache contention by reducing pointer chasing because we > > do not have to dereference pgdat so often in hot paths. Fair > > enough but I do not see any numbers in the changelog nor in the > > original discussion (https://lkml.org/lkml/2007/11/20/547 resp. > > https://lkml.org/lkml/2007/9/28/170). Maybe Mel remembers what was the > > benchmark which has shown the difference so that we can check whether > > this is still relevant and caching the index is still worth it. > > > > IIRC, the difference was a few percent on instruction profiles and cache > profiles when driven from a systemtap microbenchmark but I no longer have > the data and besides it would have been based on an ancient machine by > todays standards. When zeroing of pages is taken into account it's going > to be marginal so a userspace test would probably show nothing. Still, > I see little motivation to replace a single deference with multiple > dereferences and pointer arithmetic when zonelist_zone_idx() is called. OK, fair enough. I have tried to convert back to simple zone * and it turned out we wouldn't save too much code so this is really not worth time and possible complications. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-01-07 14:54 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-05 17:17 [PATCH V4 0/4] Reducing parameters of alloc_pages* family of functions Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 1/4] mm: set page->pfmemalloc in prep_new_page() Vlastimil Babka 2015-01-06 14:30 ` Michal Hocko 2015-01-06 21:10 ` Vlastimil Babka 2015-01-06 21:44 ` Michal Hocko 2015-01-07 9:36 ` Vlastimil Babka 2015-01-07 10:54 ` Michal Hocko 2015-01-05 17:17 ` [PATCH V4 2/4] mm, page_alloc: reduce number of alloc_pages* functions' parameters Vlastimil Babka 2015-01-06 14:45 ` Michal Hocko 2015-01-05 17:17 ` [PATCH V4 3/4] mm: reduce try_to_compact_pages parameters Vlastimil Babka 2015-01-06 14:53 ` Michal Hocko 2015-01-06 14:57 ` Michal Hocko 2015-01-07 9:11 ` Vlastimil Babka 2015-01-07 9:18 ` Michal Hocko 2015-01-07 9:21 ` Vlastimil Babka 2015-01-05 17:17 ` [PATCH V4 4/4] mm: microoptimize zonelist operations Vlastimil Babka 2015-01-06 15:09 ` Michal Hocko 2015-01-07 9:15 ` Vlastimil Babka 2015-01-07 10:57 ` Michal Hocko 2015-01-07 11:17 ` Mel Gorman 2015-01-07 14:47 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).