* [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
@ 2014-12-15 23:03 akpm
2014-12-15 23:32 ` Yasuaki Ishimatsu
0 siblings, 1 reply; 10+ messages in thread
From: akpm @ 2014-12-15 23:03 UTC (permalink / raw)
To: linux-mm, akpm, hannes, mel, ming.lei
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
__alloc_pages_nodemask() strips __GFP_IO when retrying the page
allocation. But it does this by altering the function-wide variable
gfp_mask. This will cause subsequent allocation attempts to inadvertently
use the modified gfp_mask.
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
+++ a/mm/page_alloc.c
@@ -2918,8 +2918,9 @@ retry_cpuset:
* can deadlock because I/O on the device might not
* complete.
*/
- gfp_mask = memalloc_noio_flags(gfp_mask);
- page = __alloc_pages_slowpath(gfp_mask, order,
+ gfp_t mask = memalloc_noio_flags(gfp_mask);
+
+ page = __alloc_pages_slowpath(mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, classzone_idx, migratetype);
}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-15 23:03 [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask akpm
@ 2014-12-15 23:32 ` Yasuaki Ishimatsu
2014-12-15 23:43 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-15 23:32 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, hannes, mel, ming.lei
(2014/12/16 8:03), akpm@linux-foundation.org wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>
> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> allocation. But it does this by altering the function-wide variable
> gfp_mask. This will cause subsequent allocation attempts to inadvertently
> use the modified gfp_mask.
>
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/page_alloc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> +++ a/mm/page_alloc.c
> @@ -2918,8 +2918,9 @@ retry_cpuset:
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_mask = memalloc_noio_flags(gfp_mask);
> - page = __alloc_pages_slowpath(gfp_mask, order,
> + gfp_t mask = memalloc_noio_flags(gfp_mask);
> +
> + page = __alloc_pages_slowpath(mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
gfp_mask.
Thanks,
Yasuaki Ishimatsu
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-15 23:32 ` Yasuaki Ishimatsu
@ 2014-12-15 23:43 ` Andrew Morton
2014-12-16 0:08 ` Yasuaki Ishimatsu
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2014-12-15 23:43 UTC (permalink / raw)
To: Yasuaki Ishimatsu; +Cc: linux-mm, hannes, mel, ming.lei
On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
> >
> > __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> > allocation. But it does this by altering the function-wide variable
> > gfp_mask. This will cause subsequent allocation attempts to inadvertently
> > use the modified gfp_mask.
> >
> > Cc: Ming Lei <ming.lei@canonical.com>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >
> > mm/page_alloc.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> > --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> > +++ a/mm/page_alloc.c
> > @@ -2918,8 +2918,9 @@ retry_cpuset:
> > * can deadlock because I/O on the device might not
> > * complete.
> > */
> > - gfp_mask = memalloc_noio_flags(gfp_mask);
> > - page = __alloc_pages_slowpath(gfp_mask, order,
>
> > + gfp_t mask = memalloc_noio_flags(gfp_mask);
> > +
> > + page = __alloc_pages_slowpath(mask, order,
> > zonelist, high_zoneidx, nodemask,
> > preferred_zone, classzone_idx, migratetype);
> > }
>
> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
> gfp_mask.
Well it was already wrong because the first allocation attempt uses
gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
This?
--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
+++ a/mm/page_alloc.c
@@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
int classzone_idx;
+ gfp_t mask;
gfp_mask &= gfp_allowed_mask;
@@ -2910,23 +2911,24 @@ retry_cpuset:
classzone_idx = zonelist_zone_idx(preferred_zoneref);
/* First allocation attempt */
- page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
- zonelist, high_zoneidx, alloc_flags,
- preferred_zone, classzone_idx, migratetype);
+ mask = gfp_mask|__GFP_HARDWALL;
+ page = get_page_from_freelist(mask, nodemask, order, zonelist,
+ high_zoneidx, alloc_flags, preferred_zone,
+ classzone_idx, migratetype);
if (unlikely(!page)) {
/*
* Runtime PM, block IO and its error handling path
* can deadlock because I/O on the device might not
* complete.
*/
- gfp_t mask = memalloc_noio_flags(gfp_mask);
+ mask = memalloc_noio_flags(gfp_mask);
page = __alloc_pages_slowpath(mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, classzone_idx, migratetype);
}
- trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+ trace_mm_page_alloc(page, order, mask, migratetype);
out:
/*
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-15 23:43 ` Andrew Morton
@ 2014-12-16 0:08 ` Yasuaki Ishimatsu
2014-12-17 10:47 ` Vlastimil Babka
2014-12-18 0:22 ` David Rientjes
2 siblings, 0 replies; 10+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-16 0:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, hannes, mel, ming.lei
(2014/12/16 8:43), Andrew Morton wrote:
> On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
>>> From: Andrew Morton <akpm@linux-foundation.org>
>>> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>>>
>>> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
>>> allocation. But it does this by altering the function-wide variable
>>> gfp_mask. This will cause subsequent allocation attempts to inadvertently
>>> use the modified gfp_mask.
>>>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> Cc: Mel Gorman <mel@csn.ul.ie>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>
>>> mm/page_alloc.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
>>> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
>>> +++ a/mm/page_alloc.c
>>> @@ -2918,8 +2918,9 @@ retry_cpuset:
>>> * can deadlock because I/O on the device might not
>>> * complete.
>>> */
>>> - gfp_mask = memalloc_noio_flags(gfp_mask);
>>> - page = __alloc_pages_slowpath(gfp_mask, order,
>>
>>> + gfp_t mask = memalloc_noio_flags(gfp_mask);
>>> +
>>> + page = __alloc_pages_slowpath(mask, order,
>>> zonelist, high_zoneidx, nodemask,
>>> preferred_zone, classzone_idx, migratetype);
>>> }
>>
>> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
>> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
>> gfp_mask.
>
> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
>
> This?
Looks good to me.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Thanks,
Yasuaki Ishimatsu
>
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> unsigned int cpuset_mems_cookie;
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> int classzone_idx;
> + gfp_t mask;
>
> gfp_mask &= gfp_allowed_mask;
>
> @@ -2910,23 +2911,24 @@ retry_cpuset:
> classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, alloc_flags,
> - preferred_zone, classzone_idx, migratetype);
> + mask = gfp_mask|__GFP_HARDWALL;
> + page = get_page_from_freelist(mask, nodemask, order, zonelist,
> + high_zoneidx, alloc_flags, preferred_zone,
> + classzone_idx, migratetype);
> if (unlikely(!page)) {
> /*
> * Runtime PM, block IO and its error handling path
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_t mask = memalloc_noio_flags(gfp_mask);
> + mask = memalloc_noio_flags(gfp_mask);
>
> page = __alloc_pages_slowpath(mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
>
> - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> + trace_mm_page_alloc(page, order, mask, migratetype);
>
> out:
> /*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-15 23:43 ` Andrew Morton
2014-12-16 0:08 ` Yasuaki Ishimatsu
@ 2014-12-17 10:47 ` Vlastimil Babka
2014-12-18 0:22 ` David Rientjes
2 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2014-12-17 10:47 UTC (permalink / raw)
To: Andrew Morton, Yasuaki Ishimatsu; +Cc: linux-mm, hannes, mel, ming.lei
On 12/16/2014 12:43 AM, Andrew Morton wrote:
> On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
>>> From: Andrew Morton <akpm@linux-foundation.org>
>>> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>>>
>>> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
>>> allocation. But it does this by altering the function-wide variable
>>> gfp_mask. This will cause subsequent allocation attempts to inadvertently
>>> use the modified gfp_mask.
>>>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> Cc: Mel Gorman <mel@csn.ul.ie>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>
>>> mm/page_alloc.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
>>> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
>>> +++ a/mm/page_alloc.c
>>> @@ -2918,8 +2918,9 @@ retry_cpuset:
>>> * can deadlock because I/O on the device might not
>>> * complete.
>>> */
>>> - gfp_mask = memalloc_noio_flags(gfp_mask);
>>> - page = __alloc_pages_slowpath(gfp_mask, order,
>>
>>> + gfp_t mask = memalloc_noio_flags(gfp_mask);
>>> +
>>> + page = __alloc_pages_slowpath(mask, order,
>>> zonelist, high_zoneidx, nodemask,
>>> preferred_zone, classzone_idx, migratetype);
>>> }
>>
>> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
>> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
>> gfp_mask.
>
> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
If we wanted to be 100% correct with the tracepoint, then there's also
__alloc_pages_may_oom(), which also appends __GFP_HARDWALL on the fly
page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
But fixing that would be ugly :/ I guess it's not worth the trouble.
> This?
>
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> unsigned int cpuset_mems_cookie;
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> int classzone_idx;
> + gfp_t mask;
>
> gfp_mask &= gfp_allowed_mask;
>
> @@ -2910,23 +2911,24 @@ retry_cpuset:
> classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, alloc_flags,
> - preferred_zone, classzone_idx, migratetype);
> + mask = gfp_mask|__GFP_HARDWALL;
> + page = get_page_from_freelist(mask, nodemask, order, zonelist,
> + high_zoneidx, alloc_flags, preferred_zone,
> + classzone_idx, migratetype);
> if (unlikely(!page)) {
> /*
> * Runtime PM, block IO and its error handling path
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_t mask = memalloc_noio_flags(gfp_mask);
> + mask = memalloc_noio_flags(gfp_mask);
>
> page = __alloc_pages_slowpath(mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
>
> - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> + trace_mm_page_alloc(page, order, mask, migratetype);
>
> out:
> /*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-15 23:43 ` Andrew Morton
2014-12-16 0:08 ` Yasuaki Ishimatsu
2014-12-17 10:47 ` Vlastimil Babka
@ 2014-12-18 0:22 ` David Rientjes
2014-12-18 0:29 ` Andrew Morton
2 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-12-18 0:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei
On Mon, 15 Dec 2014, Andrew Morton wrote:
> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
>
> This?
>
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> unsigned int cpuset_mems_cookie;
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> int classzone_idx;
> + gfp_t mask;
>
> gfp_mask &= gfp_allowed_mask;
>
> @@ -2910,23 +2911,24 @@ retry_cpuset:
> classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, alloc_flags,
> - preferred_zone, classzone_idx, migratetype);
> + mask = gfp_mask|__GFP_HARDWALL;
> + page = get_page_from_freelist(mask, nodemask, order, zonelist,
> + high_zoneidx, alloc_flags, preferred_zone,
> + classzone_idx, migratetype);
> if (unlikely(!page)) {
> /*
> * Runtime PM, block IO and its error handling path
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_t mask = memalloc_noio_flags(gfp_mask);
> + mask = memalloc_noio_flags(gfp_mask);
>
> page = __alloc_pages_slowpath(mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
>
> - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> + trace_mm_page_alloc(page, order, mask, migratetype);
>
> out:
> /*
I'm not sure I understand why we need a local variable to hold the context
mask vs. what was passed to the function. We should only be allocating
with a single gfp_mask that is passed to the function and modify it as
necessary, and that becomes the context mask that can be traced.
The above is wrong because it unconditionally sets __GFP_HARDWALL as the
gfp mask for __alloc_pages_slowpath() when we actually only want that for
the first allocation attempt, it's needed for the implementation of
__cpuset_node_allowed().
The page allocator slowpath is always called from the fastpath if the
first allocation didn't succeed, so we don't know from which we allocated
the page at this tracepoint.
I'm afraid the original code before either of these patches was more
correct. The use of memalloc_noio_flags() for "subsequent allocation
attempts" doesn't really matter since neither __GFP_FS nor __GFP_IO
matters for fastpath allocation (we aren't reclaiming).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-18 0:22 ` David Rientjes
@ 2014-12-18 0:29 ` Andrew Morton
2014-12-18 0:51 ` David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-12-18 0:29 UTC (permalink / raw)
To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei
On Wed, 17 Dec 2014 16:22:30 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
> On Mon, 15 Dec 2014, Andrew Morton wrote:
>
> > Well it was already wrong because the first allocation attempt uses
> > gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
> >
> > This?
> >
> > --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> > +++ a/mm/page_alloc.c
> > @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> > unsigned int cpuset_mems_cookie;
> > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> > int classzone_idx;
> > + gfp_t mask;
> >
> > gfp_mask &= gfp_allowed_mask;
> >
> > @@ -2910,23 +2911,24 @@ retry_cpuset:
> > classzone_idx = zonelist_zone_idx(preferred_zoneref);
> >
> > /* First allocation attempt */
> > - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> > - zonelist, high_zoneidx, alloc_flags,
> > - preferred_zone, classzone_idx, migratetype);
> > + mask = gfp_mask|__GFP_HARDWALL;
> > + page = get_page_from_freelist(mask, nodemask, order, zonelist,
> > + high_zoneidx, alloc_flags, preferred_zone,
> > + classzone_idx, migratetype);
> > if (unlikely(!page)) {
> > /*
> > * Runtime PM, block IO and its error handling path
> > * can deadlock because I/O on the device might not
> > * complete.
> > */
> > - gfp_t mask = memalloc_noio_flags(gfp_mask);
> > + mask = memalloc_noio_flags(gfp_mask);
> >
> > page = __alloc_pages_slowpath(mask, order,
> > zonelist, high_zoneidx, nodemask,
> > preferred_zone, classzone_idx, migratetype);
> > }
> >
> > - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> > + trace_mm_page_alloc(page, order, mask, migratetype);
> >
> > out:
> > /*
>
> I'm not sure I understand why we need a local variable to hold the context
> mask vs. what was passed to the function. We should only be allocating
> with a single gfp_mask that is passed to the function and modify it as
> necessary, and that becomes the context mask that can be traced.
>
> The above is wrong because it unconditionally sets __GFP_HARDWALL as the
> gfp mask for __alloc_pages_slowpath() when we actually only want that for
> the first allocation attempt, it's needed for the implementation of
> __cpuset_node_allowed().
no,
: /* First allocation attempt */
: mask = gfp_mask|__GFP_HARDWALL;
: page = get_page_from_freelist(mask, nodemask, order, zonelist,
: high_zoneidx, alloc_flags, preferred_zone,
: classzone_idx, migratetype);
: if (unlikely(!page)) {
: /*
: * Runtime PM, block IO and its error handling path
: * can deadlock because I/O on the device might not
: * complete.
: */
: mask = memalloc_noio_flags(gfp_mask);
^^ this
: page = __alloc_pages_slowpath(mask, order,
: zonelist, high_zoneidx, nodemask,
: preferred_zone, classzone_idx, migratetype);
: }
:
: trace_mm_page_alloc(page, order, mask, migratetype);
> The page allocator slowpath is always called from the fastpath if the
> first allocation didn't succeed, so we don't know from which we allocated
> the page at this tracepoint.
True, but the idea is that when we call trace_mm_page_alloc(), local
var `mask' holds the gfp_t which was used in the most recent allocation
attempt.
> I'm afraid the original code before either of these patches was more
> correct. The use of memalloc_noio_flags() for "subsequent allocation
> attempts" doesn't really matter since neither __GFP_FS nor __GFP_IO
> matters for fastpath allocation (we aren't reclaiming).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-18 0:29 ` Andrew Morton
@ 2014-12-18 0:51 ` David Rientjes
2014-12-18 21:10 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-12-18 0:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei
On Wed, 17 Dec 2014, Andrew Morton wrote:
> > The above is wrong because it unconditionally sets __GFP_HARDWALL as the
> > gfp mask for __alloc_pages_slowpath() when we actually only want that for
> > the first allocation attempt, it's needed for the implementation of
> > __cpuset_node_allowed().
>
> no,
>
> : /* First allocation attempt */
> : mask = gfp_mask|__GFP_HARDWALL;
> : page = get_page_from_freelist(mask, nodemask, order, zonelist,
> : high_zoneidx, alloc_flags, preferred_zone,
> : classzone_idx, migratetype);
> : if (unlikely(!page)) {
> : /*
> : * Runtime PM, block IO and its error handling path
> : * can deadlock because I/O on the device might not
> : * complete.
> : */
> : mask = memalloc_noio_flags(gfp_mask);
>
> ^^ this
>
> : page = __alloc_pages_slowpath(mask, order,
> : zonelist, high_zoneidx, nodemask,
> : preferred_zone, classzone_idx, migratetype);
> : }
> :
> : trace_mm_page_alloc(page, order, mask, migratetype);
>
Sorry, I should have applied the patch locally to look at it.
> > The page allocator slowpath is always called from the fastpath if the
> > first allocation didn't succeed, so we don't know from which we allocated
> > the page at this tracepoint.
>
> True, but the idea is that when we call trace_mm_page_alloc(), local
> var `mask' holds the gfp_t which was used in the most recent allocation
> attempt.
>
So if the fastpath succeeds, which should be the majority of the time,
then we get a tracepoint here that says we allocated with
__GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set. So if
page != NULL, we can know that either the fastpath succeeded or we don't
have PF_MEMALLOC_NOIO and were allowed to reclaim. Not sure that's very
helpful.
Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we
clear everything not in gfp_allowed_mask, but that's pointless if the
fastpath succeeds. I'm not sure it's worth to restructure the code with a
possible performance overhead for the benefit of a tracepoint.
And then there's the call to lockdep_trace_alloc() which does care about
__GFP_FS. That looks broken because we need to clear __GFP_FS with
PF_MEMALLOC_NOIO.
I think that should be
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2871,7 +2873,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
gfp_mask &= gfp_allowed_mask;
- lockdep_trace_alloc(gfp_mask);
+ lockdep_trace_alloc(memalloc_noio_flags(gfp_mask));
might_sleep_if(gfp_mask & __GFP_WAIT);
Wow. I think we should just trace gfp_mask & gfp_allowed_mask and pass a
bit to say whether PF_MEMALLOC_NOIO is set or not.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-18 0:51 ` David Rientjes
@ 2014-12-18 21:10 ` Andrew Morton
2015-01-06 17:58 ` Vlastimil Babka
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-12-18 21:10 UTC (permalink / raw)
To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei
On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
> > > The page allocator slowpath is always called from the fastpath if the
> > > first allocation didn't succeed, so we don't know from which we allocated
> > > the page at this tracepoint.
> >
> > True, but the idea is that when we call trace_mm_page_alloc(), local
> > var `mask' holds the gfp_t which was used in the most recent allocation
> > attempt.
> >
>
> So if the fastpath succeeds, which should be the majority of the time,
> then we get a tracepoint here that says we allocated with
> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set. So if
> page != NULL, we can know that either the fastpath succeeded or we don't
> have PF_MEMALLOC_NOIO and were allowed to reclaim. Not sure that's very
> helpful.
>
> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we
> clear everything not in gfp_allowed_mask, but that's pointless if the
> fastpath succeeds. I'm not sure it's worth to restructure the code with a
> possible performance overhead for the benefit of a tracepoint.
>
> And then there's the call to lockdep_trace_alloc() which does care about
> __GFP_FS. That looks broken because we need to clear __GFP_FS with
> PF_MEMALLOC_NOIO.
<head spinning>
I'm not particuarly concerned about the tracepoint and we can change
that later. The main intent here is to restore the allocation mask
when __alloc_pages_nodemask() does the "goto retry_cpuset".
(I renamed `mask' to `alloc_mask' and documented it a bit)
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
__alloc_pages_nodemask() strips __GFP_IO when retrying the page
allocation. But it does this by altering the function-wide variable
gfp_mask. This will cause subsequent allocation attempts to inadvertently
use the modified gfp_mask.
Also, pass the correct mask (the mask we actually used) into
trace_mm_page_alloc().
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
+++ a/mm/page_alloc.c
@@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
int classzone_idx;
+ gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
gfp_mask &= gfp_allowed_mask;
@@ -2898,22 +2899,24 @@ retry_cpuset:
classzone_idx = zonelist_zone_idx(preferred_zoneref);
/* First allocation attempt */
- page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
- zonelist, high_zoneidx, alloc_flags,
- preferred_zone, classzone_idx, migratetype);
+ alloc_mask = gfp_mask|__GFP_HARDWALL;
+ page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
+ high_zoneidx, alloc_flags, preferred_zone,
+ classzone_idx, migratetype);
if (unlikely(!page)) {
/*
* Runtime PM, block IO and its error handling path
* can deadlock because I/O on the device might not
* complete.
*/
- gfp_mask = memalloc_noio_flags(gfp_mask);
- page = __alloc_pages_slowpath(gfp_mask, order,
+ alloc_mask = memalloc_noio_flags(gfp_mask);
+
+ page = __alloc_pages_slowpath(alloc_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, classzone_idx, migratetype);
}
- trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+ trace_mm_page_alloc(page, order, alloc_mask, migratetype);
out:
/*
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
2014-12-18 21:10 ` Andrew Morton
@ 2015-01-06 17:58 ` Vlastimil Babka
0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2015-01-06 17:58 UTC (permalink / raw)
To: Andrew Morton, David Rientjes
Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei, Ingo Molnar,
Peter Zijlstra
On 12/18/2014 10:10 PM, Andrew Morton wrote:
> On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
>
>> > > The page allocator slowpath is always called from the fastpath if the
>> > > first allocation didn't succeed, so we don't know from which we allocated
>> > > the page at this tracepoint.
>> >
>> > True, but the idea is that when we call trace_mm_page_alloc(), local
>> > var `mask' holds the gfp_t which was used in the most recent allocation
>> > attempt.
>> >
>>
>> So if the fastpath succeeds, which should be the majority of the time,
>> then we get a tracepoint here that says we allocated with
>> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set. So if
>> page != NULL, we can know that either the fastpath succeeded or we don't
>> have PF_MEMALLOC_NOIO and were allowed to reclaim. Not sure that's very
>> helpful.
>>
>> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we
>> clear everything not in gfp_allowed_mask, but that's pointless if the
>> fastpath succeeds. I'm not sure it's worth to restructure the code with a
>> possible performance overhead for the benefit of a tracepoint.
>>
>> And then there's the call to lockdep_trace_alloc() which does care about
>> __GFP_FS. That looks broken because we need to clear __GFP_FS with
>> PF_MEMALLOC_NOIO.
>
> <head spinning>
>
> I'm not particuarly concerned about the tracepoint and we can change
> that later. The main intent here is to restore the allocation mask
> when __alloc_pages_nodemask() does the "goto retry_cpuset".
Hmm David seems to be correct about the initial get_page_from_freelist() not
caring about __GFP_FS/__GFP_IO, since it won't be reclaiming. Well, there might
be zone reclaim enabled, but __zone_reclaim() seems to be calling
memalloc_noio_flags() itself so it's fine.
Anyway it's subtle and fragile, so even though your patch seems now to really
affect just the tracepoint (which you are not so concerned about), it's IMHO
better than current state. So feel free to add my acked-by.
I also agree that lockdep_trace_alloc() seems broken if it doesn't take
PF_MEMALLOC_NOIO into account, but that could be solved within
lockdep_trace_alloc()? Adding Peter and Ingo to CC.
> (I renamed `mask' to `alloc_mask' and documented it a bit)
>
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>
> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> allocation. But it does this by altering the function-wide variable
> gfp_mask. This will cause subsequent allocation attempts to inadvertently
> use the modified gfp_mask.
>
> Also, pass the correct mask (the mask we actually used) into
> trace_mm_page_alloc().
>
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/page_alloc.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> +++ a/mm/page_alloc.c
> @@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> unsigned int cpuset_mems_cookie;
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> int classzone_idx;
> + gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>
> gfp_mask &= gfp_allowed_mask;
>
> @@ -2898,22 +2899,24 @@ retry_cpuset:
> classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, alloc_flags,
> - preferred_zone, classzone_idx, migratetype);
> + alloc_mask = gfp_mask|__GFP_HARDWALL;
> + page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
> + high_zoneidx, alloc_flags, preferred_zone,
> + classzone_idx, migratetype);
> if (unlikely(!page)) {
> /*
> * Runtime PM, block IO and its error handling path
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_mask = memalloc_noio_flags(gfp_mask);
> - page = __alloc_pages_slowpath(gfp_mask, order,
> + alloc_mask = memalloc_noio_flags(gfp_mask);
> +
> + page = __alloc_pages_slowpath(alloc_mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
>
> - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> + trace_mm_page_alloc(page, order, alloc_mask, migratetype);
>
> out:
> /*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-01-06 17:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-15 23:03 [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask akpm
2014-12-15 23:32 ` Yasuaki Ishimatsu
2014-12-15 23:43 ` Andrew Morton
2014-12-16 0:08 ` Yasuaki Ishimatsu
2014-12-17 10:47 ` Vlastimil Babka
2014-12-18 0:22 ` David Rientjes
2014-12-18 0:29 ` Andrew Morton
2014-12-18 0:51 ` David Rientjes
2014-12-18 21:10 ` Andrew Morton
2015-01-06 17:58 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).