linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
@ 2014-12-15 23:03 akpm
  2014-12-15 23:32 ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 10+ messages in thread
From: akpm @ 2014-12-15 23:03 UTC (permalink / raw)
  To: linux-mm, akpm, hannes, mel, ming.lei

From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask

__alloc_pages_nodemask() strips __GFP_IO when retrying the page
allocation.  But it does this by altering the function-wide variable
gfp_mask.  This will cause subsequent allocation attempts to inadvertently
use the modified gfp_mask.

Cc: Ming Lei <ming.lei@canonical.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
+++ a/mm/page_alloc.c
@@ -2918,8 +2918,9 @@ retry_cpuset:
 		 * can deadlock because I/O on the device might not
 		 * complete.
 		 */
-		gfp_mask = memalloc_noio_flags(gfp_mask);
-		page = __alloc_pages_slowpath(gfp_mask, order,
+		gfp_t mask = memalloc_noio_flags(gfp_mask);
+
+		page = __alloc_pages_slowpath(mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, classzone_idx, migratetype);
 	}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-15 23:03 [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask akpm
@ 2014-12-15 23:32 ` Yasuaki Ishimatsu
  2014-12-15 23:43   ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-15 23:32 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, hannes, mel, ming.lei

(2014/12/16 8:03), akpm@linux-foundation.org wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>
> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> allocation.  But it does this by altering the function-wide variable
> gfp_mask.  This will cause subsequent allocation attempts to inadvertently
> use the modified gfp_mask.
>
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>   mm/page_alloc.c |    5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> +++ a/mm/page_alloc.c
> @@ -2918,8 +2918,9 @@ retry_cpuset:
>   		 * can deadlock because I/O on the device might not
>   		 * complete.
>   		 */
> -		gfp_mask = memalloc_noio_flags(gfp_mask);
> -		page = __alloc_pages_slowpath(gfp_mask, order,

> +		gfp_t mask = memalloc_noio_flags(gfp_mask);
> +
> +		page = __alloc_pages_slowpath(mask, order,
>   				zonelist, high_zoneidx, nodemask,
>   				preferred_zone, classzone_idx, migratetype);
>   	}

After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
gfp_mask.

Thanks,
Yasuaki Ishimatsu

> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-15 23:32 ` Yasuaki Ishimatsu
@ 2014-12-15 23:43   ` Andrew Morton
  2014-12-16  0:08     ` Yasuaki Ishimatsu
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2014-12-15 23:43 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: linux-mm, hannes, mel, ming.lei

On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
> >
> > __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> > allocation.  But it does this by altering the function-wide variable
> > gfp_mask.  This will cause subsequent allocation attempts to inadvertently
> > use the modified gfp_mask.
> >
> > Cc: Ming Lei <ming.lei@canonical.com>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >
> >   mm/page_alloc.c |    5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> > --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> > +++ a/mm/page_alloc.c
> > @@ -2918,8 +2918,9 @@ retry_cpuset:
> >   		 * can deadlock because I/O on the device might not
> >   		 * complete.
> >   		 */
> > -		gfp_mask = memalloc_noio_flags(gfp_mask);
> > -		page = __alloc_pages_slowpath(gfp_mask, order,
> 
> > +		gfp_t mask = memalloc_noio_flags(gfp_mask);
> > +
> > +		page = __alloc_pages_slowpath(mask, order,
> >   				zonelist, high_zoneidx, nodemask,
> >   				preferred_zone, classzone_idx, migratetype);
> >   	}
> 
> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
> gfp_mask.

Well it was already wrong because the first allocation attempt uses
gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.

This?

--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
+++ a/mm/page_alloc.c
@@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 	unsigned int cpuset_mems_cookie;
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
 	int classzone_idx;
+	gfp_t mask;
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2910,23 +2911,24 @@ retry_cpuset:
 	classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
 	/* First allocation attempt */
-	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, classzone_idx, migratetype);
+	mask = gfp_mask|__GFP_HARDWALL;
+	page = get_page_from_freelist(mask, nodemask, order, zonelist,
+			high_zoneidx, alloc_flags, preferred_zone,
+			classzone_idx, migratetype);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.
 		 */
-		gfp_t mask = memalloc_noio_flags(gfp_mask);
+		mask = memalloc_noio_flags(gfp_mask);
 
 		page = __alloc_pages_slowpath(mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, classzone_idx, migratetype);
 	}
 
-	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+	trace_mm_page_alloc(page, order, mask, migratetype);
 
 out:
 	/*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-15 23:43   ` Andrew Morton
@ 2014-12-16  0:08     ` Yasuaki Ishimatsu
  2014-12-17 10:47     ` Vlastimil Babka
  2014-12-18  0:22     ` David Rientjes
  2 siblings, 0 replies; 10+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-16  0:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, hannes, mel, ming.lei

(2014/12/16 8:43), Andrew Morton wrote:
> On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
>>> From: Andrew Morton <akpm@linux-foundation.org>
>>> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>>>
>>> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
>>> allocation.  But it does this by altering the function-wide variable
>>> gfp_mask.  This will cause subsequent allocation attempts to inadvertently
>>> use the modified gfp_mask.
>>>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> Cc: Mel Gorman <mel@csn.ul.ie>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>
>>>    mm/page_alloc.c |    5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
>>> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
>>> +++ a/mm/page_alloc.c
>>> @@ -2918,8 +2918,9 @@ retry_cpuset:
>>>    		 * can deadlock because I/O on the device might not
>>>    		 * complete.
>>>    		 */
>>> -		gfp_mask = memalloc_noio_flags(gfp_mask);
>>> -		page = __alloc_pages_slowpath(gfp_mask, order,
>>
>>> +		gfp_t mask = memalloc_noio_flags(gfp_mask);
>>> +
>>> +		page = __alloc_pages_slowpath(mask, order,
>>>    				zonelist, high_zoneidx, nodemask,
>>>    				preferred_zone, classzone_idx, migratetype);
>>>    	}
>>
>> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
>> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
>> gfp_mask.
>
> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
>
> This?

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks,
Yasuaki Ishimatsu

>
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
>   	unsigned int cpuset_mems_cookie;
>   	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>   	int classzone_idx;
> +	gfp_t mask;
>
>   	gfp_mask &= gfp_allowed_mask;
>
> @@ -2910,23 +2911,24 @@ retry_cpuset:
>   	classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
>   	/* First allocation attempt */
> -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> -			zonelist, high_zoneidx, alloc_flags,
> -			preferred_zone, classzone_idx, migratetype);
> +	mask = gfp_mask|__GFP_HARDWALL;
> +	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> +			high_zoneidx, alloc_flags, preferred_zone,
> +			classzone_idx, migratetype);
>   	if (unlikely(!page)) {
>   		/*
>   		 * Runtime PM, block IO and its error handling path
>   		 * can deadlock because I/O on the device might not
>   		 * complete.
>   		 */
> -		gfp_t mask = memalloc_noio_flags(gfp_mask);
> +		mask = memalloc_noio_flags(gfp_mask);
>
>   		page = __alloc_pages_slowpath(mask, order,
>   				zonelist, high_zoneidx, nodemask,
>   				preferred_zone, classzone_idx, migratetype);
>   	}
>
> -	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> +	trace_mm_page_alloc(page, order, mask, migratetype);
>
>   out:
>   	/*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-15 23:43   ` Andrew Morton
  2014-12-16  0:08     ` Yasuaki Ishimatsu
@ 2014-12-17 10:47     ` Vlastimil Babka
  2014-12-18  0:22     ` David Rientjes
  2 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2014-12-17 10:47 UTC (permalink / raw)
  To: Andrew Morton, Yasuaki Ishimatsu; +Cc: linux-mm, hannes, mel, ming.lei

On 12/16/2014 12:43 AM, Andrew Morton wrote:
> On Tue, 16 Dec 2014 08:32:36 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> (2014/12/16 8:03), akpm@linux-foundation.org wrote:
>>> From: Andrew Morton <akpm@linux-foundation.org>
>>> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>>>
>>> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
>>> allocation.  But it does this by altering the function-wide variable
>>> gfp_mask.  This will cause subsequent allocation attempts to inadvertently
>>> use the modified gfp_mask.
>>>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> Cc: Mel Gorman <mel@csn.ul.ie>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>
>>>    mm/page_alloc.c |    5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
>>> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
>>> +++ a/mm/page_alloc.c
>>> @@ -2918,8 +2918,9 @@ retry_cpuset:
>>>    		 * can deadlock because I/O on the device might not
>>>    		 * complete.
>>>    		 */
>>> -		gfp_mask = memalloc_noio_flags(gfp_mask);
>>> -		page = __alloc_pages_slowpath(gfp_mask, order,
>>
>>> +		gfp_t mask = memalloc_noio_flags(gfp_mask);
>>> +
>>> +		page = __alloc_pages_slowpath(mask, order,
>>>    				zonelist, high_zoneidx, nodemask,
>>>    				preferred_zone, classzone_idx, migratetype);
>>>    	}
>>
>> After allocating page, trace_mm_page_alloc(page, order, gfp_mask, migratetype)
>> is called. But mask is not passed to it. So trace_mm_page_alloc traces wrong
>> gfp_mask.
>
> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.

If we wanted to be 100% correct with the tracepoint, then there's also 
__alloc_pages_may_oom(), which also appends __GFP_HARDWALL on the fly

         page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,

But fixing that would be ugly :/ I guess it's not worth the trouble.

> This?
>
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
>   	unsigned int cpuset_mems_cookie;
>   	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>   	int classzone_idx;
> +	gfp_t mask;
>
>   	gfp_mask &= gfp_allowed_mask;
>
> @@ -2910,23 +2911,24 @@ retry_cpuset:
>   	classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
>   	/* First allocation attempt */
> -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> -			zonelist, high_zoneidx, alloc_flags,
> -			preferred_zone, classzone_idx, migratetype);
> +	mask = gfp_mask|__GFP_HARDWALL;
> +	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> +			high_zoneidx, alloc_flags, preferred_zone,
> +			classzone_idx, migratetype);
>   	if (unlikely(!page)) {
>   		/*
>   		 * Runtime PM, block IO and its error handling path
>   		 * can deadlock because I/O on the device might not
>   		 * complete.
>   		 */
> -		gfp_t mask = memalloc_noio_flags(gfp_mask);
> +		mask = memalloc_noio_flags(gfp_mask);
>
>   		page = __alloc_pages_slowpath(mask, order,
>   				zonelist, high_zoneidx, nodemask,
>   				preferred_zone, classzone_idx, migratetype);
>   	}
>
> -	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> +	trace_mm_page_alloc(page, order, mask, migratetype);
>
>   out:
>   	/*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-15 23:43   ` Andrew Morton
  2014-12-16  0:08     ` Yasuaki Ishimatsu
  2014-12-17 10:47     ` Vlastimil Babka
@ 2014-12-18  0:22     ` David Rientjes
  2014-12-18  0:29       ` Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-12-18  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei

On Mon, 15 Dec 2014, Andrew Morton wrote:

> Well it was already wrong because the first allocation attempt uses
> gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
> 
> This?
> 
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> +++ a/mm/page_alloc.c
> @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
>  	unsigned int cpuset_mems_cookie;
>  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>  	int classzone_idx;
> +	gfp_t mask;
>  
>  	gfp_mask &= gfp_allowed_mask;
>  
> @@ -2910,23 +2911,24 @@ retry_cpuset:
>  	classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  
>  	/* First allocation attempt */
> -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> -			zonelist, high_zoneidx, alloc_flags,
> -			preferred_zone, classzone_idx, migratetype);
> +	mask = gfp_mask|__GFP_HARDWALL;
> +	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> +			high_zoneidx, alloc_flags, preferred_zone,
> +			classzone_idx, migratetype);
>  	if (unlikely(!page)) {
>  		/*
>  		 * Runtime PM, block IO and its error handling path
>  		 * can deadlock because I/O on the device might not
>  		 * complete.
>  		 */
> -		gfp_t mask = memalloc_noio_flags(gfp_mask);
> +		mask = memalloc_noio_flags(gfp_mask);
>  
>  		page = __alloc_pages_slowpath(mask, order,
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, classzone_idx, migratetype);
>  	}
>  
> -	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> +	trace_mm_page_alloc(page, order, mask, migratetype);
>  
>  out:
>  	/*

I'm not sure I understand why we need a local variable to hold the context 
mask vs. what was passed to the function.  We should only be allocating 
with a single gfp_mask that is passed to the function and modify it as 
necessary, and that becomes the context mask that can be traced.

The above is wrong because it unconditionally sets __GFP_HARDWALL as the 
gfp mask for __alloc_pages_slowpath() when we actually only want that for 
the first allocation attempt, it's needed for the implementation of 
__cpuset_node_allowed().

The page allocator slowpath is always called from the fastpath if the 
first allocation didn't succeed, so we don't know from which we allocated 
the page at this tracepoint.

I'm afraid the original code before either of these patches was more 
correct.  The use of memalloc_noio_flags() for "subsequent allocation 
attempts" doesn't really matter since neither __GFP_FS nor __GFP_IO 
matters for fastpath allocation (we aren't reclaiming).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-18  0:22     ` David Rientjes
@ 2014-12-18  0:29       ` Andrew Morton
  2014-12-18  0:51         ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-12-18  0:29 UTC (permalink / raw)
  To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei

On Wed, 17 Dec 2014 16:22:30 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> On Mon, 15 Dec 2014, Andrew Morton wrote:
> 
> > Well it was already wrong because the first allocation attempt uses
> > gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask.
> > 
> > This?
> > 
> > --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix
> > +++ a/mm/page_alloc.c
> > @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> >  	unsigned int cpuset_mems_cookie;
> >  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> >  	int classzone_idx;
> > +	gfp_t mask;
> >  
> >  	gfp_mask &= gfp_allowed_mask;
> >  
> > @@ -2910,23 +2911,24 @@ retry_cpuset:
> >  	classzone_idx = zonelist_zone_idx(preferred_zoneref);
> >  
> >  	/* First allocation attempt */
> > -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> > -			zonelist, high_zoneidx, alloc_flags,
> > -			preferred_zone, classzone_idx, migratetype);
> > +	mask = gfp_mask|__GFP_HARDWALL;
> > +	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> > +			high_zoneidx, alloc_flags, preferred_zone,
> > +			classzone_idx, migratetype);
> >  	if (unlikely(!page)) {
> >  		/*
> >  		 * Runtime PM, block IO and its error handling path
> >  		 * can deadlock because I/O on the device might not
> >  		 * complete.
> >  		 */
> > -		gfp_t mask = memalloc_noio_flags(gfp_mask);
> > +		mask = memalloc_noio_flags(gfp_mask);
> >  
> >  		page = __alloc_pages_slowpath(mask, order,
> >  				zonelist, high_zoneidx, nodemask,
> >  				preferred_zone, classzone_idx, migratetype);
> >  	}
> >  
> > -	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> > +	trace_mm_page_alloc(page, order, mask, migratetype);
> >  
> >  out:
> >  	/*
> 
> I'm not sure I understand why we need a local variable to hold the context 
> mask vs. what was passed to the function.  We should only be allocating 
> with a single gfp_mask that is passed to the function and modify it as 
> necessary, and that becomes the context mask that can be traced.
> 
> The above is wrong because it unconditionally sets __GFP_HARDWALL as the 
> gfp mask for __alloc_pages_slowpath() when we actually only want that for 
> the first allocation attempt, it's needed for the implementation of 
> __cpuset_node_allowed().

no,

: 	/* First allocation attempt */
: 	mask = gfp_mask|__GFP_HARDWALL;
: 	page = get_page_from_freelist(mask, nodemask, order, zonelist,
: 			high_zoneidx, alloc_flags, preferred_zone,
: 			classzone_idx, migratetype);
: 	if (unlikely(!page)) {
: 		/*
: 		 * Runtime PM, block IO and its error handling path
: 		 * can deadlock because I/O on the device might not
: 		 * complete.
: 		 */
: 		mask = memalloc_noio_flags(gfp_mask);

^^ this

: 		page = __alloc_pages_slowpath(mask, order,
: 				zonelist, high_zoneidx, nodemask,
: 				preferred_zone, classzone_idx, migratetype);
: 	}
: 
: 	trace_mm_page_alloc(page, order, mask, migratetype);

> The page allocator slowpath is always called from the fastpath if the 
> first allocation didn't succeed, so we don't know from which we allocated 
> the page at this tracepoint.

True, but the idea is that when we call trace_mm_page_alloc(), local
var `mask' holds the gfp_t which was used in the most recent allocation
attempt.

> I'm afraid the original code before either of these patches was more 
> correct.  The use of memalloc_noio_flags() for "subsequent allocation 
> attempts" doesn't really matter since neither __GFP_FS nor __GFP_IO 
> matters for fastpath allocation (we aren't reclaiming).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-18  0:29       ` Andrew Morton
@ 2014-12-18  0:51         ` David Rientjes
  2014-12-18 21:10           ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-12-18  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei

On Wed, 17 Dec 2014, Andrew Morton wrote:

> > The above is wrong because it unconditionally sets __GFP_HARDWALL as the 
> > gfp mask for __alloc_pages_slowpath() when we actually only want that for 
> > the first allocation attempt, it's needed for the implementation of 
> > __cpuset_node_allowed().
> 
> no,
> 
> : 	/* First allocation attempt */
> : 	mask = gfp_mask|__GFP_HARDWALL;
> : 	page = get_page_from_freelist(mask, nodemask, order, zonelist,
> : 			high_zoneidx, alloc_flags, preferred_zone,
> : 			classzone_idx, migratetype);
> : 	if (unlikely(!page)) {
> : 		/*
> : 		 * Runtime PM, block IO and its error handling path
> : 		 * can deadlock because I/O on the device might not
> : 		 * complete.
> : 		 */
> : 		mask = memalloc_noio_flags(gfp_mask);
> 
> ^^ this
> 
> : 		page = __alloc_pages_slowpath(mask, order,
> : 				zonelist, high_zoneidx, nodemask,
> : 				preferred_zone, classzone_idx, migratetype);
> : 	}
> : 
> : 	trace_mm_page_alloc(page, order, mask, migratetype);
> 

Sorry, I should have applied the patch locally to look at it.

> > The page allocator slowpath is always called from the fastpath if the 
> > first allocation didn't succeed, so we don't know from which we allocated 
> > the page at this tracepoint.
> 
> True, but the idea is that when we call trace_mm_page_alloc(), local
> var `mask' holds the gfp_t which was used in the most recent allocation
> attempt.
> 

So if the fastpath succeeds, which should be the majority of the time, 
then we get a tracepoint here that says we allocated with 
__GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set.  So if 
page != NULL, we can know that either the fastpath succeeded or we don't 
have PF_MEMALLOC_NOIO and were allowed to reclaim.  Not sure that's very 
helpful.

Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we 
clear everything not in gfp_allowed_mask, but that's pointless if the 
fastpath succeeds.  I'm not sure it's worth to restructure the code with a 
possible performance overhead for the benefit of a tracepoint.

And then there's the call to lockdep_trace_alloc() which does care about 
__GFP_FS.  That looks broken because we need to clear __GFP_FS with 
PF_MEMALLOC_NOIO.

I think that should be

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2871,7 +2873,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
 	gfp_mask &= gfp_allowed_mask;
 
-	lockdep_trace_alloc(gfp_mask);
+	lockdep_trace_alloc(memalloc_noio_flags(gfp_mask));
 
 	might_sleep_if(gfp_mask & __GFP_WAIT);
 
Wow.  I think we should just trace gfp_mask & gfp_allowed_mask and pass a 
bit to say whether PF_MEMALLOC_NOIO is set or not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-18  0:51         ` David Rientjes
@ 2014-12-18 21:10           ` Andrew Morton
  2015-01-06 17:58             ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-12-18 21:10 UTC (permalink / raw)
  To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei

On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> > > The page allocator slowpath is always called from the fastpath if the 
> > > first allocation didn't succeed, so we don't know from which we allocated 
> > > the page at this tracepoint.
> > 
> > True, but the idea is that when we call trace_mm_page_alloc(), local
> > var `mask' holds the gfp_t which was used in the most recent allocation
> > attempt.
> > 
> 
> So if the fastpath succeeds, which should be the majority of the time, 
> then we get a tracepoint here that says we allocated with 
> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set.  So if 
> page != NULL, we can know that either the fastpath succeeded or we don't 
> have PF_MEMALLOC_NOIO and were allowed to reclaim.  Not sure that's very 
> helpful.
> 
> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we 
> clear everything not in gfp_allowed_mask, but that's pointless if the 
> fastpath succeeds.  I'm not sure it's worth to restructure the code with a 
> possible performance overhead for the benefit of a tracepoint.
> 
> And then there's the call to lockdep_trace_alloc() which does care about 
> __GFP_FS.  That looks broken because we need to clear __GFP_FS with 
> PF_MEMALLOC_NOIO.

<head spinning>

I'm not particuarly concerned about the tracepoint and we can change
that later.  The main intent here is to restore the allocation mask
when __alloc_pages_nodemask() does the "goto retry_cpuset".

(I renamed `mask' to `alloc_mask' and documented it a bit)



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask

__alloc_pages_nodemask() strips __GFP_IO when retrying the page
allocation.  But it does this by altering the function-wide variable
gfp_mask.  This will cause subsequent allocation attempts to inadvertently
use the modified gfp_mask.

Also, pass the correct mask (the mask we actually used) into
trace_mm_page_alloc().

Cc: Ming Lei <ming.lei@canonical.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
+++ a/mm/page_alloc.c
@@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 	unsigned int cpuset_mems_cookie;
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
 	int classzone_idx;
+	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2898,22 +2899,24 @@ retry_cpuset:
 	classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
 	/* First allocation attempt */
-	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, classzone_idx, migratetype);
+	alloc_mask = gfp_mask|__GFP_HARDWALL;
+	page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
+			high_zoneidx, alloc_flags, preferred_zone,
+			classzone_idx, migratetype);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.
 		 */
-		gfp_mask = memalloc_noio_flags(gfp_mask);
-		page = __alloc_pages_slowpath(gfp_mask, order,
+		alloc_mask = memalloc_noio_flags(gfp_mask);
+
+		page = __alloc_pages_slowpath(alloc_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, classzone_idx, migratetype);
 	}
 
-	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+	trace_mm_page_alloc(page, order, alloc_mask, migratetype);
 
 out:
 	/*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
  2014-12-18 21:10           ` Andrew Morton
@ 2015-01-06 17:58             ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2015-01-06 17:58 UTC (permalink / raw)
  To: Andrew Morton, David Rientjes
  Cc: Yasuaki Ishimatsu, linux-mm, hannes, mel, ming.lei, Ingo Molnar,
	Peter Zijlstra

On 12/18/2014 10:10 PM, Andrew Morton wrote:
> On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
> 
>> > > The page allocator slowpath is always called from the fastpath if the 
>> > > first allocation didn't succeed, so we don't know from which we allocated 
>> > > the page at this tracepoint.
>> > 
>> > True, but the idea is that when we call trace_mm_page_alloc(), local
>> > var `mask' holds the gfp_t which was used in the most recent allocation
>> > attempt.
>> > 
>> 
>> So if the fastpath succeeds, which should be the majority of the time, 
>> then we get a tracepoint here that says we allocated with 
>> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set.  So if 
>> page != NULL, we can know that either the fastpath succeeded or we don't 
>> have PF_MEMALLOC_NOIO and were allowed to reclaim.  Not sure that's very 
>> helpful.
>> 
>> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we 
>> clear everything not in gfp_allowed_mask, but that's pointless if the 
>> fastpath succeeds.  I'm not sure it's worth to restructure the code with a 
>> possible performance overhead for the benefit of a tracepoint.
>> 
>> And then there's the call to lockdep_trace_alloc() which does care about 
>> __GFP_FS.  That looks broken because we need to clear __GFP_FS with 
>> PF_MEMALLOC_NOIO.
> 
> <head spinning>
> 
> I'm not particuarly concerned about the tracepoint and we can change
> that later.  The main intent here is to restore the allocation mask
> when __alloc_pages_nodemask() does the "goto retry_cpuset".

Hmm David seems to be correct about the initial get_page_from_freelist() not
caring about __GFP_FS/__GFP_IO, since it won't be reclaiming. Well, there might
be zone reclaim enabled, but __zone_reclaim() seems to be calling
memalloc_noio_flags() itself so it's fine.

Anyway it's subtle and fragile, so even though your patch seems now to really
affect just the tracepoint (which you are not so concerned about), it's IMHO
better than current state. So feel free to add my acked-by.

I also agree that lockdep_trace_alloc() seems broken if it doesn't take
PF_MEMALLOC_NOIO into account, but that could be solved within
lockdep_trace_alloc()? Adding Peter and Ingo to CC.

> (I renamed `mask' to `alloc_mask' and documented it a bit)
> 
> 
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
> 
> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> allocation.  But it does this by altering the function-wide variable
> gfp_mask.  This will cause subsequent allocation attempts to inadvertently
> use the modified gfp_mask.
> 
> Also, pass the correct mask (the mask we actually used) into
> trace_mm_page_alloc().
> 
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/page_alloc.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> +++ a/mm/page_alloc.c
> @@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
>  	unsigned int cpuset_mems_cookie;
>  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>  	int classzone_idx;
> +	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>  
>  	gfp_mask &= gfp_allowed_mask;
>  
> @@ -2898,22 +2899,24 @@ retry_cpuset:
>  	classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  
>  	/* First allocation attempt */
> -	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> -			zonelist, high_zoneidx, alloc_flags,
> -			preferred_zone, classzone_idx, migratetype);
> +	alloc_mask = gfp_mask|__GFP_HARDWALL;
> +	page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
> +			high_zoneidx, alloc_flags, preferred_zone,
> +			classzone_idx, migratetype);
>  	if (unlikely(!page)) {
>  		/*
>  		 * Runtime PM, block IO and its error handling path
>  		 * can deadlock because I/O on the device might not
>  		 * complete.
>  		 */
> -		gfp_mask = memalloc_noio_flags(gfp_mask);
> -		page = __alloc_pages_slowpath(gfp_mask, order,
> +		alloc_mask = memalloc_noio_flags(gfp_mask);
> +
> +		page = __alloc_pages_slowpath(alloc_mask, order,
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, classzone_idx, migratetype);
>  	}
>  
> -	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> +	trace_mm_page_alloc(page, order, alloc_mask, migratetype);
>  
>  out:
>  	/*
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-06 17:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-15 23:03 [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask akpm
2014-12-15 23:32 ` Yasuaki Ishimatsu
2014-12-15 23:43   ` Andrew Morton
2014-12-16  0:08     ` Yasuaki Ishimatsu
2014-12-17 10:47     ` Vlastimil Babka
2014-12-18  0:22     ` David Rientjes
2014-12-18  0:29       ` Andrew Morton
2014-12-18  0:51         ` David Rientjes
2014-12-18 21:10           ` Andrew Morton
2015-01-06 17:58             ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).