From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
linux-mm@kvack.org, hannes@cmpxchg.org, mel@csn.ul.ie,
ming.lei@canonical.com, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
Date: Tue, 06 Jan 2015 18:58:23 +0100 [thread overview]
Message-ID: <54AC223F.80307@suse.cz> (raw)
In-Reply-To: <20141218131041.76391e96a6bd8b071db45962@linux-foundation.org>
On 12/18/2014 10:10 PM, Andrew Morton wrote:
> On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
>
>> > > The page allocator slowpath is always called from the fastpath if the
>> > > first allocation didn't succeed, so we don't know from which we allocated
>> > > the page at this tracepoint.
>> >
>> > True, but the idea is that when we call trace_mm_page_alloc(), local
>> > var `mask' holds the gfp_t which was used in the most recent allocation
>> > attempt.
>> >
>>
>> So if the fastpath succeeds, which should be the majority of the time,
>> then we get a tracepoint here that says we allocated with
>> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set. So if
>> page != NULL, we can know that either the fastpath succeeded or we don't
>> have PF_MEMALLOC_NOIO and were allowed to reclaim. Not sure that's very
>> helpful.
>>
>> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we
>> clear everything not in gfp_allowed_mask, but that's pointless if the
>> fastpath succeeds. I'm not sure it's worth to restructure the code with a
>> possible performance overhead for the benefit of a tracepoint.
>>
>> And then there's the call to lockdep_trace_alloc() which does care about
>> __GFP_FS. That looks broken because we need to clear __GFP_FS with
>> PF_MEMALLOC_NOIO.
>
> <head spinning>
>
> I'm not particuarly concerned about the tracepoint and we can change
> that later. The main intent here is to restore the allocation mask
> when __alloc_pages_nodemask() does the "goto retry_cpuset".
Hmm David seems to be correct about the initial get_page_from_freelist() not
caring about __GFP_FS/__GFP_IO, since it won't be reclaiming. Well, there might
be zone reclaim enabled, but __zone_reclaim() seems to be calling
memalloc_noio_flags() itself so it's fine.
Anyway it's subtle and fragile, so even though your patch seems now to really
affect just the tracepoint (which you are not so concerned about), it's IMHO
better than current state. So feel free to add my acked-by.
I also agree that lockdep_trace_alloc() seems broken if it doesn't take
PF_MEMALLOC_NOIO into account, but that could be solved within
lockdep_trace_alloc()? Adding Peter and Ingo to CC.
> (I renamed `mask' to `alloc_mask' and documented it a bit)
>
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
>
> __alloc_pages_nodemask() strips __GFP_IO when retrying the page
> allocation. But it does this by altering the function-wide variable
> gfp_mask. This will cause subsequent allocation attempts to inadvertently
> use the modified gfp_mask.
>
> Also, pass the correct mask (the mask we actually used) into
> trace_mm_page_alloc().
>
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/page_alloc.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
> +++ a/mm/page_alloc.c
> @@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
> unsigned int cpuset_mems_cookie;
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> int classzone_idx;
> + gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>
> gfp_mask &= gfp_allowed_mask;
>
> @@ -2898,22 +2899,24 @@ retry_cpuset:
> classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, alloc_flags,
> - preferred_zone, classzone_idx, migratetype);
> + alloc_mask = gfp_mask|__GFP_HARDWALL;
> + page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
> + high_zoneidx, alloc_flags, preferred_zone,
> + classzone_idx, migratetype);
> if (unlikely(!page)) {
> /*
> * Runtime PM, block IO and its error handling path
> * can deadlock because I/O on the device might not
> * complete.
> */
> - gfp_mask = memalloc_noio_flags(gfp_mask);
> - page = __alloc_pages_slowpath(gfp_mask, order,
> + alloc_mask = memalloc_noio_flags(gfp_mask);
> +
> + page = __alloc_pages_slowpath(alloc_mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, classzone_idx, migratetype);
> }
>
> - trace_mm_page_alloc(page, order, gfp_mask, migratetype);
> + trace_mm_page_alloc(page, order, alloc_mask, migratetype);
>
> out:
> /*
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2015-01-06 17:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-15 23:03 [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask akpm
2014-12-15 23:32 ` Yasuaki Ishimatsu
2014-12-15 23:43 ` Andrew Morton
2014-12-16 0:08 ` Yasuaki Ishimatsu
2014-12-17 10:47 ` Vlastimil Babka
2014-12-18 0:22 ` David Rientjes
2014-12-18 0:29 ` Andrew Morton
2014-12-18 0:51 ` David Rientjes
2014-12-18 21:10 ` Andrew Morton
2015-01-06 17:58 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54AC223F.80307@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=ming.lei@canonical.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).