Re: [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
Date: Wed, 18 Nov 2015 15:15:29 +0100	[thread overview]
Message-ID: <564C8801.2090202@suse.cz> (raw)
In-Reply-To: <20151110125101.GA8440@dhcp22.suse.cz>

On 11/10/2015 01:51 PM, Michal Hocko wrote:
> On Mon 09-11-15 23:04:15, Vlastimil Babka wrote:
>> On 5.11.2015 17:15, mhocko@kernel.org wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> > 
>> > __GFP_REPEAT has a rather weak semantic but since it has been introduced
>> > around 2.6.12 it has been ignored for low order allocations. Yet we have
>> > the full kernel tree with its usage for apparently order-0 allocations.
>> > This is really confusing because __GFP_REPEAT is explicitly documented
>> > to allow allocation failures which is a weaker semantic than the current
>> > order-0 has (basically nofail).
>> > 
>> > Let's simply reap out __GFP_REPEAT from those places. This would allow
>> > to identify place which really need allocator to retry harder and
>> > formulate a more specific semantic for what the flag is supposed to do
>> > actually.
>> 
>> So at first I thought "yeah that's obvious", but then after some more thinking,
>> I'm not so sure anymore.
> 
> Thanks for looking into this! The primary purpose of this patch series was
> to start the discussion. I've only now realized I forgot to add RFC, sorry
> about that.
> 
>> I think we should formulate the semantic first, then do any changes. Also, let's
>> look at the flag description (which comes from pre-git):
> 
> It's rather hard to formulate one without examining the current users...

Sure, but changing existing users is a different thing :)

>>  * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
>>  * _might_ fail.  This depends upon the particular VM implementation.
>> 
>> So we say it's implementation detail, and IIRC the same is said about which
>> orders are considered costly and which not, and the associated rules. So, can we
>> blame callers that happen to use __GFP_REPEAT essentially as a no-op in the
>> current implementation? And is it a problem that they do that?
> 
> Well, I think that many users simply copy&pasted the code along with the
> flag. I have failed to find any justification for adding this flag for
> basically all the cases I've checked.
> 
> My understanding is that the overal motivation for the flag was to
> fortify the allocation requests rather than weaken them. But if we were
> literal then __GFP_REPEAT is in fact weaker than GFP_KERNEL for lower
> orders. It is true that the later one is so only implicitly - and as an
> implementation detail.

OK I admit I didn't realize fully that __GFP_REPEAT is supposed to be weaker,
although you did write it quite explicitly in the changelog. It's just
completely counterintuitive given the name of the flag!

> Anyway I think that getting rid of those users which couldn't ever see a
> difference is a good start.
> 
>> So I think we should answer the following questions:
>> 
>> * What is the semantic of __GFP_REPEAT?
>>   - My suggestion would be something like "I would really like this allocation
>> to succeed. I still have some fallback but it's so suboptimal I'd rather wait
>> for this allocation."
> 
> This is very close to the current wording.
> 
>> And then we could e.g. change some heuristics to take that
>> into account - e.g. direct compaction could ignore the deferred state and
>> pageblock skip bits, to make sure it's as thorough as possible. Right now, that
>> sort of happens, but not quite - given enough reclaim/compact attempts, the
>> compact attempts might break out of deferred state. But pages_reclaimed might
>> reach 1 << order before compaction "undefers", and then it breaks out of the
>> loop. Is any such heuristic change possible for reclaim as well?
> 
> I am not familiar with the compaction code enough to comment on this but
> the reclaim part is already having something in should_continue_reclaim.

Yeah in this function having the flag means to try harder. Which is
counterintuitive to the "allow failure" semantics.

> For low order allocations this doesn't make too much of a difference
> because the reclaim is retried anyway from the page allocator path.
> 
>> As part of this question we should also keep in mind/rethink __GFP_NORETRY as
>> that's supposed to be the opposite flag to __GFP_REPEAT.
>> 
>> * Can it ever happen that __GFP_REPEAT could make some difference for order-0?
> 
> Yes, if you want to try hard but eventually allow to fail the request.

Right, I missed the "eventually fail" part.

> Why just not use __GFP_NORETRY for that purpose? Well, that context is
> much weaker. We give up after the first round of the reclaim and we do
> not trigger the OOM killer in that context. __GFP_REPEAT should be about
> finit retrying.
> 
> I am pretty sure there are users who would like to have this semantic.
> None of the current low-order users seem to fall into this cathegory
> AFAICS though.
> 
>>   - Certainly not wrt compaction, how about reclaim?
> 
> We can decide to allow the allocation to fail if reclaim progress was
> not sufficient _and_ the OOM killer haven't helped rather than start
> looping again.

Makes sense.

>>   - If it couldn't possibly affect order-0, then yeah proceed with Patch 1.
> 
> I've split up obviously order-0 from the rest because I think order-0
> are really easy to understand. Patch 1 is a bit harder to grasp but I
> think it should be safe as well. I am open to discussion of course.
> 
>> * Is PAGE_ALLOC_COSTLY_ORDER considered an implementation detail?
> 
> Yes, more or less, but I doubt we can change it much considering all the
> legacy code which might rely on it. I think we should simply remove the
> dependency on the order and act for all orders same semantically.

That's a nice goal, but probably a long-term one? Looks like making failure
possibility the default isn't viable. So that leaves us with making non-failure
the default. Which means checking all currently-costly-oder allocating sites to
pass some of the flags allowing failure.

>>   - I would think so, and if yes, then we probably shouldn't remove
>> __GFP_NORETRY for order-1+ allocations that happen to be not costly in the
>> current implementation?
> 
> I guess you meant __GFP_REPEAT here

Ah, yes.

> but even then we should really think
> about the reason why the flag has been added. Is it to fortify the
> request? If yes it never worked that way so it is hard to justify it
> that way.

Yep, we should definitely think of a less misleading name.

> Thanks!
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
Date: Wed, 18 Nov 2015 15:15:29 +0100	[thread overview]
Message-ID: <564C8801.2090202@suse.cz> (raw)
In-Reply-To: <20151110125101.GA8440@dhcp22.suse.cz>

On 11/10/2015 01:51 PM, Michal Hocko wrote:
> On Mon 09-11-15 23:04:15, Vlastimil Babka wrote:
>> On 5.11.2015 17:15, mhocko@kernel.org wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> > 
>> > __GFP_REPEAT has a rather weak semantic but since it has been introduced
>> > around 2.6.12 it has been ignored for low order allocations. Yet we have
>> > the full kernel tree with its usage for apparently order-0 allocations.
>> > This is really confusing because __GFP_REPEAT is explicitly documented
>> > to allow allocation failures which is a weaker semantic than the current
>> > order-0 has (basically nofail).
>> > 
>> > Let's simply reap out __GFP_REPEAT from those places. This would allow
>> > to identify place which really need allocator to retry harder and
>> > formulate a more specific semantic for what the flag is supposed to do
>> > actually.
>> 
>> So at first I thought "yeah that's obvious", but then after some more thinking,
>> I'm not so sure anymore.
> 
> Thanks for looking into this! The primary purpose of this patch series was
> to start the discussion. I've only now realized I forgot to add RFC, sorry
> about that.
> 
>> I think we should formulate the semantic first, then do any changes. Also, let's
>> look at the flag description (which comes from pre-git):
> 
> It's rather hard to formulate one without examining the current users...

Sure, but changing existing users is a different thing :)

>>  * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
>>  * _might_ fail.  This depends upon the particular VM implementation.
>> 
>> So we say it's implementation detail, and IIRC the same is said about which
>> orders are considered costly and which not, and the associated rules. So, can we
>> blame callers that happen to use __GFP_REPEAT essentially as a no-op in the
>> current implementation? And is it a problem that they do that?
> 
> Well, I think that many users simply copy&pasted the code along with the
> flag. I have failed to find any justification for adding this flag for
> basically all the cases I've checked.
> 
> My understanding is that the overal motivation for the flag was to
> fortify the allocation requests rather than weaken them. But if we were
> literal then __GFP_REPEAT is in fact weaker than GFP_KERNEL for lower
> orders. It is true that the later one is so only implicitly - and as an
> implementation detail.

OK I admit I didn't realize fully that __GFP_REPEAT is supposed to be weaker,
although you did write it quite explicitly in the changelog. It's just
completely counterintuitive given the name of the flag!

> Anyway I think that getting rid of those users which couldn't ever see a
> difference is a good start.
> 
>> So I think we should answer the following questions:
>> 
>> * What is the semantic of __GFP_REPEAT?
>>   - My suggestion would be something like "I would really like this allocation
>> to succeed. I still have some fallback but it's so suboptimal I'd rather wait
>> for this allocation."
> 
> This is very close to the current wording.
> 
>> And then we could e.g. change some heuristics to take that
>> into account - e.g. direct compaction could ignore the deferred state and
>> pageblock skip bits, to make sure it's as thorough as possible. Right now, that
>> sort of happens, but not quite - given enough reclaim/compact attempts, the
>> compact attempts might break out of deferred state. But pages_reclaimed might
>> reach 1 << order before compaction "undefers", and then it breaks out of the
>> loop. Is any such heuristic change possible for reclaim as well?
> 
> I am not familiar with the compaction code enough to comment on this but
> the reclaim part is already having something in should_continue_reclaim.

Yeah in this function having the flag means to try harder. Which is
counterintuitive to the "allow failure" semantics.

> For low order allocations this doesn't make too much of a difference
> because the reclaim is retried anyway from the page allocator path.
> 
>> As part of this question we should also keep in mind/rethink __GFP_NORETRY as
>> that's supposed to be the opposite flag to __GFP_REPEAT.
>> 
>> * Can it ever happen that __GFP_REPEAT could make some difference for order-0?
> 
> Yes, if you want to try hard but eventually allow to fail the request.

Right, I missed the "eventually fail" part.

> Why just not use __GFP_NORETRY for that purpose? Well, that context is
> much weaker. We give up after the first round of the reclaim and we do
> not trigger the OOM killer in that context. __GFP_REPEAT should be about
> finit retrying.
> 
> I am pretty sure there are users who would like to have this semantic.
> None of the current low-order users seem to fall into this cathegory
> AFAICS though.
> 
>>   - Certainly not wrt compaction, how about reclaim?
> 
> We can decide to allow the allocation to fail if reclaim progress was
> not sufficient _and_ the OOM killer haven't helped rather than start
> looping again.

Makes sense.

>>   - If it couldn't possibly affect order-0, then yeah proceed with Patch 1.
> 
> I've split up obviously order-0 from the rest because I think order-0
> are really easy to understand. Patch 1 is a bit harder to grasp but I
> think it should be safe as well. I am open to discussion of course.
> 
>> * Is PAGE_ALLOC_COSTLY_ORDER considered an implementation detail?
> 
> Yes, more or less, but I doubt we can change it much considering all the
> legacy code which might rely on it. I think we should simply remove the
> dependency on the order and act for all orders same semantically.

That's a nice goal, but probably a long-term one? Looks like making failure
possibility the default isn't viable. So that leaves us with making non-failure
the default. Which means checking all currently-costly-oder allocating sites to
pass some of the flags allowing failure.

>>   - I would think so, and if yes, then we probably shouldn't remove
>> __GFP_NORETRY for order-1+ allocations that happen to be not costly in the
>> current implementation?
> 
> I guess you meant __GFP_REPEAT here

Ah, yes.

> but even then we should really think
> about the reason why the flag has been added. Is it to fortify the
> request? If yes it never worked that way so it is hard to justify it
> that way.

Yep, we should definitely think of a less misleading name.

> Thanks!
>

next prev parent reply	other threads:[~2015-11-18 14:15 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-05 16:15 [PATCH 0/3] __GFP_REPEAT cleanup mhocko
2015-11-05 16:15 ` mhocko
2015-11-05 16:15 ` [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I mhocko
2015-11-05 16:15   ` mhocko
2015-11-09 22:04   ` Vlastimil Babka
2015-11-09 22:04     ` Vlastimil Babka
2015-11-10 12:51     ` Michal Hocko
2015-11-10 12:51       ` Michal Hocko
2015-11-18 14:15       ` Vlastimil Babka [this message]
2015-11-18 14:15         ` Vlastimil Babka
2015-11-27  9:38         ` Michal Hocko
2015-11-27  9:38           ` Michal Hocko
2015-11-28 10:08           ` Michal Hocko
2015-11-28 10:08             ` Michal Hocko
2015-11-30 17:02           ` Vlastimil Babka
2015-11-30 17:02             ` Vlastimil Babka
2015-12-01 16:27             ` Michal Hocko
2015-12-01 16:27               ` Michal Hocko
2015-12-21 12:18               ` Vlastimil Babka
2015-12-21 12:18                 ` Vlastimil Babka
2015-11-05 16:15 ` [PATCH 2/3] tree wide: get rid of __GFP_REPEAT for small order requests mhocko
2015-11-05 16:15   ` mhocko
2015-11-05 16:15   ` mhocko
2015-11-05 16:16 ` [PATCH 3/3] jbd2: get rid of superfluous __GFP_REPEAT mhocko
2015-11-05 16:16   ` mhocko
2015-11-06 16:17   ` [PATCH] " mhocko
2015-11-06 16:17     ` mhocko
2015-11-07  1:22     ` Tetsuo Handa
2015-11-07  1:22       ` Tetsuo Handa
2015-11-08  5:08       ` Theodore Ts'o
2015-11-08  5:08         ` Theodore Ts'o
2015-11-09  8:16         ` Michal Hocko
2015-11-09  8:16           ` Michal Hocko
2015-11-26 15:10           ` Michal Hocko
2015-11-26 15:10             ` Michal Hocko
2015-11-26 20:18             ` Theodore Ts'o
2015-11-26 20:18               ` Theodore Ts'o
2015-11-27  7:56               ` Michal Hocko
2015-11-27  7:56                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564C8801.2090202@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.