linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
Date: Wed, 18 Nov 2015 15:15:29 +0100	[thread overview]
Message-ID: <564C8801.2090202@suse.cz> (raw)
In-Reply-To: <20151110125101.GA8440@dhcp22.suse.cz>

On 11/10/2015 01:51 PM, Michal Hocko wrote:
> On Mon 09-11-15 23:04:15, Vlastimil Babka wrote:
>> On 5.11.2015 17:15, mhocko@kernel.org wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> > 
>> > __GFP_REPEAT has a rather weak semantic but since it has been introduced
>> > around 2.6.12 it has been ignored for low order allocations. Yet we have
>> > the full kernel tree with its usage for apparently order-0 allocations.
>> > This is really confusing because __GFP_REPEAT is explicitly documented
>> > to allow allocation failures which is a weaker semantic than the current
>> > order-0 has (basically nofail).
>> > 
>> > Let's simply reap out __GFP_REPEAT from those places. This would allow
>> > to identify place which really need allocator to retry harder and
>> > formulate a more specific semantic for what the flag is supposed to do
>> > actually.
>> 
>> So at first I thought "yeah that's obvious", but then after some more thinking,
>> I'm not so sure anymore.
> 
> Thanks for looking into this! The primary purpose of this patch series was
> to start the discussion. I've only now realized I forgot to add RFC, sorry
> about that.
> 
>> I think we should formulate the semantic first, then do any changes. Also, let's
>> look at the flag description (which comes from pre-git):
> 
> It's rather hard to formulate one without examining the current users...

Sure, but changing existing users is a different thing :)

>>  * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
>>  * _might_ fail.  This depends upon the particular VM implementation.
>> 
>> So we say it's implementation detail, and IIRC the same is said about which
>> orders are considered costly and which not, and the associated rules. So, can we
>> blame callers that happen to use __GFP_REPEAT essentially as a no-op in the
>> current implementation? And is it a problem that they do that?
> 
> Well, I think that many users simply copy&pasted the code along with the
> flag. I have failed to find any justification for adding this flag for
> basically all the cases I've checked.
> 
> My understanding is that the overal motivation for the flag was to
> fortify the allocation requests rather than weaken them. But if we were
> literal then __GFP_REPEAT is in fact weaker than GFP_KERNEL for lower
> orders. It is true that the later one is so only implicitly - and as an
> implementation detail.

OK I admit I didn't realize fully that __GFP_REPEAT is supposed to be weaker,
although you did write it quite explicitly in the changelog. It's just
completely counterintuitive given the name of the flag!

> Anyway I think that getting rid of those users which couldn't ever see a
> difference is a good start.
> 
>> So I think we should answer the following questions:
>> 
>> * What is the semantic of __GFP_REPEAT?
>>   - My suggestion would be something like "I would really like this allocation
>> to succeed. I still have some fallback but it's so suboptimal I'd rather wait
>> for this allocation."
> 
> This is very close to the current wording.
> 
>> And then we could e.g. change some heuristics to take that
>> into account - e.g. direct compaction could ignore the deferred state and
>> pageblock skip bits, to make sure it's as thorough as possible. Right now, that
>> sort of happens, but not quite - given enough reclaim/compact attempts, the
>> compact attempts might break out of deferred state. But pages_reclaimed might
>> reach 1 << order before compaction "undefers", and then it breaks out of the
>> loop. Is any such heuristic change possible for reclaim as well?
> 
> I am not familiar with the compaction code enough to comment on this but
> the reclaim part is already having something in should_continue_reclaim.

Yeah in this function having the flag means to try harder. Which is
counterintuitive to the "allow failure" semantics.

> For low order allocations this doesn't make too much of a difference
> because the reclaim is retried anyway from the page allocator path.
> 
>> As part of this question we should also keep in mind/rethink __GFP_NORETRY as
>> that's supposed to be the opposite flag to __GFP_REPEAT.
>> 
>> * Can it ever happen that __GFP_REPEAT could make some difference for order-0?
> 
> Yes, if you want to try hard but eventually allow to fail the request.

Right, I missed the "eventually fail" part.

> Why just not use __GFP_NORETRY for that purpose? Well, that context is
> much weaker. We give up after the first round of the reclaim and we do
> not trigger the OOM killer in that context. __GFP_REPEAT should be about
> finit retrying.
> 
> I am pretty sure there are users who would like to have this semantic.
> None of the current low-order users seem to fall into this cathegory
> AFAICS though.
> 
>>   - Certainly not wrt compaction, how about reclaim?
> 
> We can decide to allow the allocation to fail if reclaim progress was
> not sufficient _and_ the OOM killer haven't helped rather than start
> looping again.

Makes sense.

>>   - If it couldn't possibly affect order-0, then yeah proceed with Patch 1.
> 
> I've split up obviously order-0 from the rest because I think order-0
> are really easy to understand. Patch 1 is a bit harder to grasp but I
> think it should be safe as well. I am open to discussion of course.
> 
>> * Is PAGE_ALLOC_COSTLY_ORDER considered an implementation detail?
> 
> Yes, more or less, but I doubt we can change it much considering all the
> legacy code which might rely on it. I think we should simply remove the
> dependency on the order and act for all orders same semantically.

That's a nice goal, but probably a long-term one? Looks like making failure
possibility the default isn't viable. So that leaves us with making non-failure
the default. Which means checking all currently-costly-oder allocating sites to
pass some of the flags allowing failure.

>>   - I would think so, and if yes, then we probably shouldn't remove
>> __GFP_NORETRY for order-1+ allocations that happen to be not costly in the
>> current implementation?
> 
> I guess you meant __GFP_REPEAT here

Ah, yes.

> but even then we should really think
> about the reason why the flag has been added. Is it to fortify the
> request? If yes it never worked that way so it is hard to justify it
> that way.

Yep, we should definitely think of a less misleading name.

> Thanks!
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-11-18 14:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-05 16:15 [PATCH 0/3] __GFP_REPEAT cleanup mhocko
2015-11-05 16:15 ` [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I mhocko
2015-11-09 22:04   ` Vlastimil Babka
2015-11-10 12:51     ` Michal Hocko
2015-11-18 14:15       ` Vlastimil Babka [this message]
2015-11-27  9:38         ` Michal Hocko
2015-11-28 10:08           ` Michal Hocko
2015-11-30 17:02           ` Vlastimil Babka
2015-12-01 16:27             ` Michal Hocko
2015-12-21 12:18               ` Vlastimil Babka
2015-11-05 16:15 ` [PATCH 2/3] tree wide: get rid of __GFP_REPEAT for small order requests mhocko
2015-11-05 16:16 ` [PATCH 3/3] jbd2: get rid of superfluous __GFP_REPEAT mhocko
2015-11-06 16:17   ` [PATCH] " mhocko
2015-11-07  1:22     ` Tetsuo Handa
2015-11-08  5:08       ` Theodore Ts'o
2015-11-09  8:16         ` Michal Hocko
2015-11-26 15:10           ` Michal Hocko
2015-11-26 20:18             ` Theodore Ts'o
2015-11-27  7:56               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564C8801.2090202@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).