From: Mel Gorman <mgorman@techsingularity.net>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
David Rientjes <rientjes@google.com>,
Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 12/12] mm, page_alloc: Only enforce watermarks for order-0 allocations
Date: Mon, 21 Sep 2015 11:51:41 +0100 [thread overview]
Message-ID: <20150921105141.GB3068@techsingularity.net> (raw)
In-Reply-To: <20150918065621.GC7769@js1304-P5Q-DELUXE>
On Fri, Sep 18, 2015 at 03:56:21PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 09, 2015 at 01:39:01PM +0100, Mel Gorman wrote:
> > On Tue, Sep 08, 2015 at 05:26:13PM +0900, Joonsoo Kim wrote:
> > > 2015-08-24 21:30 GMT+09:00 Mel Gorman <mgorman@techsingularity.net>:
> > > > The primary purpose of watermarks is to ensure that reclaim can always
> > > > make forward progress in PF_MEMALLOC context (kswapd and direct reclaim).
> > > > These assume that order-0 allocations are all that is necessary for
> > > > forward progress.
> > > >
> > > > High-order watermarks serve a different purpose. Kswapd had no high-order
> > > > awareness before they were introduced (https://lkml.org/lkml/2004/9/5/9).
> > > > This was particularly important when there were high-order atomic requests.
> > > > The watermarks both gave kswapd awareness and made a reserve for those
> > > > atomic requests.
> > > >
> > > > There are two important side-effects of this. The most important is that
> > > > a non-atomic high-order request can fail even though free pages are available
> > > > and the order-0 watermarks are ok. The second is that high-order watermark
> > > > checks are expensive as the free list counts up to the requested order must
> > > > be examined.
> > > >
> > > > With the introduction of MIGRATE_HIGHATOMIC it is no longer necessary to
> > > > have high-order watermarks. Kswapd and compaction still need high-order
> > > > awareness which is handled by checking that at least one suitable high-order
> > > > page is free.
> > >
> > > I still don't think that this one suitable high-order page is enough.
> > > If fragmentation happens, there would be no order-2 freepage. If kswapd
> > > prepares only 1 order-2 freepage, one of two successive process forks
> > > (AFAIK, fork in x86 and ARM require order 2 page) must go to direct reclaim
> > > to make order-2 freepage. Kswapd cannot make order-2 freepage in that
> > > short time. It causes latency to many high-order freepage requestor
> > > in fragmented situation.
> > >
> >
> > So what do you suggest instead? A fixed number, some other heuristic?
> > You have pushed several times now for the series to focus on the latency
> > of standard high-order allocations but again I will say that it is outside
> > the scope of this series. If you want to take steps to reduce the latency
> > of ordinary high-order allocation requests that can sleep then it should
> > be a separate series.
>
> I don't understand why you think it should be a separate series.
Because atomic high-order allocation success and normal high-order
allocation stall latency are different problems. Atomic high-order
allocation successes are about reserves, normal high-order allocations
are about reclaim.
> I don't know exact reason why high order watermark check is
> introduced, but, based on your description, it is for high-order
> allocation request in atomic context.
Mostly yes, the initial motivation is described in the linked mail --
give kswapd high-order awareness because otherwise (higher-order && !wait)
allocations that fail would wake kswapd but it would go back to sleep.
> And, it would accidently take care
> about latency.
Except all it does is defer the problem. If kswapd frees N high-order
pages then it disrupts the system to satisfy the request, potentially
reclaiming hot pages for an allocation attempt that *may* occur that
will stall if there are N+1 allocation requests.
Kswapd reclaiming additional pages is definite system disruption and
potentially increases thrashing *now* to help an event that *might* occur
in the future.
> It is used for a long time and your patch try to remove it
> and it only takes care about success rate. That means that your patch
> could cause regression. I think that if this happens actually, it is handled
> in this patchset instead of separate series.
>
Except it doesn't really.
Current situation
o A high-order watermark check might fail for a normal high-order
allocation request. On failure, stall to reclaim more pages which may
or may not succeed
o An atomic allocation may use a lower watermark but it can still fail
even if there are free pages on the list
Patched situation
o A watermark check might fail for a normal high-order allocation
request and cannot use one of the reserved pages. On failure, stall to
reclaim more pages which may or may not succeed.
Functionally, this is very similar to current behaviour
o An atomic allocation may use the reserves so if a free page exists, it
will be used
Functionally, this is more reliable than current behaviour as there is
still potential for disruption
> In review of previous version, I suggested that removing watermark
> check only for higher than PAGE_ALLOC_COSTLY_ORDER.
It increases complexity for reasons that are not quantified.
> You didn't accept
> that and I still don't agree with your approach. You can show me that
> my concern is wrong via some number.
>
> One candidate test for this is that making system fragmented and
> run hackbench which uses a lot of high-order allocation and measure
> elapsed-time.
>
o There is no difference in normal allocation high-order success rates
with this series appied
o With the series applied, such tests complete in approximately the same
time
o For the tests with parallel high-order allocation requests, there was
no significant difference in the elapsed times although success rates
were slightly higher
Each time the full sets of tests take about 4 days to complete on this
series and so far no problems of the type you describe have been found.
If such a test case is found then there would a clear workload to
justify either having kswapd reclaiming multiple pages or apply the old
watermark scheme for lower orders.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-09-21 10:51 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-24 12:09 [PATCH 00/12] Remove zonelist cache and high-order watermark checking v3 Mel Gorman
2015-08-24 12:09 ` [PATCH 01/12] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-08-24 12:09 ` [PATCH 02/12] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-08-24 12:09 ` [PATCH 03/12] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-08-26 10:25 ` Michal Hocko
2015-08-24 12:09 ` [PATCH 04/12] mm, page_alloc: Only check cpusets when one exists that can be mem-controlled Mel Gorman
2015-08-24 12:37 ` Vlastimil Babka
2015-08-24 13:16 ` Mel Gorman
2015-08-24 20:53 ` Vlastimil Babka
2015-08-25 10:33 ` Mel Gorman
2015-08-25 11:09 ` Vlastimil Babka
2015-08-26 13:41 ` Mel Gorman
2015-08-26 10:46 ` Michal Hocko
2015-08-24 12:09 ` [PATCH 05/12] mm, page_alloc: Remove unecessary recheck of nodemask Mel Gorman
2015-08-25 14:32 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 06/12] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-08-25 14:36 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd Mel Gorman
2015-08-24 18:29 ` Mel Gorman
2015-08-25 15:37 ` Vlastimil Babka
2015-08-26 14:45 ` Mel Gorman
2015-08-26 16:24 ` Vlastimil Babka
2015-08-26 18:10 ` Mel Gorman
2015-08-27 9:18 ` Vlastimil Babka
2015-08-25 15:48 ` Vlastimil Babka
2015-08-26 13:05 ` Michal Hocko
2015-09-08 6:49 ` Joonsoo Kim
2015-09-09 12:22 ` Mel Gorman
2015-09-18 6:25 ` Joonsoo Kim
2015-08-24 12:09 ` [PATCH 08/12] mm, page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM Mel Gorman
2015-08-26 12:19 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 09/12] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-08-24 12:29 ` [PATCH 10/12] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-08-24 12:29 ` [PATCH 11/12] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-08-26 12:44 ` Vlastimil Babka
2015-08-26 14:53 ` Michal Hocko
2015-08-26 15:38 ` Mel Gorman
2015-09-08 8:01 ` Joonsoo Kim
2015-09-09 12:32 ` Mel Gorman
2015-09-18 6:38 ` Joonsoo Kim
2015-09-21 10:51 ` Mel Gorman
2015-08-24 12:30 ` [PATCH 12/12] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-08-26 13:42 ` Vlastimil Babka
2015-08-26 14:53 ` Mel Gorman
2015-08-28 12:10 ` Michal Hocko
2015-08-28 14:12 ` Mel Gorman
2015-09-08 8:26 ` Joonsoo Kim
2015-09-09 12:39 ` Mel Gorman
2015-09-18 6:56 ` Joonsoo Kim
2015-09-21 10:51 ` Mel Gorman [this message]
2015-09-30 8:51 ` Vitaly Wool
2015-09-30 13:52 ` Vlastimil Babka
2015-09-30 14:16 ` Vitaly Wool
2015-09-30 14:43 ` Vlastimil Babka
2015-09-30 15:18 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150921105141.GB3068@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).