From: Mel Gorman <mgorman@techsingularity.net>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.com>, Linux-MM <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
Pintu Kumar <pintu.k@samsung.com>,
Xishi Qiu <qiuxishi@huawei.com>, Gioh Kim <gioh.kim@lge.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand
Date: Fri, 31 Jul 2015 08:11:13 +0100 [thread overview]
Message-ID: <20150731071113.GA5840@techsingularity.net> (raw)
In-Reply-To: <20150731055407.GA15912@js1304-P5Q-DELUXE>
On Fri, Jul 31, 2015 at 02:54:07PM +0900, Joonsoo Kim wrote:
> Hello, Mel.
>
> On Mon, Jul 20, 2015 at 09:00:18AM +0100, Mel Gorman wrote:
> > From: Mel Gorman <mgorman@suse.de>
> >
> > High-order watermark checking exists for two reasons -- kswapd high-order
> > awareness and protection for high-order atomic requests. Historically we
> > depended on MIGRATE_RESERVE to preserve min_free_kbytes as high-order free
> > pages for as long as possible. This patch introduces MIGRATE_HIGHATOMIC
> > that reserves pageblocks for high-order atomic allocations. This is expected
> > to be more reliable than MIGRATE_RESERVE was.
>
> I have some concerns on this patch.
>
> 1) This patch breaks intention of __GFP_WAIT.
> __GFP_WAIT is used when we want to succeed allocation even if we need
> to do some reclaim/compaction work. That implies importance of
> allocation success. But, reserved pageblock for MIGRATE_HIGHATOMIC makes
> atomic allocation (~__GFP_WAIT) more successful than allocation with
> __GFP_WAIT in many situation. It breaks basic assumption of gfp flags
> and doesn't make any sense.
>
Currently allocation requests that do not specify __GFP_WAIT get the
ALLOC_HARDER flag which allows them to dip further into watermark reserves.
It already is the case that there are corner cases where a high atomic
allocation can succeed when a non-atomic allocation would reclaim.
> 2) Who care about success of high-order atomic allocation with this
> reliability?
Historically network configurations with large MTUs that could not scatter
gather. These days network will also attempt atomic order-3 allocations
to reduce overhead. SLUB also attempts atomic high-order allocations to
reduce overhead. It's why MIGRATE_RESERVE exists at all so the intent of
the patch is to preserve what MIGRATE_RESERVE was for but do it better.
> In case of allocation without __GFP_WAIT, requestor preare sufficient
> fallback method. They just want to success if it is easily successful.
> They don't want to succeed allocation with paying great cost that slow
> down general workload by this patch that can be accidentally reserve
> too much memory.
>
Not necessary true. In the historical case, the network request was atomic
because it was from IRQ context and could not sleep.
> > A MIGRATE_HIGHORDER pageblock is created when an allocation request steals
> > a pageblock but limits the total number to 10% of the zone.
>
> When steals happens, pageblock already can be fragmented and we can't
> fully utilize this pageblock without allowing order-0 allocation. This
> is very waste.
>
If the pageblock was stolen, it implies there was at least 1 usable page
of the correct order. As the pageblock is then reserved, any pages that
free in that block stay free for use by high-order atomic allocations.
Else, the number of pageblocks will increase again until the 10% limit
is hit.
> > The pageblocks are unreserved if an allocation fails after a direct
> > reclaim attempt.
> >
> > The watermark checks account for the reserved pageblocks when the allocation
> > request is not a high-order atomic allocation.
> >
> > The stutter benchmark was used to evaluate this but while it was running
> > there was a systemtap script that randomly allocated between 1 and 1G worth
> > of order-3 pages using GFP_ATOMIC. In kernel 4.2-rc1 running this workload
> > on a single-node machine there were 339574 allocation failures. With this
> > patch applied there were 28798 failures -- a 92% reduction. On a 4-node
> > machine, allocation failures went from 76917 to 0 failures.
>
> There is some missing information to justify benchmark result.
> Especially, I'd like to know:
>
> 1) Detailed system setup (CPU, MEMORY, etc...)
CPUs were 8 core Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz with 8G of RAM.
> 2) Total number of attempt of GFP_ATOMIC allocation request
>
Each attempt was between 1 and 1G randomly as described already.
> I don't know how you modify stutter benchmark in mmtests but it
> looks like there is no delay when continually requesting GFP_ATOMIC
> allocation.
> 1G of order-3 allocation request without delay seems insane
> to me. Could you tell me how you modify that benchmark for this patch?
>
The stutter benchmark was not modified. The watch-stress-highorder-atomic
monitor was run in parallel and that's what is doing the allocation. It's
true that up to 1G of order-3 allocations without delay would be insane
in a normal situation. The point was to show an extreme case where atomic
allocations were used and to test whether the reserves held up or not.
> > There are minor theoritical side-effects. If the system is intensively
> > making large numbers of long-lived high-order atomic allocations then
> > there will be a lot of reserved pageblocks. This may push some workloads
> > into reclaim until the number of reserved pageblocks is reduced again. This
> > problem was not observed in reclaim intensive workloads but such workloads
> > are also not atomic high-order intensive.
>
> I don't think this is theoritical side-effects. It can happen easily.
> Recently, network subsystem makes some of their high-order allocation
> request ~_GFP_WAIT (fb05e7a89f50: net: don't wait for order-3 page
> allocation). And, I've submitted similar patch for slub today
> (mm/slub: don't wait for high-order page allocation). That
> makes system atomic high-order allocation request more and this side-effect
> can be possible in many situation.
>
The key is long-lived allocations. The network subsystem frees theirs. I
was not able to trigger a situation in a variety of workloads where these
happened which is why I classified it as theoritical.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-07-31 7:11 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-20 8:00 [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Mel Gorman
2015-07-20 8:00 ` [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-07-21 23:47 ` David Rientjes
2015-07-23 10:58 ` Mel Gorman
2015-07-20 8:00 ` [PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-07-21 23:49 ` David Rientjes
2015-07-28 12:20 ` Vlastimil Babka
2015-07-20 8:00 ` [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-07-22 0:08 ` David Rientjes
2015-07-23 12:28 ` Mel Gorman
2015-07-28 12:25 ` Vlastimil Babka
2015-07-20 8:00 ` [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-07-22 0:11 ` David Rientjes
2015-07-28 12:32 ` Vlastimil Babka
2015-07-20 8:00 ` [PATCH 05/10] mm, page_alloc: Remove unnecessary updating of GFP flags during normal operation Mel Gorman
2015-07-28 13:36 ` Vlastimil Babka
2015-07-28 13:47 ` Peter Zijlstra
2015-07-28 15:48 ` Mel Gorman
2015-07-20 8:00 ` [PATCH 06/10] mm, page_alloc: Use jump label to check if page grouping by mobility is enabled Mel Gorman
2015-07-28 13:42 ` Vlastimil Babka
2015-07-20 8:00 ` [PATCH 07/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-07-20 8:00 ` [PATCH 08/10] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-07-29 9:59 ` Vlastimil Babka
2015-07-29 12:25 ` Mel Gorman
2015-07-20 8:00 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-07-29 11:35 ` Vlastimil Babka
2015-07-29 12:53 ` Mel Gorman
2015-07-31 8:28 ` Vlastimil Babka
2015-07-31 8:43 ` Mel Gorman
2015-07-31 5:54 ` Joonsoo Kim
2015-07-31 7:11 ` Mel Gorman [this message]
2015-07-31 7:25 ` Vlastimil Babka
2015-07-31 8:22 ` Mel Gorman
2015-07-31 8:30 ` Joonsoo Kim
2015-07-31 8:26 ` Joonsoo Kim
2015-07-31 8:41 ` Mel Gorman
2015-07-20 8:00 ` [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-07-29 12:25 ` Vlastimil Babka
2015-07-29 13:04 ` Mel Gorman
2015-07-31 6:08 ` Joonsoo Kim
2015-07-31 7:19 ` Mel Gorman
2015-07-31 8:40 ` Joonsoo Kim
2015-07-31 6:14 ` [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Joonsoo Kim
2015-07-31 7:20 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2015-08-12 10:45 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v2 Mel Gorman
2015-08-12 10:45 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-21 10:52 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v4 Mel Gorman
2015-09-21 10:52 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-24 13:50 ` Michal Hocko
2015-09-25 19:22 ` Johannes Weiner
2015-09-29 21:01 ` Andrew Morton
2015-09-30 8:27 ` Mel Gorman
2015-09-30 14:02 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150731071113.GA5840@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=gioh.kim@lge.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.com \
--cc=pintu.k@samsung.com \
--cc=qiuxishi@huawei.com \
--cc=riel@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).