Re: [RFC PATCH 00/10] redesign compaction algorithm

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Vlastimil Babka <vbabka@suse.cz>, Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [RFC PATCH 00/10] redesign compaction algorithm
Date: Thu, 25 Jun 2015 19:41:35 +0100	[thread overview]
Message-ID: <20150625184135.GB26927@suse.de> (raw)
In-Reply-To: <CAAmzW4PMWOaAa0bd7xVr5Jz=xVgqMw8G=UFOwhUGuyLL9EFbHA@mail.gmail.com>

On Fri, Jun 26, 2015 at 03:14:39AM +0900, Joonsoo Kim wrote:
> > It could though. Reclaim/compaction is entered for orders higher than
> > PAGE_ALLOC_COSTLY_ORDER and when scan priority is sufficiently high.
> > That could be adjusted if you have a viable case where orders <
> > PAGE_ALLOC_COSTLY_ORDER must succeed and currently requires excessive
> > reclaim instead of relying on compaction.
> 
> Yes. I saw this problem in real situation. In ARM, order-2 allocation
> is requested
> in fork(), so it should be succeed. But, there is not enough order-2 freepage,
> so reclaim/compaction begins. Compaction fails repeatedly although
> I didn't check exact reason.

That should be identified and repaired prior to reimplementing
compaction because it's important.

> >> >> 3) Compaction capability is highly depends on migratetype of memory,
> >> >> because freepage scanner doesn't scan unmovable pageblock.
> >> >>
> >> >
> >> > For a very good reason. Unmovable allocation requests that fallback to
> >> > other pageblocks are the worst in terms of fragmentation avoidance. The
> >> > more of these events there are, the more the system will decay. If there
> >> > are many of these events then a compaction benchmark may start with high
> >> > success rates but decay over time.
> >> >
> >> > Very broadly speaking, the more the mm_page_alloc_extfrag tracepoint
> >> > triggers with alloc_migratetype == MIGRATE_UNMOVABLE, the faster the
> >> > system is decaying. Having the freepage scanner select unmovable
> >> > pageblocks will trigger this event more frequently.
> >> >
> >> > The unfortunate impact is that selecting unmovable blocks from the free
> >> > csanner will improve compaction success rates for high-order kernel
> >> > allocations early in the lifetime of the system but later fail high-order
> >> > allocation requests as more pageblocks get converted to unmovable. It
> >> > might be ok for kernel allocations but THP will eventually have a 100%
> >> > failure rate.
> >>
> >> I wrote rationale in the patch itself. We already use non-movable pageblock
> >> for migration scanner. It empties non-movable pageblock so number of
> >> freepage on non-movable pageblock will increase. Using non-movable
> >> pageblock for freepage scanner negates this effect so number of freepage
> >> on non-movable pageblock will be balanced. Could you tell me in detail
> >> how freepage scanner select unmovable pageblocks will cause
> >> more fragmentation? Possibly, I don't understand effect of this patch
> >> correctly and need some investigation. :)
> >>
> >
> > The long-term success rate of fragmentation avoidance depends on
> > minimsing the number of UNMOVABLE allocation requests that use a
> > pageblock belonging to another migratetype. Once such a fallback occurs,
> > that pageblock potentially can never be used for a THP allocation again.
> >
> > Lets say there is an unmovable pageblock with 500 free pages in it. If
> > the freepage scanner uses that pageblock and allocates all 500 free
> > pages then the next unmovable allocation request needs a new pageblock.
> > If one is not completely free then it will fallback to using a
> > RECLAIMABLE or MOVABLE pageblock forever contaminating it.
> 
> Yes, I can imagine that situation. But, as I said above, we already use
> non-movable pageblock for migration scanner. While unmovable
> pageblock with 500 free pages fills, some other unmovable pageblock
> with some movable pages will be emptied. Number of freepage
> on non-movable would be maintained so fallback doesn't happen.
> 
> Anyway, it is better to investigate this effect. I will do it and attach
> result on next submission.
> 

Lets say we have X unmovable pageblocks and Y pageblocks overall. If the
migration scanner takes movable pages from X then there is more space for
unmovable allocations without having to increase X -- this is good. If
the free scanner uses the X pageblocks as targets then they can fill. The
next unmovable allocation then falls back to another pageblock and we
either have X+1 unmovable pageblocks (full steal) or a mixed pageblock
(partial steal) that cannot be used for THP. Do this enough times and
X == Y and all THP allocations fail.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Vlastimil Babka <vbabka@suse.cz>, Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [RFC PATCH 00/10] redesign compaction algorithm
Date: Thu, 25 Jun 2015 19:41:35 +0100	[thread overview]
Message-ID: <20150625184135.GB26927@suse.de> (raw)
In-Reply-To: <CAAmzW4PMWOaAa0bd7xVr5Jz=xVgqMw8G=UFOwhUGuyLL9EFbHA@mail.gmail.com>

On Fri, Jun 26, 2015 at 03:14:39AM +0900, Joonsoo Kim wrote:
> > It could though. Reclaim/compaction is entered for orders higher than
> > PAGE_ALLOC_COSTLY_ORDER and when scan priority is sufficiently high.
> > That could be adjusted if you have a viable case where orders <
> > PAGE_ALLOC_COSTLY_ORDER must succeed and currently requires excessive
> > reclaim instead of relying on compaction.
> 
> Yes. I saw this problem in real situation. In ARM, order-2 allocation
> is requested
> in fork(), so it should be succeed. But, there is not enough order-2 freepage,
> so reclaim/compaction begins. Compaction fails repeatedly although
> I didn't check exact reason.

That should be identified and repaired prior to reimplementing
compaction because it's important.

> >> >> 3) Compaction capability is highly depends on migratetype of memory,
> >> >> because freepage scanner doesn't scan unmovable pageblock.
> >> >>
> >> >
> >> > For a very good reason. Unmovable allocation requests that fallback to
> >> > other pageblocks are the worst in terms of fragmentation avoidance. The
> >> > more of these events there are, the more the system will decay. If there
> >> > are many of these events then a compaction benchmark may start with high
> >> > success rates but decay over time.
> >> >
> >> > Very broadly speaking, the more the mm_page_alloc_extfrag tracepoint
> >> > triggers with alloc_migratetype == MIGRATE_UNMOVABLE, the faster the
> >> > system is decaying. Having the freepage scanner select unmovable
> >> > pageblocks will trigger this event more frequently.
> >> >
> >> > The unfortunate impact is that selecting unmovable blocks from the free
> >> > csanner will improve compaction success rates for high-order kernel
> >> > allocations early in the lifetime of the system but later fail high-order
> >> > allocation requests as more pageblocks get converted to unmovable. It
> >> > might be ok for kernel allocations but THP will eventually have a 100%
> >> > failure rate.
> >>
> >> I wrote rationale in the patch itself. We already use non-movable pageblock
> >> for migration scanner. It empties non-movable pageblock so number of
> >> freepage on non-movable pageblock will increase. Using non-movable
> >> pageblock for freepage scanner negates this effect so number of freepage
> >> on non-movable pageblock will be balanced. Could you tell me in detail
> >> how freepage scanner select unmovable pageblocks will cause
> >> more fragmentation? Possibly, I don't understand effect of this patch
> >> correctly and need some investigation. :)
> >>
> >
> > The long-term success rate of fragmentation avoidance depends on
> > minimsing the number of UNMOVABLE allocation requests that use a
> > pageblock belonging to another migratetype. Once such a fallback occurs,
> > that pageblock potentially can never be used for a THP allocation again.
> >
> > Lets say there is an unmovable pageblock with 500 free pages in it. If
> > the freepage scanner uses that pageblock and allocates all 500 free
> > pages then the next unmovable allocation request needs a new pageblock.
> > If one is not completely free then it will fallback to using a
> > RECLAIMABLE or MOVABLE pageblock forever contaminating it.
> 
> Yes, I can imagine that situation. But, as I said above, we already use
> non-movable pageblock for migration scanner. While unmovable
> pageblock with 500 free pages fills, some other unmovable pageblock
> with some movable pages will be emptied. Number of freepage
> on non-movable would be maintained so fallback doesn't happen.
> 
> Anyway, it is better to investigate this effect. I will do it and attach
> result on next submission.
> 

Lets say we have X unmovable pageblocks and Y pageblocks overall. If the
migration scanner takes movable pages from X then there is more space for
unmovable allocations without having to increase X -- this is good. If
the free scanner uses the X pageblocks as targets then they can fill. The
next unmovable allocation then falls back to another pageblock and we
either have X+1 unmovable pageblocks (full steal) or a mixed pageblock
(partial steal) that cannot be used for THP. Do this enough times and
X == Y and all THP allocations fail.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-06-25 18:41 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25  0:45 [RFC PATCH 00/10] redesign compaction algorithm Joonsoo Kim
2015-06-25  0:45 ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 01/10] mm/compaction: update skip-bit if whole pageblock is really scanned Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 02/10] mm/compaction: skip useless pfn for scanner's cached pfn Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 03/10] mm/compaction: always update " Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  9:08   ` Vlastimil Babka
2015-06-25  9:08     ` Vlastimil Babka
2015-06-25  0:45 ` [RFC PATCH 04/10] mm/compaction: clean-up restarting condition check Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 05/10] mm/compaction: make freepage scanner scans non-movable pageblock Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 06/10] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 07/10] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 08/10] mm/compaction: remove compaction deferring Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 09/10] mm/compaction: redesign compaction Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25  0:45 ` [RFC PATCH 10/10] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
2015-06-25  0:45   ` Joonsoo Kim
2015-06-25 11:03 ` [RFC PATCH 00/10] redesign compaction algorithm Mel Gorman
2015-06-25 11:03   ` Mel Gorman
2015-06-25 17:11   ` Joonsoo Kim
2015-06-25 17:11     ` Joonsoo Kim
2015-06-25 17:25     ` Mel Gorman
2015-06-25 17:25       ` Mel Gorman
2015-06-25 18:14       ` Joonsoo Kim
2015-06-25 18:14         ` Joonsoo Kim
2015-06-25 18:41         ` Mel Gorman [this message]
2015-06-25 18:41           ` Mel Gorman
2015-06-26  2:07           ` Joonsoo Kim
2015-06-26  2:07             ` Joonsoo Kim
2015-06-26 10:22             ` Mel Gorman
2015-06-26 10:22               ` Mel Gorman
2015-07-08  8:24               ` Joonsoo Kim
2015-07-08  8:24                 ` Joonsoo Kim
2015-07-21  9:27                 ` Vlastimil Babka
2015-07-21  9:27                   ` Vlastimil Babka
2015-07-23  5:33                   ` Joonsoo Kim
2015-07-23  5:33                     ` Joonsoo Kim
2015-06-25 18:56         ` Vlastimil Babka
2015-06-25 18:56           ` Vlastimil Babka
2015-06-26  2:14           ` Joonsoo Kim
2015-06-26  2:14             ` Joonsoo Kim
2015-06-26 11:22             ` Vlastimil Babka
2015-06-26 11:22               ` Vlastimil Babka
2015-06-25 13:35 ` Vlastimil Babka
2015-06-25 13:35   ` Vlastimil Babka
2015-06-25 17:32   ` Joonsoo Kim
2015-06-25 17:32     ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150625184135.GB26927@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.