From: Johannes Weiner <hannes@cmpxchg.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>, Joonsoo Kim <js1304@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: Regression in mobility grouping?
Date: Wed, 28 Sep 2016 11:39:25 -0400 [thread overview]
Message-ID: <20160928153925.GA24966@cmpxchg.org> (raw)
In-Reply-To: <8c3b7dd8-ef6f-6666-2f60-8168d41202cf@suse.cz>
Hi Vlastimil,
On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> >
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
> > Node 1, zone Normal 815 433 31518 2 0
> >
> > and on 4.0 like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> > Node 1, zone Normal 3880 3530 25356 2 0 0
>
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.
Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.
> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?
They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.
> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> >
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> >
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
>
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.
Good point.
> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
>
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.
Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.
But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.
So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.
> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> >
> > Any thoughts on this would be greatly appreciated. I can test patches.
>
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.
I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.
The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.
Thanks for your insights, Vlastimil!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>, Joonsoo Kim <js1304@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: Regression in mobility grouping?
Date: Wed, 28 Sep 2016 11:39:25 -0400 [thread overview]
Message-ID: <20160928153925.GA24966@cmpxchg.org> (raw)
In-Reply-To: <8c3b7dd8-ef6f-6666-2f60-8168d41202cf@suse.cz>
Hi Vlastimil,
On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> >
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
> > Node 1, zone Normal 815 433 31518 2 0
> >
> > and on 4.0 like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> > Node 1, zone Normal 3880 3530 25356 2 0 0
>
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.
Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.
> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?
They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.
> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> >
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> >
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
>
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.
Good point.
> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
>
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.
Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.
But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.
So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.
> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> >
> > Any thoughts on this would be greatly appreciated. I can test patches.
>
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.
I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.
The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.
Thanks for your insights, Vlastimil!
next prev parent reply other threads:[~2016-09-28 15:39 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-28 1:41 Regression in mobility grouping? Johannes Weiner
2016-09-28 9:00 ` Vlastimil Babka
2016-09-28 9:00 ` Vlastimil Babka
2016-09-28 15:39 ` Johannes Weiner [this message]
2016-09-28 15:39 ` Johannes Weiner
2016-09-29 2:25 ` Johannes Weiner
2016-09-29 2:25 ` Johannes Weiner
2016-09-29 6:14 ` Joonsoo Kim
2016-09-29 6:14 ` Joonsoo Kim
2016-09-29 16:14 ` Johannes Weiner
2016-09-29 16:14 ` Johannes Weiner
2016-10-13 7:33 ` Joonsoo Kim
2016-10-13 7:33 ` Joonsoo Kim
2016-09-29 7:17 ` Vlastimil Babka
2016-09-29 7:17 ` Vlastimil Babka
2016-09-28 10:26 ` Mel Gorman
2016-09-28 10:26 ` Mel Gorman
2016-09-28 16:37 ` Johannes Weiner
2016-09-28 16:37 ` Johannes Weiner
2016-09-29 21:05 ` [RFC 0/4] try to reduce fragmenting fallbacks Vlastimil Babka
2016-09-29 21:05 ` Vlastimil Babka
2016-09-29 21:05 ` [RFC 1/4] mm, compaction: change migrate_async_suitable() to suitable_migration_source() Vlastimil Babka
2016-09-29 21:05 ` Vlastimil Babka
2016-09-29 21:05 ` [RFC 2/4] mm, compaction: add migratetype to compact_control Vlastimil Babka
2016-09-29 21:05 ` Vlastimil Babka
2016-09-29 21:05 ` [RFC 3/4] mm, compaction: restrict async compaction to matching migratetype Vlastimil Babka
2016-09-29 21:05 ` Vlastimil Babka
2016-09-29 21:05 ` [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath Vlastimil Babka
2016-09-29 21:05 ` Vlastimil Babka
2016-10-12 14:51 ` Vlastimil Babka
2016-10-12 14:51 ` Vlastimil Babka
2016-10-13 7:58 ` Joonsoo Kim
2016-10-13 7:58 ` Joonsoo Kim
2016-10-13 11:46 ` Vlastimil Babka
2016-10-13 11:46 ` Vlastimil Babka
2016-10-07 8:32 ` [RFC 5/4] mm, page_alloc: split smallest stolen page in fallback Vlastimil Babka
2016-10-07 8:32 ` Vlastimil Babka
2016-10-10 17:16 ` [RFC 0/4] try to reduce fragmenting fallbacks Johannes Weiner
2016-10-10 17:16 ` Johannes Weiner
2016-10-11 13:11 ` [RFC 6/4] mm, page_alloc: introduce MIGRATE_MIXED migratetype Vlastimil Babka
2016-10-11 13:11 ` Vlastimil Babka
2016-10-13 14:11 ` [RFC 7/4] mm, page_alloc: count movable pages when stealing Vlastimil Babka
2016-10-13 14:11 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160928153925.GA24966@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=js1304@gmail.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.