* alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9
@ 2021-04-21 18:36 Florian Fainelli
2021-04-22 7:49 ` Michal Hocko
0 siblings, 1 reply; 3+ messages in thread
From: Florian Fainelli @ 2021-04-21 18:36 UTC (permalink / raw)
To: mhocko, Vlastimil Babka, Mel Gorman, Minchan Kim, Johannes Weiner
Cc: l.stach, LKML, linux-mmc, Jaewon Kim, Michal Nazarewicz,
Joonsoo Kim
[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]
Hi all,
I have been trying for the past few days to identify the source of a
performance regression that we are seeing with the 5.4 kernel but not
with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
challenging at the moment but will happen eventually.
What we are seeing is a ~3x increase in the time needed for
alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
is idle at the time and there are no other contenders for memory other
than the user-space programs already started (DHCP client, shell, etc.).
I have tried playing with the compact_control structure settings but
have not found anything that would bring us back to the performance of
4.9. More often than not, we see test_pages_isolated() returning an
non-zero error code which would explain the slow down, since we have
some logic that re-tries the allocation if alloc_contig_range() returns
-EBUSY. If I remove the retry logic however, we don't get -EBUSY and we
get the results below:
4.9 shows this:
[ 457.537634] allocating: size: 1024MB avg: 59172 (us), max: 137306
(us), min: 44859 (us), total: 591723 (us), pages: 512, per-page: 115 (us)
[ 457.550222] freeing: size: 1024MB avg: 67397 (us), max: 151408 (us),
min: 52630 (us), total: 673974 (us), pages: 512, per-page: 131 (us)
5.4 show this:
[ 222.388758] allocating: size: 1024MB avg: 156739 (us), max: 157254
(us), min: 155915 (us), total: 1567394 (us), pages: 512, per-page: 306 (us)
[ 222.401601] freeing: size: 1024MB avg: 209899 (us), max: 210085 (us),
min: 209749 (us), total: 2098999 (us), pages: 512, per-page: 409 (us)
This regression is not seen when MIGRATE_CMA is specified instead of
MIGRATE_MOVABLE.
A few characteristics that you should probably be aware of:
- There is 4GB of memory populated with the memory being mapped into the
CPU's address starting at space at 0x4000_0000 (1GB), PAGE_SIZE is 4KB
- there is a ZONE_DMA32 that starts at 0x4000_0000 and ends at
0xE480_0000, from there on we have a ZONE_MOVABLE which is comprised of
0xE480_0000 - 0xfdc00000 and another range spanning 0x1_0000_0000 -
0x1_4000_0000
Attached is the kernel configuration.
Thanks!
--
Florian
[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 32742 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9
2021-04-21 18:36 alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9 Florian Fainelli
@ 2021-04-22 7:49 ` Michal Hocko
2021-04-22 8:56 ` David Hildenbrand
0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2021-04-22 7:49 UTC (permalink / raw)
To: Florian Fainelli
Cc: Vlastimil Babka, Mel Gorman, Minchan Kim, Johannes Weiner,
l.stach, LKML, linux-mmc, Jaewon Kim, Michal Nazarewicz,
Joonsoo Kim, David Hildenbrand, Oscar Salvador
Cc David and Oscar who are familiar with this code as well.
On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
> Hi all,
>
> I have been trying for the past few days to identify the source of a
> performance regression that we are seeing with the 5.4 kernel but not
> with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
> challenging at the moment but will happen eventually.
>
> What we are seeing is a ~3x increase in the time needed for
> alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
> is idle at the time and there are no other contenders for memory other
> than the user-space programs already started (DHCP client, shell, etc.).
>
> I have tried playing with the compact_control structure settings but
> have not found anything that would bring us back to the performance of
> 4.9. More often than not, we see test_pages_isolated() returning an
> non-zero error code which would explain the slow down, since we have
> some logic that re-tries the allocation if alloc_contig_range() returns
> -EBUSY. If I remove the retry logic however, we don't get -EBUSY and we
> get the results below:
>
> 4.9 shows this:
>
> [ 457.537634] allocating: size: 1024MB avg: 59172 (us), max: 137306
> (us), min: 44859 (us), total: 591723 (us), pages: 512, per-page: 115 (us)
> [ 457.550222] freeing: size: 1024MB avg: 67397 (us), max: 151408 (us),
> min: 52630 (us), total: 673974 (us), pages: 512, per-page: 131 (us)
>
> 5.4 show this:
>
> [ 222.388758] allocating: size: 1024MB avg: 156739 (us), max: 157254
> (us), min: 155915 (us), total: 1567394 (us), pages: 512, per-page: 306 (us)
> [ 222.401601] freeing: size: 1024MB avg: 209899 (us), max: 210085 (us),
> min: 209749 (us), total: 2098999 (us), pages: 512, per-page: 409 (us)
>
> This regression is not seen when MIGRATE_CMA is specified instead of
> MIGRATE_MOVABLE.
>
> A few characteristics that you should probably be aware of:
>
> - There is 4GB of memory populated with the memory being mapped into the
> CPU's address starting at space at 0x4000_0000 (1GB), PAGE_SIZE is 4KB
>
> - there is a ZONE_DMA32 that starts at 0x4000_0000 and ends at
> 0xE480_0000, from there on we have a ZONE_MOVABLE which is comprised of
> 0xE480_0000 - 0xfdc00000 and another range spanning 0x1_0000_0000 -
> 0x1_4000_0000
>
> Attached is the kernel configuration.
>
> Thanks!
> --
> Florian
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9
2021-04-22 7:49 ` Michal Hocko
@ 2021-04-22 8:56 ` David Hildenbrand
0 siblings, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2021-04-22 8:56 UTC (permalink / raw)
To: Michal Hocko, Florian Fainelli
Cc: Vlastimil Babka, Mel Gorman, Minchan Kim, Johannes Weiner,
l.stach, LKML, linux-mmc, Jaewon Kim, Michal Nazarewicz,
Joonsoo Kim, Oscar Salvador, linux-mm@kvack.org
On 22.04.21 09:49, Michal Hocko wrote:
> Cc David and Oscar who are familiar with this code as well.
>
> On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
>> Hi all,
>>
>> I have been trying for the past few days to identify the source of a
>> performance regression that we are seeing with the 5.4 kernel but not
>> with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
>> challenging at the moment but will happen eventually.
>>
>> What we are seeing is a ~3x increase in the time needed for
>> alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
>> is idle at the time and there are no other contenders for memory other
>> than the user-space programs already started (DHCP client, shell, etc.).
Hi,
If you can easily reproduce it might be worth to just try bisecting;
that could be faster than manually poking around in the code.
Also, it would be worth having a look at the state of upstream Linux.
Upstream Linux developers tend to not care about minor performance
regressions on oldish kernels.
There has been work on improving exactly the situation you are
describing -- a "fail fast" / "no retry" mode for alloc_contig_range().
Maybe it tackles exactly this issue.
https://lkml.kernel.org/r/20210121175502.274391-3-minchan@kernel.org
Minchan is already on cc.
(next time, please cc linux-mm on core-mm questions; maybe you tried,
but ended up with linux-mmc :) )
>>
>> I have tried playing with the compact_control structure settings but
>> have not found anything that would bring us back to the performance of
>> 4.9. More often than not, we see test_pages_isolated() returning an
>> non-zero error code which would explain the slow down, since we have
>> some logic that re-tries the allocation if alloc_contig_range() returns
>> -EBUSY. If I remove the retry logic however, we don't get -EBUSY and we
>> get the results below:
>>
>> 4.9 shows this:
>>
>> [ 457.537634] allocating: size: 1024MB avg: 59172 (us), max: 137306
>> (us), min: 44859 (us), total: 591723 (us), pages: 512, per-page: 115 (us)
>> [ 457.550222] freeing: size: 1024MB avg: 67397 (us), max: 151408 (us),
>> min: 52630 (us), total: 673974 (us), pages: 512, per-page: 131 (us)
>>
>> 5.4 show this:
>>
>> [ 222.388758] allocating: size: 1024MB avg: 156739 (us), max: 157254
>> (us), min: 155915 (us), total: 1567394 (us), pages: 512, per-page: 306 (us)
>> [ 222.401601] freeing: size: 1024MB avg: 209899 (us), max: 210085 (us),
>> min: 209749 (us), total: 2098999 (us), pages: 512, per-page: 409 (us)
>>
>> This regression is not seen when MIGRATE_CMA is specified instead of
>> MIGRATE_MOVABLE.
>>
>> A few characteristics that you should probably be aware of:
>>
>> - There is 4GB of memory populated with the memory being mapped into the
>> CPU's address starting at space at 0x4000_0000 (1GB), PAGE_SIZE is 4KB
>>
>> - there is a ZONE_DMA32 that starts at 0x4000_0000 and ends at
>> 0xE480_0000, from there on we have a ZONE_MOVABLE which is comprised of
>> 0xE480_0000 - 0xfdc00000 and another range spanning 0x1_0000_0000 -
>> 0x1_4000_0000
>>
>> Attached is the kernel configuration.
>>
>> Thanks!
>> --
>> Florian
>
>
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-04-22 8:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-21 18:36 alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9 Florian Fainelli
2021-04-22 7:49 ` Michal Hocko
2021-04-22 8:56 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox