From: Vlastimil Babka <vbabka@suse.cz>
To: Gioh Kim <gioh.kim@lge.com>,
akpm@linux-foundation.org, mgorman@suse.de, riel@redhat.com,
hannes@cmpxchg.org, rientjes@google.com, vdavydov@parallels.com,
iamjoonsoo.kim@lge.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, gunho.lee@lge.com
Subject: Re: [RFCv2] mm: page allocation for less fragmentation
Date: Wed, 01 Apr 2015 14:05:30 +0200 [thread overview]
Message-ID: <551BDF0A.2090503@suse.cz> (raw)
In-Reply-To: <551343E3.3050709@lge.com>
On 03/26/2015 12:25 AM, Gioh Kim wrote:
>
>
> 2015-03-26 i??i ? 7:16i?? Vlastimil Babka i?'(e??) i?' e,?:
>> On 25.3.2015 3:39, Gioh Kim wrote:
>>> My driver allocates more than 40MB pages via alloc_page() at a time and
>>> maps them at virtual address. Totally it uses 300~400MB pages.
>>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>>> because-of the external fragmentation.
>>>
>>> I thought I needed a anti-fragmentation solution for my driver.
>>> But there is no allocation function that considers fragmentation.
>>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>>
>>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>>
>>> I tested this patch like following:
>>>
>>> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
>>> Node 0, zone Normal, type Unmovable 3864 728 394 216 129 47 18 9 1 0 0
>>> Node 0, zone Normal, type Reclaimable 902 96 68 17 3 0 1 0 0 0 0
>>> Node 0, zone Normal, type Movable 5146 663 178 91 43 16 4 0 0 0 0
>>> Node 0, zone Normal, type Reserve 1 4 6 6 2 1 1 1 0 1 1
>>> Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
>>> Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
>>>
>>> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
>>> Node 0, zone Normal 135 3 124 2 0 0
>>> Node 0, zone Normal 9880 1489 647 332 177 64 24 10 1 1 1
>>>
>>> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
>>
>> This is not a good test setup. You shouldn't switch the allocation types during
>> single system boot. You should compare results from a boot where common
>> allocation is used and from a boot where your new allocation is used.
>
> The new allocator is slower so I don't think it can replace current allocator.
> I don't aim to change general allocator.
I don't say you should replace current allocator for everything. Use it
just for your driver, that's fine. But when you perform/simulate your
driver allocation, use either the general allocator or the new
allocator, don't change from one to another during a single boot.
> The main pupose of the new allocator is a specific allocator if system has too much fragmentation.
> If some drivers consume much memory and generate fragmentation, it can use new allocator instead at the time.
> I want to make a kind of compaction for drivers that allocates unmovable pages.
>
> Therefore I tested like that.
> I first generated fragmentation and called the new allocator.
> I wanted to check whether the fragmentation was caused by my driver
> and the pages of the driver was able to be compacted.
> I thought the pages was compacted.
>
> If I freed pages and called the commmon allocator again,
> it could decrease a little fragmentation (not much as the new allocator).
> But there was no pages compaction and fragmentation would increase soon.
Yes, we need data comparing common/new allocator in the same scenario.
Presumably that's what you have in v3 submission.
>
>
>>
>>> This is a kind of compaction of the driver.
>>> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
>>> Node 0, zone Normal, type Unmovable 8 5 1 432 272 91 37 11 1 0 0
>>> Node 0, zone Normal, type Reclaimable 901 96 68 17 3 0 1 0 0 0 0
>>> Node 0, zone Normal, type Movable 4790 776 192 91 43 16 4 0 0 0 0
>>> Node 0, zone Normal, type Reserve 1 4 6 6 2 1 1 1 0 1 1
>>> Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
>>> Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
>>>
>>> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
>>> Node 0, zone Normal 135 3 124 2 0 0
>>> Node 0, zone Normal 5693 877 266 544 320 108 43 12 1 1 1
>>
>> The number of unmovable pageblocks didn't change here. The stats for free
>> unmovable pages does look better for higher orders than in the first listing
>> above, but even the common allocation logic would give you that result, if you
>> allocated your 400 MB using (many) order-0 allocations (since you apparently
>> don't care about physically contiguous memory). That would also prefer order-0
>> free pages before splitting higher orders. So this doesn't demonstrate benefits
>> of the alloc_pages_compact() approach I'm afraid. The results suggest that the
>> system was in a worst state when the first allocation happened, and meanwhile
>> some pages were freed, creating the large numbers of order-0 unmovable free
>> pages. Or maybe the system got fragmented in the first allocation because your
>> driver tries to allocate the memory with high-order allocations before falling
>> back to lower orders? That would probably defeat the natural anti-fragmentation
>> of the buddy system.
>
> My driver is allocating pages only with alloc_page, not alloc_pages with high order.
>
> Yes, if I freed pages and called alloc_page again, it could decrease fragmentation at the time.
> But there was no compaction and fragmentation would increase soon,
> because the allocated pages was scattered all over the system.
>
> The new allocator compacts pages. I believe it can decrease fragmentation for long time.
If that's what v3 shows, ok. Let me check.
>>
>> So a proper test could be based on this:
>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot
>> allocate even order=3 pages
>>> because-of the external fragmentation.
>>
>> With this patch, is the situation quantifiably better? Can you post the
>> pagetype/buddyinfo for system boot where all driver allocations use the common
>> allocator, and system boot with the patch? That should be comparable if the
>> workload is the same for both boots.
>>
>
> OK. I'll. I can be good test.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2015-04-01 12:05 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-25 2:39 [RFCv2] mm: page allocation for less fragmentation Gioh Kim
2015-03-25 10:56 ` Mel Gorman
2015-03-25 21:16 ` Gioh Kim
2015-03-26 10:28 ` Mel Gorman
2015-03-27 0:51 ` Gioh Kim
2015-03-25 22:16 ` Vlastimil Babka
2015-03-25 23:25 ` Gioh Kim
2015-04-01 12:05 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551BDF0A.2090503@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=gioh.kim@lge.com \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).