From: Vlastimil Babka <vbabka@suse.cz>
To: Gioh Kim <gioh.kim@lge.com>,
akpm@linux-foundation.org, mgorman@suse.de, riel@redhat.com,
hannes@cmpxchg.org, rientjes@google.com, vdavydov@parallels.com,
iamjoonsoo.kim@lge.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, gunho.lee@lge.com
Subject: Re: [RFCv2] mm: page allocation for less fragmentation
Date: Wed, 25 Mar 2015 23:16:54 +0100 [thread overview]
Message-ID: <551333D6.20708@suse.cz> (raw)
In-Reply-To: <1427251155-12322-1-git-send-email-gioh.kim@lge.com>
On 25.3.2015 3:39, Gioh Kim wrote:
> My driver allocates more than 40MB pages via alloc_page() at a time and
> maps them at virtual address. Totally it uses 300~400MB pages.
>
> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> because-of the external fragmentation.
>
> I thought I needed a anti-fragmentation solution for my driver.
> But there is no allocation function that considers fragmentation.
> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>
> This patch proposes a allocation function allocates only pages in the same pageblock.
>
> I tested this patch like following:
>
> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>
> Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
> Node 0, zone Normal, type Unmovable 3864 728 394 216 129 47 18 9 1 0 0
> Node 0, zone Normal, type Reclaimable 902 96 68 17 3 0 1 0 0 0 0
> Node 0, zone Normal, type Movable 5146 663 178 91 43 16 4 0 0 0 0
> Node 0, zone Normal, type Reserve 1 4 6 6 2 1 1 1 0 1 1
> Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
> Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
>
> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> Node 0, zone Normal 135 3 124 2 0 0
> Node 0, zone Normal 9880 1489 647 332 177 64 24 10 1 1 1
>
> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
This is not a good test setup. You shouldn't switch the allocation types during
single system boot. You should compare results from a boot where common
allocation is used and from a boot where your new allocation is used.
> This is a kind of compaction of the driver.
> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>
> Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
> Node 0, zone Normal, type Unmovable 8 5 1 432 272 91 37 11 1 0 0
> Node 0, zone Normal, type Reclaimable 901 96 68 17 3 0 1 0 0 0 0
> Node 0, zone Normal, type Movable 4790 776 192 91 43 16 4 0 0 0 0
> Node 0, zone Normal, type Reserve 1 4 6 6 2 1 1 1 0 1 1
> Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
> Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
>
> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> Node 0, zone Normal 135 3 124 2 0 0
> Node 0, zone Normal 5693 877 266 544 320 108 43 12 1 1 1
The number of unmovable pageblocks didn't change here. The stats for free
unmovable pages does look better for higher orders than in the first listing
above, but even the common allocation logic would give you that result, if you
allocated your 400 MB using (many) order-0 allocations (since you apparently
don't care about physically contiguous memory). That would also prefer order-0
free pages before splitting higher orders. So this doesn't demonstrate benefits
of the alloc_pages_compact() approach I'm afraid. The results suggest that the
system was in a worst state when the first allocation happened, and meanwhile
some pages were freed, creating the large numbers of order-0 unmovable free
pages. Or maybe the system got fragmented in the first allocation because your
driver tries to allocate the memory with high-order allocations before falling
back to lower orders? That would probably defeat the natural anti-fragmentation
of the buddy system.
So a proper test could be based on this:
> If I run a heavy load test for a few days in 1GB memory system, I cannot
allocate even order=3 pages
> because-of the external fragmentation.
With this patch, is the situation quantifiably better? Can you post the
pagetype/buddyinfo for system boot where all driver allocations use the common
allocator, and system boot with the patch? That should be comparable if the
workload is the same for both boots.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-25 22:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-25 2:39 [RFCv2] mm: page allocation for less fragmentation Gioh Kim
2015-03-25 10:56 ` Mel Gorman
2015-03-25 21:16 ` Gioh Kim
2015-03-26 10:28 ` Mel Gorman
2015-03-27 0:51 ` Gioh Kim
2015-03-25 22:16 ` Vlastimil Babka [this message]
2015-03-25 23:25 ` Gioh Kim
2015-04-01 12:05 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551333D6.20708@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=gioh.kim@lge.com \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).