From: Luiz Capitulino <lcapitulino@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
mtosatti@redhat.com, aarcange@redhat.com, mgorman@suse.de,
andi@firstfloor.org, davidlohr@hp.com, rientjes@google.com,
isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
riel@redhat.com, n-horiguchi@ah.jp.nec.com, kirill@shutemov.name
Subject: Re: [PATCH 5/5] hugetlb: add support for gigantic page allocation at runtime
Date: Tue, 22 Apr 2014 17:19:46 -0400 [thread overview]
Message-ID: <20140422171946.081df5ca@redhat.com> (raw)
In-Reply-To: <20140417160039.28e031760e7546ee54c6fc7b@linux-foundation.org>
On Thu, 17 Apr 2014 16:00:39 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 10 Apr 2014 13:58:45 -0400 Luiz Capitulino <lcapitulino@redhat.com> wrote:
>
> > HugeTLB is limited to allocating hugepages whose size are less than
> > MAX_ORDER order. This is so because HugeTLB allocates hugepages via
> > the buddy allocator. Gigantic pages (that is, pages whose size is
> > greater than MAX_ORDER order) have to be allocated at boottime.
> >
> > However, boottime allocation has at least two serious problems. First,
> > it doesn't support NUMA and second, gigantic pages allocated at
> > boottime can't be freed.
> >
> > This commit solves both issues by adding support for allocating gigantic
> > pages during runtime. It works just like regular sized hugepages,
> > meaning that the interface in sysfs is the same, it supports NUMA,
> > and gigantic pages can be freed.
> >
> > For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
> > gigantic pages on node 1, one can do:
> >
> > # echo 2 > \
> > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
> >
> > And to free them all:
> >
> > # echo 0 > \
> > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
> >
> > The one problem with gigantic page allocation at runtime is that it
> > can't be serviced by the buddy allocator. To overcome that problem, this
> > commit scans all zones from a node looking for a large enough contiguous
> > region. When one is found, it's allocated by using CMA, that is, we call
> > alloc_contig_range() to do the actual allocation. For example, on x86_64
> > we scan all zones looking for a 1GB contiguous region. When one is found,
> > it's allocated by alloc_contig_range().
> >
> > One expected issue with that approach is that such gigantic contiguous
> > regions tend to vanish as runtime goes by. The best way to avoid this for
> > now is to make gigantic page allocations very early during system boot, say
> > from a init script. Other possible optimization include using compaction,
> > which is supported by CMA but is not explicitly used by this commit.
>
> Why aren't we using compaction?
The main reason is that I'm not sure what's the best way to use it in the
context of a 1GB allocation. I mean, the most obvious way (which seems to
be what the DMA subsystem does) is trial and error: just pass a gigantic
PFN range to alloc_contig_range() and if it fails you go to the next range
(or try again in certain cases). This might work, but to be honest I'm not
sure what are the implications of doing that for a 1GB range, especially
because compaction (as implemented by CMA) is synchronous.
As I see compaction usage as an optimization, I've opted for submitting the
simplest implementation that works. I've tested this series on two NUMA
machines and it worked just fine. Future improvements can be done on top.
Also note that this is about HugeTLB making use of compaction automatically.
There's nothing in this series that prevents the user from manually compacting
memory by writing to /sys/devices/system/node/nodeN/compact. As HugeTLB
page reservation is a manual procedure anyways, I don't think that manually
starting compaction is that bad.
next prev parent reply other threads:[~2014-04-22 21:20 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-10 17:58 [PATCH v3 0/5] hugetlb: add support gigantic page allocation at runtime Luiz Capitulino
2014-04-10 17:58 ` [PATCH 1/5] hugetlb: prep_compound_gigantic_page(): drop __init marker Luiz Capitulino
2014-04-10 17:58 ` [PATCH 2/5] hugetlb: add hstate_is_gigantic() Luiz Capitulino
2014-04-10 17:58 ` [PATCH 3/5] hugetlb: update_and_free_page(): don't clear PG_reserved bit Luiz Capitulino
2014-04-10 17:58 ` [PATCH 4/5] hugetlb: move helpers up in the file Luiz Capitulino
2014-04-10 17:58 ` [PATCH 5/5] hugetlb: add support for gigantic page allocation at runtime Luiz Capitulino
2014-04-13 23:31 ` Yasuaki Ishimatsu
2014-04-17 23:00 ` Andrew Morton
2014-04-22 21:19 ` Luiz Capitulino [this message]
2014-04-10 21:44 ` [PATCH v3 0/5] hugetlb: add support " Davidlohr Bueso
2014-04-11 12:08 ` Kirill A. Shutemov
2014-04-14 7:31 ` Zhang Yanfei
2014-04-17 15:13 ` Luiz Capitulino
2014-04-17 18:52 ` Andrew Morton
2014-04-17 19:09 ` Luiz Capitulino
2014-04-17 23:01 ` Andrew Morton
2014-04-22 21:37 ` Luiz Capitulino
2014-04-22 21:55 ` Andrew Morton
2014-04-25 20:18 ` Luiz Capitulino
-- strict thread matches above, loose matches on Subject: below --
2014-04-08 19:02 [PATCH v2 " Luiz Capitulino
2014-04-08 19:02 ` [PATCH 5/5] hugetlb: add support for " Luiz Capitulino
2014-04-09 0:42 ` Yasuaki Ishimatsu
2014-04-09 17:56 ` Luiz Capitulino
2014-04-10 4:39 ` Yasuaki Ishimatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140422171946.081df5ca@redhat.com \
--to=lcapitulino@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davidlohr@hp.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mtosatti@redhat.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).