linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] hugetlb: add support gigantic page allocation at runtime
@ 2014-04-02 18:08 Luiz Capitulino
  2014-04-02 18:08 ` [PATCH 1/4] hugetlb: add hstate_is_gigantic() Luiz Capitulino
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Luiz Capitulino @ 2014-04-02 18:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, mtosatti, aarcange, mgorman, akpm, andi, davidlohr,
	rientjes, isimatu.yasuaki, yinghai, riel

The HugeTLB subsystem uses the buddy allocator to allocate hugepages during
runtime. This means that hugepages allocation during runtime is limited to
MAX_ORDER order. For archs supporting gigantic pages (that is, page sizes
greater than MAX_ORDER), this in turn means that those pages can't be
allocated at runtime.

HugeTLB supports gigantic page allocation during boottime, via the boot
allocator. To this end the kernel provides the command-line options
hugepagesz= and hugepages=, which can be used to instruct the kernel to
allocate N gigantic pages during boot.

For example, x86_64 supports 2M and 1G hugepages, but only 2M hugepages can
be allocated and freed at runtime. If one wants to allocate 1G gigantic pages,
this has to be done at boot via the hugepagesz= and hugepages= command-line
options.

Now, gigantic page allocation at boottime has two serious problems:

 1. Boottime allocation is not NUMA aware. On a NUMA machine the kernel
    evenly distributes boottime allocated hugepages among nodes.

    For example, suppose you have a four-node NUMA machine and want
    to allocate four 1G gigantic pages at boottime. The kernel will
    allocate one gigantic page per node.

    On the other hand, we do have users who want to be able to specify
    which NUMA node gigantic pages should allocated from. So that they
    can place virtual machines on a specific NUMA node.

 2. Gigantic pages allocated at boottime can't be freed

At this point it's important to observe that regular hugepages allocated
at runtime don't have those problems. This is so because HugeTLB interface
for runtime allocation in sysfs supports NUMA and runtime allocated pages
can be freed just fine via the buddy allocator.

This series adds support for allocating gigantic pages at runtime. It does
so by allocating gigantic pages via CMA instead of the buddy allocator.
Releasing gigantic pages is also supported via CMA. As this series builds
on top of the existing HugeTLB interface, it makes gigantic page allocation
and releasing just like regular sized hugepages. This also means that NUMA
support just works.

For example, to allocate two 1G gigantic pages on node 1, one can do:

 # echo 2 > \
   /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

And, to release all gigantic pages on the same node:

 # echo 0 > \
   /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

Please, refer to patch 4/4 for full technical details.

Finally, please note that this series is a follow up for a previous series
that tried to extend the command-line options set to be NUMA aware:

 http://marc.info/?l=linux-mm&m=139593335312191&w=2

During the discussion of that series it was agreed that having runtime
allocation support for gigantic pages was a better solution.

Luiz Capitulino (4):
  hugetlb: add hstate_is_gigantic()
  hugetlb: update_and_free_page(): don't clear PG_reserved bit
  hugetlb: move helpers up in the file
  hugetlb: add support for gigantic page allocation at runtime

 arch/x86/include/asm/hugetlb.h |  10 ++
 include/linux/hugetlb.h        |   5 +
 mm/hugetlb.c                   | 344 ++++++++++++++++++++++++++++++-----------
 3 files changed, 265 insertions(+), 94 deletions(-)

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-04-09  0:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-02 18:08 [PATCH 0/4] hugetlb: add support gigantic page allocation at runtime Luiz Capitulino
2014-04-02 18:08 ` [PATCH 1/4] hugetlb: add hstate_is_gigantic() Luiz Capitulino
2014-04-07 17:57   ` Naoya Horiguchi
2014-04-08  2:00   ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 2/4] hugetlb: update_and_free_page(): don't clear PG_reserved bit Luiz Capitulino
2014-04-07 17:58   ` Naoya Horiguchi
2014-04-08  2:01   ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 3/4] hugetlb: move helpers up in the file Luiz Capitulino
2014-04-07 17:58   ` Naoya Horiguchi
2014-04-08  2:01   ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 4/4] hugetlb: add support for gigantic page allocation at runtime Luiz Capitulino
2014-04-04  3:05   ` Yasuaki Ishimatsu
2014-04-04 13:30     ` Luiz Capitulino
2014-04-08  1:58       ` Yasuaki Ishimatsu
2014-04-07 17:58   ` Naoya Horiguchi
     [not found]   ` <1396893509-x52fgnka@n-horiguchi@ah.jp.nec.com>
2014-04-07 18:49     ` Luiz Capitulino
2014-04-07 19:03       ` Naoya Horiguchi
2014-04-08 22:51       ` Andrew Morton
2014-04-09  0:29         ` Luiz Capitulino
2014-04-03 15:33 ` [PATCH 0/4] hugetlb: add support " Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).