Re: [PATCH 4/4] hugetlb: add support for gigantic page allocation at runtime

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Luiz Capitulino <lcapitulino@redhat.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	mtosatti@redhat.com, aarcange@redhat.com, mgorman@suse.de,
	akpm@linux-foundation.org, andi@firstfloor.org, davidlohr@hp.com,
	rientjes@google.com, isimatu.yasuaki@jp.fujitsu.com,
	yinghai@kernel.org, riel@redhat.com
Subject: Re: [PATCH 4/4] hugetlb: add support for gigantic page allocation at runtime
Date: Mon, 7 Apr 2014 14:49:35 -0400	[thread overview]
Message-ID: <20140407144935.259d4301@redhat.com> (raw)
In-Reply-To: <1396893509-x52fgnka@n-horiguchi@ah.jp.nec.com>

On Mon, 07 Apr 2014 13:58:29 -0400
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> On Wed, Apr 02, 2014 at 02:08:48PM -0400, Luiz Capitulino wrote:
> > HugeTLB is limited to allocating hugepages whose size are less than
> > MAX_ORDER order. This is so because HugeTLB allocates hugepages via
> > the buddy allocator. Gigantic pages (that is, pages whose size is
> > greater than MAX_ORDER order) have to be allocated at boottime.
> > 
> > However, boottime allocation has at least two serious problems. First,
> > it doesn't support NUMA and second, gigantic pages allocated at
> > boottime can't be freed.
> > 
> > This commit solves both issues by adding support for allocating gigantic
> > pages during runtime. It works just like regular sized hugepages,
> > meaning that the interface in sysfs is the same, it supports NUMA,
> > and gigantic pages can be freed.
> > 
> > For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
> > gigantic pages on node 1, one can do:
> > 
> >  # echo 2 > \
> >    /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
> > 
> > And to free them later:
> > 
> >  # echo 0 > \
> >    /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
> > 
> > The one problem with gigantic page allocation at runtime is that it
> > can't be serviced by the buddy allocator. To overcome that problem, this
> > series scans all zones from a node looking for a large enough contiguous
> > region. When one is found, it's allocated by using CMA, that is, we call
> > alloc_contig_range() to do the actual allocation. For example, on x86_64
> > we scan all zones looking for a 1GB contiguous region. When one is found
> > it's allocated by alloc_contig_range().
> > 
> > One expected issue with that approach is that such gigantic contiguous
> > regions tend to vanish as time goes by. The best way to avoid this for
> > now is to make gigantic page allocations very early during boot, say
> > from a init script. Other possible optimization include using compaction,
> > which is supported by CMA but is not explicitly used by this commit.
> > 
> > It's also important to note the following:
> > 
> >  1. My target systems are x86_64 machines, so I have only tested 1GB
> >     pages allocation/release. I did try to make this arch indepedent
> >     and expect it to work on other archs but didn't try it myself
> > 
> >  2. I didn't add support for hugepage overcommit, that is allocating
> >     a gigantic page on demand when
> >    /proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
> >    think it's reasonable to do the hard and long work required for
> >    allocating a gigantic page at fault time. But it should be simple
> >    to add this if wanted
> > 
> > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> 
> I agree to the basic idea. One question below ...

Good to hear that.

> > ---
> >  arch/x86/include/asm/hugetlb.h |  10 +++
> >  mm/hugetlb.c                   | 177 ++++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 176 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> > index a809121..2b262f7 100644
> > --- a/arch/x86/include/asm/hugetlb.h
> > +++ b/arch/x86/include/asm/hugetlb.h
> > @@ -91,6 +91,16 @@ static inline void arch_release_hugepage(struct page *page)
> >  {
> >  }
> >  
> > +static inline int arch_prepare_gigantic_page(struct page *page)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void arch_release_gigantic_page(struct page *page)
> > +{
> > +}
> > +
> > +
> >  static inline void arch_clear_hugepage_flags(struct page *page)
> >  {
> >  }
> 
> These are defined only on arch/x86, but called in generic code.
> Does it cause build failure on other archs?

Hmm, probably. The problem here is that I'm unable to test this
code in other archs. So I think the best solution for the first
merge is to make the build of this feature conditional to x86_64?
Then the first person interested in making this work in other
archs add the generic code. Sounds reasonable?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-04-07 18:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02 18:08 [PATCH 0/4] hugetlb: add support gigantic page allocation at runtime Luiz Capitulino
2014-04-02 18:08 ` Luiz Capitulino
2014-04-02 18:08 ` [PATCH 1/4] hugetlb: add hstate_is_gigantic() Luiz Capitulino
2014-04-02 18:08   ` Luiz Capitulino
2014-04-07 17:57   ` Naoya Horiguchi
2014-04-08  2:00   ` Yasuaki Ishimatsu
2014-04-08  2:00     ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 2/4] hugetlb: update_and_free_page(): don't clear PG_reserved bit Luiz Capitulino
2014-04-02 18:08   ` Luiz Capitulino
2014-04-07 17:58   ` Naoya Horiguchi
2014-04-08  2:01   ` Yasuaki Ishimatsu
2014-04-08  2:01     ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 3/4] hugetlb: move helpers up in the file Luiz Capitulino
2014-04-02 18:08   ` Luiz Capitulino
2014-04-07 17:58   ` Naoya Horiguchi
2014-04-08  2:01   ` Yasuaki Ishimatsu
2014-04-08  2:01     ` Yasuaki Ishimatsu
2014-04-02 18:08 ` [PATCH 4/4] hugetlb: add support for gigantic page allocation at runtime Luiz Capitulino
2014-04-02 18:08   ` Luiz Capitulino
2014-04-04  3:05   ` Yasuaki Ishimatsu
2014-04-04  3:05     ` Yasuaki Ishimatsu
2014-04-04 13:30     ` Luiz Capitulino
2014-04-04 13:30       ` Luiz Capitulino
2014-04-08  1:58       ` Yasuaki Ishimatsu
2014-04-08  1:58         ` Yasuaki Ishimatsu
2014-04-07 17:58   ` Naoya Horiguchi
     [not found]   ` <1396893509-x52fgnka@n-horiguchi@ah.jp.nec.com>
2014-04-07 18:49     ` Luiz Capitulino [this message]
2014-04-07 19:03       ` Naoya Horiguchi
2014-04-08 22:51       ` Andrew Morton
2014-04-08 22:51         ` Andrew Morton
2014-04-09  0:29         ` Luiz Capitulino
2014-04-09  0:29           ` Luiz Capitulino
2014-04-03 15:33 ` [PATCH 0/4] hugetlb: add support " Andrea Arcangeli
2014-04-03 15:33   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140407144935.259d4301@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=davidlohr@hp.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mtosatti@redhat.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.