kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* HugePage by default
@ 2014-07-30 19:41 Xin Tong
  2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Tong @ 2014-07-30 19:41 UTC (permalink / raw)
  To: kernelnewbies

Hi

Is there anyway for me to turn on HugePage by default in the Linux X86
kernel, i.e. allocate a 2MB page by default in place of 4KB now ?
Thanks,
Xin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/08a85d82/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 19:41 HugePage by default Xin Tong
@ 2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu
  2014-07-30 20:06   ` Xin Tong
  0 siblings, 1 reply; 9+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-07-30 19:57 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 30 Jul 2014 14:41:26 -0500, Xin Tong said:

> Is there anyway for me to turn on HugePage by default in the Linux X86
> kernel, i.e. allocate a 2MB page by default in place of 4KB now ?

Possibly related config entries to research:

CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set

However, if you allocate a single 4K page, that *won't* automatically
promote it to a hugepage - you need to allocate 2M of contiguous virtual
address space with the same access flags for it to coalesce into a hugepage.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/75988bde/attachment.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu
@ 2014-07-30 20:06   ` Xin Tong
  2014-07-30 22:22     ` Valdis.Kletnieks at vt.edu
  2014-07-31  4:06     ` Rik van Riel
  0 siblings, 2 replies; 9+ messages in thread
From: Xin Tong @ 2014-07-30 20:06 UTC (permalink / raw)
  To: kernelnewbies

I see 2 ways to do this.

1. allocate 512 4KB contiguous pages everytime a handle_mm_fault is called
and have the THP kernel thread to coalesce it to a huge page.
2. modify the kernel (maybe extensively) to allocate 2MB page by default.

I like 1. better because it requires less modifications. but it is not as
reliable. any suggestions

Xin



On Wed, Jul 30, 2014 at 2:57 PM, <Valdis.Kletnieks@vt.edu> wrote:

> On Wed, 30 Jul 2014 14:41:26 -0500, Xin Tong said:
>
> > Is there anyway for me to turn on HugePage by default in the Linux X86
> > kernel, i.e. allocate a 2MB page by default in place of 4KB now ?
>
> Possibly related config entries to research:
>
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
>
> However, if you allocate a single 4K page, that *won't* automatically
> promote it to a hugepage - you need to allocate 2M of contiguous virtual
> address space with the same access flags for it to coalesce into a
> hugepage.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/40e1cfd2/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 20:06   ` Xin Tong
@ 2014-07-30 22:22     ` Valdis.Kletnieks at vt.edu
  2014-07-30 23:26       ` Xin Tong
  2014-07-31  4:06     ` Rik van Riel
  1 sibling, 1 reply; 9+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-07-30 22:22 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 30 Jul 2014 15:06:39 -0500, Xin Tong said:


> 2. modify the kernel (maybe extensively) to allocate 2MB page by default.

How fast do you run out of memory if you do that every time you actually
only need a few 4K pages?  (In other words - think what that isn't the
default behavior already :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/bf663273/attachment.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 22:22     ` Valdis.Kletnieks at vt.edu
@ 2014-07-30 23:26       ` Xin Tong
  2014-07-31  0:26         ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Tong @ 2014-07-30 23:26 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Jul 30, 2014 at 5:22 PM, <Valdis.Kletnieks@vt.edu> wrote:

> On Wed, 30 Jul 2014 15:06:39 -0500, Xin Tong said:
>
>
> > 2. modify the kernel (maybe extensively) to allocate 2MB page by default.
>
> How fast do you run out of memory if you do that every time you actually
> only need a few 4K pages?  (In other words - think what that isn't the
> default behavior already :)
>

?I am planning to use this only for workloads with very large memory
footprints, e.g. hadoop, tpcc, etc.

BTW, i see Linux kernel uses the hugetlbfs to manage hugepages. every api
call, mmap, shmget?, etc, all create a hugetlbfs before the hugepages can
be allocated. why can not huge pages be allocated the same way as 4K pages
? whats the point of having the hugetlbfs.

Xin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/a24dbb64/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 23:26       ` Xin Tong
@ 2014-07-31  0:26         ` Valdis.Kletnieks at vt.edu
  2014-07-31  0:49           ` Xin Tong
  0 siblings, 1 reply; 9+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-07-31  0:26 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said:

> I am planning to use this only for workloads with very large memory
> footprints, e.g. hadoop, tpcc, etc.

You might want to look at how your system gets booted.  I think you'll find
that you burn through 800 to 2000 or so processes, all of which are currently
tiny, but if you make every 4K allocation grab 2M instead, you're quite likely
to find yourself tripping the OOM before hadoop ever gets launched.

You're probably *much* better off letting the current code do its work,
since you'll only pay the coalesce cost once for each 2M that hadoop uses.
And let's face it, that's only going to sum up to fractions of a second, and
then hadoop is going to be banging on the TLB for hours or days.

Don't spend time optimizing the wrong thing....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/e306ca4e/attachment.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-31  0:26         ` Valdis.Kletnieks at vt.edu
@ 2014-07-31  0:49           ` Xin Tong
  2014-07-31  0:53             ` Xin Tong
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Tong @ 2014-07-31  0:49 UTC (permalink / raw)
  To: kernelnewbies

??
How bad is the internal fragmentation going to be if 2M pages are used ?
some of the small vmas are stack, shared libraries and user mmapped files.
I assume heap is going to be 2M at least, which is somewhat reasonable.

shared library vmas can be merged to form large vmas as they have the same
permission mostly. only one stack is needed per thread. I think the big
culprit for internal fragmentation here is the user mmaped files.

Am i right to think as above ?

Xin
On Wed, Jul 30, 2014 at 7:26 PM, <Valdis.Kletnieks@vt.edu> wrote:

> On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said:
>
> > I am planning to use this only for workloads with very large memory
> > footprints, e.g. hadoop, tpcc, etc.
>
> You might want to look at how your system gets booted.  I think you'll find
> that you burn through 800 to 2000 or so processes, all of which are
> currently
> tiny, but if you make every 4K allocation grab 2M instead, you're quite
> likely
> to find yourself tripping the OOM before hadoop ever gets launched.
>
> You're probably *much* better off letting the current code do its work,
> since you'll only pay the coalesce cost once for each 2M that hadoop uses.
> And let's face it, that's only going to sum up to fractions of a second,
> and
> then hadoop is going to be banging on the TLB for hours or days.
>
> Don't spend time optimizing the wrong thing....
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/b0b7a82c/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-31  0:49           ` Xin Tong
@ 2014-07-31  0:53             ` Xin Tong
  0 siblings, 0 replies; 9+ messages in thread
From: Xin Tong @ 2014-07-31  0:53 UTC (permalink / raw)
  To: kernelnewbies

just to clarify. I am doing this for the possibility of a research project.
the feeling is that maybe 4KB is no longer the best page size. here is what
i wrote sometime ago.

"Memory size has increased significantly since the introduction of the X86
virtual memory system in 1985. However, the size of a page has stayed at
4KB. Most virtual memory systems make use of Translation Look-aside Buffer
to reduce the cost of translation.  Due to the fact that TLB sits in the
critical path of every memory access, its size is limited by its strict
latency requirement. Over the years, the size of the L1 TLB has stayed well
below 256 entries in most commercial processors."?


On Wed, Jul 30, 2014 at 7:49 PM, Xin Tong <trent.tong@gmail.com> wrote:

> ??
> How bad is the internal fragmentation going to be if 2M pages are used ?
> some of the small vmas are stack, shared libraries and user mmapped files.
> I assume heap is going to be 2M at least, which is somewhat reasonable.
>
> shared library vmas can be merged to form large vmas as they have the same
> permission mostly. only one stack is needed per thread. I think the big
> culprit for internal fragmentation here is the user mmaped files.
>
> Am i right to think as above ?
>
> Xin
> On Wed, Jul 30, 2014 at 7:26 PM, <Valdis.Kletnieks@vt.edu> wrote:
>
>> On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said:
>>
>> > I am planning to use this only for workloads with very large memory
>> > footprints, e.g. hadoop, tpcc, etc.
>>
>> You might want to look at how your system gets booted.  I think you'll
>> find
>> that you burn through 800 to 2000 or so processes, all of which are
>> currently
>> tiny, but if you make every 4K allocation grab 2M instead, you're quite
>> likely
>> to find yourself tripping the OOM before hadoop ever gets launched.
>>
>> You're probably *much* better off letting the current code do its work,
>> since you'll only pay the coalesce cost once for each 2M that hadoop uses.
>> And let's face it, that's only going to sum up to fractions of a second,
>> and
>> then hadoop is going to be banging on the TLB for hours or days.
>>
>> Don't spend time optimizing the wrong thing....
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/4022716c/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* HugePage by default
  2014-07-30 20:06   ` Xin Tong
  2014-07-30 22:22     ` Valdis.Kletnieks at vt.edu
@ 2014-07-31  4:06     ` Rik van Riel
  1 sibling, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2014-07-31  4:06 UTC (permalink / raw)
  To: kernelnewbies

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/30/2014 04:06 PM, Xin Tong wrote:
> I see 2 ways to do this.
> 
> 1. allocate 512 4KB contiguous pages everytime a handle_mm_fault is
> called and have the THP kernel thread to coalesce it to a huge 
> page. 2. modify the kernel (maybe extensively) to allocate 2MB
> page by default.
> 
> I like 1. better because it requires less modifications. but it is 
> not as reliable. any suggestions

The kernel already does both of the above when
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

- -- 
All rights reversed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJT2cCrAAoJEM553pKExN6DAFEH/2K7GdK21t+sEB8rletBkyX7
88xGjdvmtFqN1eJI/kcoq+xDyTSfH0+/C/NoXggnxDNOw0yEmKo9rarFaazlaVXH
/pPS+jxkuislTsIzhLpLXJH0o8fi742ZCYU4OAXCJ4ZPiFxo0I8gJGWzsjvsbcqa
fY6eDAdB2Zn70ABHLlp5iiEbVGf32L5OnwrtXDZTJqlX2GNZxmtBlYn4dr7Pzfl/
eio6u9JXk98ECKKKoHme/PFVSrpjUiOB8gzMDocs1bAuVSJGD36htH3HyI27lM5V
ruSM9dvQZe0XupELT5Xvu1IP0lmAXacKu/FvAZbVg1+3m1nYjLqdd+vONMkOmCE=
=6i2v
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-07-31  4:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-30 19:41 HugePage by default Xin Tong
2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu
2014-07-30 20:06   ` Xin Tong
2014-07-30 22:22     ` Valdis.Kletnieks at vt.edu
2014-07-30 23:26       ` Xin Tong
2014-07-31  0:26         ` Valdis.Kletnieks at vt.edu
2014-07-31  0:49           ` Xin Tong
2014-07-31  0:53             ` Xin Tong
2014-07-31  4:06     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).