* HugePage by default @ 2014-07-30 19:41 Xin Tong 2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu 0 siblings, 1 reply; 9+ messages in thread From: Xin Tong @ 2014-07-30 19:41 UTC (permalink / raw) To: kernelnewbies Hi Is there anyway for me to turn on HugePage by default in the Linux X86 kernel, i.e. allocate a 2MB page by default in place of 4KB now ? Thanks, Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/08a85d82/attachment.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 19:41 HugePage by default Xin Tong @ 2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu 2014-07-30 20:06 ` Xin Tong 0 siblings, 1 reply; 9+ messages in thread From: Valdis.Kletnieks at vt.edu @ 2014-07-30 19:57 UTC (permalink / raw) To: kernelnewbies On Wed, 30 Jul 2014 14:41:26 -0500, Xin Tong said: > Is there anyway for me to turn on HugePage by default in the Linux X86 > kernel, i.e. allocate a 2MB page by default in place of 4KB now ? Possibly related config entries to research: CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set However, if you allocate a single 4K page, that *won't* automatically promote it to a hugepage - you need to allocate 2M of contiguous virtual address space with the same access flags for it to coalesce into a hugepage. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 848 bytes Desc: not available Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/75988bde/attachment.bin ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu @ 2014-07-30 20:06 ` Xin Tong 2014-07-30 22:22 ` Valdis.Kletnieks at vt.edu 2014-07-31 4:06 ` Rik van Riel 0 siblings, 2 replies; 9+ messages in thread From: Xin Tong @ 2014-07-30 20:06 UTC (permalink / raw) To: kernelnewbies I see 2 ways to do this. 1. allocate 512 4KB contiguous pages everytime a handle_mm_fault is called and have the THP kernel thread to coalesce it to a huge page. 2. modify the kernel (maybe extensively) to allocate 2MB page by default. I like 1. better because it requires less modifications. but it is not as reliable. any suggestions Xin On Wed, Jul 30, 2014 at 2:57 PM, <Valdis.Kletnieks@vt.edu> wrote: > On Wed, 30 Jul 2014 14:41:26 -0500, Xin Tong said: > > > Is there anyway for me to turn on HugePage by default in the Linux X86 > > kernel, i.e. allocate a 2MB page by default in place of 4KB now ? > > Possibly related config entries to research: > > CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y > CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y > CONFIG_TRANSPARENT_HUGEPAGE=y > CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y > # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set > > However, if you allocate a single 4K page, that *won't* automatically > promote it to a hugepage - you need to allocate 2M of contiguous virtual > address space with the same access flags for it to coalesce into a > hugepage. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/40e1cfd2/attachment.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 20:06 ` Xin Tong @ 2014-07-30 22:22 ` Valdis.Kletnieks at vt.edu 2014-07-30 23:26 ` Xin Tong 2014-07-31 4:06 ` Rik van Riel 1 sibling, 1 reply; 9+ messages in thread From: Valdis.Kletnieks at vt.edu @ 2014-07-30 22:22 UTC (permalink / raw) To: kernelnewbies On Wed, 30 Jul 2014 15:06:39 -0500, Xin Tong said: > 2. modify the kernel (maybe extensively) to allocate 2MB page by default. How fast do you run out of memory if you do that every time you actually only need a few 4K pages? (In other words - think what that isn't the default behavior already :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 848 bytes Desc: not available Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/bf663273/attachment.bin ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 22:22 ` Valdis.Kletnieks at vt.edu @ 2014-07-30 23:26 ` Xin Tong 2014-07-31 0:26 ` Valdis.Kletnieks at vt.edu 0 siblings, 1 reply; 9+ messages in thread From: Xin Tong @ 2014-07-30 23:26 UTC (permalink / raw) To: kernelnewbies On Wed, Jul 30, 2014 at 5:22 PM, <Valdis.Kletnieks@vt.edu> wrote: > On Wed, 30 Jul 2014 15:06:39 -0500, Xin Tong said: > > > > 2. modify the kernel (maybe extensively) to allocate 2MB page by default. > > How fast do you run out of memory if you do that every time you actually > only need a few 4K pages? (In other words - think what that isn't the > default behavior already :) > ?I am planning to use this only for workloads with very large memory footprints, e.g. hadoop, tpcc, etc. BTW, i see Linux kernel uses the hugetlbfs to manage hugepages. every api call, mmap, shmget?, etc, all create a hugetlbfs before the hugepages can be allocated. why can not huge pages be allocated the same way as 4K pages ? whats the point of having the hugetlbfs. Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/a24dbb64/attachment.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 23:26 ` Xin Tong @ 2014-07-31 0:26 ` Valdis.Kletnieks at vt.edu 2014-07-31 0:49 ` Xin Tong 0 siblings, 1 reply; 9+ messages in thread From: Valdis.Kletnieks at vt.edu @ 2014-07-31 0:26 UTC (permalink / raw) To: kernelnewbies On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said: > I am planning to use this only for workloads with very large memory > footprints, e.g. hadoop, tpcc, etc. You might want to look at how your system gets booted. I think you'll find that you burn through 800 to 2000 or so processes, all of which are currently tiny, but if you make every 4K allocation grab 2M instead, you're quite likely to find yourself tripping the OOM before hadoop ever gets launched. You're probably *much* better off letting the current code do its work, since you'll only pay the coalesce cost once for each 2M that hadoop uses. And let's face it, that's only going to sum up to fractions of a second, and then hadoop is going to be banging on the TLB for hours or days. Don't spend time optimizing the wrong thing.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 848 bytes Desc: not available Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/e306ca4e/attachment.bin ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-31 0:26 ` Valdis.Kletnieks at vt.edu @ 2014-07-31 0:49 ` Xin Tong 2014-07-31 0:53 ` Xin Tong 0 siblings, 1 reply; 9+ messages in thread From: Xin Tong @ 2014-07-31 0:49 UTC (permalink / raw) To: kernelnewbies ?? How bad is the internal fragmentation going to be if 2M pages are used ? some of the small vmas are stack, shared libraries and user mmapped files. I assume heap is going to be 2M at least, which is somewhat reasonable. shared library vmas can be merged to form large vmas as they have the same permission mostly. only one stack is needed per thread. I think the big culprit for internal fragmentation here is the user mmaped files. Am i right to think as above ? Xin On Wed, Jul 30, 2014 at 7:26 PM, <Valdis.Kletnieks@vt.edu> wrote: > On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said: > > > I am planning to use this only for workloads with very large memory > > footprints, e.g. hadoop, tpcc, etc. > > You might want to look at how your system gets booted. I think you'll find > that you burn through 800 to 2000 or so processes, all of which are > currently > tiny, but if you make every 4K allocation grab 2M instead, you're quite > likely > to find yourself tripping the OOM before hadoop ever gets launched. > > You're probably *much* better off letting the current code do its work, > since you'll only pay the coalesce cost once for each 2M that hadoop uses. > And let's face it, that's only going to sum up to fractions of a second, > and > then hadoop is going to be banging on the TLB for hours or days. > > Don't spend time optimizing the wrong thing.... > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/b0b7a82c/attachment.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-31 0:49 ` Xin Tong @ 2014-07-31 0:53 ` Xin Tong 0 siblings, 0 replies; 9+ messages in thread From: Xin Tong @ 2014-07-31 0:53 UTC (permalink / raw) To: kernelnewbies just to clarify. I am doing this for the possibility of a research project. the feeling is that maybe 4KB is no longer the best page size. here is what i wrote sometime ago. "Memory size has increased significantly since the introduction of the X86 virtual memory system in 1985. However, the size of a page has stayed at 4KB. Most virtual memory systems make use of Translation Look-aside Buffer to reduce the cost of translation. Due to the fact that TLB sits in the critical path of every memory access, its size is limited by its strict latency requirement. Over the years, the size of the L1 TLB has stayed well below 256 entries in most commercial processors."? On Wed, Jul 30, 2014 at 7:49 PM, Xin Tong <trent.tong@gmail.com> wrote: > ?? > How bad is the internal fragmentation going to be if 2M pages are used ? > some of the small vmas are stack, shared libraries and user mmapped files. > I assume heap is going to be 2M at least, which is somewhat reasonable. > > shared library vmas can be merged to form large vmas as they have the same > permission mostly. only one stack is needed per thread. I think the big > culprit for internal fragmentation here is the user mmaped files. > > Am i right to think as above ? > > Xin > On Wed, Jul 30, 2014 at 7:26 PM, <Valdis.Kletnieks@vt.edu> wrote: > >> On Wed, 30 Jul 2014 18:26:39 -0500, Xin Tong said: >> >> > I am planning to use this only for workloads with very large memory >> > footprints, e.g. hadoop, tpcc, etc. >> >> You might want to look at how your system gets booted. I think you'll >> find >> that you burn through 800 to 2000 or so processes, all of which are >> currently >> tiny, but if you make every 4K allocation grab 2M instead, you're quite >> likely >> to find yourself tripping the OOM before hadoop ever gets launched. >> >> You're probably *much* better off letting the current code do its work, >> since you'll only pay the coalesce cost once for each 2M that hadoop uses. >> And let's face it, that's only going to sum up to fractions of a second, >> and >> then hadoop is going to be banging on the TLB for hours or days. >> >> Don't spend time optimizing the wrong thing.... >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/4022716c/attachment.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* HugePage by default 2014-07-30 20:06 ` Xin Tong 2014-07-30 22:22 ` Valdis.Kletnieks at vt.edu @ 2014-07-31 4:06 ` Rik van Riel 1 sibling, 0 replies; 9+ messages in thread From: Rik van Riel @ 2014-07-31 4:06 UTC (permalink / raw) To: kernelnewbies -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/30/2014 04:06 PM, Xin Tong wrote: > I see 2 ways to do this. > > 1. allocate 512 4KB contiguous pages everytime a handle_mm_fault is > called and have the THP kernel thread to coalesce it to a huge > page. 2. modify the kernel (maybe extensively) to allocate 2MB > page by default. > > I like 1. better because it requires less modifications. but it is > not as reliable. any suggestions The kernel already does both of the above when CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y - -- All rights reversed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJT2cCrAAoJEM553pKExN6DAFEH/2K7GdK21t+sEB8rletBkyX7 88xGjdvmtFqN1eJI/kcoq+xDyTSfH0+/C/NoXggnxDNOw0yEmKo9rarFaazlaVXH /pPS+jxkuislTsIzhLpLXJH0o8fi742ZCYU4OAXCJ4ZPiFxo0I8gJGWzsjvsbcqa fY6eDAdB2Zn70ABHLlp5iiEbVGf32L5OnwrtXDZTJqlX2GNZxmtBlYn4dr7Pzfl/ eio6u9JXk98ECKKKoHme/PFVSrpjUiOB8gzMDocs1bAuVSJGD36htH3HyI27lM5V ruSM9dvQZe0XupELT5Xvu1IP0lmAXacKu/FvAZbVg1+3m1nYjLqdd+vONMkOmCE= =6i2v -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-07-31 4:06 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-30 19:41 HugePage by default Xin Tong 2014-07-30 19:57 ` Valdis.Kletnieks at vt.edu 2014-07-30 20:06 ` Xin Tong 2014-07-30 22:22 ` Valdis.Kletnieks at vt.edu 2014-07-30 23:26 ` Xin Tong 2014-07-31 0:26 ` Valdis.Kletnieks at vt.edu 2014-07-31 0:49 ` Xin Tong 2014-07-31 0:53 ` Xin Tong 2014-07-31 4:06 ` Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).