* Re: RFC: Transparent Hugepage support [not found] ` <20091028042805.GJ7744@basil.fritz.box> @ 2009-10-29 9:43 ` Ingo Molnar 2009-10-29 10:36 ` Andrea Arcangeli 0 siblings, 1 reply; 6+ messages in thread From: Ingo Molnar @ 2009-10-29 9:43 UTC (permalink / raw) To: Andi Kleen Cc: Andrea Arcangeli, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel * Andi Kleen <andi@firstfloor.org> wrote: > > 1GB pages can't be handled by this code, and clearly it's not > > practical to hope 1G pages to materialize in the buddy (even if we > > That seems short sightened. You do this because 2MB pages give you x% > performance advantage, but then it's likely that 1GB pages will give > another y% improvement and why should people stop at the smaller > improvement? > > Ignoring the gigantic pages now would just mean that this would need > to be revised later again or that users still need to use hacks like > libhugetlbfs. I've read the patch and have read through this discussion and you are missing the big point that it's best to do such things gradually - one step at a time. Just like we went from 2 level pagetables to 3 level pagetables, then to 4 level pagetables - and we might go to 5 level pagetables in the future. We didnt go from 2 level pagetables to 5 level page tables in one go, despite predictions clearly pointing out the exponentially increasing need for RAM. So your obsession with 1GB pages is misguided. If indeed transparent largepages give us real benefits we can extend it to do transparent gbpages as well - should we ever want to. There's nothing 'shortsighted' about being gradual - the change is already ambitious enough as-is, and brings very clear benefits to a difficult, decade-old problem no other person was able to address. In fact introducing transparent 2MBpages makes 1GB pages support _easier_ to merge: as at that point we'll already have a (finally..) successful hugetlb facility happility used by an increasing range of applications. Hugetlbfs's big problem was always that it wasnt transparent and hence wasnt gradual for applications. It was an opt-in and constituted an interface/ABI change - that is always a big barrier to app adoption. So i give Andrea's patch a very big thumbs up - i hope it gets reviewed in fine detail and added to -mm ASAP. Our lack of decent, automatic hugepage support is sticking out like a sore thumb and is hurting us in high-performance setups. If largepage support within Linux has a chance, this might be the way to do it. A small comment regarding the patch itself: i think it could be simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by making it a natural feature of hugepage support. If the code is correct i cannot see any scenario under which i wouldnt want a hugepage enabled kernel i'm booting to not have transparent hugepage support as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support 2009-10-29 9:43 ` RFC: Transparent Hugepage support Ingo Molnar @ 2009-10-29 10:36 ` Andrea Arcangeli 2009-10-29 16:50 ` Mike Travis 2009-10-30 0:40 ` KAMEZAWA Hiroyuki 0 siblings, 2 replies; 6+ messages in thread From: Andrea Arcangeli @ 2009-10-29 10:36 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel Hello Ingo, Andi, everyone, On Thu, Oct 29, 2009 at 10:43:44AM +0100, Ingo Molnar wrote: > > * Andi Kleen <andi@firstfloor.org> wrote: > > > > 1GB pages can't be handled by this code, and clearly it's not > > > practical to hope 1G pages to materialize in the buddy (even if we > > > > That seems short sightened. You do this because 2MB pages give you x% > > performance advantage, but then it's likely that 1GB pages will give > > another y% improvement and why should people stop at the smaller > > improvement? > > > > Ignoring the gigantic pages now would just mean that this would need > > to be revised later again or that users still need to use hacks like > > libhugetlbfs. > > I've read the patch and have read through this discussion and you are > missing the big point that it's best to do such things gradually - one > step at a time. > > Just like we went from 2 level pagetables to 3 level pagetables, then to > 4 level pagetables - and we might go to 5 level pagetables in the > future. We didnt go from 2 level pagetables to 5 level page tables in > one go, despite predictions clearly pointing out the exponentially > increasing need for RAM. I totally agree with your assessment. > So your obsession with 1GB pages is misguided. If indeed transparent > largepages give us real benefits we can extend it to do transparent > gbpages as well - should we ever want to. There's nothing 'shortsighted' > about being gradual - the change is already ambitious enough as-is, and > brings very clear benefits to a difficult, decade-old problem no other > person was able to address. > > In fact introducing transparent 2MBpages makes 1GB pages support > _easier_ to merge: as at that point we'll already have a (finally..) > successful hugetlb facility happility used by an increasing range of > applications. Agreed. > Hugetlbfs's big problem was always that it wasnt transparent and hence > wasnt gradual for applications. It was an opt-in and constituted an > interface/ABI change - that is always a big barrier to app adoption. > > So i give Andrea's patch a very big thumbs up - i hope it gets reviewed > in fine detail and added to -mm ASAP. Our lack of decent, automatic > hugepage support is sticking out like a sore thumb and is hurting us in > high-performance setups. If largepage support within Linux has a chance, > this might be the way to do it. Thanks a lot for your review! > A small comment regarding the patch itself: i think it could be > simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by > making it a natural feature of hugepage support. If the code is correct > i cannot see any scenario under which i wouldnt want a hugepage enabled > kernel i'm booting to not have transparent hugepage support as well. The two reasons why I added a config option are: 1) because it was easy enough, gcc is smart enough to eliminate the external calls so I didn't need to add ifdefs with the exception of returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to make the exports of huge_memory.c visible unconditionally so it doesn't warn, after that I don't need to build and link huge_memory.o. 2) to avoid breaking build of archs not implementing pmd_trans_huge and that may never be able to take advantage of it But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced to Y on x86-64 and N on power. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support 2009-10-29 10:36 ` Andrea Arcangeli @ 2009-10-29 16:50 ` Mike Travis 2009-10-30 0:40 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 6+ messages in thread From: Mike Travis @ 2009-10-29 16:50 UTC (permalink / raw) To: Andrea Arcangeli Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel, Karl Feind, Jack Steiner Hi Andrea, I will find some time soon to test out your patch on a (relatively) huge machine and let you know the results. The memory size on this machine: 480,700,399,616 bytes of system memory tested OK This translates to ~240k available 2Mb pages. Thanks, Mike Andrea Arcangeli wrote: > Hello Ingo, Andi, everyone, > > On Thu, Oct 29, 2009 at 10:43:44AM +0100, Ingo Molnar wrote: >> * Andi Kleen <andi@firstfloor.org> wrote: >> >>>> 1GB pages can't be handled by this code, and clearly it's not >>>> practical to hope 1G pages to materialize in the buddy (even if we >>> That seems short sightened. You do this because 2MB pages give you x% >>> performance advantage, but then it's likely that 1GB pages will give >>> another y% improvement and why should people stop at the smaller >>> improvement? >>> >>> Ignoring the gigantic pages now would just mean that this would need >>> to be revised later again or that users still need to use hacks like >>> libhugetlbfs. >> I've read the patch and have read through this discussion and you are >> missing the big point that it's best to do such things gradually - one >> step at a time. >> >> Just like we went from 2 level pagetables to 3 level pagetables, then to >> 4 level pagetables - and we might go to 5 level pagetables in the >> future. We didnt go from 2 level pagetables to 5 level page tables in >> one go, despite predictions clearly pointing out the exponentially >> increasing need for RAM. > > I totally agree with your assessment. > >> So your obsession with 1GB pages is misguided. If indeed transparent >> largepages give us real benefits we can extend it to do transparent >> gbpages as well - should we ever want to. There's nothing 'shortsighted' >> about being gradual - the change is already ambitious enough as-is, and >> brings very clear benefits to a difficult, decade-old problem no other >> person was able to address. >> >> In fact introducing transparent 2MBpages makes 1GB pages support >> _easier_ to merge: as at that point we'll already have a (finally..) >> successful hugetlb facility happility used by an increasing range of >> applications. > > Agreed. > >> Hugetlbfs's big problem was always that it wasnt transparent and hence >> wasnt gradual for applications. It was an opt-in and constituted an >> interface/ABI change - that is always a big barrier to app adoption. >> >> So i give Andrea's patch a very big thumbs up - i hope it gets reviewed >> in fine detail and added to -mm ASAP. Our lack of decent, automatic >> hugepage support is sticking out like a sore thumb and is hurting us in >> high-performance setups. If largepage support within Linux has a chance, >> this might be the way to do it. > > Thanks a lot for your review! > >> A small comment regarding the patch itself: i think it could be >> simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by >> making it a natural feature of hugepage support. If the code is correct >> i cannot see any scenario under which i wouldnt want a hugepage enabled >> kernel i'm booting to not have transparent hugepage support as well. > > The two reasons why I added a config option are: > > 1) because it was easy enough, gcc is smart enough to eliminate the > external calls so I didn't need to add ifdefs with the exception of > returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to > make the exports of huge_memory.c visible unconditionally so it doesn't > warn, after that I don't need to build and link huge_memory.o. > > 2) to avoid breaking build of archs not implementing pmd_trans_huge > and that may never be able to take advantage of it > > But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced > to Y on x86-64 and N on power. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support 2009-10-29 10:36 ` Andrea Arcangeli 2009-10-29 16:50 ` Mike Travis @ 2009-10-30 0:40 ` KAMEZAWA Hiroyuki 2009-11-03 10:55 ` Andrea Arcangeli 1 sibling, 1 reply; 6+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-30 0:40 UTC (permalink / raw) To: Andrea Arcangeli Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel On Thu, 29 Oct 2009 11:36:58 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > > A small comment regarding the patch itself: i think it could be > > simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by > > making it a natural feature of hugepage support. If the code is correct > > i cannot see any scenario under which i wouldnt want a hugepage enabled > > kernel i'm booting to not have transparent hugepage support as well. > > The two reasons why I added a config option are: > > 1) because it was easy enough, gcc is smart enough to eliminate the > external calls so I didn't need to add ifdefs with the exception of > returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to > make the exports of huge_memory.c visible unconditionally so it doesn't > warn, after that I don't need to build and link huge_memory.o. > > 2) to avoid breaking build of archs not implementing pmd_trans_huge > and that may never be able to take advantage of it > > But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced > to Y on x86-64 and N on power. Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while. Now, memcg don't handle hugetlbfs because it's special and cannot be freed by the kernel, only users can free it. But this new transparent-hugepage seems to be designed as that the kernel can free it for memory reclaiming. So, I'd like to handle this in memcg transparently. But it seems I need several changes to support this new rule. I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a while. Thanks, -Kame ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support 2009-10-30 0:40 ` KAMEZAWA Hiroyuki @ 2009-11-03 10:55 ` Andrea Arcangeli 2009-11-04 0:36 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 6+ messages in thread From: Andrea Arcangeli @ 2009-11-03 10:55 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel On Fri, Oct 30, 2009 at 09:40:37AM +0900, KAMEZAWA Hiroyuki wrote: > Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while. > Now, memcg don't handle hugetlbfs because it's special and cannot be freed by > the kernel, only users can free it. But this new transparent-hugepage seems to > be designed as that the kernel can free it for memory reclaiming. > So, I'd like to handle this in memcg transparently. > > But it seems I need several changes to support this new rule. > I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a > while. Yeah the accounting (not just memcg) should be checked.. I didn't pay too much attention to stats at this point. But we want to fix it fast instead of making the two options mutually exclusive.. Where are the pages de-accounted when they are freed? Accounting seems to require just two one liners calling mem_cgroup_newpage_charge. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support 2009-11-03 10:55 ` Andrea Arcangeli @ 2009-11-04 0:36 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 6+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-04 0:36 UTC (permalink / raw) To: Andrea Arcangeli Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton, linux-kernel On Tue, 3 Nov 2009 11:55:43 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > On Fri, Oct 30, 2009 at 09:40:37AM +0900, KAMEZAWA Hiroyuki wrote: > > Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while. > > Now, memcg don't handle hugetlbfs because it's special and cannot be freed by > > the kernel, only users can free it. But this new transparent-hugepage seems to > > be designed as that the kernel can free it for memory reclaiming. > > So, I'd like to handle this in memcg transparently. > > > > But it seems I need several changes to support this new rule. > > I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a > > while. > > Yeah the accounting (not just memcg) should be checked.. I didn't pay > too much attention to stats at this point. > > But we want to fix it fast instead of making the two options mutually > exclusive.. Where are the pages de-accounted when they are freed? It's de-accounted at page_remove_rmap() in typical case of Anon. But swap-cache/bacthed-uncarhge related part is complicated, maybe. ...because of me ;( Okay, I don't request !CONFIG_CGROUP_MEM_RES_CTRL, I'm glad if you CC me. > Accounting seems to require just two one liners > calling mem_cgroup_newpage_charge. Yes, maybe so. Thanks, -Kame ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-11-04 0:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091026185130.GC4868@random.random>
[not found] ` <87ljiwk8el.fsf@basil.nowhere.org>
[not found] ` <20091027193007.GA6043@random.random>
[not found] ` <20091028042805.GJ7744@basil.fritz.box>
2009-10-29 9:43 ` RFC: Transparent Hugepage support Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox