* Re: RFC: Transparent Hugepage support
[not found] ` <20091028042805.GJ7744@basil.fritz.box>
@ 2009-10-29 9:43 ` Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2009-10-29 9:43 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrea Arcangeli, linux-mm, Marcelo Tosatti, Adam Litke,
Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel
* Andi Kleen <andi@firstfloor.org> wrote:
> > 1GB pages can't be handled by this code, and clearly it's not
> > practical to hope 1G pages to materialize in the buddy (even if we
>
> That seems short sightened. You do this because 2MB pages give you x%
> performance advantage, but then it's likely that 1GB pages will give
> another y% improvement and why should people stop at the smaller
> improvement?
>
> Ignoring the gigantic pages now would just mean that this would need
> to be revised later again or that users still need to use hacks like
> libhugetlbfs.
I've read the patch and have read through this discussion and you are
missing the big point that it's best to do such things gradually - one
step at a time.
Just like we went from 2 level pagetables to 3 level pagetables, then to
4 level pagetables - and we might go to 5 level pagetables in the
future. We didnt go from 2 level pagetables to 5 level page tables in
one go, despite predictions clearly pointing out the exponentially
increasing need for RAM.
So your obsession with 1GB pages is misguided. If indeed transparent
largepages give us real benefits we can extend it to do transparent
gbpages as well - should we ever want to. There's nothing 'shortsighted'
about being gradual - the change is already ambitious enough as-is, and
brings very clear benefits to a difficult, decade-old problem no other
person was able to address.
In fact introducing transparent 2MBpages makes 1GB pages support
_easier_ to merge: as at that point we'll already have a (finally..)
successful hugetlb facility happility used by an increasing range of
applications.
Hugetlbfs's big problem was always that it wasnt transparent and hence
wasnt gradual for applications. It was an opt-in and constituted an
interface/ABI change - that is always a big barrier to app adoption.
So i give Andrea's patch a very big thumbs up - i hope it gets reviewed
in fine detail and added to -mm ASAP. Our lack of decent, automatic
hugepage support is sticking out like a sore thumb and is hurting us in
high-performance setups. If largepage support within Linux has a chance,
this might be the way to do it.
A small comment regarding the patch itself: i think it could be
simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by
making it a natural feature of hugepage support. If the code is correct
i cannot see any scenario under which i wouldnt want a hugepage enabled
kernel i'm booting to not have transparent hugepage support as well.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support
2009-10-29 9:43 ` RFC: Transparent Hugepage support Ingo Molnar
@ 2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2009-10-29 10:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke, Avi Kivity,
Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel
Hello Ingo, Andi, everyone,
On Thu, Oct 29, 2009 at 10:43:44AM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <andi@firstfloor.org> wrote:
>
> > > 1GB pages can't be handled by this code, and clearly it's not
> > > practical to hope 1G pages to materialize in the buddy (even if we
> >
> > That seems short sightened. You do this because 2MB pages give you x%
> > performance advantage, but then it's likely that 1GB pages will give
> > another y% improvement and why should people stop at the smaller
> > improvement?
> >
> > Ignoring the gigantic pages now would just mean that this would need
> > to be revised later again or that users still need to use hacks like
> > libhugetlbfs.
>
> I've read the patch and have read through this discussion and you are
> missing the big point that it's best to do such things gradually - one
> step at a time.
>
> Just like we went from 2 level pagetables to 3 level pagetables, then to
> 4 level pagetables - and we might go to 5 level pagetables in the
> future. We didnt go from 2 level pagetables to 5 level page tables in
> one go, despite predictions clearly pointing out the exponentially
> increasing need for RAM.
I totally agree with your assessment.
> So your obsession with 1GB pages is misguided. If indeed transparent
> largepages give us real benefits we can extend it to do transparent
> gbpages as well - should we ever want to. There's nothing 'shortsighted'
> about being gradual - the change is already ambitious enough as-is, and
> brings very clear benefits to a difficult, decade-old problem no other
> person was able to address.
>
> In fact introducing transparent 2MBpages makes 1GB pages support
> _easier_ to merge: as at that point we'll already have a (finally..)
> successful hugetlb facility happility used by an increasing range of
> applications.
Agreed.
> Hugetlbfs's big problem was always that it wasnt transparent and hence
> wasnt gradual for applications. It was an opt-in and constituted an
> interface/ABI change - that is always a big barrier to app adoption.
>
> So i give Andrea's patch a very big thumbs up - i hope it gets reviewed
> in fine detail and added to -mm ASAP. Our lack of decent, automatic
> hugepage support is sticking out like a sore thumb and is hurting us in
> high-performance setups. If largepage support within Linux has a chance,
> this might be the way to do it.
Thanks a lot for your review!
> A small comment regarding the patch itself: i think it could be
> simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by
> making it a natural feature of hugepage support. If the code is correct
> i cannot see any scenario under which i wouldnt want a hugepage enabled
> kernel i'm booting to not have transparent hugepage support as well.
The two reasons why I added a config option are:
1) because it was easy enough, gcc is smart enough to eliminate the
external calls so I didn't need to add ifdefs with the exception of
returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to
make the exports of huge_memory.c visible unconditionally so it doesn't
warn, after that I don't need to build and link huge_memory.o.
2) to avoid breaking build of archs not implementing pmd_trans_huge
and that may never be able to take advantage of it
But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced
to Y on x86-64 and N on power.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support
2009-10-29 10:36 ` Andrea Arcangeli
@ 2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 6+ messages in thread
From: Mike Travis @ 2009-10-29 16:50 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke,
Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel, Karl Feind, Jack Steiner
Hi Andrea,
I will find some time soon to test out your patch on a
(relatively) huge machine and let you know the results.
The memory size on this machine:
480,700,399,616 bytes of system memory tested OK
This translates to ~240k available 2Mb pages.
Thanks,
Mike
Andrea Arcangeli wrote:
> Hello Ingo, Andi, everyone,
>
> On Thu, Oct 29, 2009 at 10:43:44AM +0100, Ingo Molnar wrote:
>> * Andi Kleen <andi@firstfloor.org> wrote:
>>
>>>> 1GB pages can't be handled by this code, and clearly it's not
>>>> practical to hope 1G pages to materialize in the buddy (even if we
>>> That seems short sightened. You do this because 2MB pages give you x%
>>> performance advantage, but then it's likely that 1GB pages will give
>>> another y% improvement and why should people stop at the smaller
>>> improvement?
>>>
>>> Ignoring the gigantic pages now would just mean that this would need
>>> to be revised later again or that users still need to use hacks like
>>> libhugetlbfs.
>> I've read the patch and have read through this discussion and you are
>> missing the big point that it's best to do such things gradually - one
>> step at a time.
>>
>> Just like we went from 2 level pagetables to 3 level pagetables, then to
>> 4 level pagetables - and we might go to 5 level pagetables in the
>> future. We didnt go from 2 level pagetables to 5 level page tables in
>> one go, despite predictions clearly pointing out the exponentially
>> increasing need for RAM.
>
> I totally agree with your assessment.
>
>> So your obsession with 1GB pages is misguided. If indeed transparent
>> largepages give us real benefits we can extend it to do transparent
>> gbpages as well - should we ever want to. There's nothing 'shortsighted'
>> about being gradual - the change is already ambitious enough as-is, and
>> brings very clear benefits to a difficult, decade-old problem no other
>> person was able to address.
>>
>> In fact introducing transparent 2MBpages makes 1GB pages support
>> _easier_ to merge: as at that point we'll already have a (finally..)
>> successful hugetlb facility happility used by an increasing range of
>> applications.
>
> Agreed.
>
>> Hugetlbfs's big problem was always that it wasnt transparent and hence
>> wasnt gradual for applications. It was an opt-in and constituted an
>> interface/ABI change - that is always a big barrier to app adoption.
>>
>> So i give Andrea's patch a very big thumbs up - i hope it gets reviewed
>> in fine detail and added to -mm ASAP. Our lack of decent, automatic
>> hugepage support is sticking out like a sore thumb and is hurting us in
>> high-performance setups. If largepage support within Linux has a chance,
>> this might be the way to do it.
>
> Thanks a lot for your review!
>
>> A small comment regarding the patch itself: i think it could be
>> simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by
>> making it a natural feature of hugepage support. If the code is correct
>> i cannot see any scenario under which i wouldnt want a hugepage enabled
>> kernel i'm booting to not have transparent hugepage support as well.
>
> The two reasons why I added a config option are:
>
> 1) because it was easy enough, gcc is smart enough to eliminate the
> external calls so I didn't need to add ifdefs with the exception of
> returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to
> make the exports of huge_memory.c visible unconditionally so it doesn't
> warn, after that I don't need to build and link huge_memory.o.
>
> 2) to avoid breaking build of archs not implementing pmd_trans_huge
> and that may never be able to take advantage of it
>
> But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced
> to Y on x86-64 and N on power.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
@ 2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
1 sibling, 1 reply; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-30 0:40 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke,
Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel
On Thu, 29 Oct 2009 11:36:58 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:
> > A small comment regarding the patch itself: i think it could be
> > simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by
> > making it a natural feature of hugepage support. If the code is correct
> > i cannot see any scenario under which i wouldnt want a hugepage enabled
> > kernel i'm booting to not have transparent hugepage support as well.
>
> The two reasons why I added a config option are:
>
> 1) because it was easy enough, gcc is smart enough to eliminate the
> external calls so I didn't need to add ifdefs with the exception of
> returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to
> make the exports of huge_memory.c visible unconditionally so it doesn't
> warn, after that I don't need to build and link huge_memory.o.
>
> 2) to avoid breaking build of archs not implementing pmd_trans_huge
> and that may never be able to take advantage of it
>
> But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced
> to Y on x86-64 and N on power.
Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while.
Now, memcg don't handle hugetlbfs because it's special and cannot be freed by
the kernel, only users can free it. But this new transparent-hugepage seems to
be designed as that the kernel can free it for memory reclaiming.
So, I'd like to handle this in memcg transparently.
But it seems I need several changes to support this new rule.
I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a
while.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
@ 2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2009-11-03 10:55 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke,
Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel
On Fri, Oct 30, 2009 at 09:40:37AM +0900, KAMEZAWA Hiroyuki wrote:
> Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while.
> Now, memcg don't handle hugetlbfs because it's special and cannot be freed by
> the kernel, only users can free it. But this new transparent-hugepage seems to
> be designed as that the kernel can free it for memory reclaiming.
> So, I'd like to handle this in memcg transparently.
>
> But it seems I need several changes to support this new rule.
> I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a
> while.
Yeah the accounting (not just memcg) should be checked.. I didn't pay
too much attention to stats at this point.
But we want to fix it fast instead of making the two options mutually
exclusive.. Where are the pages de-accounted when they are freed?
Accounting seems to require just two one liners
calling mem_cgroup_newpage_charge.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Transparent Hugepage support
2009-11-03 10:55 ` Andrea Arcangeli
@ 2009-11-04 0:36 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04 0:36 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Andi Kleen, linux-mm, Marcelo Tosatti, Adam Litke,
Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin, Andrew Morton,
linux-kernel
On Tue, 3 Nov 2009 11:55:43 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Fri, Oct 30, 2009 at 09:40:37AM +0900, KAMEZAWA Hiroyuki wrote:
> > Ah, please keep CONFIG_TRANSPARENT_HUGEPAGE for a while.
> > Now, memcg don't handle hugetlbfs because it's special and cannot be freed by
> > the kernel, only users can free it. But this new transparent-hugepage seems to
> > be designed as that the kernel can free it for memory reclaiming.
> > So, I'd like to handle this in memcg transparently.
> >
> > But it seems I need several changes to support this new rule.
> > I'm glad if this new huge page depends on !CONFIG_CGROUP_MEM_RES_CTRL for a
> > while.
>
> Yeah the accounting (not just memcg) should be checked.. I didn't pay
> too much attention to stats at this point.
>
> But we want to fix it fast instead of making the two options mutually
> exclusive.. Where are the pages de-accounted when they are freed?
It's de-accounted at page_remove_rmap() in typical case of Anon.
But swap-cache/bacthed-uncarhge related part is complicated, maybe.
...because of me ;(
Okay, I don't request !CONFIG_CGROUP_MEM_RES_CTRL, I'm glad if you CC me.
> Accounting seems to require just two one liners
> calling mem_cgroup_newpage_charge.
Yes, maybe so.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-11-04 0:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091026185130.GC4868@random.random>
[not found] ` <87ljiwk8el.fsf@basil.nowhere.org>
[not found] ` <20091027193007.GA6043@random.random>
[not found] ` <20091028042805.GJ7744@basil.fritz.box>
2009-10-29 9:43 ` RFC: Transparent Hugepage support Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox