Re: [patch 10/17] mm: fix bootmem alignment

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [patch 10/17] mm: fix bootmem alignment
       [not found] ` <20080410171101.395469000@nick.local0.net>
@ 2008-04-10 17:33   ` Yinghai Lu
  2008-04-10 17:39     ` Nick Piggin
  2008-04-11 11:58     ` Nick Piggin
  0 siblings, 2 replies; 10+ messages in thread
From: Yinghai Lu @ 2008-04-10 17:33 UTC (permalink / raw)
  To: npiggin, Andrew Morton, Andi Kleen; +Cc: linux-kernel, linux-mm, pj, kniht

On Thu, Apr 10, 2008 at 10:02 AM,  <npiggin@suse.de> wrote:
> Without this fix bootmem can return unaligned addresses when the start of a
>  node is not aligned to the align value. Needed for reliably allocating
>  gigabyte pages.
>
>  I removed the offset variable because all tests should align themself correctly
>  now. Slight drawback might be that the bootmem allocator will spend
>  some more time skipping bits in the bitmap initially, but that shouldn't
>  be a big issue.
>


this patch from Andi was obsoleted by the one in -mm


The patch titled
    mm: offset align in alloc_bootmem
has been added to the -mm tree.  Its filename is
    mm-offset-align-in-alloc_bootmem.patch

------------------------------------------------------
Subject: mm: offset align in alloc_bootmem
From: Yinghai Lu <yhlu.kernel.send@gmail.com>

Need offset alignment when node_boot_start's alignment is less than align
required

Use local node_boot_start to match align.  so don't add extra opteration in
search loop.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 10/17] mm: fix bootmem alignment
  2008-04-10 17:33   ` [patch 10/17] mm: fix bootmem alignment Yinghai Lu
@ 2008-04-10 17:39     ` Nick Piggin
  2008-04-11 11:58     ` Nick Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2008-04-10 17:39 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Andrew Morton, Andi Kleen, linux-kernel, linux-mm, pj, kniht

On Thu, Apr 10, 2008 at 10:33:50AM -0700, Yinghai Lu wrote:
> On Thu, Apr 10, 2008 at 10:02 AM,  <npiggin@suse.de> wrote:
> > Without this fix bootmem can return unaligned addresses when the start of a
> >  node is not aligned to the align value. Needed for reliably allocating
> >  gigabyte pages.
> >
> >  I removed the offset variable because all tests should align themself correctly
> >  now. Slight drawback might be that the bootmem allocator will spend
> >  some more time skipping bits in the bitmap initially, but that shouldn't
> >  be a big issue.
> >
> 
> 
> this patch from Andi was obsoleted by the one in -mm

Ah, great thanks for letting me know.

 
 
> The patch titled
>     mm: offset align in alloc_bootmem
> has been added to the -mm tree.  Its filename is
>     mm-offset-align-in-alloc_bootmem.patch
> 
> ------------------------------------------------------
> Subject: mm: offset align in alloc_bootmem
> From: Yinghai Lu <yhlu.kernel.send@gmail.com>
> 
> Need offset alignment when node_boot_start's alignment is less than align
> required
> 
> Use local node_boot_start to match align.  so don't add extra opteration in
> search loop.
> 
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Christoph Lameter <clameter@sgi.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86
       [not found] <20080410170232.015351000@nick.local0.net>
       [not found] ` <20080410171101.395469000@nick.local0.net>
@ 2008-04-10 23:59 ` Nish Aravamudan
  2008-04-11  8:28   ` Nick Piggin
       [not found] ` <20080410171101.551336000@nick.local0.net>
       [not found] ` <20080410171100.425293000@nick.local0.net>
  3 siblings, 1 reply; 10+ messages in thread
From: Nish Aravamudan @ 2008-04-10 23:59 UTC (permalink / raw)
  To: npiggin@suse.de
  Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, andi, kniht,
	Adam Litke

Hi Nick,

On 4/10/08, npiggin@suse.de <npiggin@suse.de> wrote:
> Hi,
>
>  I'm taking care of Andi's hugetlb patchset now. I've taken a while to appear
>  to do anything with it because I have had other things to do and also needed
>  some time to get up to speed on it.
>
>  Anyway, from my reviewing of the patchset, I didn't find a great deal
>  wrong with it in the technical aspects. Taking hstate out of the hugetlbfs
>  inode and vma is really the main thing I did.

Have you tested with the libhugetlbfs test suite? We're gearing up for
libhugetlbfs 1.3, so most of the test are uptodate and expected to run
cleanly, even with giant hugetlb page support (Jon has been working
diligently to test with his 16G page support for power). I'm planning
on pushing the last bits out today for Adam to pick up before we start
stabilizing for 1.3, so I'm hoping if you grab tomorrow's development
snapshot from libhugetlbfs.ozlabs.org, things should run ok. Probably
only with just 1G hugepages, though, we haven't yet taught
libhugetlbfs about multiple hugepage size availability at run-time,
but that shouldn't be hard.

>  However on the less technical side, I think a few things could be improved,
>  eg. to do with the configuring and reporting, as well as the "administrative"
>  type of code. I tried to make improvements to things in the last patch of
>  the series. I will end up folding this properly into the rest of the patchset
>  where possible.

I've got a few ideas here. Are we sure that
/proc/sys/vm/nr_{,overcommit}_hugepages is the pool allocation
interface we want going forward? I'm fairly sure we don't. I think
we're best off moving to a sysfs-based allocator scheme, while keeping
/proc/sys/vm/nr_{,overcommit}_hugepages around for the default
hugepage size (which may be the only for many folks for now).

I'm thinking something like:

/sys/devices/system/[DIRNAME]/nr_hugepages ->
nr_hugepages_{default_hugepagesize}
/sys/devices/system/[DIRNAME]/nr_hugepages_default_hugepagesize
/sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize1
/sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize2
/sys/devices/system/[DIRNAME]/nr_overcommit_hugepages ->
nr_overcommit_hugepages_{default_hugepagesize}
/sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_default_hugepagesize
/sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize1
/sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize2

That is, nr_hugepages in the directory (should it be called vm?
memory? hugepages specifically? I'm looking for ideas!) will just be a
symlink to the underlying default hugepagesize allocator. The files
themselves would probably be named along the lines of:

nr_hugepages_2M
nr_hugepages_1G
nr_hugepages_64K

etc?

We'd want to have a similar layout on a per-node basis, I think (see
my patchsets to add a per-node interface).

>  The other thing I did was try to shuffle the patches around a bit. There
>  were one or two (pretty trivial) points where it wasn't bisectable, and also
>  merge a couple of patches.
>
>  I will try to get this patchset merged in -mm soon if feedback is positive.
>  I would also like to take patches for other architectures or any other
>  patches or suggestions for improvements.

There are definitely going to be conflicts between my per-node stack
and your set, but if you agree the interface should be cleaned up for
multiple hugepage size support, then I'd like to get my sysfs bits
into -mm and work on putting the global allocator into sysfs properly
for you to base off. I think there's enough room for discussion that
-mm may be a bit premature, but that's just my opinion.

Thanks for keeping the patchset uptodate, I hope to do a more careful
review next week of the individual patches.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 11/17] hugetlbfs: support larger than MAX_ORDER
       [not found] ` <20080410171101.551336000@nick.local0.net>
@ 2008-04-11  8:13   ` Andi Kleen
  2008-04-11  8:59     ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2008-04-11  8:13 UTC (permalink / raw)
  To: npiggin; +Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, andi, kniht

>  	spin_lock(&hugetlb_lock);
> -	if (h->surplus_huge_pages_node[nid]) {
> +	if (h->surplus_huge_pages_node[nid] && h->order <= MAX_ORDER) {

As Andrew Hastings pointed out earlier this all needs to be h->order < MAX_ORDER
[got pretty much all the checks wrong off by one]. It won't affect anything
on x86-64 but might cause problems on archs which have exactly MAX_ORDER
sized huge pages.

>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -220,6 +221,9 @@ static struct page *alloc_fresh_huge_pag
>  {
>  	struct page *page;
>  
> +	if (h->order > MAX_ORDER)

>= etc.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86
  2008-04-10 23:59 ` [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86 Nish Aravamudan
@ 2008-04-11  8:28   ` Nick Piggin
  2008-04-11 19:57     ` Nish Aravamudan
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2008-04-11  8:28 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, andi, kniht,
	Adam Litke

On Thu, Apr 10, 2008 at 04:59:15PM -0700, Nish Aravamudan wrote:
> Hi Nick,
> 
> On 4/10/08, npiggin@suse.de <npiggin@suse.de> wrote:
> > Hi,
> >
> >  I'm taking care of Andi's hugetlb patchset now. I've taken a while to appear
> >  to do anything with it because I have had other things to do and also needed
> >  some time to get up to speed on it.
> >
> >  Anyway, from my reviewing of the patchset, I didn't find a great deal
> >  wrong with it in the technical aspects. Taking hstate out of the hugetlbfs
> >  inode and vma is really the main thing I did.
> 
> Have you tested with the libhugetlbfs test suite? We're gearing up for
> libhugetlbfs 1.3, so most of the test are uptodate and expected to run
> cleanly, even with giant hugetlb page support (Jon has been working
> diligently to test with his 16G page support for power). I'm planning
> on pushing the last bits out today for Adam to pick up before we start
> stabilizing for 1.3, so I'm hoping if you grab tomorrow's development
> snapshot from libhugetlbfs.ozlabs.org, things should run ok. Probably
> only with just 1G hugepages, though, we haven't yet taught
> libhugetlbfs about multiple hugepage size availability at run-time,
> but that shouldn't be hard.

Yeah, it should be easy to disable the 2MB default and just make it
look exactly the same but with 1G pages.

Thanks a lot for your suggestion, I'll pull the snapshot over the 
weekend and try to make it pass on x86 and work with Jon to ensure it
is working with powerpc...

 
> >  However on the less technical side, I think a few things could be improved,
> >  eg. to do with the configuring and reporting, as well as the "administrative"
> >  type of code. I tried to make improvements to things in the last patch of
> >  the series. I will end up folding this properly into the rest of the patchset
> >  where possible.
> 
> I've got a few ideas here. Are we sure that
> /proc/sys/vm/nr_{,overcommit}_hugepages is the pool allocation
> interface we want going forward? I'm fairly sure we don't. I think
> we're best off moving to a sysfs-based allocator scheme, while keeping
> /proc/sys/vm/nr_{,overcommit}_hugepages around for the default
> hugepage size (which may be the only for many folks for now).
> 
> I'm thinking something like:
> 
> /sys/devices/system/[DIRNAME]/nr_hugepages ->
> nr_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize2
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages ->
> nr_overcommit_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize2
> 
> That is, nr_hugepages in the directory (should it be called vm?
> memory? hugepages specifically? I'm looking for ideas!) will just be a
> symlink to the underlying default hugepagesize allocator. The files
> themselves would probably be named along the lines of:
> 
> nr_hugepages_2M
> nr_hugepages_1G
> nr_hugepages_64K
> 
> etc?

Yes I don't like the proc interface, nor the way it has been extended
(although that's not Andi's fault it is just a limitation of the old
API).

I think actually we should have individual directories for each hstate
size, and we can put all other stuff (reservations and per-node stuff
etc) under those directories. Leave the proc stuff just for the default
page size.

I think it should go in /sys/kernel/, because I think /sys/devices is
more of the hardware side of the system (so it makes sense for
reporting eg the actual supported TLB sizes, but for configuring your
page reserves, I think it makes more sense under /sys/kernel/). But
we'll ask the sysfs folk for guidance there.


> We'd want to have a similar layout on a per-node basis, I think (see
> my patchsets to add a per-node interface).
> 
> >  The other thing I did was try to shuffle the patches around a bit. There
> >  were one or two (pretty trivial) points where it wasn't bisectable, and also
> >  merge a couple of patches.
> >
> >  I will try to get this patchset merged in -mm soon if feedback is positive.
> >  I would also like to take patches for other architectures or any other
> >  patches or suggestions for improvements.
> 
> There are definitely going to be conflicts between my per-node stack
> and your set, but if you agree the interface should be cleaned up for
> multiple hugepage size support, then I'd like to get my sysfs bits
> into -mm and work on putting the global allocator into sysfs properly
> for you to base off. I think there's enough room for discussion that
> -mm may be a bit premature, but that's just my opinion.
> 
> Thanks for keeping the patchset uptodate, I hope to do a more careful
> review next week of the individual patches.

Sure, I haven't seen your work but it shouldn't be terribly hard to merge
either way. It should be easy if we work together ;)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 11/17] hugetlbfs: support larger than MAX_ORDER
  2008-04-11  8:13   ` [patch 11/17] hugetlbfs: support larger than MAX_ORDER Andi Kleen
@ 2008-04-11  8:59     ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2008-04-11  8:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, kniht

On Fri, Apr 11, 2008 at 10:13:17AM +0200, Andi Kleen wrote:
> >  	spin_lock(&hugetlb_lock);
> > -	if (h->surplus_huge_pages_node[nid]) {
> > +	if (h->surplus_huge_pages_node[nid] && h->order <= MAX_ORDER) {
> 
> As Andrew Hastings pointed out earlier this all needs to be h->order < MAX_ORDER
> [got pretty much all the checks wrong off by one]. It won't affect anything
> on x86-64 but might cause problems on archs which have exactly MAX_ORDER
> sized huge pages.

Ah, hmm, I might have missed a couple of emails worth of feedback when
you last posted. Thanks for pointing this out, I'll read over them again.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 10/17] mm: fix bootmem alignment
  2008-04-10 17:33   ` [patch 10/17] mm: fix bootmem alignment Yinghai Lu
  2008-04-10 17:39     ` Nick Piggin
@ 2008-04-11 11:58     ` Nick Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2008-04-11 11:58 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Andrew Morton, Andi Kleen, linux-kernel, linux-mm, pj, kniht

On Thu, Apr 10, 2008 at 10:33:50AM -0700, Yinghai Lu wrote:
> On Thu, Apr 10, 2008 at 10:02 AM,  <npiggin@suse.de> wrote:
> > Without this fix bootmem can return unaligned addresses when the start of a
> >  node is not aligned to the align value. Needed for reliably allocating
> >  gigabyte pages.
> >
> >  I removed the offset variable because all tests should align themself correctly
> >  now. Slight drawback might be that the bootmem allocator will spend
> >  some more time skipping bits in the bitmap initially, but that shouldn't
> >  be a big issue.
> >
> 
> 
> this patch from Andi was obsoleted by the one in -mm
> 
> 
> The patch titled
>     mm: offset align in alloc_bootmem
> has been added to the -mm tree.  Its filename is
>     mm-offset-align-in-alloc_bootmem.patch
> 
> ------------------------------------------------------
> Subject: mm: offset align in alloc_bootmem
> From: Yinghai Lu <yhlu.kernel.send@gmail.com>
> 
> Need offset alignment when node_boot_start's alignment is less than align
> required
> 
> Use local node_boot_start to match align.  so don't add extra opteration in
> search loop.

Ah, with this patch I'm actually able to allocate 2 1GB pages (on my 4GB
box), so it must be doing something right ;) Will be helpful for my
testing, thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86
  2008-04-11  8:28   ` Nick Piggin
@ 2008-04-11 19:57     ` Nish Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nish Aravamudan @ 2008-04-11 19:57 UTC (permalink / raw)
  To: Nick Piggin
  Cc: akpm, linux-kernel, linux-mm, pj, andi, kniht, Adam Litke,
	Greg KH

[Trimming Andi's SUSE address, as it gave me permanent failures on my
last message]

On 4/11/08, Nick Piggin <npiggin@suse.de> wrote:
> On Thu, Apr 10, 2008 at 04:59:15PM -0700, Nish Aravamudan wrote:
>  > Hi Nick,
>  >
>  > On 4/10/08, npiggin@suse.de <npiggin@suse.de> wrote:
>  > > Hi,
>  > >
>  > >  I'm taking care of Andi's hugetlb patchset now. I've taken a while to appear
>  > >  to do anything with it because I have had other things to do and also needed
>  > >  some time to get up to speed on it.
>  > >
>  > >  Anyway, from my reviewing of the patchset, I didn't find a great deal
>  > >  wrong with it in the technical aspects. Taking hstate out of the hugetlbfs
>  > >  inode and vma is really the main thing I did.
>  >
>  > Have you tested with the libhugetlbfs test suite? We're gearing up for
>  > libhugetlbfs 1.3, so most of the test are uptodate and expected to run
>  > cleanly, even with giant hugetlb page support (Jon has been working
>  > diligently to test with his 16G page support for power). I'm planning
>  > on pushing the last bits out today for Adam to pick up before we start
>  > stabilizing for 1.3, so I'm hoping if you grab tomorrow's development
>  > snapshot from libhugetlbfs.ozlabs.org, things should run ok. Probably
>  > only with just 1G hugepages, though, we haven't yet taught
>  > libhugetlbfs about multiple hugepage size availability at run-time,
>  > but that shouldn't be hard.
>
>
> Yeah, it should be easy to disable the 2MB default and just make it
>  look exactly the same but with 1G pages.

Exactly.

>  Thanks a lot for your suggestion, I'll pull the snapshot over the
>  weekend and try to make it pass on x86 and work with Jon to ensure it
>  is working with powerpc...

Just FYI, we tagged 1.3-pre1 today and it's out now:
http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.3-pre1.tar.gz.

The kernel tests should work fine on x86 as is, even with 1G pages. I
expect some of the linker script testcases to fail, though, as they
will require alignment changes, I think (Adam is actually reworking
the segment remapping code for libhugetlbfs 2.0, which will release
shortly after 1.3, under our current plans).

>  > >  However on the less technical side, I think a few things could be improved,
>  > >  eg. to do with the configuring and reporting, as well as the "administrative"
>  > >  type of code. I tried to make improvements to things in the last patch of
>  > >  the series. I will end up folding this properly into the rest of the patchset
>  > >  where possible.
>  >
>  > I've got a few ideas here. Are we sure that
>  > /proc/sys/vm/nr_{,overcommit}_hugepages is the pool allocation
>  > interface we want going forward? I'm fairly sure we don't. I think
>  > we're best off moving to a sysfs-based allocator scheme, while keeping
>  > /proc/sys/vm/nr_{,overcommit}_hugepages around for the default
>  > hugepage size (which may be the only for many folks for now).
>  >
>  > I'm thinking something like:
>  >
>  > /sys/devices/system/[DIRNAME]/nr_hugepages ->
>  > nr_hugepages_{default_hugepagesize}
>  > /sys/devices/system/[DIRNAME]/nr_hugepages_default_hugepagesize
>  > /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize1
>  > /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize2
>  > /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages ->
>  > nr_overcommit_hugepages_{default_hugepagesize}
>  > /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_default_hugepagesize
>  > /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize1
>  > /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize2
>  >
>  > That is, nr_hugepages in the directory (should it be called vm?
>  > memory? hugepages specifically? I'm looking for ideas!) will just be a
>  > symlink to the underlying default hugepagesize allocator. The files
>  > themselves would probably be named along the lines of:
>  >
>  > nr_hugepages_2M
>  > nr_hugepages_1G
>  > nr_hugepages_64K
>  >
>  > etc?
>
>
> Yes I don't like the proc interface, nor the way it has been extended
>  (although that's not Andi's fault it is just a limitation of the old
>  API).

Agreed, I wasn't trying to blame you or Andi for the choice. Just
suggesting we nip the extension in the bud :)

>  I think actually we should have individual directories for each hstate
>  size, and we can put all other stuff (reservations and per-node stuff
>  etc) under those directories. Leave the proc stuff just for the default
>  page size.
>
>  I think it should go in /sys/kernel/, because I think /sys/devices is
>  more of the hardware side of the system (so it makes sense for
>  reporting eg the actual supported TLB sizes, but for configuring your
>  page reserves, I think it makes more sense under /sys/kernel/). But
>  we'll ask the sysfs folk for guidance there.

That's a good point. I've added Greg explicitly to the Cc, to see if
he has any input. Greg, for something like an allocator interface for
hugepages, where would you expect to see that put in the sysfs
hierarchy? /sys/devices/system or /sys/kernel ?

The reason I was suggesting /sys/devices/system is that we already
have the NUMA topology laid out there (and is where I currently have
the per-node nr_hugepages). If we put per-node allocations in
/sys/kernel, we would have to duplicate some of that information (or
have really long filenames), and I'm not sure which is better.

Also, for reference, can we not use "reservations" for the pool
allocators? Reserved huge pages have a special meaning (are used to
satisfy MAP_SHARED mmap()s -- see
http://linux-mm.org/DynamicHugetlbPool). I'm not sure of a better
terminology, beyond perhaps "hugetlb pool interfaces" or something. I
know what you mean, but it got me confused for a second or two :)

>  > We'd want to have a similar layout on a per-node basis, I think (see
>  > my patchsets to add a per-node interface).
>  >
>  > >  The other thing I did was try to shuffle the patches around a bit. There
>  > >  were one or two (pretty trivial) points where it wasn't bisectable, and also
>  > >  merge a couple of patches.
>  > >
>  > >  I will try to get this patchset merged in -mm soon if feedback is positive.
>  > >  I would also like to take patches for other architectures or any other
>  > >  patches or suggestions for improvements.
>  >
>  > There are definitely going to be conflicts between my per-node stack
>  > and your set, but if you agree the interface should be cleaned up for
>  > multiple hugepage size support, then I'd like to get my sysfs bits
>  > into -mm and work on putting the global allocator into sysfs properly
>  > for you to base off. I think there's enough room for discussion that
>  > -mm may be a bit premature, but that's just my opinion.
>  >
>  > Thanks for keeping the patchset uptodate, I hope to do a more careful
>  > review next week of the individual patches.
>
>
> Sure, I haven't seen your work but it shouldn't be terribly hard to merge
>  either way. It should be easy if we work together ;)

I'll make sure to Cc you on the patches that will conflict. If we
decide that /sys/kernel is the right place for the per-node interface
to live, too, then I will need to respin them anyways.

As a side note, I don't think I saw any patches for Documentation in
the last posted set :) Could you update that, it might help with
understanding the changes a bit, although most are pretty
straightforward. It would also be great to update
http://linux-mm.org/PageTableStructure for the 1G case (and eventually
the power 16G case, Jon).

Thanks,
Nish

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 01/17] hugetlb: modular state
       [not found] ` <20080410171100.425293000@nick.local0.net>
@ 2008-04-21 20:51   ` Jon Tollefson
  2008-04-22  6:45     ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Tollefson @ 2008-04-21 20:51 UTC (permalink / raw)
  To: npiggin; +Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, andi, kniht


On Fri, 2008-04-11 at 03:02 +1000, npiggin@suse.de wrote:

<snip>

> Index: linux-2.6/include/linux/hugetlb.h
> ===================================================================
> --- linux-2.6.orig/include/linux/hugetlb.h
> +++ linux-2.6/include/linux/hugetlb.h
> @@ -40,7 +40,7 @@ extern int sysctl_hugetlb_shm_group;
> 
>  /* arch callbacks */
> 
> -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr);
> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, int sz);

<snip>

The sz here needs to be a long to handle sizes such as 16G on powerpc.

There are other places in hugetlb.c where the size also needs to be a
long, but this one affects the arch code too since it is public.

Jon
Tollefson



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 01/17] hugetlb: modular state
  2008-04-21 20:51   ` [patch 01/17] hugetlb: modular state Jon Tollefson
@ 2008-04-22  6:45     ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2008-04-22  6:45 UTC (permalink / raw)
  To: Jon Tollefson; +Cc: akpm, Andi Kleen, linux-kernel, linux-mm, pj, andi

On Mon, Apr 21, 2008 at 03:51:24PM -0500, Jon Tollefson wrote:
> 
> On Fri, 2008-04-11 at 03:02 +1000, npiggin@suse.de wrote:
> 
> <snip>
> 
> > Index: linux-2.6/include/linux/hugetlb.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/hugetlb.h
> > +++ linux-2.6/include/linux/hugetlb.h
> > @@ -40,7 +40,7 @@ extern int sysctl_hugetlb_shm_group;
> > 
> >  /* arch callbacks */
> > 
> > -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr);
> > +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, int sz);
> 
> <snip>
> 
> The sz here needs to be a long to handle sizes such as 16G on powerpc.
> 
> There are other places in hugetlb.c where the size also needs to be a
> long, but this one affects the arch code too since it is public.

Thanks, I've fixed that and found (hopefully) the rest of the ones
in the hugetlb.c code.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-04-22  6:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080410170232.015351000@nick.local0.net>
     [not found] ` <20080410171101.395469000@nick.local0.net>
2008-04-10 17:33   ` [patch 10/17] mm: fix bootmem alignment Yinghai Lu
2008-04-10 17:39     ` Nick Piggin
2008-04-11 11:58     ` Nick Piggin
2008-04-10 23:59 ` [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86 Nish Aravamudan
2008-04-11  8:28   ` Nick Piggin
2008-04-11 19:57     ` Nish Aravamudan
     [not found] ` <20080410171101.551336000@nick.local0.net>
2008-04-11  8:13   ` [patch 11/17] hugetlbfs: support larger than MAX_ORDER Andi Kleen
2008-04-11  8:59     ` Nick Piggin
     [not found] ` <20080410171100.425293000@nick.local0.net>
2008-04-21 20:51   ` [patch 01/17] hugetlb: modular state Jon Tollefson
2008-04-22  6:45     ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox