discontig patch question

Linux IA64 platform development
 help / color / mirror / Atom feed

* discontig patch question
@ 2003-11-10 15:52 Van Maren, Kevin
  2003-11-10 17:23 ` Jesse Barnes
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2003-11-10 15:52 UTC (permalink / raw)
  To: linux-ia64

Hi,

I tried running the discontig patch on a more "normal" machine
(one that wasn't fully loaded with memory), and I found the results
suprising.

The EFI memory map is simple, and looks like:
 0- 4G Node 0 (2G + 2G hole)
 4- 8G Node 1
 8-12G Node 2
12-16G Node 3
16-20G Node 0 (2 G memory-map I/O reclaim)
with 4G per node, 16GB total.

Because of ORDERROUNDDOWN in count_pages (arch/ia64/mm/init.c),
the memory ended up being assigned like this:

 0- 8G Node 1 (6G, 2GB hole)
 8-16G Node 3 (8G)
16-20G Node 0 (2G)
       Node 2 (0G)

Which was not at all what I wanted.

ORDERROUNDDOWN causes the kernel to assign all memory starting at the
(PAGE_SIZE << MAX_ORDER) boundary to the current node, which in my case
is 16KB << 19 (hard-coded for IA64), or 8GB.

I understand the GRANULE rounding, but is there a compelling reason that
we need 8GB node chunks on IA64 Linux (with 16KB pages)?

Thanks,
Kevin Van Maren

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
@ 2003-11-10 17:23 ` Jesse Barnes
  2003-11-10 17:38 ` Van Maren, Kevin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2003-11-10 17:23 UTC (permalink / raw)
  To: linux-ia64

On Mon, Nov 10, 2003 at 09:52:49AM -0600, Van Maren, Kevin wrote:
> The EFI memory map is simple, and looks like:
>  0- 4G Node 0 (2G + 2G hole)
>  4- 8G Node 1
>  8-12G Node 2
> 12-16G Node 3
> 16-20G Node 0 (2 G memory-map I/O reclaim)
> with 4G per node, 16GB total.
> 
> Because of ORDERROUNDDOWN in count_pages (arch/ia64/mm/init.c),
> the memory ended up being assigned like this:
> 
>  0- 8G Node 1 (6G, 2GB hole)
>  8-16G Node 3 (8G)
> 16-20G Node 0 (2G)
>        Node 2 (0G)
> 
> Which was not at all what I wanted.

I guess I didn't see this because the nodes on sn2 are so large (64GB).

> ORDERROUNDDOWN causes the kernel to assign all memory starting at the
> (PAGE_SIZE << MAX_ORDER) boundary to the current node, which in my case
> is 16KB << 19 (hard-coded for IA64), or 8GB.

I wonder if that shouldn't be simply 1UL<<MAX_ORDER...  That's all that
mm/page_alloc.c seems to care about.

> I understand the GRANULE rounding, but is there a compelling reason that
> we need 8GB node chunks on IA64 Linux (with 16KB pages)?

I don't think so.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
  2003-11-10 17:23 ` Jesse Barnes
@ 2003-11-10 17:38 ` Van Maren, Kevin
  2003-11-10 17:56 ` Jesse Barnes
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2003-11-10 17:38 UTC (permalink / raw)
  To: linux-ia64

> From: Jesse Barnes [mailto:jbarnes@sgi.com]
> 
> On Mon, Nov 10, 2003 at 09:52:49AM -0600, Van Maren, Kevin wrote:
> > The EFI memory map is simple, and looks like:
> >  0- 4G Node 0 (2G + 2G hole)
> >  4- 8G Node 1
> >  8-12G Node 2
> > 12-16G Node 3
> > 16-20G Node 0 (2 G memory-map I/O reclaim)
> > with 4G per node, 16GB total.
> > 
> > Because of ORDERROUNDDOWN in count_pages (arch/ia64/mm/init.c),
> > the memory ended up being assigned like this:
> > 
> >  0- 8G Node 1 (6G, 2GB hole)
> >  8-16G Node 3 (8G)
> > 16-20G Node 0 (2G)
> >        Node 2 (0G)
> > 
> > Which was not at all what I wanted.
> 
> I guess I didn't see this because the nodes on sn2 are so 
> large (64GB).

I've never run with so little memory before either :-(

> > ORDERROUNDDOWN causes the kernel to assign all memory starting at the
> > (PAGE_SIZE << MAX_ORDER) boundary to the current node, which in my case
> > is 16KB << 19 (hard-coded for IA64), or 8GB.
> 
> I wonder if that shouldn't be simply 1UL<<MAX_ORDER...  
> That's all that
> mm/page_alloc.c seems to care about.

But doesn't it deal with page-sized chunks?

It makes sense if all the memory chunks have to start on a "MAX_ORDER"
boundary, but is that really the case?  That's pretty restrictive, at least
with such a large MAX_ORDER.

Why is MAX_ORDER 19 on IA64?

> > I understand the GRANULE rounding, but is there a compelling reason that
> > we need 8GB node chunks on IA64 Linux (with 16KB pages)?
>
> I don't think so.

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
  2003-11-10 17:23 ` Jesse Barnes
  2003-11-10 17:38 ` Van Maren, Kevin
@ 2003-11-10 17:56 ` Jesse Barnes
  2003-11-10 18:02 ` Seth, Rohit
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2003-11-10 17:56 UTC (permalink / raw)
  To: linux-ia64

On Mon, Nov 10, 2003 at 11:38:34AM -0600, Van Maren, Kevin wrote:
> > I wonder if that shouldn't be simply 1UL<<MAX_ORDER...  
> > That's all that
> > mm/page_alloc.c seems to care about.
> 
> But doesn't it deal with page-sized chunks?

Yeah, but 1UL<<MAX_ORDER will always be page aligned, right?

> It makes sense if all the memory chunks have to start on a "MAX_ORDER"
> boundary, but is that really the case?  That's pretty restrictive, at least
> with such a large MAX_ORDER.
> 
> Why is MAX_ORDER 19 on IA64?

For hugetlbfs, afaik.  But it should only be 18 according to
arch/ia64/Kconfig.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (2 preceding siblings ...)
  2003-11-10 17:56 ` Jesse Barnes
@ 2003-11-10 18:02 ` Seth, Rohit
  2003-11-10 18:34 ` Van Maren, Kevin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Seth, Rohit @ 2003-11-10 18:02 UTC (permalink / raw)
  To: linux-ia64



> > 
> > Why is MAX_ORDER 19 on IA64?
> 
> For hugetlbfs, afaik.  But it should only be 18 according to 
> arch/ia64/Kconfig.

That is correct. For the support of 4G hugepage size (when the normal
page size is 16K).  Kernel actually takes looks at MAX_ORDER -1 (and not
MAX_ORDER).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (3 preceding siblings ...)
  2003-11-10 18:02 ` Seth, Rohit
@ 2003-11-10 18:34 ` Van Maren, Kevin
  2003-11-10 19:08 ` Jesse Barnes
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2003-11-10 18:34 UTC (permalink / raw)
  To: linux-ia64

> Yeah, but 1UL<<MAX_ORDER will always be page aligned, right?

Only if MAX_ORDER >= PAGE_SHIFT.

But page alignment isn't the question: it is already aligned to
the 16MB or 64MB granules.

But you are saying that the address doesn't have to be as strict:
even if allocating 2^MAX_ORDER _pages_, the start doesn't have to
be aligned at a natural (PAGE_SIZE<<MAX_ORDER) boundary, and that
we can change the ORDERROUNDDOWN to not be as aggressive.

But then it also makes sense to have a smaller MAX_ORDER when not
using 4GB hugepages?  I'm happy with <= 256MB hugepages with 16GB ram,
so I guess I'd rather MAX_ORDER was normally smaller, and increased
only with very large hugepage pages.

Thanks,
Kevin Van Maren

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (4 preceding siblings ...)
  2003-11-10 18:34 ` Van Maren, Kevin
@ 2003-11-10 19:08 ` Jesse Barnes
  2003-11-10 19:10 ` Jesse Barnes
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2003-11-10 19:08 UTC (permalink / raw)
  To: linux-ia64

On Mon, Nov 10, 2003 at 12:34:47PM -0600, Van Maren, Kevin wrote:
> > Yeah, but 1UL<<MAX_ORDER will always be page aligned, right?
> 
> Only if MAX_ORDER >= PAGE_SHIFT.
> 
> But page alignment isn't the question: it is already aligned to
> the 16MB or 64MB granules.

Right.  Spaced out there for a minute...

> But you are saying that the address doesn't have to be as strict:
> even if allocating 2^MAX_ORDER _pages_, the start doesn't have to
> be aligned at a natural (PAGE_SIZE<<MAX_ORDER) boundary, and that
> we can change the ORDERROUNDDOWN to not be as aggressive.

Well, strictly speaking I don't think start _has_ to align on those
conditions, but the hugetlb stuff may that it does (I haven't looked).

> But then it also makes sense to have a smaller MAX_ORDER when not
> using 4GB hugepages?  I'm happy with <= 256MB hugepages with 16GB ram,
> so I guess I'd rather MAX_ORDER was normally smaller, and increased
> only with very large hugepage pages.

That makes sense to me.  It seems like FORCE_MAX_ZONEORDER should depend
on HUGETLB_PAGE_SIZE_* so that we don't apply unnecessary alignment
constraints.  Of course, there's probably something I'm missing, Rohit
might know more.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (5 preceding siblings ...)
  2003-11-10 19:08 ` Jesse Barnes
@ 2003-11-10 19:10 ` Jesse Barnes
  2003-11-10 20:20 ` Seth, Rohit
  2003-11-12 18:14 ` Van Maren, Kevin
  8 siblings, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2003-11-10 19:10 UTC (permalink / raw)
  To: linux-ia64

On Mon, Nov 10, 2003 at 11:08:31AM -0800, Jesse Barnes wrote:
> Well, strictly speaking I don't think start _has_ to align on those
> conditions, but the hugetlb stuff may that it does (I haven't looked).

s/may/may assume/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (6 preceding siblings ...)
  2003-11-10 19:10 ` Jesse Barnes
@ 2003-11-10 20:20 ` Seth, Rohit
  2003-11-12 18:14 ` Van Maren, Kevin
  8 siblings, 0 replies; 10+ messages in thread
From: Seth, Rohit @ 2003-11-10 20:20 UTC (permalink / raw)
  To: linux-ia64


In my opinion since the other zone build functions depend on (1UL <<
MAX_ORDER -1) for forcing the zone alignment, so the same should be used
here in ORDERSDOWN also.  If that seems too aggressive for any reason
then you should change the ORDERROUNDDOWN minimally to have (PAGE_SIZE
<< (MAX_ORDER -1)).  That should align the modules at 4G boundary for
16K PAGE_SIZE. 

Hugetlb needs certain order (of NORMAL_PAGE_SIZE)of contiguous pages to
be available for allocation.  The start of this allocation needs to be
at least HUGE_PAGE_SIZE_ALIGNED.  And I think that part is guaranteed by
buddy page_allocator(while allocating a certain order of pages).


rohit

> -----Original Message-----
> From: Jesse Barnes [mailto:jbarnes@sgi.com]
> Sent: Monday, November 10, 2003 11:09 AM
> To: Van Maren, Kevin
> Cc: Seth, Rohit; linux-ia64@vger.kernel.org
> Subject: Re: discontig patch question
> 
> On Mon, Nov 10, 2003 at 12:34:47PM -0600, Van Maren, Kevin wrote:
> > > Yeah, but 1UL<<MAX_ORDER will always be page aligned, right?
> >
> > Only if MAX_ORDER >= PAGE_SHIFT.
> >
> > But page alignment isn't the question: it is already aligned to
> > the 16MB or 64MB granules.
> 
> Right.  Spaced out there for a minute...
> 
> > But you are saying that the address doesn't have to be as strict:
> > even if allocating 2^MAX_ORDER _pages_, the start doesn't have to
> > be aligned at a natural (PAGE_SIZE<<MAX_ORDER) boundary, and that
> > we can change the ORDERROUNDDOWN to not be as aggressive.
> 
> Well, strictly speaking I don't think start _has_ to align on those
> conditions, but the hugetlb stuff may that it does (I haven't looked).
> 
> > But then it also makes sense to have a smaller MAX_ORDER when not
> > using 4GB hugepages?  I'm happy with <= 256MB hugepages with 16GB
ram,
> > so I guess I'd rather MAX_ORDER was normally smaller, and increased
> > only with very large hugepage pages.
> 
> That makes sense to me.  It seems like FORCE_MAX_ZONEORDER should
depend
> on HUGETLB_PAGE_SIZE_* so that we don't apply unnecessary alignment
> constraints.  Of course, there's probably something I'm missing, Rohit
> might know more.
> 
> Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: discontig patch question
  2003-11-10 15:52 discontig patch question Van Maren, Kevin
                   ` (7 preceding siblings ...)
  2003-11-10 20:20 ` Seth, Rohit
@ 2003-11-12 18:14 ` Van Maren, Kevin
  8 siblings, 0 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2003-11-12 18:14 UTC (permalink / raw)
  To: linux-ia64

The system boots and functions properly with 4GB nodes, using
#define ORDERROUNDDOWN(n) ((n) & ~((PAGE_SIZE<<MAX_ORDER-1)-1))
Leaving it out (1GB boundary) the kernel doesn't boot (memory
allocation problem).  But I'm (obviously) not using 4GB pages.

So either that fix or dropping MAX_ORDER needs to be made.

Kevin Van Maren

> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org
> [mailto:linux-ia64-owner@vger.kernel.org]On Behalf Of Seth, Rohit
> Sent: Monday, November 10, 2003 1:21 PM
> To: Jesse Barnes; Van Maren, Kevin
> Cc: linux-ia64@vger.kernel.org
> Subject: RE: discontig patch question
> 
> 
> 
> In my opinion since the other zone build functions depend on (1UL <<
> MAX_ORDER -1) for forcing the zone alignment, so the same 
> should be used
> here in ORDERSDOWN also.  If that seems too aggressive for any reason
> then you should change the ORDERROUNDDOWN minimally to have (PAGE_SIZE
> << (MAX_ORDER -1)).  That should align the modules at 4G boundary for
> 16K PAGE_SIZE. 
> 
> Hugetlb needs certain order (of NORMAL_PAGE_SIZE)of 
> contiguous pages to
> be available for allocation.  The start of this allocation needs to be
> at least HUGE_PAGE_SIZE_ALIGNED.  And I think that part is 
> guaranteed by
> buddy page_allocator(while allocating a certain order of pages).
> 
> 
> rohit
> 
> > -----Original Message-----
> > From: Jesse Barnes [mailto:jbarnes@sgi.com]
> > Sent: Monday, November 10, 2003 11:09 AM
> > To: Van Maren, Kevin
> > Cc: Seth, Rohit; linux-ia64@vger.kernel.org
> > Subject: Re: discontig patch question
> > 
> > On Mon, Nov 10, 2003 at 12:34:47PM -0600, Van Maren, Kevin wrote:
> > > > Yeah, but 1UL<<MAX_ORDER will always be page aligned, right?
> > >
> > > Only if MAX_ORDER >= PAGE_SHIFT.
> > >
> > > But page alignment isn't the question: it is already aligned to
> > > the 16MB or 64MB granules.
> > 
> > Right.  Spaced out there for a minute...
> > 
> > > But you are saying that the address doesn't have to be as strict:
> > > even if allocating 2^MAX_ORDER _pages_, the start doesn't have to
> > > be aligned at a natural (PAGE_SIZE<<MAX_ORDER) boundary, and that
> > > we can change the ORDERROUNDDOWN to not be as aggressive.
> > 
> > Well, strictly speaking I don't think start _has_ to align on those
> > conditions, but the hugetlb stuff may that it does (I 
> haven't looked).
> > 
> > > But then it also makes sense to have a smaller MAX_ORDER when not
> > > using 4GB hugepages?  I'm happy with <= 256MB hugepages with 16GB
> ram,
> > > so I guess I'd rather MAX_ORDER was normally smaller, and 
> increased
> > > only with very large hugepage pages.
> > 
> > That makes sense to me.  It seems like FORCE_MAX_ZONEORDER should
> depend
> > on HUGETLB_PAGE_SIZE_* so that we don't apply unnecessary alignment
> > constraints.  Of course, there's probably something I'm 
> missing, Rohit
> > might know more.
> > 
> > Jesse
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-11-12 18:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-10 15:52 discontig patch question Van Maren, Kevin
2003-11-10 17:23 ` Jesse Barnes
2003-11-10 17:38 ` Van Maren, Kevin
2003-11-10 17:56 ` Jesse Barnes
2003-11-10 18:02 ` Seth, Rohit
2003-11-10 18:34 ` Van Maren, Kevin
2003-11-10 19:08 ` Jesse Barnes
2003-11-10 19:10 ` Jesse Barnes
2003-11-10 20:20 ` Seth, Rohit
2003-11-12 18:14 ` Van Maren, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox