linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* using different format for hugetlbfs
@ 2009-12-04  7:18 Kumar Gala
  2009-12-04  8:58 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-04  7:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: linux-ppc list

Ben, David,

If we want to support true 4G/4G split on ppc32 using the MSB of the  
address to determine of the pgd_t is for hugetlbfs isn't going to  
work.  Since every pointer in the pgd_t -> pud_t -> pmd_t is point to  
at least a 4K page I would think the low order 12-bits should always  
be 0.

Could we use something like:

addr[0:51] || shift [52:59] || flags [60:63]

with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled  
pointer'.  Seems like shift will at most be 64 so 8-bits should cover  
it.

- k

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-04  7:18 using different format for hugetlbfs Kumar Gala
@ 2009-12-04  8:58 ` Benjamin Herrenschmidt
  2009-12-04 14:09   ` Kumar Gala
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-04  8:58 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linux-ppc list, David Gibson

On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
> Ben, David,
> 
> If we want to support true 4G/4G split on ppc32 using the MSB of the  
> address to determine of the pgd_t is for hugetlbfs isn't going to  
> work.  Since every pointer in the pgd_t -> pud_t -> pmd_t is point to  
> at least a 4K page I would think the low order 12-bits should always  
> be 0.

On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
want to have a different encoding for both types though.

> Could we use something like:
> 
> addr[0:51] || shift [52:59] || flags [60:63]
> 
> with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled  
> pointer'.  Seems like shift will at most be 64 so 8-bits should cover  
> it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-04  8:58 ` Benjamin Herrenschmidt
@ 2009-12-04 14:09   ` Kumar Gala
  2009-12-04 21:25     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-04 14:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson


On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:

> On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
>> Ben, David,
>>
>> If we want to support true 4G/4G split on ppc32 using the MSB of the
>> address to determine of the pgd_t is for hugetlbfs isn't going to
>> work.  Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
>> at least a 4K page I would think the low order 12-bits should always
>> be 0.
>
> On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
> want to have a different encoding for both types though.

What do you mean they can be smaller?  We have some scenario when we  
dont allocate a full page?  I agree having the encodings be different  
would be bad.  I'm trying to avoid having it be different between 32  
bit and 64 (but maybe that will be impossible).

>> Could we use something like:
>>
>> addr[0:51] || shift [52:59] || flags [60:63]
>>
>> with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled
>> pointer'.  Seems like shift will at most be 64 so 8-bits should cover
>> it.
>
> Cheers,
> Ben.
>

- k

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-04 14:09   ` Kumar Gala
@ 2009-12-04 21:25     ` Benjamin Herrenschmidt
  2009-12-06  3:05       ` Kumar Gala
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-04 21:25 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linux-ppc list, David Gibson

On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote:
> On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:
> 
> > On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
> >> Ben, David,
> >>
> >> If we want to support true 4G/4G split on ppc32 using the MSB of the
> >> address to determine of the pgd_t is for hugetlbfs isn't going to
> >> work.  Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
> >> at least a 4K page I would think the low order 12-bits should always
> >> be 0.
> >
> > On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
> > want to have a different encoding for both types though.
> 
> What do you mean they can be smaller?  We have some scenario when we  
> dont allocate a full page?  I agree having the encodings be different  
> would be bad.  I'm trying to avoid having it be different between 32  
> bit and 64 (but maybe that will be impossible).

Yes. The intermediary levels are smaller on 64-bit. Also, with hugetlbfs
it can create special levels of various sizes depending on the
requirements to fit a given huge page size. And that would be true of
both 32 and 64-bit in fact.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-04 21:25     ` Benjamin Herrenschmidt
@ 2009-12-06  3:05       ` Kumar Gala
  2009-12-07  1:04         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-06  3:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson


On Dec 4, 2009, at 3:25 PM, Benjamin Herrenschmidt wrote:

> On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote:
>> On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:
>>
>>> On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
>>>> Ben, David,
>>>>
>>>> If we want to support true 4G/4G split on ppc32 using the MSB of  
>>>> the
>>>> address to determine of the pgd_t is for hugetlbfs isn't going to
>>>> work.  Since every pointer in the pgd_t -> pud_t -> pmd_t is  
>>>> point to
>>>> at least a 4K page I would think the low order 12-bits should  
>>>> always
>>>> be 0.
>>>
>>> On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
>>> want to have a different encoding for both types though.
>>
>> What do you mean they can be smaller?  We have some scenario when we
>> dont allocate a full page?  I agree having the encodings be different
>> would be bad.  I'm trying to avoid having it be different between 32
>> bit and 64 (but maybe that will be impossible).
>
> Yes. The intermediary levels are smaller on 64-bit. Also, with  
> hugetlbfs
> it can create special levels of various sizes depending on the
> requirements to fit a given huge page size. And that would be true of
> both 32 and 64-bit in fact.

Even than, does that preclude the format I suggested?  I'm assuming  
that pgd_t/pud_t/pmd_t are always a double word so the low order 4- 
bits should be 0 (on 64-bit), so using the lsb as the flag between  
hugetlb and normal pointer should still work.

- k

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-06  3:05       ` Kumar Gala
@ 2009-12-07  1:04         ` Benjamin Herrenschmidt
  2009-12-08  2:28           ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-07  1:04 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linux-ppc list, David Gibson


> 
> Even than, does that preclude the format I suggested?  I'm assuming  
> that pgd_t/pud_t/pmd_t are always a double word so the low order 4- 
> bits should be 0 (on 64-bit), so using the lsb as the flag between  
> hugetlb and normal pointer should still work.

Might do, depends if David has enough bits ...  David ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-07  1:04         ` Benjamin Herrenschmidt
@ 2009-12-08  2:28           ` David Gibson
  2009-12-08 15:44             ` Kumar Gala
  0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2009-12-08  2:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-ppc list

On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt wrote:
> 
> > 
> > Even than, does that preclude the format I suggested?  I'm assuming  
> > that pgd_t/pud_t/pmd_t are always a double word so the low order 4- 
> > bits should be 0 (on 64-bit),

Double word alignment only gives us 3 low bits.

> so using the lsb as the flag between  
> > hugetlb and normal pointer should still work.
> 
> Might do, depends if David has enough bits ...  David ?

Well, the flag can go at the bottom, but that will mean grabbing more
bits at the bottom.  At the moment to cover all the page table sizes
that are wanted on the various setups we have, I need 5 bits, this
would push it to 6.  At present, I just force up the minimum alignment
of any page directory (even if it's natural alignment is smaller) so
as to make sure I have those bits.  That's pretty easy to adjust, but
pushing it up too high will start wasting memory, of course.

If we move to a variable sized encoding, as Ben and I have discussed
on a couple of occasions, I think we could do this though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-08  2:28           ` David Gibson
@ 2009-12-08 15:44             ` Kumar Gala
  2009-12-09  2:00               ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-08 15:44 UTC (permalink / raw)
  To: David Gibson; +Cc: linux-ppc list


On Dec 7, 2009, at 8:28 PM, David Gibson wrote:

> On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt  
> wrote:
>>
>>>
>>> Even than, does that preclude the format I suggested?  I'm assuming
>>> that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
>>> bits should be 0 (on 64-bit),
>
> Double word alignment only gives us 3 low bits.
>
>> so using the lsb as the flag between
>>> hugetlb and normal pointer should still work.
>>
>> Might do, depends if David has enough bits ...  David ?
>
> Well, the flag can go at the bottom, but that will mean grabbing more
> bits at the bottom.  At the moment to cover all the page table sizes
> that are wanted on the various setups we have, I need 5 bits, this
> would push it to 6.  At present, I just force up the minimum alignment
> of any page directory (even if it's natural alignment is smaller) so
> as to make sure I have those bits.  That's pretty easy to adjust, but
> pushing it up too high will start wasting memory, of course.
>
> If we move to a variable sized encoding, as Ben and I have discussed
> on a couple of occasions, I think we could do this though.

I don't understand.  It seems like only the flag bit of normal pointer  
vs hugetlb is the only thing that we need to distinguish.  Once we've  
done that all the other bits are free to use as we see fit.  So the  
less significant bit can be used for that purpose and the size  
encoding, etc we are free to do what we want with.

Am I missing something?

- k

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using different format for hugetlbfs
  2009-12-08 15:44             ` Kumar Gala
@ 2009-12-09  2:00               ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2009-12-09  2:00 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linux-ppc list

On Tue, Dec 08, 2009 at 09:44:55AM -0600, Kumar Gala wrote:
> 
> On Dec 7, 2009, at 8:28 PM, David Gibson wrote:
> 
> >On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt
> >wrote:
> >>
> >>>
> >>>Even than, does that preclude the format I suggested?  I'm assuming
> >>>that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
> >>>bits should be 0 (on 64-bit),
> >
> >Double word alignment only gives us 3 low bits.
> >
> >>so using the lsb as the flag between
> >>>hugetlb and normal pointer should still work.
> >>
> >>Might do, depends if David has enough bits ...  David ?
> >
> >Well, the flag can go at the bottom, but that will mean grabbing more
> >bits at the bottom.  At the moment to cover all the page table sizes
> >that are wanted on the various setups we have, I need 5 bits, this
> >would push it to 6.  At present, I just force up the minimum alignment
> >of any page directory (even if it's natural alignment is smaller) so
> >as to make sure I have those bits.  That's pretty easy to adjust, but
> >pushing it up too high will start wasting memory, of course.
> >
> >If we move to a variable sized encoding, as Ben and I have discussed
> >on a couple of occasions, I think we could do this though.
> 
> I don't understand.  It seems like only the flag bit of normal
> pointer vs hugetlb is the only thing that we need to distinguish.
> Once we've done that all the other bits are free to use as we see
> fit.  So the less significant bit can be used for that purpose and
> the size encoding, etc we are free to do what we want with.

Well, yes, but the huge page directory pointers are still pointers, so
this is one extra bit at the bottom which counts against our minimum
alignment for those pointers.  There's no natural lower bound on the
size of the hugepte directories, and with existing setups they already
go as low as 4 entries, which we already pad out to meet our minimum
alignment.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-12-09  2:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-04  7:18 using different format for hugetlbfs Kumar Gala
2009-12-04  8:58 ` Benjamin Herrenschmidt
2009-12-04 14:09   ` Kumar Gala
2009-12-04 21:25     ` Benjamin Herrenschmidt
2009-12-06  3:05       ` Kumar Gala
2009-12-07  1:04         ` Benjamin Herrenschmidt
2009-12-08  2:28           ` David Gibson
2009-12-08 15:44             ` Kumar Gala
2009-12-09  2:00               ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).