* using different format for hugetlbfs
@ 2009-12-04 7:18 Kumar Gala
2009-12-04 8:58 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-04 7:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt, David Gibson; +Cc: linux-ppc list
Ben, David,
If we want to support true 4G/4G split on ppc32 using the MSB of the
address to determine of the pgd_t is for hugetlbfs isn't going to
work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
at least a 4K page I would think the low order 12-bits should always
be 0.
Could we use something like:
addr[0:51] || shift [52:59] || flags [60:63]
with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled
pointer'. Seems like shift will at most be 64 so 8-bits should cover
it.
- k
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-04 7:18 using different format for hugetlbfs Kumar Gala
@ 2009-12-04 8:58 ` Benjamin Herrenschmidt
2009-12-04 14:09 ` Kumar Gala
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-04 8:58 UTC (permalink / raw)
To: Kumar Gala; +Cc: linux-ppc list, David Gibson
On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
> Ben, David,
>
> If we want to support true 4G/4G split on ppc32 using the MSB of the
> address to determine of the pgd_t is for hugetlbfs isn't going to
> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
> at least a 4K page I would think the low order 12-bits should always
> be 0.
On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
want to have a different encoding for both types though.
> Could we use something like:
>
> addr[0:51] || shift [52:59] || flags [60:63]
>
> with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled
> pointer'. Seems like shift will at most be 64 so 8-bits should cover
> it.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-04 8:58 ` Benjamin Herrenschmidt
@ 2009-12-04 14:09 ` Kumar Gala
2009-12-04 21:25 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-04 14:09 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson
On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
>> Ben, David,
>>
>> If we want to support true 4G/4G split on ppc32 using the MSB of the
>> address to determine of the pgd_t is for hugetlbfs isn't going to
>> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
>> at least a 4K page I would think the low order 12-bits should always
>> be 0.
>
> On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
> want to have a different encoding for both types though.
What do you mean they can be smaller? We have some scenario when we
dont allocate a full page? I agree having the encodings be different
would be bad. I'm trying to avoid having it be different between 32
bit and 64 (but maybe that will be impossible).
>> Could we use something like:
>>
>> addr[0:51] || shift [52:59] || flags [60:63]
>>
>> with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled
>> pointer'. Seems like shift will at most be 64 so 8-bits should cover
>> it.
>
> Cheers,
> Ben.
>
- k
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-04 14:09 ` Kumar Gala
@ 2009-12-04 21:25 ` Benjamin Herrenschmidt
2009-12-06 3:05 ` Kumar Gala
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-04 21:25 UTC (permalink / raw)
To: Kumar Gala; +Cc: linux-ppc list, David Gibson
On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote:
> On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:
>
> > On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
> >> Ben, David,
> >>
> >> If we want to support true 4G/4G split on ppc32 using the MSB of the
> >> address to determine of the pgd_t is for hugetlbfs isn't going to
> >> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to
> >> at least a 4K page I would think the low order 12-bits should always
> >> be 0.
> >
> > On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
> > want to have a different encoding for both types though.
>
> What do you mean they can be smaller? We have some scenario when we
> dont allocate a full page? I agree having the encodings be different
> would be bad. I'm trying to avoid having it be different between 32
> bit and 64 (but maybe that will be impossible).
Yes. The intermediary levels are smaller on 64-bit. Also, with hugetlbfs
it can create special levels of various sizes depending on the
requirements to fit a given huge page size. And that would be true of
both 32 and 64-bit in fact.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-04 21:25 ` Benjamin Herrenschmidt
@ 2009-12-06 3:05 ` Kumar Gala
2009-12-07 1:04 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-06 3:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson
On Dec 4, 2009, at 3:25 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote:
>> On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote:
>>
>>> On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote:
>>>> Ben, David,
>>>>
>>>> If we want to support true 4G/4G split on ppc32 using the MSB of
>>>> the
>>>> address to determine of the pgd_t is for hugetlbfs isn't going to
>>>> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is
>>>> point to
>>>> at least a 4K page I would think the low order 12-bits should
>>>> always
>>>> be 0.
>>>
>>> On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really
>>> want to have a different encoding for both types though.
>>
>> What do you mean they can be smaller? We have some scenario when we
>> dont allocate a full page? I agree having the encodings be different
>> would be bad. I'm trying to avoid having it be different between 32
>> bit and 64 (but maybe that will be impossible).
>
> Yes. The intermediary levels are smaller on 64-bit. Also, with
> hugetlbfs
> it can create special levels of various sizes depending on the
> requirements to fit a given huge page size. And that would be true of
> both 32 and 64-bit in fact.
Even than, does that preclude the format I suggested? I'm assuming
that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
bits should be 0 (on 64-bit), so using the lsb as the flag between
hugetlb and normal pointer should still work.
- k
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-06 3:05 ` Kumar Gala
@ 2009-12-07 1:04 ` Benjamin Herrenschmidt
2009-12-08 2:28 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2009-12-07 1:04 UTC (permalink / raw)
To: Kumar Gala; +Cc: linux-ppc list, David Gibson
>
> Even than, does that preclude the format I suggested? I'm assuming
> that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
> bits should be 0 (on 64-bit), so using the lsb as the flag between
> hugetlb and normal pointer should still work.
Might do, depends if David has enough bits ... David ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-07 1:04 ` Benjamin Herrenschmidt
@ 2009-12-08 2:28 ` David Gibson
2009-12-08 15:44 ` Kumar Gala
0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2009-12-08 2:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-ppc list
On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt wrote:
>
> >
> > Even than, does that preclude the format I suggested? I'm assuming
> > that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
> > bits should be 0 (on 64-bit),
Double word alignment only gives us 3 low bits.
> so using the lsb as the flag between
> > hugetlb and normal pointer should still work.
>
> Might do, depends if David has enough bits ... David ?
Well, the flag can go at the bottom, but that will mean grabbing more
bits at the bottom. At the moment to cover all the page table sizes
that are wanted on the various setups we have, I need 5 bits, this
would push it to 6. At present, I just force up the minimum alignment
of any page directory (even if it's natural alignment is smaller) so
as to make sure I have those bits. That's pretty easy to adjust, but
pushing it up too high will start wasting memory, of course.
If we move to a variable sized encoding, as Ben and I have discussed
on a couple of occasions, I think we could do this though.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-08 2:28 ` David Gibson
@ 2009-12-08 15:44 ` Kumar Gala
2009-12-09 2:00 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Kumar Gala @ 2009-12-08 15:44 UTC (permalink / raw)
To: David Gibson; +Cc: linux-ppc list
On Dec 7, 2009, at 8:28 PM, David Gibson wrote:
> On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt
> wrote:
>>
>>>
>>> Even than, does that preclude the format I suggested? I'm assuming
>>> that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
>>> bits should be 0 (on 64-bit),
>
> Double word alignment only gives us 3 low bits.
>
>> so using the lsb as the flag between
>>> hugetlb and normal pointer should still work.
>>
>> Might do, depends if David has enough bits ... David ?
>
> Well, the flag can go at the bottom, but that will mean grabbing more
> bits at the bottom. At the moment to cover all the page table sizes
> that are wanted on the various setups we have, I need 5 bits, this
> would push it to 6. At present, I just force up the minimum alignment
> of any page directory (even if it's natural alignment is smaller) so
> as to make sure I have those bits. That's pretty easy to adjust, but
> pushing it up too high will start wasting memory, of course.
>
> If we move to a variable sized encoding, as Ben and I have discussed
> on a couple of occasions, I think we could do this though.
I don't understand. It seems like only the flag bit of normal pointer
vs hugetlb is the only thing that we need to distinguish. Once we've
done that all the other bits are free to use as we see fit. So the
less significant bit can be used for that purpose and the size
encoding, etc we are free to do what we want with.
Am I missing something?
- k
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs
2009-12-08 15:44 ` Kumar Gala
@ 2009-12-09 2:00 ` David Gibson
0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2009-12-09 2:00 UTC (permalink / raw)
To: Kumar Gala; +Cc: linux-ppc list
On Tue, Dec 08, 2009 at 09:44:55AM -0600, Kumar Gala wrote:
>
> On Dec 7, 2009, at 8:28 PM, David Gibson wrote:
>
> >On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt
> >wrote:
> >>
> >>>
> >>>Even than, does that preclude the format I suggested? I'm assuming
> >>>that pgd_t/pud_t/pmd_t are always a double word so the low order 4-
> >>>bits should be 0 (on 64-bit),
> >
> >Double word alignment only gives us 3 low bits.
> >
> >>so using the lsb as the flag between
> >>>hugetlb and normal pointer should still work.
> >>
> >>Might do, depends if David has enough bits ... David ?
> >
> >Well, the flag can go at the bottom, but that will mean grabbing more
> >bits at the bottom. At the moment to cover all the page table sizes
> >that are wanted on the various setups we have, I need 5 bits, this
> >would push it to 6. At present, I just force up the minimum alignment
> >of any page directory (even if it's natural alignment is smaller) so
> >as to make sure I have those bits. That's pretty easy to adjust, but
> >pushing it up too high will start wasting memory, of course.
> >
> >If we move to a variable sized encoding, as Ben and I have discussed
> >on a couple of occasions, I think we could do this though.
>
> I don't understand. It seems like only the flag bit of normal
> pointer vs hugetlb is the only thing that we need to distinguish.
> Once we've done that all the other bits are free to use as we see
> fit. So the less significant bit can be used for that purpose and
> the size encoding, etc we are free to do what we want with.
Well, yes, but the huge page directory pointers are still pointers, so
this is one extra bit at the bottom which counts against our minimum
alignment for those pointers. There's no natural lower bound on the
size of the hugepte directories, and with existing setups they already
go as low as 4 entries, which we already pad out to meet our minimum
alignment.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-12-09 2:00 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-04 7:18 using different format for hugetlbfs Kumar Gala
2009-12-04 8:58 ` Benjamin Herrenschmidt
2009-12-04 14:09 ` Kumar Gala
2009-12-04 21:25 ` Benjamin Herrenschmidt
2009-12-06 3:05 ` Kumar Gala
2009-12-07 1:04 ` Benjamin Herrenschmidt
2009-12-08 2:28 ` David Gibson
2009-12-08 15:44 ` Kumar Gala
2009-12-09 2:00 ` David Gibson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).