* using different format for hugetlbfs @ 2009-12-04 7:18 Kumar Gala 2009-12-04 8:58 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 9+ messages in thread From: Kumar Gala @ 2009-12-04 7:18 UTC (permalink / raw) To: Benjamin Herrenschmidt, David Gibson; +Cc: linux-ppc list Ben, David, If we want to support true 4G/4G split on ppc32 using the MSB of the address to determine of the pgd_t is for hugetlbfs isn't going to work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to at least a 4K page I would think the low order 12-bits should always be 0. Could we use something like: addr[0:51] || shift [52:59] || flags [60:63] with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled pointer'. Seems like shift will at most be 64 so 8-bits should cover it. - k ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-04 7:18 using different format for hugetlbfs Kumar Gala @ 2009-12-04 8:58 ` Benjamin Herrenschmidt 2009-12-04 14:09 ` Kumar Gala 0 siblings, 1 reply; 9+ messages in thread From: Benjamin Herrenschmidt @ 2009-12-04 8:58 UTC (permalink / raw) To: Kumar Gala; +Cc: linux-ppc list, David Gibson On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote: > Ben, David, > > If we want to support true 4G/4G split on ppc32 using the MSB of the > address to determine of the pgd_t is for hugetlbfs isn't going to > work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to > at least a 4K page I would think the low order 12-bits should always > be 0. On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really want to have a different encoding for both types though. > Could we use something like: > > addr[0:51] || shift [52:59] || flags [60:63] > > with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled > pointer'. Seems like shift will at most be 64 so 8-bits should cover > it. Cheers, Ben. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-04 8:58 ` Benjamin Herrenschmidt @ 2009-12-04 14:09 ` Kumar Gala 2009-12-04 21:25 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 9+ messages in thread From: Kumar Gala @ 2009-12-04 14:09 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote: > On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote: >> Ben, David, >> >> If we want to support true 4G/4G split on ppc32 using the MSB of the >> address to determine of the pgd_t is for hugetlbfs isn't going to >> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to >> at least a 4K page I would think the low order 12-bits should always >> be 0. > > On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really > want to have a different encoding for both types though. What do you mean they can be smaller? We have some scenario when we dont allocate a full page? I agree having the encodings be different would be bad. I'm trying to avoid having it be different between 32 bit and 64 (but maybe that will be impossible). >> Could we use something like: >> >> addr[0:51] || shift [52:59] || flags [60:63] >> >> with the LSB flag being 'normal pointer' vs 'hugetlbfs mangled >> pointer'. Seems like shift will at most be 64 so 8-bits should cover >> it. > > Cheers, > Ben. > - k ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-04 14:09 ` Kumar Gala @ 2009-12-04 21:25 ` Benjamin Herrenschmidt 2009-12-06 3:05 ` Kumar Gala 0 siblings, 1 reply; 9+ messages in thread From: Benjamin Herrenschmidt @ 2009-12-04 21:25 UTC (permalink / raw) To: Kumar Gala; +Cc: linux-ppc list, David Gibson On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote: > On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote: > > > On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote: > >> Ben, David, > >> > >> If we want to support true 4G/4G split on ppc32 using the MSB of the > >> address to determine of the pgd_t is for hugetlbfs isn't going to > >> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is point to > >> at least a 4K page I would think the low order 12-bits should always > >> be 0. > > > > On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really > > want to have a different encoding for both types though. > > What do you mean they can be smaller? We have some scenario when we > dont allocate a full page? I agree having the encodings be different > would be bad. I'm trying to avoid having it be different between 32 > bit and 64 (but maybe that will be impossible). Yes. The intermediary levels are smaller on 64-bit. Also, with hugetlbfs it can create special levels of various sizes depending on the requirements to fit a given huge page size. And that would be true of both 32 and 64-bit in fact. Cheers, Ben. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-04 21:25 ` Benjamin Herrenschmidt @ 2009-12-06 3:05 ` Kumar Gala 2009-12-07 1:04 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 9+ messages in thread From: Kumar Gala @ 2009-12-06 3:05 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-ppc list, David Gibson On Dec 4, 2009, at 3:25 PM, Benjamin Herrenschmidt wrote: > On Fri, 2009-12-04 at 08:09 -0600, Kumar Gala wrote: >> On Dec 4, 2009, at 2:58 AM, Benjamin Herrenschmidt wrote: >> >>> On Fri, 2009-12-04 at 01:18 -0600, Kumar Gala wrote: >>>> Ben, David, >>>> >>>> If we want to support true 4G/4G split on ppc32 using the MSB of >>>> the >>>> address to determine of the pgd_t is for hugetlbfs isn't going to >>>> work. Since every pointer in the pgd_t -> pud_t -> pmd_t is >>>> point to >>>> at least a 4K page I would think the low order 12-bits should >>>> always >>>> be 0. >>> >>> On 32 bit maybe. On 64, the pg/u/md's can be smaller. I don't really >>> want to have a different encoding for both types though. >> >> What do you mean they can be smaller? We have some scenario when we >> dont allocate a full page? I agree having the encodings be different >> would be bad. I'm trying to avoid having it be different between 32 >> bit and 64 (but maybe that will be impossible). > > Yes. The intermediary levels are smaller on 64-bit. Also, with > hugetlbfs > it can create special levels of various sizes depending on the > requirements to fit a given huge page size. And that would be true of > both 32 and 64-bit in fact. Even than, does that preclude the format I suggested? I'm assuming that pgd_t/pud_t/pmd_t are always a double word so the low order 4- bits should be 0 (on 64-bit), so using the lsb as the flag between hugetlb and normal pointer should still work. - k ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-06 3:05 ` Kumar Gala @ 2009-12-07 1:04 ` Benjamin Herrenschmidt 2009-12-08 2:28 ` David Gibson 0 siblings, 1 reply; 9+ messages in thread From: Benjamin Herrenschmidt @ 2009-12-07 1:04 UTC (permalink / raw) To: Kumar Gala; +Cc: linux-ppc list, David Gibson > > Even than, does that preclude the format I suggested? I'm assuming > that pgd_t/pud_t/pmd_t are always a double word so the low order 4- > bits should be 0 (on 64-bit), so using the lsb as the flag between > hugetlb and normal pointer should still work. Might do, depends if David has enough bits ... David ? Cheers, Ben. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-07 1:04 ` Benjamin Herrenschmidt @ 2009-12-08 2:28 ` David Gibson 2009-12-08 15:44 ` Kumar Gala 0 siblings, 1 reply; 9+ messages in thread From: David Gibson @ 2009-12-08 2:28 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-ppc list On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt wrote: > > > > > Even than, does that preclude the format I suggested? I'm assuming > > that pgd_t/pud_t/pmd_t are always a double word so the low order 4- > > bits should be 0 (on 64-bit), Double word alignment only gives us 3 low bits. > so using the lsb as the flag between > > hugetlb and normal pointer should still work. > > Might do, depends if David has enough bits ... David ? Well, the flag can go at the bottom, but that will mean grabbing more bits at the bottom. At the moment to cover all the page table sizes that are wanted on the various setups we have, I need 5 bits, this would push it to 6. At present, I just force up the minimum alignment of any page directory (even if it's natural alignment is smaller) so as to make sure I have those bits. That's pretty easy to adjust, but pushing it up too high will start wasting memory, of course. If we move to a variable sized encoding, as Ben and I have discussed on a couple of occasions, I think we could do this though. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-08 2:28 ` David Gibson @ 2009-12-08 15:44 ` Kumar Gala 2009-12-09 2:00 ` David Gibson 0 siblings, 1 reply; 9+ messages in thread From: Kumar Gala @ 2009-12-08 15:44 UTC (permalink / raw) To: David Gibson; +Cc: linux-ppc list On Dec 7, 2009, at 8:28 PM, David Gibson wrote: > On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt > wrote: >> >>> >>> Even than, does that preclude the format I suggested? I'm assuming >>> that pgd_t/pud_t/pmd_t are always a double word so the low order 4- >>> bits should be 0 (on 64-bit), > > Double word alignment only gives us 3 low bits. > >> so using the lsb as the flag between >>> hugetlb and normal pointer should still work. >> >> Might do, depends if David has enough bits ... David ? > > Well, the flag can go at the bottom, but that will mean grabbing more > bits at the bottom. At the moment to cover all the page table sizes > that are wanted on the various setups we have, I need 5 bits, this > would push it to 6. At present, I just force up the minimum alignment > of any page directory (even if it's natural alignment is smaller) so > as to make sure I have those bits. That's pretty easy to adjust, but > pushing it up too high will start wasting memory, of course. > > If we move to a variable sized encoding, as Ben and I have discussed > on a couple of occasions, I think we could do this though. I don't understand. It seems like only the flag bit of normal pointer vs hugetlb is the only thing that we need to distinguish. Once we've done that all the other bits are free to use as we see fit. So the less significant bit can be used for that purpose and the size encoding, etc we are free to do what we want with. Am I missing something? - k ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: using different format for hugetlbfs 2009-12-08 15:44 ` Kumar Gala @ 2009-12-09 2:00 ` David Gibson 0 siblings, 0 replies; 9+ messages in thread From: David Gibson @ 2009-12-09 2:00 UTC (permalink / raw) To: Kumar Gala; +Cc: linux-ppc list On Tue, Dec 08, 2009 at 09:44:55AM -0600, Kumar Gala wrote: > > On Dec 7, 2009, at 8:28 PM, David Gibson wrote: > > >On Mon, Dec 07, 2009 at 12:04:37PM +1100, Benjamin Herrenschmidt > >wrote: > >> > >>> > >>>Even than, does that preclude the format I suggested? I'm assuming > >>>that pgd_t/pud_t/pmd_t are always a double word so the low order 4- > >>>bits should be 0 (on 64-bit), > > > >Double word alignment only gives us 3 low bits. > > > >>so using the lsb as the flag between > >>>hugetlb and normal pointer should still work. > >> > >>Might do, depends if David has enough bits ... David ? > > > >Well, the flag can go at the bottom, but that will mean grabbing more > >bits at the bottom. At the moment to cover all the page table sizes > >that are wanted on the various setups we have, I need 5 bits, this > >would push it to 6. At present, I just force up the minimum alignment > >of any page directory (even if it's natural alignment is smaller) so > >as to make sure I have those bits. That's pretty easy to adjust, but > >pushing it up too high will start wasting memory, of course. > > > >If we move to a variable sized encoding, as Ben and I have discussed > >on a couple of occasions, I think we could do this though. > > I don't understand. It seems like only the flag bit of normal > pointer vs hugetlb is the only thing that we need to distinguish. > Once we've done that all the other bits are free to use as we see > fit. So the less significant bit can be used for that purpose and > the size encoding, etc we are free to do what we want with. Well, yes, but the huge page directory pointers are still pointers, so this is one extra bit at the bottom which counts against our minimum alignment for those pointers. There's no natural lower bound on the size of the hugepte directories, and with existing setups they already go as low as 4 entries, which we already pad out to meet our minimum alignment. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-12-09 2:00 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-04 7:18 using different format for hugetlbfs Kumar Gala 2009-12-04 8:58 ` Benjamin Herrenschmidt 2009-12-04 14:09 ` Kumar Gala 2009-12-04 21:25 ` Benjamin Herrenschmidt 2009-12-06 3:05 ` Kumar Gala 2009-12-07 1:04 ` Benjamin Herrenschmidt 2009-12-08 2:28 ` David Gibson 2009-12-08 15:44 ` Kumar Gala 2009-12-09 2:00 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).