linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jon Tollefson <kniht@linux.vnet.ibm.com>
To: libhugetlbfs-devel@lists.sourceforge.net,
	linuxppc-dev@ozlabs.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [Libhugetlbfs-devel] Buglet in 16G page handling
Date: Thu, 04 Sep 2008 16:08:30 -0500	[thread overview]
Message-ID: <48C04E4E.6090408@linux.vnet.ibm.com> (raw)
In-Reply-To: <48BF0D6F.2010602@linux.vnet.ibm.com>

Jon Tollefson wrote:
> David Gibson wrote:
>   
>> On Tue, Sep 02, 2008 at 12:12:27PM -0500, Jon Tollefson wrote:
>>   
>>     
>>> David Gibson wrote:
>>>     
>>>       
>>>> When BenH and I were looking at the new code for handling 16G pages,
>>>> we noticed a small bug.  It doesn't actually break anything user
>>>> visible, but it's certainly not the way things are supposed to be.
>>>> The 16G patches didn't update the huge_pte_offset() and
>>>> huge_pte_alloc() functions, which means that the hugepte tables for
>>>> 16G pages will be allocated much further down the page table tree than
>>>> they should be - allocating several levels of page table with a single
>>>> entry in them along the way.
>>>>
>>>> The patch below is supposed to fix this, cleaning up the existing
>>>> handling of 64k vs 16M pages while its at it.  However, it needs some
>>>> testing.
>>>>
>>>> I've checked that it doesn't break existing 16M support, either with
>>>> 4k or 64k base pages.  I haven't figured out how to test with 64k
>>>> pages yet, at least until the multisize support goes into
>>>> libhugetlbfs.  For 16G pages, I just don't have access to a machine
>>>> with enough memory to test.  Jon, presumably you must have found such
>>>> a machine when you did the 16G page support in the first place.  Do
>>>> you still have access, and can you test this patch?
>>>>   
>>>>       
>>>>         
>>> I do have access to a machine to test it.  I applied the patch to -rc4
>>> and used a pseries_defconfig.  I boot with
>>> default_hugepagesz=16G... in order to test huge page sizes other then
>>> 16M at this point.
>>>
>>> Running the libhugetlbfs test suite it gets as far as   Readback (64):  
>>> PASS
>>> before it hits the following program check.
>>>     
>>>       
>> Ah, yes, oops, forgot to fix up the pagetable freeing path in line
>> with the other changes.  Try the revised version below.
>>   
>>     
> I have run through the tests twice now with this new patch using a 4k
> base page size(and 16G huge page size) and there are no program checks
> or spin lock issues.  So looking good.
>
> I will run it next a couple of times with 64K base pages.
>   
I have run through the libhugetest suite 3 times each now with both
combinations(4k and 64K base page) and have not seen the spin lock
problem or any other problems.

Acked-by: Jon Tollefson <kniht@linux.vnet.ibm.com>


> Jon
>
>
>
>
>   
>> Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
>> ===================================================================
>> --- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2008-09-02 11:50:12.000000000 +1000
>> +++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2008-09-03 10:10:54.000000000 +1000
>> @@ -128,29 +128,37 @@ static int __hugepte_alloc(struct mm_str
>>  	return 0;
>>  }
>>
>> -/* Base page size affects how we walk hugetlb page tables */
>> -#ifdef CONFIG_PPC_64K_PAGES
>> -#define hpmd_offset(pud, addr, h)	pmd_offset(pud, addr)
>> -#define hpmd_alloc(mm, pud, addr, h)	pmd_alloc(mm, pud, addr)
>> -#else
>> -static inline
>> -pmd_t *hpmd_offset(pud_t *pud, unsigned long addr, struct hstate *hstate)
>> +
>> +static pud_t *hpud_offset(pgd_t *pgd, unsigned long addr, struct hstate *hstate)
>> +{
>> +	if (huge_page_shift(hstate) < PUD_SHIFT)
>> +		return pud_offset(pgd, addr);
>> +	else
>> +		return (pud_t *) pgd;
>> +}
>> +static pud_t *hpud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
>> +			 struct hstate *hstate)
>>  {
>> -	if (huge_page_shift(hstate) == PAGE_SHIFT_64K)
>> +	if (huge_page_shift(hstate) < PUD_SHIFT)
>> +		return pud_alloc(mm, pgd, addr);
>> +	else
>> +		return (pud_t *) pgd;
>> +}
>> +static pmd_t *hpmd_offset(pud_t *pud, unsigned long addr, struct hstate *hstate)
>> +{
>> +	if (huge_page_shift(hstate) < PMD_SHIFT)
>>  		return pmd_offset(pud, addr);
>>  	else
>>  		return (pmd_t *) pud;
>>  }
>> -static inline
>> -pmd_t *hpmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
>> -		  struct hstate *hstate)
>> +static pmd_t *hpmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
>> +			 struct hstate *hstate)
>>  {
>> -	if (huge_page_shift(hstate) == PAGE_SHIFT_64K)
>> +	if (huge_page_shift(hstate) < PMD_SHIFT)
>>  		return pmd_alloc(mm, pud, addr);
>>  	else
>>  		return (pmd_t *) pud;
>>  }
>> -#endif
>>
>>  /* Build list of addresses of gigantic pages.  This function is used in early
>>   * boot before the buddy or bootmem allocator is setup.
>> @@ -204,7 +212,7 @@ pte_t *huge_pte_offset(struct mm_struct 
>>
>>  	pg = pgd_offset(mm, addr);
>>  	if (!pgd_none(*pg)) {
>> -		pu = pud_offset(pg, addr);
>> +		pu = hpud_offset(pg, addr, hstate);
>>  		if (!pud_none(*pu)) {
>>  			pm = hpmd_offset(pu, addr, hstate);
>>  			if (!pmd_none(*pm))
>> @@ -233,7 +241,7 @@ pte_t *huge_pte_alloc(struct mm_struct *
>>  	addr &= hstate->mask;
>>
>>  	pg = pgd_offset(mm, addr);
>> -	pu = pud_alloc(mm, pg, addr);
>> +	pu = hpud_alloc(mm, pg, addr, hstate);
>>
>>  	if (pu) {
>>  		pm = hpmd_alloc(mm, pu, addr, hstate);
>> @@ -316,13 +324,7 @@ static void hugetlb_free_pud_range(struc
>>  	pud = pud_offset(pgd, addr);
>>  	do {
>>  		next = pud_addr_end(addr, end);
>> -#ifdef CONFIG_PPC_64K_PAGES
>> -		if (pud_none_or_clear_bad(pud))
>> -			continue;
>> -		hugetlb_free_pmd_range(tlb, pud, addr, next, floor, ceiling,
>> -				       psize);
>> -#else
>> -		if (shift == PAGE_SHIFT_64K) {
>> +		if (shift < PMD_SHIFT) {
>>  			if (pud_none_or_clear_bad(pud))
>>  				continue;
>>  			hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
>> @@ -332,7 +334,6 @@ static void hugetlb_free_pud_range(struc
>>  				continue;
>>  			free_hugepte_range(tlb, (hugepd_t *)pud, psize);
>>  		}
>> -#endif
>>  	} while (pud++, addr = next, addr != end);
>>
>>  	start &= PGDIR_MASK;
>> @@ -422,9 +423,15 @@ void hugetlb_free_pgd_range(struct mmu_g
>>  		psize = get_slice_psize(tlb->mm, addr);
>>  		BUG_ON(!mmu_huge_psizes[psize]);
>>  		next = pgd_addr_end(addr, end);
>> -		if (pgd_none_or_clear_bad(pgd))
>> -			continue;
>> -		hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling);
>> +		if (mmu_psize_to_shift(psize) < PUD_SHIFT) {
>> +			if (pgd_none_or_clear_bad(pgd))
>> +				continue;
>> +			hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling);
>> +		} else {
>> +			if (pgd_none(*pgd))
>> +				continue;
>> +			free_hugepte_range(tlb, (hugepd_t *)pgd, psize);
>> +		}
>>  	} while (pgd++, addr = next, addr != end);
>>  }
>>     

  parent reply	other threads:[~2008-09-04 21:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-02  5:05 Buglet in 16G page handling David Gibson
2008-09-02 12:44 ` [Libhugetlbfs-devel] " Mel Gorman
2008-09-02 16:25   ` Nishanth Aravamudan
2008-09-02 21:05   ` Benjamin Herrenschmidt
2008-09-02 22:16     ` Jon Tollefson
2008-09-02 22:53       ` Benjamin Herrenschmidt
2008-09-03 14:11         ` Jon Tollefson
2008-09-02 17:12 ` Jon Tollefson
2008-09-03  0:20   ` [Libhugetlbfs-devel] " David Gibson
2008-09-03 22:19     ` Jon Tollefson
2008-09-04  6:22       ` David Gibson
2008-09-04 21:08       ` Jon Tollefson [this message]
2008-09-05  1:36         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C04E4E.6090408@linux.vnet.ibm.com \
    --to=kniht@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=libhugetlbfs-devel@lists.sourceforge.net \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).