linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@samba.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
Date: Fri, 22 Feb 2013 16:23:51 +1100	[thread overview]
Message-ID: <20130222052351.GE6139@drongo> (raw)
In-Reply-To: <1361465248-10867-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We now have PTE page consuming only 2K of the 64K page.This is in order to

In fact the PTE page together with the hash table indexes occupies 4k,
doesn't it?  The comments in the code are similarly confusing since
they talk about 2k but actually allocate 4k.

> facilitate transparent huge page support, which works much better if our PMDs
> cover 16MB instead of 256MB.
> 
> Inorder to reduce the wastage, we now have multiple PTE page fragment
  ^ In order (two words)

> from the same PTE page.

A patch like this needs a more complete description and explanation
than you have given.  For instance, you could mention that the code
that you're adding for the 32-bit and non-64k cases are just copies of
the previously generic code from pgalloc.h (actually, this movement
might be something that could be split out as a separate patch).
Also, you should describe in outline how you keep a list of pages that
aren't fully allocated and have a bitmap of which 4k sections are in
use, and also how your scheme interacts with RCU.

[snip]

> +#ifdef CONFIG_PPC_64K_PAGES
> +/*
> + * we support 15 fragments per PTE page. This is limited by how many

Why only 15?  Don't we get 16 fragments per page?

> + * bits we can pack in page->_mapcount. We use the first half for
> + * tracking the usage for rcu page table free.

What does "first" mean?  The high half or the low half?

> +unsigned long *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr)
> +{
> +	struct page *page;
> +	unsigned int mask, bit;
> +	unsigned long *table;
> +
> +	/* Allocate fragments of a 4K page as 1K/2K page table */

A 4k page?  Do you mean a 64k page?  And what is 1K to do with
anything?

> +#ifdef CONFIG_SMP
> +static void __page_table_free_rcu(void *table)
> +{
> +	unsigned int bit;
> +	struct page *page;
> +	/*
> +	 * this is a PTE page free 2K page table
> +	 * fragment of a 64K page.
> +	 */
> +	page = virt_to_page(table);
> +	bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
> +	bit <<= FRAG_MASK_BITS;
> +	/*
> +	 * clear the higher half and if nobody used the page in
> +	 * between, even lower half would be zero.
> +	 */
> +	if (atomic_xor_bits(&page->_mapcount, bit) == 0) {
> +		pgtable_page_dtor(page);
> +		atomic_set(&page->_mapcount, -1);
> +		__free_page(page);
> +	}
> +}
> +
> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
> +{
> +	struct page *page;
> +	struct mm_struct *mm;
> +	unsigned int bit, mask;
> +
> +	mm = tlb->mm;
> +	/* Free 2K page table fragment of a 64K page */
> +	page = virt_to_page(table);
> +	bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
> +	spin_lock(&mm->page_table_lock);
> +	/*
> +	 * stash the actual mask in higher half, and clear the lower half
> +	 * and selectively, add remove from pgtable list
> +	 */
> +	mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
> +	if (!(mask & FRAG_MASK))
> +		list_del(&page->lru);
> +	else {
> +		/*
> +		 * Add the page table page to pgtable_list so that
> +		 * the free fragment can be used by the next alloc
> +		 */
> +		list_del_init(&page->lru);
> +		list_add_tail(&page->lru, &mm->context.pgtable_list);
> +	}
> +	spin_unlock(&mm->page_table_lock);
> +	tlb_remove_table(tlb, table);
> +}

This looks like you're allowing a fragment that is being freed to be
reallocated and used again during the grace period when we are waiting
for any references to the fragment to disappear.  Doesn't that allow a
race where one CPU traversing the page table and using the fragment in
its old location in the tree could see a PTE created after the
fragment was reallocated?  In other words, why is it safe to allow the
fragment to be used during the grace period?  If it is safe, it at
least needs a comment explaining why.

Paul.

  parent reply	other threads:[~2013-02-22  5:23 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 16:47 [RFC PATCH -V2 00/21] THP support for PPC64 Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 01/21] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-02-22  5:00   ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 02/21] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-02-22  5:03   ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 03/21] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-02-22  5:06   ` Paul Mackerras
2013-02-23 16:17     ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 04/21] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-02-22  5:07   ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-02-22  0:32   ` David Gibson
2013-02-22  5:14     ` Aneesh Kumar K.V
2013-02-22  5:23   ` Paul Mackerras [this message]
2013-02-22 17:20     ` Aneesh Kumar K.V
2013-02-23 16:38     ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 06/21] powerpc: Add size argument to pgtable_cache_add Aneesh Kumar K.V
2013-02-22  5:27   ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 07/21] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-02-22  5:28   ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 08/21] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-02-22  5:37   ` Paul Mackerras
2013-02-24 16:51     ` Aneesh Kumar K.V
2013-02-24 17:45     ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 09/21] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 10/21] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 11/21] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 12/21] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 13/21] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 14/21] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 15/21] mm/THP: support for zerout withdraw Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 16/21] powerpc/THP: Implement transparent huge pages for ppc64 Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 17/21] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 18/21] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 19/21] powerpc/THP: hypervisor require few WIMG bit set Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 20/21] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 21/21] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-03-21  8:17 ` [RFC PATCH -V2 00/21] THP support for PPC64 Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130222052351.GE6139@drongo \
    --to=paulus@samba.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).