linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@samba.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [PATCH -V1 06/24] powerpc: Reduce PTE table memory wastage
Date: Mon, 4 Mar 2013 15:58:53 +1100	[thread overview]
Message-ID: <20130304045853.GB27523@drongo> (raw)
In-Reply-To: <1361865914-13911-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Feb 26, 2013 at 01:34:56PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We allocate one page for the last level of linux page table. With THP and
> large page size of 16MB, that would mean we are be wasting large part
> of that page. To map 16MB area, we only need a PTE space of 2K with 64K
> page size. This patch reduce the space wastage by sharing the page
> allocated for the last level of linux page table with multiple pmd
> entries. We call these smaller chunks PTE page fragments and allocated
> page, PTE page. We use the page->_mapcount as bitmap to indicate which
> PTE fragments are free.
> 
> page->_mapcount is divided into two halves. The upper half is used for
> tracking the freed page framents in the RCU grace period.
> 
> In order to support systems which doesn't have 64K HPTE support, we also
> add another 2K to PTE page fragment. The second half of the PTE fragments
> is used for storing slot and secondary bit information of an HPTE. With this
> we now have a 4K PTE fragment.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

This one has taken me hours to review.  Perhaps it's partly because of
the way that diff has matched things up, but it's difficult to see
what's moved where, what's common code that is now the 4k page case,
etc.  For example, pmd_alloc_one() and pmd_free() are unchanged, but
the diff shows them as removed in one place and added in another.

The other general comment I have is that it's not really clear when a
page will be on the mm->context.pgtable_list and when it won't.  I
would like to see an invariant that says something like "the page is
on the pgtable_list if and only if (page->_mapcount & FRAG_MASK) is
neither 0 nor FRAG_MASK".  But that doesn't seem to be the case
exactly, and I can't see any consistent rule, which makes me think
there are going to be bugs in corner cases.

Consider, for example, the case where a page has two fragments still
in use, and one of them gets queued up by RCU for freeing via a call
to page_table_free_rcu, and then the other one gets freed through
page_table_free().  Neither the call to page_table_free_rcu nor the
call to page_table_free will take the page off the list AFAICS, and
then __page_table_free_rcu() will free the page while it's still on
the pgtable_list.

More specific comments below...

> -static inline void pgtable_free(void *table, unsigned index_size)
> -{
> -	if (!index_size)
> -		free_page((unsigned long)table);
> -	else {
> -		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
> -		kmem_cache_free(PGT_CACHE(index_size), table);
> -	}
> -}

This is still used in the UP case, both for 4k and 64k, and UP configs
now fail to build.

>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
>  	free_page((unsigned long)pte);
> @@ -156,7 +118,12 @@ static inline void __tlb_remove_table(void *_table)
>  	void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
>  	unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
>  
> -	pgtable_free(table, shift);
> +	if (!shift)
> +		free_page((unsigned long)table);
> +	else {
> +		BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
> +		kmem_cache_free(PGT_CACHE(shift), table);
> +	}

Any particular reason for open-coding pgtable_free() here?

> +/*
> + * we support 15 fragments per PTE page. This is limited by how many
> + * bits we can pack in page->_mapcount. We use the first half for
> + * tracking the usage for rcu page table free.
> + */
> +#define FRAG_MASK_BITS	15
> +#define FRAG_MASK ((1 << FRAG_MASK_BITS) - 1)

Atomic_t variables are 32-bit, so you really should be able to make
FRAG_MASK_BITS be 16 and avoid wasting the last fragment of each page.

Paul.

  reply	other threads:[~2013-03-04  4:58 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-26  8:04 [PATCH -V1 00/24] THP support for PPC64 Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 01/24] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 02/24] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 03/24] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-02-27 23:09   ` Paul Mackerras
2013-02-26  8:04 ` [PATCH -V1 04/24] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 05/24] powerpc: Move the pte free routines from common header Aneesh Kumar K.V
2013-02-28  8:36   ` Paul Mackerras
2013-02-26  8:04 ` [PATCH -V1 06/24] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-03-04  4:58   ` Paul Mackerras [this message]
2013-03-04 10:58     ` Aneesh Kumar K.V
2013-03-04 23:36       ` Benjamin Herrenschmidt
2013-03-06  4:01         ` Aneesh Kumar K.V
2013-03-05  2:12       ` Paul Mackerras
2013-03-06  4:08         ` Aneesh Kumar K.V
2013-03-06  5:03         ` Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 07/24] powerpc: Add size argument to pgtable_cache_add Aneesh Kumar K.V
2013-03-04  5:13   ` Paul Mackerras
2013-03-04 11:02     ` Aneesh Kumar K.V
2013-03-05  1:50       ` Paul Mackerras
2013-03-06  4:23         ` Aneesh Kumar K.V
2013-02-26  8:04 ` [PATCH -V1 08/24] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-03-04  5:15   ` Paul Mackerras
2013-02-26  8:04 ` [PATCH -V1 09/24] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-03-04  5:48   ` Paul Mackerras
2013-03-04 11:41     ` Aneesh Kumar K.V
2013-03-05  2:02       ` Paul Mackerras
2013-03-06  4:30         ` Aneesh Kumar K.V
2013-03-04 11:52     ` Aneesh Kumar K.V
2013-03-04 18:11       ` Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 10/24] powerpc: Return all the valid pte ecndoing in KVM_PPC_GET_SMMU_INFO ioctl Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 11/24] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 12/24] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 13/24] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 14/24] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 15/24] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 16/24] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 17/24] mm/THP: withdraw the pgtable after pmdp related operations Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 18/24] powerpc/THP: Implement transparent huge pages for ppc64 Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 19/24] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 20/24] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 21/24] powerpc: Handle huge page in perf callchain Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 22/24] powerpc/THP: hypervisor require few WIMG bit set Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 23/24] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-02-26  8:05 ` [PATCH -V1 24/24] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130304045853.GB27523@drongo \
    --to=paulus@samba.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).