From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
Date: Sat, 23 Feb 2013 22:08:10 +0530 [thread overview]
Message-ID: <8738wnc7f1.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130222052351.GE6139@drongo>
Paul Mackerras <paulus@samba.org> writes:
> On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> We now have PTE page consuming only 2K of the 64K page.This is in order to
>
> In fact the PTE page together with the hash table indexes occupies 4k,
> doesn't it? The comments in the code are similarly confusing since
> they talk about 2k but actually allocate 4k.
>
>> facilitate transparent huge page support, which works much better if our PMDs
>> cover 16MB instead of 256MB.
>>
>> Inorder to reduce the wastage, we now have multiple PTE page fragment
> ^ In order (two words)
>
>> from the same PTE page.
>
> A patch like this needs a more complete description and explanation
> than you have given. For instance, you could mention that the code
> that you're adding for the 32-bit and non-64k cases are just copies of
> the previously generic code from pgalloc.h (actually, this movement
> might be something that could be split out as a separate patch).
> Also, you should describe in outline how you keep a list of pages that
> aren't fully allocated and have a bitmap of which 4k sections are in
> use, and also how your scheme interacts with RCU.
will do
>
> [snip]
>
>> +#ifdef CONFIG_PPC_64K_PAGES
>> +/*
>> + * we support 15 fragments per PTE page. This is limited by how many
>
> Why only 15? Don't we get 16 fragments per page?
>
That was one of the details I wanted to come back and closely look at
before posting. But missed that in the excitement of getting this
all working :). ._mapcount is a signed value and hence I was not sure
whether setting the top bit have any impact on how we deal with the page
in other part of VM.
>> + * bits we can pack in page->_mapcount. We use the first half for
>> + * tracking the usage for rcu page table free.
>
> What does "first" mean? The high half or the low half?
>
high half.
>> +unsigned long *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr)
>> +{
>> + struct page *page;
>> + unsigned int mask, bit;
>> + unsigned long *table;
>> +
>> + /* Allocate fragments of a 4K page as 1K/2K page table */
>
> A 4k page? Do you mean a 64k page? And what is 1K to do with
> anything?
>
That should be completely dropped, Cut-paste from s390 code :)
>> +#ifdef CONFIG_SMP
>> +static void __page_table_free_rcu(void *table)
>> +{
>> + unsigned int bit;
>> + struct page *page;
>> + /*
>> + * this is a PTE page free 2K page table
>> + * fragment of a 64K page.
>> + */
>> + page = virt_to_page(table);
>> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> + bit <<= FRAG_MASK_BITS;
>> + /*
>> + * clear the higher half and if nobody used the page in
>> + * between, even lower half would be zero.
>> + */
>> + if (atomic_xor_bits(&page->_mapcount, bit) == 0) {
>> + pgtable_page_dtor(page);
>> + atomic_set(&page->_mapcount, -1);
>> + __free_page(page);
>> + }
>> +}
>> +
>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
>> +{
>> + struct page *page;
>> + struct mm_struct *mm;
>> + unsigned int bit, mask;
>> +
>> + mm = tlb->mm;
>> + /* Free 2K page table fragment of a 64K page */
>> + page = virt_to_page(table);
>> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> + spin_lock(&mm->page_table_lock);
>> + /*
>> + * stash the actual mask in higher half, and clear the lower half
>> + * and selectively, add remove from pgtable list
>> + */
>> + mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> + if (!(mask & FRAG_MASK))
>> + list_del(&page->lru);
>> + else {
>> + /*
>> + * Add the page table page to pgtable_list so that
>> + * the free fragment can be used by the next alloc
>> + */
>> + list_del_init(&page->lru);
>> + list_add_tail(&page->lru, &mm->context.pgtable_list);
>> + }
>> + spin_unlock(&mm->page_table_lock);
>> + tlb_remove_table(tlb, table);
>> +}
-aneesh
next prev parent reply other threads:[~2013-02-23 16:38 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-21 16:47 [RFC PATCH -V2 00/21] THP support for PPC64 Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 01/21] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-02-22 5:00 ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 02/21] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-02-22 5:03 ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 03/21] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-02-22 5:06 ` Paul Mackerras
2013-02-23 16:17 ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 04/21] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-02-22 5:07 ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-02-22 0:32 ` David Gibson
2013-02-22 5:14 ` Aneesh Kumar K.V
2013-02-22 5:23 ` Paul Mackerras
2013-02-22 17:20 ` Aneesh Kumar K.V
2013-02-23 16:38 ` Aneesh Kumar K.V [this message]
2013-02-21 16:47 ` [RFC PATCH -V2 06/21] powerpc: Add size argument to pgtable_cache_add Aneesh Kumar K.V
2013-02-22 5:27 ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 07/21] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-02-22 5:28 ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 08/21] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-02-22 5:37 ` Paul Mackerras
2013-02-24 16:51 ` Aneesh Kumar K.V
2013-02-24 17:45 ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 09/21] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 10/21] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 11/21] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 12/21] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 13/21] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 14/21] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 15/21] mm/THP: support for zerout withdraw Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 16/21] powerpc/THP: Implement transparent huge pages for ppc64 Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 17/21] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 18/21] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 19/21] powerpc/THP: hypervisor require few WIMG bit set Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 20/21] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 21/21] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-03-21 8:17 ` [RFC PATCH -V2 00/21] THP support for PPC64 Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8738wnc7f1.fsf@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).