All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
Date: Fri, 22 Feb 2013 22:50:49 +0530	[thread overview]
Message-ID: <87y5eguuxa.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130222052351.GE6139@drongo>

Paul Mackerras <paulus@samba.org> writes:

I will reply to the other parts in a seperate email, but the below

>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
>> +{
>> +	struct page *page;
>> +	struct mm_struct *mm;
>> +	unsigned int bit, mask;
>> +
>> +	mm = tlb->mm;
>> +	/* Free 2K page table fragment of a 64K page */
>> +	page = virt_to_page(table);
>> +	bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> +	spin_lock(&mm->page_table_lock);
>> +	/*
>> +	 * stash the actual mask in higher half, and clear the lower half
>> +	 * and selectively, add remove from pgtable list
>> +	 */
>> +	mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> +	if (!(mask & FRAG_MASK))
>> +		list_del(&page->lru);
>> +	else {
>> +		/*
>> +		 * Add the page table page to pgtable_list so that
>> +		 * the free fragment can be used by the next alloc
>> +		 */
>> +		list_del_init(&page->lru);
>> +		list_add_tail(&page->lru, &mm->context.pgtable_list);
>> +	}
>> +	spin_unlock(&mm->page_table_lock);
>> +	tlb_remove_table(tlb, table);
>> +}
>
> This looks like you're allowing a fragment that is being freed to be
> reallocated and used again during the grace period when we are waiting
> for any references to the fragment to disappear.  Doesn't that allow a
> race where one CPU traversing the page table and using the fragment in
> its old location in the tree could see a PTE created after the
> fragment was reallocated?  In other words, why is it safe to allow the
> fragment to be used during the grace period?  If it is safe, it at
> least needs a comment explaining why.
>

We don't allow it to be reallocated during the grace period. The trick
is in the below lines of page_table_alloc()

		/*
		 * Update with the higher order mask bits accumulated,
		 * added as a part of rcu free.
		 */
		mask = mask | (mask >> FRAG_MASK_BITS);

When checking for mask, we also look at the higher order bits.

The reason we add the page back to &mm->context.pgtable_list in
page_table_free_rcu is because we need to have access to struct
mm_struct. We don't have that in the rcu call back. So we add early and
make sure we don't reallocate them, until the grace period is over.

I will definitely add more comments around the code to clarify these details.

-aneesh

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: benh@kernel.crashing.org, linuxppc-dev@lists.ozlabs.org,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
Date: Fri, 22 Feb 2013 22:50:49 +0530	[thread overview]
Message-ID: <87y5eguuxa.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130222052351.GE6139@drongo>

Paul Mackerras <paulus@samba.org> writes:

I will reply to the other parts in a seperate email, but the below

>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
>> +{
>> +	struct page *page;
>> +	struct mm_struct *mm;
>> +	unsigned int bit, mask;
>> +
>> +	mm = tlb->mm;
>> +	/* Free 2K page table fragment of a 64K page */
>> +	page = virt_to_page(table);
>> +	bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> +	spin_lock(&mm->page_table_lock);
>> +	/*
>> +	 * stash the actual mask in higher half, and clear the lower half
>> +	 * and selectively, add remove from pgtable list
>> +	 */
>> +	mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> +	if (!(mask & FRAG_MASK))
>> +		list_del(&page->lru);
>> +	else {
>> +		/*
>> +		 * Add the page table page to pgtable_list so that
>> +		 * the free fragment can be used by the next alloc
>> +		 */
>> +		list_del_init(&page->lru);
>> +		list_add_tail(&page->lru, &mm->context.pgtable_list);
>> +	}
>> +	spin_unlock(&mm->page_table_lock);
>> +	tlb_remove_table(tlb, table);
>> +}
>
> This looks like you're allowing a fragment that is being freed to be
> reallocated and used again during the grace period when we are waiting
> for any references to the fragment to disappear.  Doesn't that allow a
> race where one CPU traversing the page table and using the fragment in
> its old location in the tree could see a PTE created after the
> fragment was reallocated?  In other words, why is it safe to allow the
> fragment to be used during the grace period?  If it is safe, it at
> least needs a comment explaining why.
>

We don't allow it to be reallocated during the grace period. The trick
is in the below lines of page_table_alloc()

		/*
		 * Update with the higher order mask bits accumulated,
		 * added as a part of rcu free.
		 */
		mask = mask | (mask >> FRAG_MASK_BITS);

When checking for mask, we also look at the higher order bits.

The reason we add the page back to &mm->context.pgtable_list in
page_table_free_rcu is because we need to have access to struct
mm_struct. We don't have that in the rcu call back. So we add early and
make sure we don't reallocate them, until the grace period is over.

I will definitely add more comments around the code to clarify these details.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-22 17:20 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 16:47 [RFC PATCH -V2 00/21] THP support for PPC64 Aneesh Kumar K.V
2013-02-21 16:47 ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 01/21] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:00   ` Paul Mackerras
2013-02-22  5:00     ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 02/21] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:03   ` Paul Mackerras
2013-02-22  5:03     ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 03/21] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:06   ` Paul Mackerras
2013-02-22  5:06     ` Paul Mackerras
2013-02-23 16:17     ` Aneesh Kumar K.V
2013-02-23 16:17       ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 04/21] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:07   ` Paul Mackerras
2013-02-22  5:07     ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  0:32   ` David Gibson
2013-02-22  0:32     ` David Gibson
2013-02-22  5:14     ` Aneesh Kumar K.V
2013-02-22  5:14       ` Aneesh Kumar K.V
2013-02-22  5:23   ` Paul Mackerras
2013-02-22  5:23     ` Paul Mackerras
2013-02-22 17:20     ` Aneesh Kumar K.V [this message]
2013-02-22 17:20       ` Aneesh Kumar K.V
2013-02-23 16:38     ` Aneesh Kumar K.V
2013-02-23 16:38       ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 06/21] powerpc: Add size argument to pgtable_cache_add Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:27   ` Paul Mackerras
2013-02-22  5:27     ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 07/21] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:28   ` Paul Mackerras
2013-02-22  5:28     ` Paul Mackerras
2013-02-21 16:47 ` [RFC PATCH -V2 08/21] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-22  5:37   ` Paul Mackerras
2013-02-22  5:37     ` Paul Mackerras
2013-02-24 16:51     ` Aneesh Kumar K.V
2013-02-24 16:51       ` Aneesh Kumar K.V
2013-02-24 17:45     ` Aneesh Kumar K.V
2013-02-24 17:45       ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 09/21] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 10/21] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 11/21] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 12/21] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 13/21] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 14/21] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 15/21] mm/THP: support for zerout withdraw Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 16/21] powerpc/THP: Implement transparent huge pages for ppc64 Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 17/21] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 18/21] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 19/21] powerpc/THP: hypervisor require few WIMG bit set Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 20/21] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-02-21 16:47 ` [RFC PATCH -V2 21/21] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-02-21 16:47   ` Aneesh Kumar K.V
2013-03-21  8:17 ` [RFC PATCH -V2 00/21] THP support for PPC64 Simon Jeons
2013-03-21  8:17   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y5eguuxa.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.