linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk.
@ 2015-05-11  6:26 Aneesh Kumar K.V
  2015-05-11  7:46 ` Kirill A. Shutemov
  0 siblings, 1 reply; 3+ messages in thread
From: Aneesh Kumar K.V @ 2015-05-11  6:26 UTC (permalink / raw)
  To: benh, paulus, mpe, kirill.shutemov, aarcange, akpm
  Cc: linux-mm, linuxppc-dev, linux-kernel, Aneesh Kumar K.V

Serialize against find_linux_pte_or_hugepte which does lock-less
lookup in page tables with local interrupts disabled. For huge pages
it casts pmd_t to pte_t. Since format of pte_t is different from
pmd_t we want to prevent transit from pmd pointing to page table
to pmd pointing to huge page (and back) while interrupts are disabled.
We clear pmd to possibly replace it with page table pointer in
different code paths. So make sure we wait for the parallel
find_linux_pte_or_hugepage to finish.

Without this patch, a find_linux_pte_or_hugepte running in parallel to
__split_huge_zero_page_pmd or do_huge_pmd_wp_page_fallback or zap_huge_pmd
can run into the above issue. With __split_huge_zero_page_pmd and
do_huge_pmd_wp_page_fallback we clear the hugepage pte before inserting
the pmd entry with a regular pgtable address. Such a clear need to
wait for the parallel find_linux_pte_or_hugepte to finish.

With zap_huge_pmd, we can run into issues, with a hugepage pte
getting zapped due to a MADV_DONTNEED while other cpu fault it
in as small pages.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Changes from V2:
* Drop the cleanup patch
  Will this as a separate patch and not bug fix.
* Update commit message

 arch/powerpc/mm/pgtable_64.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index b651179ac4da..1325be89e670 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -845,6 +845,17 @@ pmd_t pmdp_get_and_clear(struct mm_struct *mm,
 	 * hash fault look at them.
 	 */
 	memset(pgtable, 0, PTE_FRAG_SIZE);
+	/*
+	 * Serialize against find_linux_pte_or_hugepte which does lock-less
+	 * lookup in page tables with local interrupts disabled. For huge pages
+	 * it casts pmd_t to pte_t. Since format of pte_t is different from
+	 * pmd_t we want to prevent transit from pmd pointing to page table
+	 * to pmd pointing to huge page (and back) while interrupts are disabled.
+	 * We clear pmd to possibly replace it with page table pointer in
+	 * different code paths. So make sure we wait for the parallel
+	 * find_linux_pte_or_hugepage to finish.
+	 */
+	kick_all_cpus_sync();
 	return old_pmd;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk.
  2015-05-11  6:26 [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk Aneesh Kumar K.V
@ 2015-05-11  7:46 ` Kirill A. Shutemov
  2015-05-11  8:54   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 3+ messages in thread
From: Kirill A. Shutemov @ 2015-05-11  7:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: aarcange, linux-kernel, linux-mm, paulus, akpm, linuxppc-dev,
	kirill.shutemov

On Mon, May 11, 2015 at 11:56:01AM +0530, Aneesh Kumar K.V wrote:
> Serialize against find_linux_pte_or_hugepte which does lock-less
> lookup in page tables with local interrupts disabled. For huge pages
> it casts pmd_t to pte_t. Since format of pte_t is different from
> pmd_t we want to prevent transit from pmd pointing to page table
> to pmd pointing to huge page (and back) while interrupts are disabled.
> We clear pmd to possibly replace it with page table pointer in
> different code paths. So make sure we wait for the parallel
> find_linux_pte_or_hugepage to finish.
> 
> Without this patch, a find_linux_pte_or_hugepte running in parallel to
> __split_huge_zero_page_pmd or do_huge_pmd_wp_page_fallback or zap_huge_pmd
> can run into the above issue. With __split_huge_zero_page_pmd and
> do_huge_pmd_wp_page_fallback we clear the hugepage pte before inserting
> the pmd entry with a regular pgtable address. Such a clear need to
> wait for the parallel find_linux_pte_or_hugepte to finish.
> 
> With zap_huge_pmd, we can run into issues, with a hugepage pte
> getting zapped due to a MADV_DONTNEED while other cpu fault it
> in as small pages.
> 
> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

CC: stable@ ?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk.
  2015-05-11  7:46 ` Kirill A. Shutemov
@ 2015-05-11  8:54   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 3+ messages in thread
From: Aneesh Kumar K.V @ 2015-05-11  8:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: aarcange, linux-kernel, linux-mm, paulus, akpm, linuxppc-dev,
	kirill.shutemov

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Mon, May 11, 2015 at 11:56:01AM +0530, Aneesh Kumar K.V wrote:
>> Serialize against find_linux_pte_or_hugepte which does lock-less
>> lookup in page tables with local interrupts disabled. For huge pages
>> it casts pmd_t to pte_t. Since format of pte_t is different from
>> pmd_t we want to prevent transit from pmd pointing to page table
>> to pmd pointing to huge page (and back) while interrupts are disabled.
>> We clear pmd to possibly replace it with page table pointer in
>> different code paths. So make sure we wait for the parallel
>> find_linux_pte_or_hugepage to finish.
>> 
>> Without this patch, a find_linux_pte_or_hugepte running in parallel to
>> __split_huge_zero_page_pmd or do_huge_pmd_wp_page_fallback or zap_huge_pmd
>> can run into the above issue. With __split_huge_zero_page_pmd and
>> do_huge_pmd_wp_page_fallback we clear the hugepage pte before inserting
>> the pmd entry with a regular pgtable address. Such a clear need to
>> wait for the parallel find_linux_pte_or_hugepte to finish.
>> 
>> With zap_huge_pmd, we can run into issues, with a hugepage pte
>> getting zapped due to a MADV_DONTNEED while other cpu fault it
>> in as small pages.
>> 
>> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> CC: stable@ ?

Yes, We also need to pick,


dac5657067919161eb3273ca787d8ae9814801e7
691e95fd7396905a38d98919e9c150dbc3ea21a3
7d6e7f7ffaba4e013c7a0589140431799bc17985


But that may need me to a backport, because we have dependencies in kvm
and a cherry-pick may not work.

Will work with Michael Ellerman to find out what needs to be done.

-aneesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-11  8:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-11  6:26 [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk Aneesh Kumar K.V
2015-05-11  7:46 ` Kirill A. Shutemov
2015-05-11  8:54   ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).