All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <bsingharora@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
	akpm@linux-foundation.org,
	Mel Gorman <mgorman@techsingularity.net>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update
Date: Mon, 15 Feb 2016 13:44:38 +1100	[thread overview]
Message-ID: <1455504278.16012.18.camel@gmail.com> (raw)
In-Reply-To: <1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, 2016-02-09 at 06:50 +0530, Aneesh Kumar K.V wrote:
> 
> Also make sure we wait for irq disable section in other cpus to finish
> before flipping a huge pte entry with a regular pmd entry. Code paths
> like find_linux_pte_or_hugepte depend on irq disable to get
> a stable pte_t pointer. A parallel thp split need to make sure we
> don't convert a pmd pte to a regular pmd entry without waiting for the
> irq disable section to finish.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  4 ++++
>  arch/powerpc/mm/pgtable_64.c                 | 35
> +++++++++++++++++++++++++++-
>  include/asm-generic/pgtable.h                |  8 +++++++
>  mm/huge_memory.c                             |  1 +
>  4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 8d1c41d28318..ac07a30a7934 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct
> mm_struct *mm, pmd_t *pmdp);
>  extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long
> address,
>  			    pmd_t *pmdp);
>  
> +#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> +extern void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> +				    unsigned long address, pmd_t *pmdp);
> +
>  #define pmd_move_must_withdraw pmd_move_must_withdraw
>  struct spinlock;
>  static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 3124a20d0fab..c8a00da39969 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -646,6 +646,30 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct
> *mm, pmd_t *pmdp)
>  	return pgtable;
>  }
>  
> +void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> +			     unsigned long address, pmd_t *pmdp)
> +{
> +	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> +
> +#ifdef CONFIG_DEBUG_VM
> +	BUG_ON(REGION_ID(address) != USER_REGION_ID);
> +#endif
> +	/*
> +	 * We can't mark the pmd none here, because that will cause a race
> +	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
> +	 * we spilt, but at the same time we wan't rest of the ppc64 code
> +	 * not to insert hash pte on this, because we will be modifying
> +	 * the deposited pgtable in the caller of this function. Hence
> +	 * clear the _PAGE_USER so that we move the fault handling to
> +	 * higher level function and that will serialize against ptl.
> +	 * We need to flush existing hash pte entries here even though,
> +	 * the translation is still valid, because we will withdraw
> +	 * pgtable_t after this.
> +	 */
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_USER, 0);

Can this break any checks for _PAGE_USER? From other paths?

> +}
> +
> +
>  /*
>   * set a new huge pmd. We should not be called for updating
>   * an existing pmd entry. That should go via pmd_hugepage_update.
> @@ -663,10 +687,19 @@ void set_pmd_at(struct mm_struct *mm, unsigned long
> addr,
>  	return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
>  }
>  
> +/*
> + * We use this to invalidate a pmdp entry before switching from a
> + * hugepte to regular pmd entry.
> + */
>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
> +	/*
> +	 * This ensures that generic code that rely on IRQ disabling
> +	 * to prevent a parallel THP split work as expected.
> +	 */
> +	kick_all_cpus_sync();

Seems expensive, anyway I think the right should do something like or a wrapper
for it

on_each_cpu_mask(mm_cpumask(vma->vm_mm), do_nothing, NULL, 1);

do_nothing is not exported, but that can be fixed :)

Balbir Singh

WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <bsingharora@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
	akpm@linux-foundation.org,
	Mel Gorman <mgorman@techsingularity.net>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update
Date: Mon, 15 Feb 2016 13:44:38 +1100	[thread overview]
Message-ID: <1455504278.16012.18.camel@gmail.com> (raw)
In-Reply-To: <1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, 2016-02-09 at 06:50 +0530, Aneesh Kumar K.V wrote:
>A 
> Also make sure we wait for irq disable section in other cpus to finish
> before flipping a huge pte entry with a regular pmd entry. Code paths
> like find_linux_pte_or_hugepte depend on irq disable to get
> a stable pte_t pointer. A parallel thp split need to make sure we
> don't convert a pmd pte to a regular pmd entry without waiting for the
> irq disable section to finish.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> A arch/powerpc/include/asm/book3s/64/pgtable.h |A A 4 ++++
> A arch/powerpc/mm/pgtable_64.cA A A A A A A A A A A A A A A A A | 35
> +++++++++++++++++++++++++++-
> A include/asm-generic/pgtable.hA A A A A A A A A A A A A A A A |A A 8 +++++++
> A mm/huge_memory.cA A A A A A A A A A A A A A A A A A A A A A A A A A A A A |A A 1 +
> A 4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 8d1c41d28318..ac07a30a7934 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct
> mm_struct *mm, pmd_t *pmdp);
> A extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long
> address,
> A 			A A A A pmd_t *pmdp);
> A 
> +#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> +extern void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> +				A A A A unsigned long address, pmd_t *pmdp);
> +
> A #define pmd_move_must_withdraw pmd_move_must_withdraw
> A struct spinlock;
> A static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 3124a20d0fab..c8a00da39969 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -646,6 +646,30 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct
> *mm, pmd_t *pmdp)
> A 	return pgtable;
> A }
> A 
> +void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> +			A A A A A unsigned long address, pmd_t *pmdp)
> +{
> +	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> +
> +#ifdef CONFIG_DEBUG_VM
> +	BUG_ON(REGION_ID(address) != USER_REGION_ID);
> +#endif
> +	/*
> +	A * We can't mark the pmd none here, because that will cause a race
> +	A * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
> +	A * we spilt, but at the same time we wan't rest of the ppc64 code
> +	A * not to insert hash pte on this, because we will be modifying
> +	A * the deposited pgtable in the caller of this function. Hence
> +	A * clear the _PAGE_USER so that we move the fault handling to
> +	A * higher level function and that will serialize against ptl.
> +	A * We need to flush existing hash pte entries here even though,
> +	A * the translation is still valid, because we will withdraw
> +	A * pgtable_t after this.
> +	A */
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_USER, 0);

Can this break any checks for _PAGE_USER? From other paths?

> +}
> +
> +
> A /*
> A  * set a new huge pmd. We should not be called for updating
> A  * an existing pmd entry. That should go via pmd_hugepage_update.
> @@ -663,10 +687,19 @@ void set_pmd_at(struct mm_struct *mm, unsigned long
> addr,
> A 	return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
> A }
> A 
> +/*
> + * We use this to invalidate a pmdp entry before switching from a
> + * hugepte to regular pmd entry.
> + */
> A void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> A 		A A A A A pmd_t *pmdp)
> A {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
> +	/*
> +	A * This ensures that generic code that rely on IRQ disabling
> +	A * to prevent a parallel THP split work as expected.
> +	A */
> +	kick_all_cpus_sync();

Seems expensive, anyway I think the right should do something like or a wrapper
for it

on_each_cpu_mask(mm_cpumask(vma->vm_mm), do_nothing, NULL, 1);

do_nothing is not exported, but that can be fixed :)

Balbir Singh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-15  2:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09  1:20 [PATCH V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update Aneesh Kumar K.V
2016-02-09  1:20 ` Aneesh Kumar K.V
2016-02-09 12:16 ` [V3] " Michael Ellerman
2016-02-09 12:16   ` Michael Ellerman
2016-02-14  5:32   ` Aneesh Kumar K.V
2016-02-14  5:32     ` Aneesh Kumar K.V
2016-02-15  2:44 ` Balbir Singh [this message]
2016-02-15  2:44   ` [PATCH V3] " Balbir Singh
2016-02-15  4:37   ` Aneesh Kumar K.V
2016-02-15  4:37     ` Aneesh Kumar K.V
2016-02-15  4:37     ` Aneesh Kumar K.V
2016-02-15  5:09     ` Balbir Singh
2016-02-15  5:09       ` Balbir Singh
2016-02-15 11:01       ` Aneesh Kumar K.V
2016-02-15 11:01         ` Aneesh Kumar K.V
2016-02-15 11:01         ` Aneesh Kumar K.V
2016-02-16  5:20         ` Balbir Singh
2016-02-16  5:20           ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1455504278.16012.18.camel@gmail.com \
    --to=bsingharora@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.