linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org,  akpm@linux-foundation.org,
	mpe@ellerman.id.au,  linuxppc-dev@lists.ozlabs.org,
	kaleshsingh@google.com, npiggin@gmail.com,
	 joel@joelfernandes.org,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	 "Kirill A . Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout
Date: Tue, 8 Jun 2021 13:39:36 -0700 (PDT)	[thread overview]
Message-ID: <295f37a-655f-81fb-7935-be652b8c655@google.com> (raw)
In-Reply-To: <87o8cgokso.fsf@linux.ibm.com>

On Tue, 8 Jun 2021, Aneesh Kumar K.V wrote:
> 
>     mm/mremap: hold the rmap lock in write mode when moving page table entries.
>     
>     To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
>     The lock was taken to ensure that rmap walk don't miss a page table entry due to
>     PTE moves via move_pagetables(). The kernel does further optimization of
>     this lock such that if we are going to find the newly added vma after the
>     old vma, the rmap lock is not taken. This is because rmap walk would find the
>     vmas in the same order and if we don't find the page table attached to
>     older vma we would find it with the new vma which we would iterate later.
>     The actual lifetime of the page is still controlled by the PTE lock.
>     
>     This patch updates the locking requirement to handle another race condition
>     explained below with optimized mremap::
>     
>     Optmized PMD move
>     
>         CPU 1                           CPU 2                                   CPU 3
>     
>         mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one
>     
>         mmap_write_lock_killable()
>     
>                                         addr = old_addr
>                                         lock(pte_ptl)
>         lock(pmd_ptl)
>         pmd = *old_pmd
>         pmd_clear(old_pmd)
>         flush_tlb_range(old_addr)
>     
>         *new_pmd = pmd
>                                                                                 *new_addr = 10; and fills
>                                                                                 TLB with new addr
>                                                                                 and old pfn
>     
>         unlock(pmd_ptl)
>                                         ptep_clear_flush()
>                                         old pfn is free.
>                                                                                 Stale TLB entry
>     

The PUD example below is mainly a waste a space and time:
"Optimized PUD move suffers from a similar race." would be better.

>     Optmized PUD move:
>     
>         CPU 1                           CPU 2                                   CPU 3
>     
>         mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one
>     
>         mmap_write_lock_killable()
>     
>                                         addr = old_addr
>                                         lock(pte_ptl)
>         lock(pud_ptl)
>         pud = *old_pud
>         pud_clear(old_pud)
>         flush_tlb_range(old_addr)
>     
>         *new_pud = pud
>                                                                                 *new_addr = 10; and fills
>                                                                                 TLB with new addr
>                                                                                 and old pfn
>     
>         unlock(pud_ptl)
>                                         ptep_clear_flush()
>                                         old pfn is free.
>                                                                                 Stale TLB entry
>     
>     Both the above race condition can be fixed if we force mremap path to take rmap lock.
>     

Don't forget the Fixes and Link you had in the previous version:
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com

>     Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Thanks, this is orders of magnitude better!
Acked-by: Hugh Dickins <hughd@google.com>

> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 9cd352fb9cf8..f12df630fb37 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -517,7 +517,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
>  
>  			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
> -					   old_pud, new_pud, need_rmap_locks))
> +					   old_pud, new_pud, true))
>  				continue;
>  		}
>  
> @@ -544,7 +544,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  			 * moving at the PMD level if possible.
>  			 */
>  			if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
> -					   old_pmd, new_pmd, need_rmap_locks))
> +					   old_pmd, new_pmd, true))
>  				continue;
>  		}
>  
> 


  parent reply	other threads:[~2021-06-08 20:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07  5:51 [PATCH v7 00/11] Speedup mremap on ppc64 Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout Aneesh Kumar K.V
2021-06-08  0:06   ` Hugh Dickins
2021-06-08  7:52     ` Aneesh Kumar K.V
2021-06-08  9:42       ` Kirill A. Shutemov
2021-06-08 11:17         ` Aneesh Kumar K.V
2021-06-08 12:05           ` Kirill A. Shutemov
2021-06-08 20:39       ` Hugh Dickins [this message]
2021-06-07  5:51 ` [PATCH v7 02/11] mm/mremap: Fix race between MOVE_PUD " Aneesh Kumar K.V
2021-06-14 14:55   ` [mm/mremap] ecf8443e51: vm-scalability.throughput -29.4% regression kernel test robot
2021-06-14 14:58     ` Linus Torvalds
2021-06-14 16:08     ` Aneesh Kumar K.V
2021-06-17  2:38       ` [LKP] " Liu, Yujie
2021-06-07  5:51 ` [PATCH v7 03/11] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 04/11] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 05/11] mm/mremap: Convert huge PUD move to separate helper Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 06/11] mm/mremap: Don't enable optimized PUD move if page table levels is 2 Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 07/11] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 08/11] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 09/11] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 10/11] powerpc/book3s64/mm: Update flush_tlb_range to flush page walk cache Aneesh Kumar K.V
2021-06-07  5:51 ` [PATCH v7 11/11] powerpc/mm: Enable HAVE_MOVE_PMD support Aneesh Kumar K.V
2021-06-07 10:10 ` [PATCH v7 00/11] Speedup mremap on ppc64 Nick Piggin
2021-06-08  4:39   ` Aneesh Kumar K.V
2021-06-08  5:03     ` Nicholas Piggin
2021-06-08 17:10   ` Linus Torvalds
2021-06-16  1:44     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=295f37a-655f-81fb-7935-be652b8c655@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=joel@joelfernandes.org \
    --cc=kaleshsingh@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).