All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Yang Shi <shy828301@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Steven Price <steven.price@arm.com>,
	SeongJae Park <sj@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>,
	Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 01/31] mm: use pmdp_get_lockless() without surplus barrier()
Date: Wed, 24 May 2023 18:29:20 -0400	[thread overview]
Message-ID: <ZG6PwAvIO4Z7lpkq@x1n> (raw)
In-Reply-To: <34467cca-58b6-3e64-1ee7-e3dc43257a@google.com>

On Sun, May 21, 2023 at 09:49:45PM -0700, Hugh Dickins wrote:
> Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more
> reliable result with PAE (or READ_ONCE as before without PAE); and remove
> the unnecessary extra barrier()s which got left behind in its callers.

Pure question: does it mean that some of below path (missing barrier()
ones) could have problem when CONFIG_PAE, hence this can be seen as a
(potential) bug fix?

Thanks,

> 
> HOWEVER: Note the small print in linux/pgtable.h, where it was designed
> specifically for fast GUP, and depends on interrupts being disabled for
> its full guarantee: most callers which have been added (here and before)
> do NOT have interrupts disabled, so there is still some need for caution.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  fs/userfaultfd.c        | 10 +---------
>  include/linux/pgtable.h | 17 -----------------
>  mm/gup.c                |  6 +-----
>  mm/hmm.c                |  2 +-
>  mm/khugepaged.c         |  5 -----
>  mm/ksm.c                |  3 +--
>  mm/memory.c             | 14 ++------------
>  mm/mprotect.c           |  5 -----
>  mm/page_vma_mapped.c    |  2 +-
>  9 files changed, 7 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 0fd96d6e39ce..f7a0817b1ec0 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -349,15 +349,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
>  	if (!pud_present(*pud))
>  		goto out;
>  	pmd = pmd_offset(pud, address);
> -	/*
> -	 * READ_ONCE must function as a barrier with narrower scope
> -	 * and it must be equivalent to:
> -	 *	_pmd = *pmd; barrier();
> -	 *
> -	 * This is to deal with the instability (as in
> -	 * pmd_trans_unstable) of the pmd.
> -	 */
> -	_pmd = READ_ONCE(*pmd);
> +	_pmd = pmdp_get_lockless(pmd);
>  	if (pmd_none(_pmd))
>  		goto out;
>  
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..8ec27fe69dc8 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1344,23 +1344,6 @@ static inline int pud_trans_unstable(pud_t *pud)
>  static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
>  {
>  	pmd_t pmdval = pmdp_get_lockless(pmd);
> -	/*
> -	 * The barrier will stabilize the pmdval in a register or on
> -	 * the stack so that it will stop changing under the code.
> -	 *
> -	 * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
> -	 * pmdp_get_lockless is allowed to return a not atomic pmdval
> -	 * (for example pointing to an hugepage that has never been
> -	 * mapped in the pmd). The below checks will only care about
> -	 * the low part of the pmd with 32bit PAE x86 anyway, with the
> -	 * exception of pmd_none(). So the important thing is that if
> -	 * the low part of the pmd is found null, the high part will
> -	 * be also null or the pmd_none() check below would be
> -	 * confused.
> -	 */
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	barrier();
> -#endif
>  	/*
>  	 * !pmd_present() checks for pmd migration entries
>  	 *
> diff --git a/mm/gup.c b/mm/gup.c
> index bbe416236593..3bd5d3854c51 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -653,11 +653,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
>  	struct mm_struct *mm = vma->vm_mm;
>  
>  	pmd = pmd_offset(pudp, address);
> -	/*
> -	 * The READ_ONCE() will stabilize the pmdval in a register or
> -	 * on the stack so that it will stop changing under the code.
> -	 */
> -	pmdval = READ_ONCE(*pmd);
> +	pmdval = pmdp_get_lockless(pmd);
>  	if (pmd_none(pmdval))
>  		return no_page_table(vma, flags);
>  	if (!pmd_present(pmdval))
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 6a151c09de5e..e23043345615 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -332,7 +332,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  	pmd_t pmd;
>  
>  again:
> -	pmd = READ_ONCE(*pmdp);
> +	pmd = pmdp_get_lockless(pmdp);
>  	if (pmd_none(pmd))
>  		return hmm_vma_walk_hole(start, end, -1, walk);
>  
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6b9d39d65b73..732f9ac393fc 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -961,11 +961,6 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm,
>  		return SCAN_PMD_NULL;
>  
>  	pmde = pmdp_get_lockless(*pmd);
> -
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	/* See comments in pmd_none_or_trans_huge_or_clear_bad() */
> -	barrier();
> -#endif
>  	if (pmd_none(pmde))
>  		return SCAN_PMD_NONE;
>  	if (!pmd_present(pmde))
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 0156bded3a66..df2aa281d49d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1194,8 +1194,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
>  	 * without holding anon_vma lock for write.  So when looking for a
>  	 * genuine pmde (in which to find pte), test present and !THP together.
>  	 */
> -	pmde = *pmd;
> -	barrier();
> +	pmde = pmdp_get_lockless(pmd);
>  	if (!pmd_present(pmde) || pmd_trans_huge(pmde))
>  		goto out;
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index f69fbc251198..2eb54c0d5d3c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4925,18 +4925,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  		 * So now it's safe to run pte_offset_map().
>  		 */
>  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> -		vmf->orig_pte = *vmf->pte;
> +		vmf->orig_pte = ptep_get_lockless(vmf->pte);
>  		vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;
>  
> -		/*
> -		 * some architectures can have larger ptes than wordsize,
> -		 * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and
> -		 * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic
> -		 * accesses.  The code below just needs a consistent view
> -		 * for the ifs and we later double check anyway with the
> -		 * ptl lock held. So here a barrier will do.
> -		 */
> -		barrier();
>  		if (pte_none(vmf->orig_pte)) {
>  			pte_unmap(vmf->pte);
>  			vmf->pte = NULL;
> @@ -5060,9 +5051,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  		if (!(ret & VM_FAULT_FALLBACK))
>  			return ret;
>  	} else {
> -		vmf.orig_pmd = *vmf.pmd;
> +		vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
>  
> -		barrier();
>  		if (unlikely(is_swap_pmd(vmf.orig_pmd))) {
>  			VM_BUG_ON(thp_migration_supported() &&
>  					  !is_pmd_migration_entry(vmf.orig_pmd));
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 92d3d3ca390a..c5a13c0f1017 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -309,11 +309,6 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
>  {
>  	pmd_t pmdval = pmdp_get_lockless(pmd);
>  
> -	/* See pmd_none_or_trans_huge_or_clear_bad for info on barrier */
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	barrier();
> -#endif
> -
>  	if (pmd_none(pmdval))
>  		return 1;
>  	if (pmd_trans_huge(pmdval))
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 4e448cfbc6ef..64aff6718bdb 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -210,7 +210,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>  		 * compiler and used as a stale value after we've observed a
>  		 * subsequent update.
>  		 */
> -		pmde = READ_ONCE(*pvmw->pmd);
> +		pmde = pmdp_get_lockless(pvmw->pmd);
>  
>  		if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) ||
>  		    (pmd_present(pmde) && pmd_devmap(pmde))) {
> -- 
> 2.35.3
> 

-- 
Peter Xu



  reply	other threads:[~2023-05-24 22:29 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22  4:46 [PATCH 00/31] mm: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-22  4:49 ` [PATCH 01/31] mm: use pmdp_get_lockless() without surplus barrier() Hugh Dickins
2023-05-24 22:29   ` Peter Xu [this message]
2023-05-25 22:35     ` Hugh Dickins
2023-05-26 16:48       ` Peter Xu
2023-06-02  2:31         ` Hugh Dickins
2023-05-24 22:54   ` Yu Zhao
2023-05-22  4:51 ` [PATCH 02/31] mm/migrate: remove cruft from migration_entry_wait()s Hugh Dickins
2023-05-23  1:45   ` Alistair Popple
2023-05-24  1:57     ` Hugh Dickins
2023-05-22  4:52 ` [PATCH 03/31] mm/pgtable: kmap_local_page() instead of kmap_atomic() Hugh Dickins
2023-05-26 22:22   ` Peter Xu
2023-05-26 22:42     ` Peter Xu
2023-05-22  4:53 ` [PATCH 04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-22 11:17   ` Qi Zheng
2023-05-24  2:22     ` Hugh Dickins
2023-05-24  3:11       ` Qi Zheng
2023-07-05 14:48   ` Aneesh Kumar K.V
2023-07-05 22:26     ` Hugh Dickins
2023-05-22  4:54 ` [PATCH 05/31] mm/filemap: allow pte_offset_map_lock() " Hugh Dickins
2023-05-22 11:23   ` Qi Zheng
2023-05-24  2:35     ` Hugh Dickins
2023-05-24  3:14       ` Qi Zheng
2023-05-22  4:55 ` [PATCH 06/31] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() Hugh Dickins
2023-05-22  4:57 ` [PATCH 07/31] mm/page_vma_mapped: reformat map_pte() with less indentation Hugh Dickins
2023-05-22  4:58 ` [PATCH 08/31] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() Hugh Dickins
2023-05-22 11:41   ` Qi Zheng
2023-05-24  2:44     ` Hugh Dickins
2023-05-22  5:00 ` [PATCH 09/31] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails Hugh Dickins
2023-05-23 18:07   ` SeongJae Park
2023-05-22  5:01 ` [PATCH 10/31] mm/pagewalk: walk_pte_range() allow for pte_offset_map() Hugh Dickins
2023-05-22  5:03 ` [PATCH 11/31] mm/vmwgfx: simplify pmd & pud mapping dirty helpers Hugh Dickins
2023-05-22  5:04 ` [PATCH 12/31] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() Hugh Dickins
2023-05-22  7:27   ` Lorenzo Stoakes
2023-05-22  5:05 ` [PATCH 13/31] mm/hmm: retry if pte_offset_map() fails Hugh Dickins
2023-05-22 12:11   ` Qi Zheng
2023-05-23  2:39     ` Alistair Popple
2023-05-23  6:06       ` Qi Zheng
2023-05-24  2:50         ` Hugh Dickins
2023-05-24  5:16           ` Alistair Popple
2023-05-22  5:06 ` [PATCH 14/31] fs/userfaultfd: " Hugh Dickins
2023-05-24 22:31   ` Peter Xu
2023-05-22  5:07 ` [PATCH 15/31] mm/userfaultfd: allow pte_offset_map_lock() to fail Hugh Dickins
2023-05-24 22:44   ` Peter Xu
2023-05-25 22:06     ` Hugh Dickins
2023-05-26 16:25       ` Peter Xu
2023-05-22  5:08 ` [PATCH 16/31] mm/debug_vm_pgtable,page_table_check: warn pte map fails Hugh Dickins
2023-05-22  5:10 ` [PATCH 17/31] mm/various: give up if pte_offset_map[_lock]() fails Hugh Dickins
2023-05-22 12:24   ` Qi Zheng
2023-05-22 12:37     ` Qi Zheng
2023-05-24  3:20       ` Hugh Dickins
2023-05-22  5:12 ` [PATCH 18/31] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() Hugh Dickins
2023-05-22  5:13 ` [PATCH 19/31] mm/mremap: retry if either pte_offset_map_*lock() fails Hugh Dickins
2023-05-22  5:15 ` [PATCH 20/31] mm/madvise: clean up pte_offset_map_lock() scans Hugh Dickins
2023-05-22  5:17 ` [PATCH 21/31] mm/madvise: clean up force_shm_swapin_readahead() Hugh Dickins
2023-05-22  5:18 ` [PATCH 22/31] mm/swapoff: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-22  5:19 ` [PATCH 23/31] mm/mglru: allow pte_offset_map_nolock() " Hugh Dickins
2023-05-22  5:26   ` Yu Zhao
2023-05-22  5:20 ` [PATCH 24/31] mm/migrate_device: allow pte_offset_map_lock() " Hugh Dickins
2023-05-23  2:23   ` Alistair Popple
2023-05-24  3:45     ` Hugh Dickins
2023-05-24  5:11       ` Alistair Popple
2023-05-22  5:22 ` [PATCH 25/31] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() Hugh Dickins
2023-05-23  2:26   ` Yang Shi
2023-05-23  2:44     ` Yang Shi
2023-05-24  4:26       ` Hugh Dickins
2023-05-24 22:45         ` Yang Shi
2023-05-25 21:16           ` Hugh Dickins
2023-05-25 22:33             ` Yang Shi
2023-05-22  5:23 ` [PATCH 26/31] mm/huge_memory: split huge pmd under one pte_offset_map() Hugh Dickins
2023-05-22 23:35   ` Yang Shi
2023-05-22  5:24 ` [PATCH 27/31] mm/khugepaged: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-05-22 23:54   ` Yang Shi
2023-05-24  4:44     ` Hugh Dickins
2023-05-24 21:59       ` Yang Shi
2023-05-22  5:25 ` [PATCH 28/31] mm/memory: " Hugh Dickins
2023-05-22  5:26 ` [PATCH 29/31] mm/memory: handle_pte_fault() use pte_offset_map_nolock() Hugh Dickins
2023-05-22 12:52   ` Qi Zheng
2023-05-24  4:54     ` Hugh Dickins
2023-05-22  5:27 ` [PATCH 30/31] mm/pgtable: delete pmd_trans_unstable() and friends Hugh Dickins
2023-05-22  5:29 ` [PATCH 31/31] perf/core: Allow pte_offset_map() to fail Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZG6PwAvIO4Z7lpkq@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=song@kernel.org \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zackr@vmware.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.