[PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries

linux-m68k.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries
@ 2024-09-17  7:31 Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users

This series converts all generic page table entries direct derefences via
pxdp_get() based helpers extending the changes brought in via the commit
c33c794828f2 ("mm: ptep_get() conversion"). First it does some platform
specific changes for m68k and x86 architecture.

This series has been build tested on multiple architecture such as x86,
arm64, powerpc, powerpc64le, riscv, and m68k etc.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: x86@kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Cc: kasan-dev@googlegroups.com

Changes in V2:

- Separated out PUD changes from P4D changes
- Updated the commit message for x86 patch per Dave
- Implemented local variable page table value caching when applicable
- Updated all commit messages regarding local variable caching

Changes in V1:

https://lore.kernel.org/all/20240913084433.1016256-1-anshuman.khandual@arm.com/

Anshuman Khandual (7):
  m68k/mm: Change pmd_val()
  x86/mm: Drop page table entry address output from pxd_ERROR()
  mm: Use ptep_get() for accessing PTE entries
  mm: Use pmdp_get() for accessing PMD entries
  mm: Use pudp_get() for accessing PUD entries
  mm: Use p4dp_get() for accessing P4D entries
  mm: Use pgdp_get() for accessing PGD entries

 arch/m68k/include/asm/page.h          |  2 +-
 arch/x86/include/asm/pgtable-3level.h | 12 ++--
 arch/x86/include/asm/pgtable_64.h     | 20 +++---
 drivers/misc/sgi-gru/grufault.c       | 13 ++--
 fs/proc/task_mmu.c                    | 28 +++++----
 fs/userfaultfd.c                      |  6 +-
 include/linux/huge_mm.h               |  6 +-
 include/linux/mm.h                    |  6 +-
 include/linux/pgtable.h               | 49 +++++++++------
 kernel/events/core.c                  |  6 +-
 mm/gup.c                              | 43 ++++++-------
 mm/hmm.c                              |  2 +-
 mm/huge_memory.c                      | 90 +++++++++++++++------------
 mm/hugetlb.c                          | 10 +--
 mm/hugetlb_vmemmap.c                  |  4 +-
 mm/kasan/init.c                       | 38 +++++------
 mm/kasan/shadow.c                     | 12 ++--
 mm/khugepaged.c                       |  4 +-
 mm/madvise.c                          |  6 +-
 mm/mapping_dirty_helpers.c            |  2 +-
 mm/memory-failure.c                   | 14 ++---
 mm/memory.c                           | 71 +++++++++++----------
 mm/mempolicy.c                        |  4 +-
 mm/migrate.c                          |  4 +-
 mm/migrate_device.c                   | 10 +--
 mm/mlock.c                            |  6 +-
 mm/mprotect.c                         |  2 +-
 mm/mremap.c                           |  4 +-
 mm/page_table_check.c                 |  4 +-
 mm/page_vma_mapped.c                  |  6 +-
 mm/pagewalk.c                         | 10 +--
 mm/percpu.c                           |  8 +--
 mm/pgalloc-track.h                    |  6 +-
 mm/pgtable-generic.c                  | 30 ++++-----
 mm/ptdump.c                           |  8 +--
 mm/rmap.c                             | 10 +--
 mm/sparse-vmemmap.c                   | 10 +--
 mm/vmalloc.c                          | 58 +++++++++--------
 mm/vmscan.c                           |  6 +-
 39 files changed, 333 insertions(+), 297 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V2 1/7] m68k/mm: Change pmd_val()
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17  8:40   ` Ryan Roberts
  2024-09-17 10:20   ` David Hildenbrand
  2024-09-17  7:31 ` [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR() Anshuman Khandual
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Geert Uytterhoeven, Guo Ren

This changes platform's pmd_val() to access the pmd_t element directly like
other architectures rather than current pointer address based dereferencing
that prevents transition into pmdp_get().

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/m68k/include/asm/page.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/m68k/include/asm/page.h b/arch/m68k/include/asm/page.h
index 8cfb84b49975..be3f2c2a656c 100644
--- a/arch/m68k/include/asm/page.h
+++ b/arch/m68k/include/asm/page.h
@@ -19,7 +19,7 @@
  */
 #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
 typedef struct { unsigned long pmd; } pmd_t;
-#define pmd_val(x)	((&x)->pmd)
+#define pmd_val(x)	((x).pmd)
 #define __pmd(x)	((pmd_t) { (x) } )
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/7] m68k/mm: Change pmd_val()
  2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
@ 2024-09-17  8:40   ` Ryan Roberts
  2024-09-17 10:20   ` David Hildenbrand
  1 sibling, 0 replies; 37+ messages in thread
From: Ryan Roberts @ 2024-09-17  8:40 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Rapoport (IBM),
	Arnd Bergmann, x86, linux-m68k, linux-fsdevel, kasan-dev,
	linux-kernel, linux-perf-users, Geert Uytterhoeven, Guo Ren

On 17/09/2024 08:31, Anshuman Khandual wrote:
> This changes platform's pmd_val() to access the pmd_t element directly like
> other architectures rather than current pointer address based dereferencing
> that prevents transition into pmdp_get().
> 
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

I know very little about m68k, but for what it's worth:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  arch/m68k/include/asm/page.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/m68k/include/asm/page.h b/arch/m68k/include/asm/page.h
> index 8cfb84b49975..be3f2c2a656c 100644
> --- a/arch/m68k/include/asm/page.h
> +++ b/arch/m68k/include/asm/page.h
> @@ -19,7 +19,7 @@
>   */
>  #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
>  typedef struct { unsigned long pmd; } pmd_t;
> -#define pmd_val(x)	((&x)->pmd)
> +#define pmd_val(x)	((x).pmd)
>  #define __pmd(x)	((pmd_t) { (x) } )
>  #endif
>  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/7] m68k/mm: Change pmd_val()
  2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
  2024-09-17  8:40   ` Ryan Roberts
@ 2024-09-17 10:20   ` David Hildenbrand
  2024-09-17 10:27     ` Ryan Roberts
  1 sibling, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2024-09-17 10:20 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Geert Uytterhoeven, Guo Ren, Peter Zijlstra

On 17.09.24 09:31, Anshuman Khandual wrote:
> This changes platform's pmd_val() to access the pmd_t element directly like
> other architectures rather than current pointer address based dereferencing
> that prevents transition into pmdp_get().
> 
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   arch/m68k/include/asm/page.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/m68k/include/asm/page.h b/arch/m68k/include/asm/page.h
> index 8cfb84b49975..be3f2c2a656c 100644
> --- a/arch/m68k/include/asm/page.h
> +++ b/arch/m68k/include/asm/page.h
> @@ -19,7 +19,7 @@
>    */
>   #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
>   typedef struct { unsigned long pmd; } pmd_t;
> -#define pmd_val(x)	((&x)->pmd)
> +#define pmd_val(x)	((x).pmd)
>   #define __pmd(x)	((pmd_t) { (x) } )
>   #endif
>   

Trying to understand what's happening here, I stumbled over

commit ef22d8abd876e805b604e8f655127de2beee2869
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jan 31 13:45:36 2020 +0100

     m68k: mm: Restructure Motorola MMU page-table layout
     
     The Motorola 68xxx MMUs, 040 (and later) have a fixed 7,7,{5,6}
     page-table setup, where the last depends on the page-size selected (8k
     vs 4k resp.), and head.S selects 4K pages. For 030 (and earlier) we
     explicitly program 7,7,6 and 4K pages in %tc.
     
     However, the current code implements this mightily weird. What it does
     is group 16 of those (6 bit) pte tables into one 4k page to not waste
     space. The down-side is that that forces pmd_t to be a 16-tuple
     pointing to consecutive pte tables.
     
     This breaks the generic code which assumes READ_ONCE(*pmd) will be
     word sized.

Where we did

  #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
-typedef struct { unsigned long pmd[16]; } pmd_t;
-#define pmd_val(x)     ((&x)->pmd[0])
-#define __pmd(x)       ((pmd_t) { { (x) }, })
+typedef struct { unsigned long pmd; } pmd_t;
+#define pmd_val(x)     ((&x)->pmd)
+#define __pmd(x)       ((pmd_t) { (x) } )
  #endif

So I assume this should be fine

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/7] m68k/mm: Change pmd_val()
  2024-09-17 10:20   ` David Hildenbrand
@ 2024-09-17 10:27     ` Ryan Roberts
  2024-09-17 10:30       ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Ryan Roberts @ 2024-09-17 10:27 UTC (permalink / raw)
  To: David Hildenbrand, Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Mike Rapoport (IBM), Arnd Bergmann, x86,
	linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Geert Uytterhoeven, Guo Ren, Peter Zijlstra

On 17/09/2024 11:20, David Hildenbrand wrote:
> On 17.09.24 09:31, Anshuman Khandual wrote:
>> This changes platform's pmd_val() to access the pmd_t element directly like
>> other architectures rather than current pointer address based dereferencing
>> that prevents transition into pmdp_get().
>>
>> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: Guo Ren <guoren@kernel.org>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: linux-m68k@lists.linux-m68k.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   arch/m68k/include/asm/page.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/m68k/include/asm/page.h b/arch/m68k/include/asm/page.h
>> index 8cfb84b49975..be3f2c2a656c 100644
>> --- a/arch/m68k/include/asm/page.h
>> +++ b/arch/m68k/include/asm/page.h
>> @@ -19,7 +19,7 @@
>>    */
>>   #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
>>   typedef struct { unsigned long pmd; } pmd_t;
>> -#define pmd_val(x)    ((&x)->pmd)
>> +#define pmd_val(x)    ((x).pmd)
>>   #define __pmd(x)    ((pmd_t) { (x) } )
>>   #endif
>>   
> 
> Trying to understand what's happening here, I stumbled over
> 
> commit ef22d8abd876e805b604e8f655127de2beee2869
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Jan 31 13:45:36 2020 +0100
> 
>     m68k: mm: Restructure Motorola MMU page-table layout
>         The Motorola 68xxx MMUs, 040 (and later) have a fixed 7,7,{5,6}
>     page-table setup, where the last depends on the page-size selected (8k
>     vs 4k resp.), and head.S selects 4K pages. For 030 (and earlier) we
>     explicitly program 7,7,6 and 4K pages in %tc.
>         However, the current code implements this mightily weird. What it does
>     is group 16 of those (6 bit) pte tables into one 4k page to not waste
>     space. The down-side is that that forces pmd_t to be a 16-tuple
>     pointing to consecutive pte tables.
>         This breaks the generic code which assumes READ_ONCE(*pmd) will be
>     word sized.
> 
> Where we did
> 
>  #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
> -typedef struct { unsigned long pmd[16]; } pmd_t;
> -#define pmd_val(x)     ((&x)->pmd[0])
> -#define __pmd(x)       ((pmd_t) { { (x) }, })
> +typedef struct { unsigned long pmd; } pmd_t;
> +#define pmd_val(x)     ((&x)->pmd)
> +#define __pmd(x)       ((pmd_t) { (x) } )
>  #endif
> 
> So I assume this should be fine

I think you're implying that taking the address then using arrow operator was
needed when pmd was an array? I don't really understand that if so? Surely:

  ((x).pmd[0])

would have worked too? I traced back further, and a version of that macro exists
with the "address of" and arrow operator since the beginning of (git) time.

> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/7] m68k/mm: Change pmd_val()
  2024-09-17 10:27     ` Ryan Roberts
@ 2024-09-17 10:30       ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2024-09-17 10:30 UTC (permalink / raw)
  To: Ryan Roberts, Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Mike Rapoport (IBM), Arnd Bergmann, x86,
	linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Geert Uytterhoeven, Guo Ren, Peter Zijlstra


>>   #if !defined(CONFIG_MMU) || CONFIG_PGTABLE_LEVELS == 3
>> -typedef struct { unsigned long pmd[16]; } pmd_t;
>> -#define pmd_val(x)     ((&x)->pmd[0])
>> -#define __pmd(x)       ((pmd_t) { { (x) }, })
>> +typedef struct { unsigned long pmd; } pmd_t;
>> +#define pmd_val(x)     ((&x)->pmd)
>> +#define __pmd(x)       ((pmd_t) { (x) } )
>>   #endif
>>
>> So I assume this should be fine
> 
> I think you're implying that taking the address then using arrow operator was
> needed when pmd was an array? I don't really understand that if so? Surely:
> 
>    ((x).pmd[0])
> 
> would have worked too?

I think your right, I guess one suspects that there is more magic to it 
than there actually is ... :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR()
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17 10:22   ` David Hildenbrand
  2024-09-17  7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen

This drops page table entry address output from all pxd_ERROR() definitions
which now matches with other architectures. This also prevents build issues
while transitioning into pxdp_get() based page table entry accesses.

The mentioned build error is caused with changed macros pxd_ERROR() ends up
doing &pxdp_get(pxd) which does not make sense and generates "error: lvalue
required as unary '&' operand" warning.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/x86/include/asm/pgtable-3level.h | 12 ++++++------
 arch/x86/include/asm/pgtable_64.h     | 20 ++++++++++----------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index dabafba957ea..e1fa4dd87753 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -10,14 +10,14 @@
  */
 
 #define pte_ERROR(e)							\
-	pr_err("%s:%d: bad pte %p(%08lx%08lx)\n",			\
-	       __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low)
+	pr_err("%s:%d: bad pte (%08lx%08lx)\n",			\
+	       __FILE__, __LINE__, (e).pte_high, (e).pte_low)
 #define pmd_ERROR(e)							\
-	pr_err("%s:%d: bad pmd %p(%016Lx)\n",				\
-	       __FILE__, __LINE__, &(e), pmd_val(e))
+	pr_err("%s:%d: bad pmd (%016Lx)\n",				\
+	       __FILE__, __LINE__, pmd_val(e))
 #define pgd_ERROR(e)							\
-	pr_err("%s:%d: bad pgd %p(%016Lx)\n",				\
-	       __FILE__, __LINE__, &(e), pgd_val(e))
+	pr_err("%s:%d: bad pgd (%016Lx)\n",				\
+	       __FILE__, __LINE__, pgd_val(e))
 
 #define pxx_xchg64(_pxx, _ptr, _val) ({					\
 	_pxx##val_t *_p = (_pxx##val_t *)_ptr;				\
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 3c4407271d08..4e462c825cab 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -32,24 +32,24 @@ extern void paging_init(void);
 static inline void sync_initial_page_table(void) { }
 
 #define pte_ERROR(e)					\
-	pr_err("%s:%d: bad pte %p(%016lx)\n",		\
-	       __FILE__, __LINE__, &(e), pte_val(e))
+	pr_err("%s:%d: bad pte (%016lx)\n",		\
+	       __FILE__, __LINE__, pte_val(e))
 #define pmd_ERROR(e)					\
-	pr_err("%s:%d: bad pmd %p(%016lx)\n",		\
-	       __FILE__, __LINE__, &(e), pmd_val(e))
+	pr_err("%s:%d: bad pmd (%016lx)\n",		\
+	       __FILE__, __LINE__, pmd_val(e))
 #define pud_ERROR(e)					\
-	pr_err("%s:%d: bad pud %p(%016lx)\n",		\
-	       __FILE__, __LINE__, &(e), pud_val(e))
+	pr_err("%s:%d: bad pud (%016lx)\n",		\
+	       __FILE__, __LINE__, pud_val(e))
 
 #if CONFIG_PGTABLE_LEVELS >= 5
 #define p4d_ERROR(e)					\
-	pr_err("%s:%d: bad p4d %p(%016lx)\n",		\
-	       __FILE__, __LINE__, &(e), p4d_val(e))
+	pr_err("%s:%d: bad p4d (%016lx)\n",		\
+	       __FILE__, __LINE__, p4d_val(e))
 #endif
 
 #define pgd_ERROR(e)					\
-	pr_err("%s:%d: bad pgd %p(%016lx)\n",		\
-	       __FILE__, __LINE__, &(e), pgd_val(e))
+	pr_err("%s:%d: bad pgd (%016lx)\n",		\
+	       __FILE__, __LINE__, pgd_val(e))
 
 struct mm_struct;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR()
  2024-09-17  7:31 ` [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR() Anshuman Khandual
@ 2024-09-17 10:22   ` David Hildenbrand
  2024-09-17 11:19     ` Dave Hansen
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2024-09-17 10:22 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On 17.09.24 09:31, Anshuman Khandual wrote:
> This drops page table entry address output from all pxd_ERROR() definitions
> which now matches with other architectures. This also prevents build issues
> while transitioning into pxdp_get() based page table entry accesses.
> 
> The mentioned build error is caused with changed macros pxd_ERROR() ends up
> doing &pxdp_get(pxd) which does not make sense and generates "error: lvalue
> required as unary '&' operand" warning.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---

Not a big fan of all these "bad PTE" thingies ...

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR()
  2024-09-17 10:22   ` David Hildenbrand
@ 2024-09-17 11:19     ` Dave Hansen
  2024-09-17 11:25       ` Anshuman Khandual
  2024-09-17 11:31       ` David Hildenbrand
  0 siblings, 2 replies; 37+ messages in thread
From: Dave Hansen @ 2024-09-17 11:19 UTC (permalink / raw)
  To: David Hildenbrand, Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On 9/17/24 03:22, David Hildenbrand wrote:
> Not a big fan of all these "bad PTE" thingies ...

In general?

Or not a big fan of the fact that every architecture has their own
(mostly) copied-and-pasted set?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR()
  2024-09-17 11:19     ` Dave Hansen
@ 2024-09-17 11:25       ` Anshuman Khandual
  2024-09-17 11:31       ` David Hildenbrand
  1 sibling, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17 11:25 UTC (permalink / raw)
  To: Dave Hansen, David Hildenbrand, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On 9/17/24 16:49, Dave Hansen wrote:
> On 9/17/24 03:22, David Hildenbrand wrote:
>> Not a big fan of all these "bad PTE" thingies ...
> 
> In general?
> 
> Or not a big fan of the fact that every architecture has their own
> (mostly) copied-and-pasted set?

Right, these pxd_ERROR() have similar definitions across platforms,
(often the exact same) something that could be converged into common
generic ones instead.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR()
  2024-09-17 11:19     ` Dave Hansen
  2024-09-17 11:25       ` Anshuman Khandual
@ 2024-09-17 11:31       ` David Hildenbrand
  1 sibling, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2024-09-17 11:31 UTC (permalink / raw)
  To: Dave Hansen, Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On 17.09.24 13:19, Dave Hansen wrote:
> On 9/17/24 03:22, David Hildenbrand wrote:
>> Not a big fan of all these "bad PTE" thingies ...
> 
> In general?

In general, after I learned that pmd_bad() fires on perfectly fine 
pmd_large() entries, which makes things like pmd_none_or_clear_bad() 
absolutely dangerous to use in code where we could have THPs ...

Consequently, we stopped using them in THP code, so what's the whole 
point of having them ...

> 
> Or not a big fan of the fact that every architecture has their own
> (mostly) copied-and-pasted set?

Well, that most certainly as well :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR() Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17  8:44   ` Ryan Roberts
  2024-09-17 10:28   ` David Hildenbrand
  2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users

Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
also provides the platform an opportunity to override when required. This
stores read page table entry value in a local variable which can be used in
multiple instances there after. This helps in avoiding multiple memory load
operations as well possible race conditions.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 include/linux/pgtable.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 2a6a3cccfc36..547eeae8c43f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
  */
 #define set_pte_safe(ptep, pte) \
 ({ \
-	WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
+	pte_t __old = ptep_get(ptep); \
+	WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
 	set_pte(ptep, pte); \
 })
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-17  7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
@ 2024-09-17  8:44   ` Ryan Roberts
  2024-09-17 10:28   ` David Hildenbrand
  1 sibling, 0 replies; 37+ messages in thread
From: Ryan Roberts @ 2024-09-17  8:44 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Rapoport (IBM),
	Arnd Bergmann, x86, linux-m68k, linux-fsdevel, kasan-dev,
	linux-kernel, linux-perf-users

On 17/09/2024 08:31, Anshuman Khandual wrote:
> Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
> also provides the platform an opportunity to override when required. This
> stores read page table entry value in a local variable which can be used in
> multiple instances there after. This helps in avoiding multiple memory load
> operations as well possible race conditions.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/pgtable.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 2a6a3cccfc36..547eeae8c43f 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>   */
>  #define set_pte_safe(ptep, pte) \
>  ({ \
> -	WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
> +	pte_t __old = ptep_get(ptep); \
> +	WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
>  	set_pte(ptep, pte); \
>  })
>  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-17  7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
  2024-09-17  8:44   ` Ryan Roberts
@ 2024-09-17 10:28   ` David Hildenbrand
  2024-09-18  6:32     ` Anshuman Khandual
  1 sibling, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2024-09-17 10:28 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users

On 17.09.24 09:31, Anshuman Khandual wrote:
> Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
> also provides the platform an opportunity to override when required. This
> stores read page table entry value in a local variable which can be used in
> multiple instances there after. This helps in avoiding multiple memory load
> operations as well possible race conditions.
> 

Please make it clearer in the subject+description that this really only 
involves set_pte_safe().


> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   include/linux/pgtable.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 2a6a3cccfc36..547eeae8c43f 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>    */
>   #define set_pte_safe(ptep, pte) \
>   ({ \
> -	WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
> +	pte_t __old = ptep_get(ptep); \
> +	WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
>   	set_pte(ptep, pte); \
>   })
>   

I don't think this is necessary. PTE present cannot flip concurrently, 
that's the whole reason of the "safe" part after all.

Can we just move these weird set_pte/pmd_safe() stuff to x86 init code 
and be done with it? Then it's also clear *where* it is getting used and 
for which reason.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-17 10:28   ` David Hildenbrand
@ 2024-09-18  6:32     ` Anshuman Khandual
  2024-09-19  8:04       ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-18  6:32 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users



On 9/17/24 15:58, David Hildenbrand wrote:
> On 17.09.24 09:31, Anshuman Khandual wrote:
>> Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
>> also provides the platform an opportunity to override when required. This
>> stores read page table entry value in a local variable which can be used in
>> multiple instances there after. This helps in avoiding multiple memory load
>> operations as well possible race conditions.
>>
> 
> Please make it clearer in the subject+description that this really only involves set_pte_safe().

I will update the commit message with some thing like this.

mm: Use ptep_get() in set_pte_safe()

This converts PTE accesses in set_pte_safe() via ptep_get() helper which
defaults as READ_ONCE() but also provides the platform an opportunity to
override when required. This stores read page table entry value in a local
variable which can be used in multiple instances there after. This helps
in avoiding multiple memory load operations as well as some possible race
conditions.

> 
> 
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   include/linux/pgtable.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 2a6a3cccfc36..547eeae8c43f 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>>    */
>>   #define set_pte_safe(ptep, pte) \
>>   ({ \
>> -    WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
>> +    pte_t __old = ptep_get(ptep); \
>> +    WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
>>       set_pte(ptep, pte); \
>>   })
>>   
> 
> I don't think this is necessary. PTE present cannot flip concurrently, that's the whole reason of the "safe" part after all.

Which is not necessary ? Converting de-references to ptep_get() OR caching
the page table read value in a local variable ? ptep_get() conversion also
serves the purpose providing an opportunity for platform to override.

> 
> Can we just move these weird set_pte/pmd_safe() stuff to x86 init code and be done with it? Then it's also clear *where* it is getting used and for which reason.
> 
set_pte/pmd_safe() can be moved to x86 platform - as that is currently the
sole user for these helpers. But because set_pgd_safe() gets used in riscv
platform, just wondering would it be worth moving only the pte/pmd helpers
but not the pgd one ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-18  6:32     ` Anshuman Khandual
@ 2024-09-19  8:04       ` David Hildenbrand
  2024-09-19  9:20         ` Anshuman Khandual
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2024-09-19  8:04 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users

On 18.09.24 08:32, Anshuman Khandual wrote:
> 
> 
> On 9/17/24 15:58, David Hildenbrand wrote:
>> On 17.09.24 09:31, Anshuman Khandual wrote:
>>> Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
>>> also provides the platform an opportunity to override when required. This
>>> stores read page table entry value in a local variable which can be used in
>>> multiple instances there after. This helps in avoiding multiple memory load
>>> operations as well possible race conditions.
>>>
>>
>> Please make it clearer in the subject+description that this really only involves set_pte_safe().
> 
> I will update the commit message with some thing like this.
> 
> mm: Use ptep_get() in set_pte_safe()
> 
> This converts PTE accesses in set_pte_safe() via ptep_get() helper which
> defaults as READ_ONCE() but also provides the platform an opportunity to
> override when required. This stores read page table entry value in a local
> variable which can be used in multiple instances there after. This helps
> in avoiding multiple memory load operations as well as some possible race
> conditions.
> 
>>
>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
>>> Cc: linux-mm@kvack.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>    include/linux/pgtable.h | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index 2a6a3cccfc36..547eeae8c43f 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>>>     */
>>>    #define set_pte_safe(ptep, pte) \
>>>    ({ \
>>> -    WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
>>> +    pte_t __old = ptep_get(ptep); \
>>> +    WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
>>>        set_pte(ptep, pte); \
>>>    })
>>>    
>>
>> I don't think this is necessary. PTE present cannot flip concurrently, that's the whole reason of the "safe" part after all.
> 
> Which is not necessary ? Converting de-references to ptep_get() OR caching
> the page table read value in a local variable ? ptep_get() conversion also
> serves the purpose providing an opportunity for platform to override.

Which arch override are you thinking of where this change here would 
make a real difference? Would it even make a difference with cont-pte on 
arm?

> 
>>
>> Can we just move these weird set_pte/pmd_safe() stuff to x86 init code and be done with it? Then it's also clear *where* it is getting used and for which reason.
>>
> set_pte/pmd_safe() can be moved to x86 platform - as that is currently the
> sole user for these helpers. But because set_pgd_safe() gets used in riscv
> platform, just wondering would it be worth moving only the pte/pmd helpers
> but not the pgd one ?

My take would be just to move them where they are used, and possibly 
even inlining them.

The point is that it's absolutely underdocumented what "_safe" is 
supposed to be here, and I don't really see the reason to have this in 
common code (making the common API more complicated).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries
  2024-09-19  8:04       ` David Hildenbrand
@ 2024-09-19  9:20         ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-19  9:20 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Andrew Morton, Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann,
	x86, linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users


On 9/19/24 13:34, David Hildenbrand wrote:
> On 18.09.24 08:32, Anshuman Khandual wrote:
>>
>>
>> On 9/17/24 15:58, David Hildenbrand wrote:
>>> On 17.09.24 09:31, Anshuman Khandual wrote:
>>>> Convert PTE accesses via ptep_get() helper that defaults as READ_ONCE() but
>>>> also provides the platform an opportunity to override when required. This
>>>> stores read page table entry value in a local variable which can be used in
>>>> multiple instances there after. This helps in avoiding multiple memory load
>>>> operations as well possible race conditions.
>>>>
>>>
>>> Please make it clearer in the subject+description that this really only involves set_pte_safe().
>>
>> I will update the commit message with some thing like this.
>>
>> mm: Use ptep_get() in set_pte_safe()
>>
>> This converts PTE accesses in set_pte_safe() via ptep_get() helper which
>> defaults as READ_ONCE() but also provides the platform an opportunity to
>> override when required. This stores read page table entry value in a local
>> variable which can be used in multiple instances there after. This helps
>> in avoiding multiple memory load operations as well as some possible race
>> conditions.
>>
>>>
>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>>> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> ---
>>>>    include/linux/pgtable.h | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>> index 2a6a3cccfc36..547eeae8c43f 100644
>>>> --- a/include/linux/pgtable.h
>>>> +++ b/include/linux/pgtable.h
>>>> @@ -1060,7 +1060,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>>>>     */
>>>>    #define set_pte_safe(ptep, pte) \
>>>>    ({ \
>>>> -    WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
>>>> +    pte_t __old = ptep_get(ptep); \
>>>> +    WARN_ON_ONCE(pte_present(__old) && !pte_same(__old, pte)); \
>>>>        set_pte(ptep, pte); \
>>>>    })
>>>>    
>>>
>>> I don't think this is necessary. PTE present cannot flip concurrently, that's the whole reason of the "safe" part after all.
>>
>> Which is not necessary ? Converting de-references to ptep_get() OR caching
>> the page table read value in a local variable ? ptep_get() conversion also
>> serves the purpose providing an opportunity for platform to override.
> 
> Which arch override are you thinking of where this change here would make a real difference? Would it even make a difference with cont-pte on arm?

As we figured out already this code is not used any where other than x86 platform.
So changing this, won't make a difference for arm64 unless I am missing something.
The idea behind the series is to ensure that, there are no direct de-referencing
of page table entries in generic MM code and all accesses should go via available
helpers instead. But if we move these set_pxd_safe() helpers into platform code as
you have suggested earlier, those changes will not be necessary anymore.

> 
>>
>>>
>>> Can we just move these weird set_pte/pmd_safe() stuff to x86 init code and be done with it? Then it's also clear *where* it is getting used and for which reason.
>>>
>> set_pte/pmd_safe() can be moved to x86 platform - as that is currently the
>> sole user for these helpers. But because set_pgd_safe() gets used in riscv
>> platform, just wondering would it be worth moving only the pte/pmd helpers
>> but not the pgd one ?
> 
> My take would be just to move them where they are used, and possibly even inlining them.
> 
> The point is that it's absolutely underdocumented what "_safe" is supposed to be here, and I don't really see the reason to have this in common code (making the common API more complicated).

Agreed, it makes sense for these helpers to be in the platform code instead where
they get used (x86, riscv). Will move them as required.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
                   ` (2 preceding siblings ...)
  2024-09-17  7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17 10:05   ` Ryan Roberts
                     ` (2 more replies)
  2024-09-17  7:31 ` [PATCH V2 5/7] mm: Use pudp_get() for accessing PUD entries Anshuman Khandual
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Muchun Song, Andrey Ryabinin, Miaohe Lin,
	Naoya Horiguchi, Pasha Tatashin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig

Convert PMD accesses via pmdp_get() helper that defaults as READ_ONCE() but
also provides the platform an opportunity to override when required. This
stores read page table entry value in a local variable which can be used in
multiple instances there after. This helps in avoiding multiple memory load
operations as well possible race conditions.

Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: kasan-dev@googlegroups.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/misc/sgi-gru/grufault.c |  7 ++--
 fs/proc/task_mmu.c              | 28 +++++++-------
 include/linux/huge_mm.h         |  4 +-
 include/linux/mm.h              |  2 +-
 include/linux/pgtable.h         | 15 ++++----
 mm/gup.c                        | 14 +++----
 mm/huge_memory.c                | 66 +++++++++++++++++----------------
 mm/hugetlb_vmemmap.c            |  4 +-
 mm/kasan/init.c                 | 10 ++---
 mm/kasan/shadow.c               |  4 +-
 mm/khugepaged.c                 |  4 +-
 mm/madvise.c                    |  6 +--
 mm/memory-failure.c             |  6 +--
 mm/memory.c                     | 25 +++++++------
 mm/mempolicy.c                  |  4 +-
 mm/migrate.c                    |  4 +-
 mm/migrate_device.c             | 10 ++---
 mm/mlock.c                      |  6 +--
 mm/mprotect.c                   |  2 +-
 mm/mremap.c                     |  4 +-
 mm/page_table_check.c           |  2 +-
 mm/pagewalk.c                   |  4 +-
 mm/percpu.c                     |  2 +-
 mm/pgtable-generic.c            | 20 +++++-----
 mm/ptdump.c                     |  2 +-
 mm/rmap.c                       |  4 +-
 mm/sparse-vmemmap.c             |  4 +-
 mm/vmalloc.c                    | 15 ++++----
 28 files changed, 145 insertions(+), 133 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 3557d78ee47a..804f275ece99 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -208,7 +208,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
 	pgd_t *pgdp;
 	p4d_t *p4dp;
 	pud_t *pudp;
-	pmd_t *pmdp;
+	pmd_t *pmdp, old_pmd;
 	pte_t pte;
 
 	pgdp = pgd_offset(vma->vm_mm, vaddr);
@@ -224,10 +224,11 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
 		goto err;
 
 	pmdp = pmd_offset(pudp, vaddr);
-	if (unlikely(pmd_none(*pmdp)))
+	old_pmd = pmdp_get(pmdp);
+	if (unlikely(pmd_none(old_pmd)))
 		goto err;
 #ifdef CONFIG_X86_64
-	if (unlikely(pmd_leaf(*pmdp)))
+	if (unlikely(pmd_leaf(old_pmd)))
 		pte = ptep_get((pte_t *)pmdp);
 	else
 #endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5f171ad7b436..f0c63884d008 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -861,12 +861,13 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	struct page *page = NULL;
 	bool present = false;
 	struct folio *folio;
+	pmd_t old_pmd = pmdp_get(pmd);
 
-	if (pmd_present(*pmd)) {
-		page = vm_normal_page_pmd(vma, addr, *pmd);
+	if (pmd_present(old_pmd)) {
+		page = vm_normal_page_pmd(vma, addr, old_pmd);
 		present = true;
-	} else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
-		swp_entry_t entry = pmd_to_swp_entry(*pmd);
+	} else if (unlikely(thp_migration_supported() && is_swap_pmd(old_pmd))) {
+		swp_entry_t entry = pmd_to_swp_entry(old_pmd);
 
 		if (is_pfn_swap_entry(entry))
 			page = pfn_swap_entry_to_page(entry);
@@ -883,7 +884,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	else
 		mss->file_thp += HPAGE_PMD_SIZE;
 
-	smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd),
+	smaps_account(mss, page, true, pmd_young(old_pmd), pmd_dirty(old_pmd),
 		      locked, present);
 }
 #else
@@ -1426,7 +1427,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmdp)
 {
-	pmd_t old, pmd = *pmdp;
+	pmd_t old, pmd = pmdp_get(pmdp);
 
 	if (pmd_present(pmd)) {
 		/* See comment in change_huge_pmd() */
@@ -1468,10 +1469,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 			goto out;
 		}
 
-		if (!pmd_present(*pmd))
+		if (!pmd_present(pmdp_get(pmd)))
 			goto out;
 
-		folio = pmd_folio(*pmd);
+		folio = pmd_folio(pmdp_get(pmd));
 
 		/* Clear accessed and referenced bits. */
 		pmdp_test_and_clear_young(vma, addr, pmd);
@@ -1769,7 +1770,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 	if (ptl) {
 		unsigned int idx = (addr & ~PMD_MASK) >> PAGE_SHIFT;
 		u64 flags = 0, frame = 0;
-		pmd_t pmd = *pmdp;
+		pmd_t pmd = pmdp_get(pmdp);
 		struct page *page = NULL;
 		struct folio *folio = NULL;
 
@@ -2189,7 +2190,7 @@ static unsigned long pagemap_thp_category(struct pagemap_scan_private *p,
 static void make_uffd_wp_pmd(struct vm_area_struct *vma,
 			     unsigned long addr, pmd_t *pmdp)
 {
-	pmd_t old, pmd = *pmdp;
+	pmd_t old, pmd = pmdp_get(pmdp);
 
 	if (pmd_present(pmd)) {
 		old = pmdp_invalidate_ad(vma, addr, pmdp);
@@ -2416,7 +2417,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, unsigned long start,
 		return -ENOENT;
 
 	categories = p->cur_vma_category |
-		     pagemap_thp_category(p, vma, start, *pmd);
+		     pagemap_thp_category(p, vma, start, pmdp_get(pmd));
 
 	if (!pagemap_scan_is_interesting_page(categories, p))
 		goto out_unlock;
@@ -2946,10 +2947,11 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
 		struct page *page;
+		pmd_t old_pmd = pmdp_get(pmd);
 
-		page = can_gather_numa_stats_pmd(*pmd, vma, addr);
+		page = can_gather_numa_stats_pmd(old_pmd, vma, addr);
 		if (page)
-			gather_stats(page, md, pmd_dirty(*pmd),
+			gather_stats(page, md, pmd_dirty(old_pmd),
 				     HPAGE_PMD_SIZE/PAGE_SIZE);
 		spin_unlock(ptl);
 		return 0;
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e25d9ebfdf89..38b5de040d02 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -369,7 +369,9 @@ static inline int is_swap_pmd(pmd_t pmd)
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 		struct vm_area_struct *vma)
 {
-	if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
+	pmd_t old_pmd = pmdp_get(pmd);
+
+	if (is_swap_pmd(old_pmd) || pmd_trans_huge(old_pmd) || pmd_devmap(old_pmd))
 		return __pmd_trans_huge_lock(pmd, vma);
 	else
 		return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 147073601716..258e49323306 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2921,7 +2921,7 @@ static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 
 static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-	return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
+	return ptlock_ptr(page_ptdesc(pmd_page(pmdp_get(pmd))));
 }
 
 static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 547eeae8c43f..ea283ce958a7 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -367,7 +367,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 					    unsigned long address,
 					    pmd_t *pmdp)
 {
-	pmd_t pmd = *pmdp;
+	pmd_t pmd = pmdp_get(pmdp);
 	int r = 1;
 	if (!pmd_young(pmd))
 		r = 0;
@@ -598,7 +598,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 					    unsigned long address,
 					    pmd_t *pmdp)
 {
-	pmd_t pmd = *pmdp;
+	pmd_t pmd = pmdp_get(pmdp);
 
 	pmd_clear(pmdp);
 	page_table_check_pmd_clear(mm, pmd);
@@ -876,7 +876,7 @@ static inline pte_t pte_sw_mkyoung(pte_t pte)
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long address, pmd_t *pmdp)
 {
-	pmd_t old_pmd = *pmdp;
+	pmd_t old_pmd = pmdp_get(pmdp);
 	set_pmd_at(mm, address, pmdp, pmd_wrprotect(old_pmd));
 }
 #else
@@ -945,7 +945,7 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-	pmd_t old_pmd = *pmdp;
+	pmd_t old_pmd = pmdp_get(pmdp);
 	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
 	return old_pmd;
 }
@@ -1067,7 +1067,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 
 #define set_pmd_safe(pmdp, pmd) \
 ({ \
-	WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
+	pmd_t __old = pmdp_get(pmdp); \
+	WARN_ON_ONCE(pmd_present(__old) && !pmd_same(__old, pmd)); \
 	set_pmd(pmdp, pmd); \
 })
 
@@ -1271,9 +1272,9 @@ static inline int pud_none_or_clear_bad(pud_t *pud)
 
 static inline int pmd_none_or_clear_bad(pmd_t *pmd)
 {
-	if (pmd_none(*pmd))
+	if (pmd_none(pmdp_get(pmd)))
 		return 1;
-	if (unlikely(pmd_bad(*pmd))) {
+	if (unlikely(pmd_bad(pmdp_get(pmd)))) {
 		pmd_clear_bad(pmd);
 		return 1;
 	}
diff --git a/mm/gup.c b/mm/gup.c
index 54d0dc3831fb..aeeac0a54944 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -699,7 +699,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
 				    struct follow_page_context *ctx)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pmd_t pmdval = *pmd;
+	pmd_t pmdval = pmdp_get(pmd);
 	struct page *page;
 	int ret;
 
@@ -714,7 +714,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
 	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(pmdval))
 		return ERR_PTR(-EFAULT);
 
-	if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
+	if (pmd_protnone(pmdp_get(pmd)) && !gup_can_follow_protnone(vma, flags))
 		return NULL;
 
 	if (!pmd_write(pmdval) && gup_must_unshare(vma, flags, page))
@@ -957,7 +957,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		return no_page_table(vma, flags, address);
 
 	ptl = pmd_lock(mm, pmd);
-	pmdval = *pmd;
+	pmdval = pmdp_get(pmd);
 	if (unlikely(!pmd_present(pmdval))) {
 		spin_unlock(ptl);
 		return no_page_table(vma, flags, address);
@@ -1120,7 +1120,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	if (pud_none(*pud))
 		return -EFAULT;
 	pmd = pmd_offset(pud, address);
-	if (!pmd_present(*pmd))
+	if (!pmd_present(pmdp_get(pmd)))
 		return -EFAULT;
 	pte = pte_offset_map(pmd, address);
 	if (!pte)
@@ -2898,7 +2898,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
 		if (!folio)
 			goto pte_unmap;
 
-		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
+		if (unlikely(pmd_val(pmd) != pmd_val(pmdp_get(pmdp))) ||
 		    unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
 			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
@@ -3007,7 +3007,7 @@ static int gup_fast_devmap_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	if (!gup_fast_devmap_leaf(fault_pfn, addr, end, flags, pages, nr))
 		return 0;
 
-	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+	if (unlikely(pmd_val(orig) != pmd_val(pmdp_get(pmdp)))) {
 		gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
 		return 0;
 	}
@@ -3074,7 +3074,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	if (!folio)
 		return 0;
 
-	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+	if (unlikely(pmd_val(orig) != pmd_val(pmdp_get(pmdp)))) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 67c86a5d64a6..bb63de935937 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1065,7 +1065,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm,
 		struct folio *zero_folio)
 {
 	pmd_t entry;
-	if (!pmd_none(*pmd))
+	if (!pmd_none(pmdp_get(pmd)))
 		return;
 	entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
 	entry = pmd_mkhuge(entry);
@@ -1144,17 +1144,17 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		pgtable_t pgtable)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pmd_t entry;
+	pmd_t entry, old_pmd = pmdp_get(pmd);
 	spinlock_t *ptl;
 
 	ptl = pmd_lock(mm, pmd);
-	if (!pmd_none(*pmd)) {
+	if (!pmd_none(old_pmd)) {
 		if (write) {
-			if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
-				WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
+			if (pmd_pfn(old_pmd) != pfn_t_to_pfn(pfn)) {
+				WARN_ON_ONCE(!is_huge_zero_pmd(old_pmd));
 				goto out_unlock;
 			}
-			entry = pmd_mkyoung(*pmd);
+			entry = pmd_mkyoung(old_pmd);
 			entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 			if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
 				update_mmu_cache_pmd(vma, addr, pmd);
@@ -1318,7 +1318,7 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 {
 	pmd_t _pmd;
 
-	_pmd = pmd_mkyoung(*pmd);
+	_pmd = pmd_mkyoung(pmdp_get(pmd));
 	if (write)
 		_pmd = pmd_mkdirty(_pmd);
 	if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
@@ -1329,17 +1329,18 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 		pmd_t *pmd, int flags, struct dev_pagemap **pgmap)
 {
-	unsigned long pfn = pmd_pfn(*pmd);
+	pmd_t old_pmd = pmdp_get(pmd);
+	unsigned long pfn = pmd_pfn(old_pmd);
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	int ret;
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
-	if (flags & FOLL_WRITE && !pmd_write(*pmd))
+	if (flags & FOLL_WRITE && !pmd_write(old_pmd))
 		return NULL;
 
-	if (pmd_present(*pmd) && pmd_devmap(*pmd))
+	if (pmd_present(old_pmd) && pmd_devmap(old_pmd))
 		/* pass */;
 	else
 		return NULL;
@@ -1772,7 +1773,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (!ptl)
 		goto out_unlocked;
 
-	orig_pmd = *pmd;
+	orig_pmd = pmdp_get(pmd);
 	if (is_huge_zero_pmd(orig_pmd))
 		goto out;
 
@@ -1990,7 +1991,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	spinlock_t *ptl;
-	pmd_t oldpmd, entry;
+	pmd_t oldpmd, entry, old_pmd;
 	bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
 	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
@@ -2005,13 +2006,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (!ptl)
 		return 0;
 
+	old_pmd = pmdp_get(pmd);
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-	if (is_swap_pmd(*pmd)) {
-		swp_entry_t entry = pmd_to_swp_entry(*pmd);
+	if (is_swap_pmd(old_pmd)) {
+		swp_entry_t entry = pmd_to_swp_entry(old_pmd);
 		struct folio *folio = pfn_swap_entry_folio(entry);
 		pmd_t newpmd;
 
-		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
+		VM_BUG_ON(!is_pmd_migration_entry(old_pmd));
 		if (is_writable_migration_entry(entry)) {
 			/*
 			 * A protection check is difficult so
@@ -2022,17 +2024,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			else
 				entry = make_readable_migration_entry(swp_offset(entry));
 			newpmd = swp_entry_to_pmd(entry);
-			if (pmd_swp_soft_dirty(*pmd))
+			if (pmd_swp_soft_dirty(old_pmd))
 				newpmd = pmd_swp_mksoft_dirty(newpmd);
 		} else {
-			newpmd = *pmd;
+			newpmd = old_pmd;
 		}
 
 		if (uffd_wp)
 			newpmd = pmd_swp_mkuffd_wp(newpmd);
 		else if (uffd_wp_resolve)
 			newpmd = pmd_swp_clear_uffd_wp(newpmd);
-		if (!pmd_same(*pmd, newpmd))
+		if (!pmd_same(old_pmd, newpmd))
 			set_pmd_at(mm, addr, pmd, newpmd);
 		goto unlock;
 	}
@@ -2046,13 +2048,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 * data is likely to be read-cached on the local CPU and
 		 * local/remote hits to the zero page are not interesting.
 		 */
-		if (is_huge_zero_pmd(*pmd))
+		if (is_huge_zero_pmd(old_pmd))
 			goto unlock;
 
-		if (pmd_protnone(*pmd))
+		if (pmd_protnone(old_pmd))
 			goto unlock;
 
-		folio = pmd_folio(*pmd);
+		folio = pmd_folio(old_pmd);
 		toptier = node_is_toptier(folio_nid(folio));
 		/*
 		 * Skip scanning top tier node if normal numa
@@ -2266,8 +2268,8 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
 {
 	spinlock_t *ptl;
 	ptl = pmd_lock(vma->vm_mm, pmd);
-	if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) ||
-			pmd_devmap(*pmd)))
+	if (likely(is_swap_pmd(pmdp_get(pmd)) || pmd_trans_huge(pmdp_get(pmd)) ||
+			pmd_devmap(pmdp_get(pmd))))
 		return ptl;
 	spin_unlock(ptl);
 	return NULL;
@@ -2404,8 +2406,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
 	VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
 	VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma);
-	VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)
-				&& !pmd_devmap(*pmd));
+	VM_BUG_ON(!is_pmd_migration_entry(pmdp_get(pmd)) && !pmd_trans_huge(pmdp_get(pmd))
+				&& !pmd_devmap(pmdp_get(pmd)));
 
 	count_vm_event(THP_SPLIT_PMD);
 
@@ -2438,7 +2440,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return;
 	}
 
-	if (is_huge_zero_pmd(*pmd)) {
+	if (is_huge_zero_pmd(pmdp_get(pmd))) {
 		/*
 		 * FIXME: Do we want to invalidate secondary mmu by calling
 		 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
@@ -2451,11 +2453,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return __split_huge_zero_page_pmd(vma, haddr, pmd);
 	}
 
-	pmd_migration = is_pmd_migration_entry(*pmd);
+	pmd_migration = is_pmd_migration_entry(pmdp_get(pmd));
 	if (unlikely(pmd_migration)) {
 		swp_entry_t entry;
 
-		old_pmd = *pmd;
+		old_pmd = pmdp_get(pmd);
 		entry = pmd_to_swp_entry(old_pmd);
 		page = pfn_swap_entry_to_page(entry);
 		write = is_writable_migration_entry(entry);
@@ -2620,9 +2622,9 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
 	 * require a folio to check the PMD against. Otherwise, there
 	 * is a risk of replacing the wrong folio.
 	 */
-	if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
-	    is_pmd_migration_entry(*pmd)) {
-		if (folio && folio != pmd_folio(*pmd))
+	if (pmd_trans_huge(pmdp_get(pmd)) || pmd_devmap(pmdp_get(pmd)) ||
+	    is_pmd_migration_entry(pmdp_get(pmd))) {
+		if (folio && folio != pmd_folio(pmdp_get(pmd)))
 			return;
 		__split_huge_pmd_locked(vma, pmd, address, freeze);
 	}
@@ -2719,7 +2721,7 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int ref_count, map_count;
-	pmd_t orig_pmd = *pmdp;
+	pmd_t orig_pmd = pmdp_get(pmdp);
 
 	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
 		return false;
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 0c3f56b3578e..9deb82654d5b 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -70,7 +70,7 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start,
 	}
 
 	spin_lock(&init_mm.page_table_lock);
-	if (likely(pmd_leaf(*pmd))) {
+	if (likely(pmd_leaf(pmdp_get(pmd)))) {
 		/*
 		 * Higher order allocations from buddy allocator must be able to
 		 * be treated as indepdenent small pages (as they can be freed
@@ -104,7 +104,7 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr,
 		walk->action = ACTION_CONTINUE;
 
 	spin_lock(&init_mm.page_table_lock);
-	head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL;
+	head = pmd_leaf(pmdp_get(pmd)) ? pmd_page(pmdp_get(pmd)) : NULL;
 	/*
 	 * Due to HugeTLB alignment requirements and the vmemmap
 	 * pages being at the start of the hotplugged memory
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 89895f38f722..4418bcdcb2aa 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -121,7 +121,7 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
 			continue;
 		}
 
-		if (pmd_none(*pmd)) {
+		if (pmd_none(pmdp_get(pmd))) {
 			pte_t *p;
 
 			if (slab_is_available())
@@ -300,7 +300,7 @@ static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
 			return;
 	}
 
-	pte_free_kernel(&init_mm, (pte_t *)page_to_virt(pmd_page(*pmd)));
+	pte_free_kernel(&init_mm, (pte_t *)page_to_virt(pmd_page(pmdp_get(pmd))));
 	pmd_clear(pmd);
 }
 
@@ -311,7 +311,7 @@ static void kasan_free_pmd(pmd_t *pmd_start, pud_t *pud)
 
 	for (i = 0; i < PTRS_PER_PMD; i++) {
 		pmd = pmd_start + i;
-		if (!pmd_none(*pmd))
+		if (!pmd_none(pmdp_get(pmd)))
 			return;
 	}
 
@@ -381,10 +381,10 @@ static void kasan_remove_pmd_table(pmd_t *pmd, unsigned long addr,
 
 		next = pmd_addr_end(addr, end);
 
-		if (!pmd_present(*pmd))
+		if (!pmd_present(pmdp_get(pmd)))
 			continue;
 
-		if (kasan_pte_table(*pmd)) {
+		if (kasan_pte_table(pmdp_get(pmd))) {
 			if (IS_ALIGNED(addr, PMD_SIZE) &&
 			    IS_ALIGNED(next, PMD_SIZE)) {
 				pmd_clear(pmd);
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index d6210ca48dda..aec16a7236f7 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -202,9 +202,9 @@ static bool shadow_mapped(unsigned long addr)
 	if (pud_leaf(*pud))
 		return true;
 	pmd = pmd_offset(pud, addr);
-	if (pmd_none(*pmd))
+	if (pmd_none(pmdp_get(pmd)))
 		return false;
-	if (pmd_leaf(*pmd))
+	if (pmd_leaf(pmdp_get(pmd)))
 		return true;
 	pte = pte_offset_kernel(pmd, addr);
 	return !pte_none(ptep_get(pte));
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cdd1d8655a76..793da996313f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1192,7 +1192,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
 		if (pte)
 			pte_unmap(pte);
 		spin_lock(pmd_ptl);
-		BUG_ON(!pmd_none(*pmd));
+		BUG_ON(!pmd_none(pmdp_get(pmd)));
 		/*
 		 * We can only use set_pmd_at when establishing
 		 * hugepmds and never for establishing regular pmds that
@@ -1229,7 +1229,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
 	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
 
 	spin_lock(pmd_ptl);
-	BUG_ON(!pmd_none(*pmd));
+	BUG_ON(!pmd_none(pmdp_get(pmd)));
 	folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
 	folio_add_lru_vma(folio, vma);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
diff --git a/mm/madvise.c b/mm/madvise.c
index 89089d84f8df..382c55d2ec94 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -357,7 +357,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 					!can_do_file_pageout(vma);
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (pmd_trans_huge(*pmd)) {
+	if (pmd_trans_huge(pmdp_get(pmd))) {
 		pmd_t orig_pmd;
 		unsigned long next = pmd_addr_end(addr, end);
 
@@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (!ptl)
 			return 0;
 
-		orig_pmd = *pmd;
+		orig_pmd = pmdp_get(pmd);
 		if (is_huge_zero_pmd(orig_pmd))
 			goto huge_unlock;
 
@@ -655,7 +655,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 	int nr, max_nr;
 
 	next = pmd_addr_end(addr, end);
-	if (pmd_trans_huge(*pmd))
+	if (pmd_trans_huge(pmdp_get(pmd)))
 		if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next))
 			return 0;
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7066fc84f351..305dbef3cc4d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -422,9 +422,9 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
 	if (pud_devmap(*pud))
 		return PUD_SHIFT;
 	pmd = pmd_offset(pud, address);
-	if (!pmd_present(*pmd))
+	if (!pmd_present(pmdp_get(pmd)))
 		return 0;
-	if (pmd_devmap(*pmd))
+	if (pmd_devmap(pmdp_get(pmd)))
 		return PMD_SHIFT;
 	pte = pte_offset_map(pmd, address);
 	if (!pte)
@@ -775,7 +775,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
 static int check_hwpoisoned_pmd_entry(pmd_t *pmdp, unsigned long addr,
 				      struct hwpoison_walk *hwp)
 {
-	pmd_t pmd = *pmdp;
+	pmd_t pmd = pmdp_get(pmdp);
 	unsigned long pfn;
 	unsigned long hwpoison_vaddr;
 
diff --git a/mm/memory.c b/mm/memory.c
index ebfc9768f801..5520e1f6a1b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -189,7 +189,7 @@ void mm_trace_rss_stat(struct mm_struct *mm, int member)
 static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
 			   unsigned long addr)
 {
-	pgtable_t token = pmd_pgtable(*pmd);
+	pgtable_t token = pmd_pgtable(pmdp_get(pmd));
 	pmd_clear(pmd);
 	pte_free_tlb(tlb, token, addr);
 	mm_dec_nr_ptes(tlb->mm);
@@ -421,7 +421,7 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte)
 {
 	spinlock_t *ptl = pmd_lock(mm, pmd);
 
-	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
+	if (likely(pmd_none(pmdp_get(pmd)))) {	/* Has another populated it ? */
 		mm_inc_nr_ptes(mm);
 		/*
 		 * Ensure all pte setup (eg. pte page lock and page clearing) are
@@ -462,7 +462,7 @@ int __pte_alloc_kernel(pmd_t *pmd)
 		return -ENOMEM;
 
 	spin_lock(&init_mm.page_table_lock);
-	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
+	if (likely(pmd_none(pmdp_get(pmd)))) {	/* Has another populated it ? */
 		smp_wmb(); /* See comment in pmd_install() */
 		pmd_populate_kernel(&init_mm, pmd, new);
 		new = NULL;
@@ -1710,7 +1710,8 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
-		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
+		if (is_swap_pmd(pmdp_get(pmd)) || pmd_trans_huge(pmdp_get(pmd)) ||
+		    pmd_devmap(pmdp_get(pmd))) {
 			if (next - addr != HPAGE_PMD_SIZE)
 				__split_huge_pmd(vma, pmd, addr, false, NULL);
 			else if (zap_huge_pmd(tlb, vma, pmd, addr)) {
@@ -1720,7 +1721,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			/* fall through */
 		} else if (details && details->single_folio &&
 			   folio_test_pmd_mappable(details->single_folio) &&
-			   next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) {
+			   next - addr == HPAGE_PMD_SIZE && pmd_none(pmdp_get(pmd))) {
 			spinlock_t *ptl = pmd_lock(tlb->mm, pmd);
 			/*
 			 * Take and drop THP pmd lock so that we cannot return
@@ -1729,7 +1730,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			 */
 			spin_unlock(ptl);
 		}
-		if (pmd_none(*pmd)) {
+		if (pmd_none(pmdp_get(pmd))) {
 			addr = next;
 			continue;
 		}
@@ -1975,7 +1976,7 @@ static pmd_t *walk_to_pmd(struct mm_struct *mm, unsigned long addr)
 	if (!pmd)
 		return NULL;
 
-	VM_BUG_ON(pmd_trans_huge(*pmd));
+	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
 	return pmd;
 }
 
@@ -2577,7 +2578,7 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
 	pmd = pmd_alloc(mm, pud, addr);
 	if (!pmd)
 		return -ENOMEM;
-	VM_BUG_ON(pmd_trans_huge(*pmd));
+	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
 	do {
 		next = pmd_addr_end(addr, end);
 		err = remap_pte_range(mm, pmd, addr, next,
@@ -2846,11 +2847,11 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
 	}
 	do {
 		next = pmd_addr_end(addr, end);
-		if (pmd_none(*pmd) && !create)
+		if (pmd_none(pmdp_get(pmd)) && !create)
 			continue;
-		if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+		if (WARN_ON_ONCE(pmd_leaf(pmdp_get(pmd))))
 			return -EINVAL;
-		if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
+		if (!pmd_none(pmdp_get(pmd)) && WARN_ON_ONCE(pmd_bad(pmdp_get(pmd)))) {
 			if (!create)
 				continue;
 			pmd_clear_bad(pmd);
@@ -6167,7 +6168,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 		goto out;
 
 	pmd = pmd_offset(pud, address);
-	VM_BUG_ON(pmd_trans_huge(*pmd));
+	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
 
 	ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
 	if (!ptep)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b858e22b259d..03f2df44b07f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -505,11 +505,11 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk *walk)
 	struct folio *folio;
 	struct queue_pages *qp = walk->private;
 
-	if (unlikely(is_pmd_migration_entry(*pmd))) {
+	if (unlikely(is_pmd_migration_entry(pmdp_get(pmd)))) {
 		qp->nr_failed++;
 		return;
 	}
-	folio = pmd_folio(*pmd);
+	folio = pmd_folio(pmdp_get(pmd));
 	if (is_huge_zero_folio(folio)) {
 		walk->action = ACTION_CONTINUE;
 		return;
diff --git a/mm/migrate.c b/mm/migrate.c
index 923ea80ba744..a1dd5c8f88dd 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -369,9 +369,9 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 	spinlock_t *ptl;
 
 	ptl = pmd_lock(mm, pmd);
-	if (!is_pmd_migration_entry(*pmd))
+	if (!is_pmd_migration_entry(pmdp_get(pmd)))
 		goto unlock;
-	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl);
+	migration_entry_wait_on_locked(pmd_to_swp_entry(pmdp_get(pmd)), ptl);
 	return;
 unlock:
 	spin_unlock(ptl);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 6d66dc1c6ffa..3a08cef6cd39 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -67,19 +67,19 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 	pte_t *ptep;
 
 again:
-	if (pmd_none(*pmdp))
+	if (pmd_none(pmdp_get(pmdp)))
 		return migrate_vma_collect_hole(start, end, -1, walk);
 
-	if (pmd_trans_huge(*pmdp)) {
+	if (pmd_trans_huge(pmdp_get(pmdp))) {
 		struct folio *folio;
 
 		ptl = pmd_lock(mm, pmdp);
-		if (unlikely(!pmd_trans_huge(*pmdp))) {
+		if (unlikely(!pmd_trans_huge(pmdp_get(pmdp)))) {
 			spin_unlock(ptl);
 			goto again;
 		}
 
-		folio = pmd_folio(*pmdp);
+		folio = pmd_folio(pmdp_get(pmdp));
 		if (is_huge_zero_folio(folio)) {
 			spin_unlock(ptl);
 			split_huge_pmd(vma, pmdp, addr);
@@ -596,7 +596,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
 	pmdp = pmd_alloc(mm, pudp, addr);
 	if (!pmdp)
 		goto abort;
-	if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp))
+	if (pmd_trans_huge(pmdp_get(pmdp)) || pmd_devmap(pmdp_get(pmdp)))
 		goto abort;
 	if (pte_alloc(mm, pmdp))
 		goto abort;
diff --git a/mm/mlock.c b/mm/mlock.c
index e3e3dc2b2956..c3c479e9d0f8 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -363,11 +363,11 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
-		if (!pmd_present(*pmd))
+		if (!pmd_present(pmdp_get(pmd)))
 			goto out;
-		if (is_huge_zero_pmd(*pmd))
+		if (is_huge_zero_pmd(pmdp_get(pmd)))
 			goto out;
-		folio = pmd_folio(*pmd);
+		folio = pmd_folio(pmdp_get(pmd));
 		if (vma->vm_flags & VM_LOCKED)
 			mlock_folio(folio);
 		else
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 222ab434da54..121fb448b0db 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -381,7 +381,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
 			break;
 		}
 
-		if (pmd_none(*pmd))
+		if (pmd_none(pmdp_get(pmd)))
 			goto next;
 
 		/* invoke the mmu notifier if the pmd is populated */
diff --git a/mm/mremap.c b/mm/mremap.c
index e7ae140fc640..d42ac62bd34e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -63,7 +63,7 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
 		return NULL;
 
 	pmd = pmd_offset(pud, addr);
-	if (pmd_none(*pmd))
+	if (pmd_none(pmdp_get(pmd)))
 		return NULL;
 
 	return pmd;
@@ -97,7 +97,7 @@ static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (!pmd)
 		return NULL;
 
-	VM_BUG_ON(pmd_trans_huge(*pmd));
+	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
 
 	return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 509c6ef8de40..48a2cf56c80e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -241,7 +241,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd)
 
 	page_table_check_pmd_flags(pmd);
 
-	__page_table_check_pmd_clear(mm, *pmdp);
+	__page_table_check_pmd_clear(mm, pmdp_get(pmdp));
 	if (pmd_user_accessible_page(pmd)) {
 		page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,
 				     pmd_write(pmd));
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index ae2f08ce991b..c3019a160e77 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -86,7 +86,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	do {
 again:
 		next = pmd_addr_end(addr, end);
-		if (pmd_none(*pmd)) {
+		if (pmd_none(pmdp_get(pmd))) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
@@ -112,7 +112,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		 * Check this here so we only break down trans_huge
 		 * pages when we _need_ to
 		 */
-		if ((!walk->vma && (pmd_leaf(*pmd) || !pmd_present(*pmd))) ||
+		if ((!walk->vma && (pmd_leaf(pmdp_get(pmd)) || !pmd_present(pmdp_get(pmd)))) ||
 		    walk->action == ACTION_CONTINUE ||
 		    !(ops->pte_entry))
 			continue;
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..7ee77c0fd5e3 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3208,7 +3208,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
 	}
 
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(pmdp_get(pmd))) {
 		pte_t *new;
 
 		new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index a78a4adf711a..920947bb76cd 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -51,7 +51,7 @@ void pud_clear_bad(pud_t *pud)
  */
 void pmd_clear_bad(pmd_t *pmd)
 {
-	pmd_ERROR(*pmd);
+	pmd_ERROR(pmdp_get(pmd));
 	pmd_clear(pmd);
 }
 
@@ -110,7 +110,7 @@ int pmdp_set_access_flags(struct vm_area_struct *vma,
 			  unsigned long address, pmd_t *pmdp,
 			  pmd_t entry, int dirty)
 {
-	int changed = !pmd_same(*pmdp, entry);
+	int changed = !pmd_same(pmdp_get(pmdp), entry);
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 	if (changed) {
 		set_pmd_at(vma->vm_mm, address, pmdp, entry);
@@ -137,10 +137,10 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
 pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp)
 {
-	pmd_t pmd;
+	pmd_t pmd, old_pmd = pmdp_get(pmdp);
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
-			   !pmd_devmap(*pmdp));
+	VM_BUG_ON(pmd_present(old_pmd) && !pmd_trans_huge(old_pmd) &&
+			   !pmd_devmap(old_pmd));
 	pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	return pmd;
@@ -198,8 +198,10 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	VM_WARN_ON_ONCE(!pmd_present(*pmdp));
-	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
+	pmd_t old_pmd = pmdp_get(pmdp);
+
+	VM_WARN_ON_ONCE(!pmd_present(old_pmd));
+	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(old_pmd));
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	return old;
 }
@@ -209,7 +211,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
 			 pmd_t *pmdp)
 {
-	VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+	VM_WARN_ON_ONCE(!pmd_present(pmdp_get(pmdp)));
 	return pmdp_invalidate(vma, address, pmdp);
 }
 #endif
@@ -225,7 +227,7 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
 	pmd_t pmd;
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(pmd_trans_huge(*pmdp));
+	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmdp)));
 	pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
 
 	/* collapse entails shooting down ptes not pmd */
diff --git a/mm/ptdump.c b/mm/ptdump.c
index 106e1d66e9f9..e17588a32012 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -99,7 +99,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
 			    unsigned long next, struct mm_walk *walk)
 {
 	struct ptdump_state *st = walk->private;
-	pmd_t val = READ_ONCE(*pmd);
+	pmd_t val = pmdp_get(pmd);
 
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	if (pmd_page(val) == virt_to_page(lm_alias(kasan_early_shadow_pte)))
diff --git a/mm/rmap.c b/mm/rmap.c
index 2490e727e2dc..32e4920e419d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1034,9 +1034,9 @@ static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw)
 		} else {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			pmd_t *pmd = pvmw->pmd;
-			pmd_t entry;
+			pmd_t entry, old_pmd = pmdp_get(pmd);
 
-			if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
+			if (!pmd_dirty(old_pmd) && !pmd_write(old_pmd))
 				continue;
 
 			flush_cache_range(vma, address,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index edcc7a6b0f6f..c89706e107ce 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -187,7 +187,7 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
 pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
 {
 	pmd_t *pmd = pmd_offset(pud, addr);
-	if (pmd_none(*pmd)) {
+	if (pmd_none(pmdp_get(pmd))) {
 		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
@@ -332,7 +332,7 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
 			return -ENOMEM;
 
 		pmd = pmd_offset(pud, addr);
-		if (pmd_none(READ_ONCE(*pmd))) {
+		if (pmd_none(pmdp_get(pmd))) {
 			void *p;
 
 			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a0df1e2e155a..1da56cbe5feb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -150,7 +150,7 @@ static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
 	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
 		return 0;
 
-	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
+	if (pmd_present(pmdp_get(pmd)) && !pmd_free_pte_page(pmd, addr))
 		return 0;
 
 	return pmd_set_huge(pmd, phys_addr, prot);
@@ -371,7 +371,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		next = pmd_addr_end(addr, end);
 
 		cleared = pmd_clear_huge(pmd);
-		if (cleared || pmd_bad(*pmd))
+		if (cleared || pmd_bad(pmdp_get(pmd)))
 			*mask |= PGTBL_PMD_MODIFIED;
 
 		if (cleared)
@@ -743,7 +743,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 	pgd_t *pgd = pgd_offset_k(addr);
 	p4d_t *p4d;
 	pud_t *pud;
-	pmd_t *pmd;
+	pmd_t *pmd, old_pmd;
 	pte_t *ptep, pte;
 
 	/*
@@ -776,11 +776,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 		return NULL;
 
 	pmd = pmd_offset(pud, addr);
-	if (pmd_none(*pmd))
+	old_pmd = pmdp_get(pmd);
+	if (pmd_none(old_pmd))
 		return NULL;
-	if (pmd_leaf(*pmd))
-		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-	if (WARN_ON_ONCE(pmd_bad(*pmd)))
+	if (pmd_leaf(old_pmd))
+		return pmd_page(old_pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	if (WARN_ON_ONCE(pmd_bad(old_pmd)))
 		return NULL;
 
 	ptep = pte_offset_kernel(pmd, addr);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
@ 2024-09-17 10:05   ` Ryan Roberts
  2024-09-18 18:57   ` kernel test robot
  2024-09-18 19:07   ` kernel test robot
  2 siblings, 0 replies; 37+ messages in thread
From: Ryan Roberts @ 2024-09-17 10:05 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Rapoport (IBM),
	Arnd Bergmann, x86, linux-m68k, linux-fsdevel, kasan-dev,
	linux-kernel, linux-perf-users, Dimitri Sivanich, Muchun Song,
	Andrey Ryabinin, Miaohe Lin, Naoya Horiguchi, Pasha Tatashin,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Uladzislau Rezki,
	Christoph Hellwig

On 17/09/2024 08:31, Anshuman Khandual wrote:
> Convert PMD accesses via pmdp_get() helper that defaults as READ_ONCE() but
> also provides the platform an opportunity to override when required. This
> stores read page table entry value in a local variable which can be used in
> multiple instances there after. This helps in avoiding multiple memory load
> operations as well possible race conditions.
> 
> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: kasan-dev@googlegroups.com
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/misc/sgi-gru/grufault.c |  7 ++--
>  fs/proc/task_mmu.c              | 28 +++++++-------
>  include/linux/huge_mm.h         |  4 +-
>  include/linux/mm.h              |  2 +-
>  include/linux/pgtable.h         | 15 ++++----
>  mm/gup.c                        | 14 +++----
>  mm/huge_memory.c                | 66 +++++++++++++++++----------------
>  mm/hugetlb_vmemmap.c            |  4 +-
>  mm/kasan/init.c                 | 10 ++---
>  mm/kasan/shadow.c               |  4 +-
>  mm/khugepaged.c                 |  4 +-
>  mm/madvise.c                    |  6 +--
>  mm/memory-failure.c             |  6 +--
>  mm/memory.c                     | 25 +++++++------
>  mm/mempolicy.c                  |  4 +-
>  mm/migrate.c                    |  4 +-
>  mm/migrate_device.c             | 10 ++---
>  mm/mlock.c                      |  6 +--
>  mm/mprotect.c                   |  2 +-
>  mm/mremap.c                     |  4 +-
>  mm/page_table_check.c           |  2 +-
>  mm/pagewalk.c                   |  4 +-
>  mm/percpu.c                     |  2 +-
>  mm/pgtable-generic.c            | 20 +++++-----
>  mm/ptdump.c                     |  2 +-
>  mm/rmap.c                       |  4 +-
>  mm/sparse-vmemmap.c             |  4 +-
>  mm/vmalloc.c                    | 15 ++++----
>  28 files changed, 145 insertions(+), 133 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 3557d78ee47a..804f275ece99 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -208,7 +208,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
>  	pgd_t *pgdp;
>  	p4d_t *p4dp;
>  	pud_t *pudp;
> -	pmd_t *pmdp;
> +	pmd_t *pmdp, old_pmd;
>  	pte_t pte;
>  
>  	pgdp = pgd_offset(vma->vm_mm, vaddr);
> @@ -224,10 +224,11 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
>  		goto err;
>  
>  	pmdp = pmd_offset(pudp, vaddr);
> -	if (unlikely(pmd_none(*pmdp)))
> +	old_pmd = pmdp_get(pmdp);
> +	if (unlikely(pmd_none(old_pmd)))
>  		goto err;
>  #ifdef CONFIG_X86_64
> -	if (unlikely(pmd_leaf(*pmdp)))
> +	if (unlikely(pmd_leaf(old_pmd)))
>  		pte = ptep_get((pte_t *)pmdp);
>  	else
>  #endif
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5f171ad7b436..f0c63884d008 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -861,12 +861,13 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>  	struct page *page = NULL;
>  	bool present = false;
>  	struct folio *folio;
> +	pmd_t old_pmd = pmdp_get(pmd);

nit: This function is not modifying the pmd entry, so not sure if the "old" name
is appropriate? (same comment applies generally).

The new style is to use ptep, pmdp, pudp,... for pointers and use pte, pmd,
pud,... for actual entries. The old style was to use pte, pmd, pud,.. for
pointers too, and if there was also a need for the actual entry, ptent is often
used. I don't see any instances of "pmdnt" though. I'm not sure if there is a
standard-ish name for when pmd is already used for the pointer? Perhaps "pmdval"?

nit: when "old" is appropriate, it looks like "orig_pmd" is more standard?

>  
> -	if (pmd_present(*pmd)) {
> -		page = vm_normal_page_pmd(vma, addr, *pmd);
> +	if (pmd_present(old_pmd)) {
> +		page = vm_normal_page_pmd(vma, addr, old_pmd);
>  		present = true;
> -	} else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
> -		swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +	} else if (unlikely(thp_migration_supported() && is_swap_pmd(old_pmd))) {
> +		swp_entry_t entry = pmd_to_swp_entry(old_pmd);
>  
>  		if (is_pfn_swap_entry(entry))
>  			page = pfn_swap_entry_to_page(entry);
> @@ -883,7 +884,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>  	else
>  		mss->file_thp += HPAGE_PMD_SIZE;
>  
> -	smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd),
> +	smaps_account(mss, page, true, pmd_young(old_pmd), pmd_dirty(old_pmd),
>  		      locked, present);
>  }
>  #else
> @@ -1426,7 +1427,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
>  static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
>  		unsigned long addr, pmd_t *pmdp)
>  {
> -	pmd_t old, pmd = *pmdp;
> +	pmd_t old, pmd = pmdp_get(pmdp);
>  
>  	if (pmd_present(pmd)) {
>  		/* See comment in change_huge_pmd() */
> @@ -1468,10 +1469,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>  			goto out;
>  		}
>  
> -		if (!pmd_present(*pmd))
> +		if (!pmd_present(pmdp_get(pmd)))
>  			goto out;
>  
> -		folio = pmd_folio(*pmd);
> +		folio = pmd_folio(pmdp_get(pmd));

Why 2 separate gets in this function?

>  
>  		/* Clear accessed and referenced bits. */
>  		pmdp_test_and_clear_young(vma, addr, pmd);
> @@ -1769,7 +1770,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  	if (ptl) {
>  		unsigned int idx = (addr & ~PMD_MASK) >> PAGE_SHIFT;
>  		u64 flags = 0, frame = 0;
> -		pmd_t pmd = *pmdp;
> +		pmd_t pmd = pmdp_get(pmdp);
>  		struct page *page = NULL;
>  		struct folio *folio = NULL;
>  
> @@ -2189,7 +2190,7 @@ static unsigned long pagemap_thp_category(struct pagemap_scan_private *p,
>  static void make_uffd_wp_pmd(struct vm_area_struct *vma,
>  			     unsigned long addr, pmd_t *pmdp)
>  {
> -	pmd_t old, pmd = *pmdp;
> +	pmd_t old, pmd = pmdp_get(pmdp);
>  
>  	if (pmd_present(pmd)) {
>  		old = pmdp_invalidate_ad(vma, addr, pmdp);
> @@ -2416,7 +2417,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, unsigned long start,
>  		return -ENOENT;
>  
>  	categories = p->cur_vma_category |
> -		     pagemap_thp_category(p, vma, start, *pmd);
> +		     pagemap_thp_category(p, vma, start, pmdp_get(pmd));
>  
>  	if (!pagemap_scan_is_interesting_page(categories, p))
>  		goto out_unlock;
> @@ -2946,10 +2947,11 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
>  		struct page *page;
> +		pmd_t old_pmd = pmdp_get(pmd);
>  
> -		page = can_gather_numa_stats_pmd(*pmd, vma, addr);
> +		page = can_gather_numa_stats_pmd(old_pmd, vma, addr);
>  		if (page)
> -			gather_stats(page, md, pmd_dirty(*pmd),
> +			gather_stats(page, md, pmd_dirty(old_pmd),
>  				     HPAGE_PMD_SIZE/PAGE_SIZE);
>  		spin_unlock(ptl);
>  		return 0;
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index e25d9ebfdf89..38b5de040d02 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -369,7 +369,9 @@ static inline int is_swap_pmd(pmd_t pmd)
>  static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
>  		struct vm_area_struct *vma)
>  {
> -	if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
> +	pmd_t old_pmd = pmdp_get(pmd);
> +
> +	if (is_swap_pmd(old_pmd) || pmd_trans_huge(old_pmd) || pmd_devmap(old_pmd))
>  		return __pmd_trans_huge_lock(pmd, vma);
>  	else
>  		return NULL;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 147073601716..258e49323306 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2921,7 +2921,7 @@ static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> -	return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
> +	return ptlock_ptr(page_ptdesc(pmd_page(pmdp_get(pmd))));
>  }
>  
>  static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 547eeae8c43f..ea283ce958a7 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -367,7 +367,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
>  					    unsigned long address,
>  					    pmd_t *pmdp)
>  {
> -	pmd_t pmd = *pmdp;
> +	pmd_t pmd = pmdp_get(pmdp);
>  	int r = 1;
>  	if (!pmd_young(pmd))
>  		r = 0;
> @@ -598,7 +598,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
>  					    unsigned long address,
>  					    pmd_t *pmdp)
>  {
> -	pmd_t pmd = *pmdp;
> +	pmd_t pmd = pmdp_get(pmdp);
>  
>  	pmd_clear(pmdp);
>  	page_table_check_pmd_clear(mm, pmd);
> @@ -876,7 +876,7 @@ static inline pte_t pte_sw_mkyoung(pte_t pte)
>  static inline void pmdp_set_wrprotect(struct mm_struct *mm,
>  				      unsigned long address, pmd_t *pmdp)
>  {
> -	pmd_t old_pmd = *pmdp;
> +	pmd_t old_pmd = pmdp_get(pmdp);
>  	set_pmd_at(mm, address, pmdp, pmd_wrprotect(old_pmd));
>  }
>  #else
> @@ -945,7 +945,7 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
>  static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
>  		unsigned long address, pmd_t *pmdp, pmd_t pmd)
>  {
> -	pmd_t old_pmd = *pmdp;
> +	pmd_t old_pmd = pmdp_get(pmdp);
>  	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
>  	return old_pmd;
>  }
> @@ -1067,7 +1067,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>  
>  #define set_pmd_safe(pmdp, pmd) \
>  ({ \
> -	WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
> +	pmd_t __old = pmdp_get(pmdp); \
> +	WARN_ON_ONCE(pmd_present(__old) && !pmd_same(__old, pmd)); \
>  	set_pmd(pmdp, pmd); \
>  })
>  
> @@ -1271,9 +1272,9 @@ static inline int pud_none_or_clear_bad(pud_t *pud)
>  
>  static inline int pmd_none_or_clear_bad(pmd_t *pmd)
>  {
> -	if (pmd_none(*pmd))
> +	if (pmd_none(pmdp_get(pmd)))
>  		return 1;
> -	if (unlikely(pmd_bad(*pmd))) {
> +	if (unlikely(pmd_bad(pmdp_get(pmd)))) {

Turn into a single get?

>  		pmd_clear_bad(pmd);
>  		return 1;
>  	}
> diff --git a/mm/gup.c b/mm/gup.c
> index 54d0dc3831fb..aeeac0a54944 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -699,7 +699,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
>  				    struct follow_page_context *ctx)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
> -	pmd_t pmdval = *pmd;
> +	pmd_t pmdval = pmdp_get(pmd);
>  	struct page *page;
>  	int ret;
>  
> @@ -714,7 +714,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
>  	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(pmdval))
>  		return ERR_PTR(-EFAULT);
>  
> -	if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
> +	if (pmd_protnone(pmdp_get(pmd)) && !gup_can_follow_protnone(vma, flags))

why not pmdval?

>  		return NULL;
>  
>  	if (!pmd_write(pmdval) && gup_must_unshare(vma, flags, page))
> @@ -957,7 +957,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
>  		return no_page_table(vma, flags, address);
>  
>  	ptl = pmd_lock(mm, pmd);
> -	pmdval = *pmd;
> +	pmdval = pmdp_get(pmd);
>  	if (unlikely(!pmd_present(pmdval))) {
>  		spin_unlock(ptl);
>  		return no_page_table(vma, flags, address);
> @@ -1120,7 +1120,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
>  	if (pud_none(*pud))
>  		return -EFAULT;
>  	pmd = pmd_offset(pud, address);
> -	if (!pmd_present(*pmd))
> +	if (!pmd_present(pmdp_get(pmd)))
>  		return -EFAULT;
>  	pte = pte_offset_map(pmd, address);
>  	if (!pte)
> @@ -2898,7 +2898,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
>  		if (!folio)
>  			goto pte_unmap;
>  
> -		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
> +		if (unlikely(pmd_val(pmd) != pmd_val(pmdp_get(pmdp))) ||
>  		    unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
>  			gup_put_folio(folio, 1, flags);
>  			goto pte_unmap;
> @@ -3007,7 +3007,7 @@ static int gup_fast_devmap_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	if (!gup_fast_devmap_leaf(fault_pfn, addr, end, flags, pages, nr))
>  		return 0;
>  
> -	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> +	if (unlikely(pmd_val(orig) != pmd_val(pmdp_get(pmdp)))) {
>  		gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
>  		return 0;
>  	}
> @@ -3074,7 +3074,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> -	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> +	if (unlikely(pmd_val(orig) != pmd_val(pmdp_get(pmdp)))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
>  	}
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 67c86a5d64a6..bb63de935937 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1065,7 +1065,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm,
>  		struct folio *zero_folio)
>  {
>  	pmd_t entry;
> -	if (!pmd_none(*pmd))
> +	if (!pmd_none(pmdp_get(pmd)))
>  		return;
>  	entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
>  	entry = pmd_mkhuge(entry);
> @@ -1144,17 +1144,17 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		pgtable_t pgtable)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
> -	pmd_t entry;
> +	pmd_t entry, old_pmd = pmdp_get(pmd);
>  	spinlock_t *ptl;
>  
>  	ptl = pmd_lock(mm, pmd);
> -	if (!pmd_none(*pmd)) {
> +	if (!pmd_none(old_pmd)) {
>  		if (write) {
> -			if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
> -				WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
> +			if (pmd_pfn(old_pmd) != pfn_t_to_pfn(pfn)) {
> +				WARN_ON_ONCE(!is_huge_zero_pmd(old_pmd));
>  				goto out_unlock;
>  			}
> -			entry = pmd_mkyoung(*pmd);
> +			entry = pmd_mkyoung(old_pmd);
>  			entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
>  			if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
>  				update_mmu_cache_pmd(vma, addr, pmd);
> @@ -1318,7 +1318,7 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>  {
>  	pmd_t _pmd;
>  
> -	_pmd = pmd_mkyoung(*pmd);
> +	_pmd = pmd_mkyoung(pmdp_get(pmd));
>  	if (write)
>  		_pmd = pmd_mkdirty(_pmd);
>  	if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
> @@ -1329,17 +1329,18 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>  struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		pmd_t *pmd, int flags, struct dev_pagemap **pgmap)
>  {
> -	unsigned long pfn = pmd_pfn(*pmd);
> +	pmd_t old_pmd = pmdp_get(pmd);
> +	unsigned long pfn = pmd_pfn(old_pmd);
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct page *page;
>  	int ret;
>  
>  	assert_spin_locked(pmd_lockptr(mm, pmd));
>  
> -	if (flags & FOLL_WRITE && !pmd_write(*pmd))
> +	if (flags & FOLL_WRITE && !pmd_write(old_pmd))
>  		return NULL;
>  
> -	if (pmd_present(*pmd) && pmd_devmap(*pmd))
> +	if (pmd_present(old_pmd) && pmd_devmap(old_pmd))
>  		/* pass */;
>  	else
>  		return NULL;
> @@ -1772,7 +1773,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  	if (!ptl)
>  		goto out_unlocked;
>  
> -	orig_pmd = *pmd;
> +	orig_pmd = pmdp_get(pmd);
>  	if (is_huge_zero_pmd(orig_pmd))
>  		goto out;
>  
> @@ -1990,7 +1991,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	spinlock_t *ptl;
> -	pmd_t oldpmd, entry;
> +	pmd_t oldpmd, entry, old_pmd;

You already have oldpmd. Why do you need to add old_pmd?

>  	bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
>  	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
>  	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
> @@ -2005,13 +2006,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  	if (!ptl)
>  		return 0;
>  
> +	old_pmd = pmdp_get(pmd);
>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> -	if (is_swap_pmd(*pmd)) {
> -		swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +	if (is_swap_pmd(old_pmd)) {
> +		swp_entry_t entry = pmd_to_swp_entry(old_pmd);
>  		struct folio *folio = pfn_swap_entry_folio(entry);
>  		pmd_t newpmd;
>  
> -		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> +		VM_BUG_ON(!is_pmd_migration_entry(old_pmd));
>  		if (is_writable_migration_entry(entry)) {
>  			/*
>  			 * A protection check is difficult so
> @@ -2022,17 +2024,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  			else
>  				entry = make_readable_migration_entry(swp_offset(entry));
>  			newpmd = swp_entry_to_pmd(entry);
> -			if (pmd_swp_soft_dirty(*pmd))
> +			if (pmd_swp_soft_dirty(old_pmd))
>  				newpmd = pmd_swp_mksoft_dirty(newpmd);
>  		} else {
> -			newpmd = *pmd;
> +			newpmd = old_pmd;
>  		}
>  
>  		if (uffd_wp)
>  			newpmd = pmd_swp_mkuffd_wp(newpmd);
>  		else if (uffd_wp_resolve)
>  			newpmd = pmd_swp_clear_uffd_wp(newpmd);
> -		if (!pmd_same(*pmd, newpmd))
> +		if (!pmd_same(old_pmd, newpmd))
>  			set_pmd_at(mm, addr, pmd, newpmd);
>  		goto unlock;
>  	}
> @@ -2046,13 +2048,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		 * data is likely to be read-cached on the local CPU and
>  		 * local/remote hits to the zero page are not interesting.
>  		 */
> -		if (is_huge_zero_pmd(*pmd))
> +		if (is_huge_zero_pmd(old_pmd))
>  			goto unlock;
>  
> -		if (pmd_protnone(*pmd))
> +		if (pmd_protnone(old_pmd))
>  			goto unlock;
>  
> -		folio = pmd_folio(*pmd);
> +		folio = pmd_folio(old_pmd);
>  		toptier = node_is_toptier(folio_nid(folio));
>  		/*
>  		 * Skip scanning top tier node if normal numa
> @@ -2266,8 +2268,8 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
>  {
>  	spinlock_t *ptl;
>  	ptl = pmd_lock(vma->vm_mm, pmd);
> -	if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) ||
> -			pmd_devmap(*pmd)))
> +	if (likely(is_swap_pmd(pmdp_get(pmd)) || pmd_trans_huge(pmdp_get(pmd)) ||
> +			pmd_devmap(pmdp_get(pmd))))

Why not a single get here?

>  		return ptl;
>  	spin_unlock(ptl);
>  	return NULL;
> @@ -2404,8 +2406,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
>  	VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
>  	VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma);
> -	VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)
> -				&& !pmd_devmap(*pmd));
> +	VM_BUG_ON(!is_pmd_migration_entry(pmdp_get(pmd)) && !pmd_trans_huge(pmdp_get(pmd))
> +				&& !pmd_devmap(pmdp_get(pmd)));
>  
>  	count_vm_event(THP_SPLIT_PMD);
>  
> @@ -2438,7 +2440,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		return;
>  	}
>  
> -	if (is_huge_zero_pmd(*pmd)) {
> +	if (is_huge_zero_pmd(pmdp_get(pmd))) {
>  		/*
>  		 * FIXME: Do we want to invalidate secondary mmu by calling
>  		 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
> @@ -2451,11 +2453,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		return __split_huge_zero_page_pmd(vma, haddr, pmd);
>  	}
>  
> -	pmd_migration = is_pmd_migration_entry(*pmd);
> +	pmd_migration = is_pmd_migration_entry(pmdp_get(pmd));

Why not a single get in this function?

>  	if (unlikely(pmd_migration)) {
>  		swp_entry_t entry;
>  
> -		old_pmd = *pmd;
> +		old_pmd = pmdp_get(pmd);
>  		entry = pmd_to_swp_entry(old_pmd);
>  		page = pfn_swap_entry_to_page(entry);
>  		write = is_writable_migration_entry(entry);
> @@ -2620,9 +2622,9 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
>  	 * require a folio to check the PMD against. Otherwise, there
>  	 * is a risk of replacing the wrong folio.
>  	 */
> -	if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
> -	    is_pmd_migration_entry(*pmd)) {
> -		if (folio && folio != pmd_folio(*pmd))
> +	if (pmd_trans_huge(pmdp_get(pmd)) || pmd_devmap(pmdp_get(pmd)) ||
> +	    is_pmd_migration_entry(pmdp_get(pmd))) {
> +		if (folio && folio != pmd_folio(pmdp_get(pmd)))

Why not a single get?

>  			return;
>  		__split_huge_pmd_locked(vma, pmd, address, freeze);
>  	}
> @@ -2719,7 +2721,7 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma,
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	int ref_count, map_count;
> -	pmd_t orig_pmd = *pmdp;
> +	pmd_t orig_pmd = pmdp_get(pmdp);
>  
>  	if (folio_test_dirty(folio) || pmd_dirty(orig_pmd))
>  		return false;
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 0c3f56b3578e..9deb82654d5b 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -70,7 +70,7 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start,
>  	}
>  
>  	spin_lock(&init_mm.page_table_lock);
> -	if (likely(pmd_leaf(*pmd))) {
> +	if (likely(pmd_leaf(pmdp_get(pmd)))) {
>  		/*
>  		 * Higher order allocations from buddy allocator must be able to
>  		 * be treated as indepdenent small pages (as they can be freed
> @@ -104,7 +104,7 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr,
>  		walk->action = ACTION_CONTINUE;
>  
>  	spin_lock(&init_mm.page_table_lock);
> -	head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL;
> +	head = pmd_leaf(pmdp_get(pmd)) ? pmd_page(pmdp_get(pmd)) : NULL;

single get? (there are a bunch more instances of this issue below, I'll stop
highlighting them now). Obviously you need to re-get if you take the PTL in the
between, or if you know its likely to have changed. But there are a number of
cases here, where multiple gets are technically unnecessary.

>  	/*
>  	 * Due to HugeTLB alignment requirements and the vmemmap
>  	 * pages being at the start of the hotplugged memory
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index 89895f38f722..4418bcdcb2aa 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -121,7 +121,7 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>  			continue;
>  		}
>  
> -		if (pmd_none(*pmd)) {
> +		if (pmd_none(pmdp_get(pmd))) {
>  			pte_t *p;
>  
>  			if (slab_is_available())
> @@ -300,7 +300,7 @@ static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
>  			return;
>  	}
>  
> -	pte_free_kernel(&init_mm, (pte_t *)page_to_virt(pmd_page(*pmd)));
> +	pte_free_kernel(&init_mm, (pte_t *)page_to_virt(pmd_page(pmdp_get(pmd))));
>  	pmd_clear(pmd);
>  }
>  
> @@ -311,7 +311,7 @@ static void kasan_free_pmd(pmd_t *pmd_start, pud_t *pud)
>  
>  	for (i = 0; i < PTRS_PER_PMD; i++) {
>  		pmd = pmd_start + i;
> -		if (!pmd_none(*pmd))
> +		if (!pmd_none(pmdp_get(pmd)))
>  			return;
>  	}
>  
> @@ -381,10 +381,10 @@ static void kasan_remove_pmd_table(pmd_t *pmd, unsigned long addr,
>  
>  		next = pmd_addr_end(addr, end);
>  
> -		if (!pmd_present(*pmd))
> +		if (!pmd_present(pmdp_get(pmd)))
>  			continue;
>  
> -		if (kasan_pte_table(*pmd)) {
> +		if (kasan_pte_table(pmdp_get(pmd))) {
>  			if (IS_ALIGNED(addr, PMD_SIZE) &&
>  			    IS_ALIGNED(next, PMD_SIZE)) {
>  				pmd_clear(pmd);
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index d6210ca48dda..aec16a7236f7 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -202,9 +202,9 @@ static bool shadow_mapped(unsigned long addr)
>  	if (pud_leaf(*pud))
>  		return true;
>  	pmd = pmd_offset(pud, addr);
> -	if (pmd_none(*pmd))
> +	if (pmd_none(pmdp_get(pmd)))
>  		return false;
> -	if (pmd_leaf(*pmd))
> +	if (pmd_leaf(pmdp_get(pmd)))
>  		return true;
>  	pte = pte_offset_kernel(pmd, addr);
>  	return !pte_none(ptep_get(pte));
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index cdd1d8655a76..793da996313f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1192,7 +1192,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
>  		if (pte)
>  			pte_unmap(pte);
>  		spin_lock(pmd_ptl);
> -		BUG_ON(!pmd_none(*pmd));
> +		BUG_ON(!pmd_none(pmdp_get(pmd)));
>  		/*
>  		 * We can only use set_pmd_at when establishing
>  		 * hugepmds and never for establishing regular pmds that
> @@ -1229,7 +1229,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
>  	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
>  
>  	spin_lock(pmd_ptl);
> -	BUG_ON(!pmd_none(*pmd));
> +	BUG_ON(!pmd_none(pmdp_get(pmd)));
>  	folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>  	folio_add_lru_vma(folio, vma);
>  	pgtable_trans_huge_deposit(mm, pmd, pgtable);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 89089d84f8df..382c55d2ec94 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -357,7 +357,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  					!can_do_file_pageout(vma);
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (pmd_trans_huge(*pmd)) {
> +	if (pmd_trans_huge(pmdp_get(pmd))) {
>  		pmd_t orig_pmd;
>  		unsigned long next = pmd_addr_end(addr, end);
>  
> @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		if (!ptl)
>  			return 0;
>  
> -		orig_pmd = *pmd;
> +		orig_pmd = pmdp_get(pmd);
>  		if (is_huge_zero_pmd(orig_pmd))
>  			goto huge_unlock;
>  
> @@ -655,7 +655,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  	int nr, max_nr;
>  
>  	next = pmd_addr_end(addr, end);
> -	if (pmd_trans_huge(*pmd))
> +	if (pmd_trans_huge(pmdp_get(pmd)))
>  		if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next))
>  			return 0;
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7066fc84f351..305dbef3cc4d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -422,9 +422,9 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
>  	if (pud_devmap(*pud))
>  		return PUD_SHIFT;
>  	pmd = pmd_offset(pud, address);
> -	if (!pmd_present(*pmd))
> +	if (!pmd_present(pmdp_get(pmd)))
>  		return 0;
> -	if (pmd_devmap(*pmd))
> +	if (pmd_devmap(pmdp_get(pmd)))
>  		return PMD_SHIFT;
>  	pte = pte_offset_map(pmd, address);
>  	if (!pte)
> @@ -775,7 +775,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
>  static int check_hwpoisoned_pmd_entry(pmd_t *pmdp, unsigned long addr,
>  				      struct hwpoison_walk *hwp)
>  {
> -	pmd_t pmd = *pmdp;
> +	pmd_t pmd = pmdp_get(pmdp);
>  	unsigned long pfn;
>  	unsigned long hwpoison_vaddr;
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index ebfc9768f801..5520e1f6a1b9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -189,7 +189,7 @@ void mm_trace_rss_stat(struct mm_struct *mm, int member)
>  static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
>  			   unsigned long addr)
>  {
> -	pgtable_t token = pmd_pgtable(*pmd);
> +	pgtable_t token = pmd_pgtable(pmdp_get(pmd));
>  	pmd_clear(pmd);
>  	pte_free_tlb(tlb, token, addr);
>  	mm_dec_nr_ptes(tlb->mm);
> @@ -421,7 +421,7 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte)
>  {
>  	spinlock_t *ptl = pmd_lock(mm, pmd);
>  
> -	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
> +	if (likely(pmd_none(pmdp_get(pmd)))) {	/* Has another populated it ? */
>  		mm_inc_nr_ptes(mm);
>  		/*
>  		 * Ensure all pte setup (eg. pte page lock and page clearing) are
> @@ -462,7 +462,7 @@ int __pte_alloc_kernel(pmd_t *pmd)
>  		return -ENOMEM;
>  
>  	spin_lock(&init_mm.page_table_lock);
> -	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
> +	if (likely(pmd_none(pmdp_get(pmd)))) {	/* Has another populated it ? */
>  		smp_wmb(); /* See comment in pmd_install() */
>  		pmd_populate_kernel(&init_mm, pmd, new);
>  		new = NULL;
> @@ -1710,7 +1710,8 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>  	pmd = pmd_offset(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
> -		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
> +		if (is_swap_pmd(pmdp_get(pmd)) || pmd_trans_huge(pmdp_get(pmd)) ||
> +		    pmd_devmap(pmdp_get(pmd))) {
>  			if (next - addr != HPAGE_PMD_SIZE)
>  				__split_huge_pmd(vma, pmd, addr, false, NULL);
>  			else if (zap_huge_pmd(tlb, vma, pmd, addr)) {
> @@ -1720,7 +1721,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>  			/* fall through */
>  		} else if (details && details->single_folio &&
>  			   folio_test_pmd_mappable(details->single_folio) &&
> -			   next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) {
> +			   next - addr == HPAGE_PMD_SIZE && pmd_none(pmdp_get(pmd))) {
>  			spinlock_t *ptl = pmd_lock(tlb->mm, pmd);
>  			/*
>  			 * Take and drop THP pmd lock so that we cannot return
> @@ -1729,7 +1730,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>  			 */
>  			spin_unlock(ptl);
>  		}
> -		if (pmd_none(*pmd)) {
> +		if (pmd_none(pmdp_get(pmd))) {
>  			addr = next;
>  			continue;
>  		}
> @@ -1975,7 +1976,7 @@ static pmd_t *walk_to_pmd(struct mm_struct *mm, unsigned long addr)
>  	if (!pmd)
>  		return NULL;
>  
> -	VM_BUG_ON(pmd_trans_huge(*pmd));
> +	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
>  	return pmd;
>  }
>  
> @@ -2577,7 +2578,7 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
>  	pmd = pmd_alloc(mm, pud, addr);
>  	if (!pmd)
>  		return -ENOMEM;
> -	VM_BUG_ON(pmd_trans_huge(*pmd));
> +	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		err = remap_pte_range(mm, pmd, addr, next,
> @@ -2846,11 +2847,11 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
>  	}
>  	do {
>  		next = pmd_addr_end(addr, end);
> -		if (pmd_none(*pmd) && !create)
> +		if (pmd_none(pmdp_get(pmd)) && !create)
>  			continue;
> -		if (WARN_ON_ONCE(pmd_leaf(*pmd)))
> +		if (WARN_ON_ONCE(pmd_leaf(pmdp_get(pmd))))
>  			return -EINVAL;
> -		if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
> +		if (!pmd_none(pmdp_get(pmd)) && WARN_ON_ONCE(pmd_bad(pmdp_get(pmd)))) {
>  			if (!create)
>  				continue;
>  			pmd_clear_bad(pmd);
> @@ -6167,7 +6168,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
>  		goto out;
>  
>  	pmd = pmd_offset(pud, address);
> -	VM_BUG_ON(pmd_trans_huge(*pmd));
> +	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
>  
>  	ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
>  	if (!ptep)
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index b858e22b259d..03f2df44b07f 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -505,11 +505,11 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk *walk)
>  	struct folio *folio;
>  	struct queue_pages *qp = walk->private;
>  
> -	if (unlikely(is_pmd_migration_entry(*pmd))) {
> +	if (unlikely(is_pmd_migration_entry(pmdp_get(pmd)))) {
>  		qp->nr_failed++;
>  		return;
>  	}
> -	folio = pmd_folio(*pmd);
> +	folio = pmd_folio(pmdp_get(pmd));
>  	if (is_huge_zero_folio(folio)) {
>  		walk->action = ACTION_CONTINUE;
>  		return;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 923ea80ba744..a1dd5c8f88dd 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -369,9 +369,9 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>  	spinlock_t *ptl;
>  
>  	ptl = pmd_lock(mm, pmd);
> -	if (!is_pmd_migration_entry(*pmd))
> +	if (!is_pmd_migration_entry(pmdp_get(pmd)))
>  		goto unlock;
> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl);
> +	migration_entry_wait_on_locked(pmd_to_swp_entry(pmdp_get(pmd)), ptl);
>  	return;
>  unlock:
>  	spin_unlock(ptl);
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 6d66dc1c6ffa..3a08cef6cd39 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -67,19 +67,19 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>  	pte_t *ptep;
>  
>  again:
> -	if (pmd_none(*pmdp))
> +	if (pmd_none(pmdp_get(pmdp)))
>  		return migrate_vma_collect_hole(start, end, -1, walk);
>  
> -	if (pmd_trans_huge(*pmdp)) {
> +	if (pmd_trans_huge(pmdp_get(pmdp))) {
>  		struct folio *folio;
>  
>  		ptl = pmd_lock(mm, pmdp);
> -		if (unlikely(!pmd_trans_huge(*pmdp))) {
> +		if (unlikely(!pmd_trans_huge(pmdp_get(pmdp)))) {
>  			spin_unlock(ptl);
>  			goto again;
>  		}
>  
> -		folio = pmd_folio(*pmdp);
> +		folio = pmd_folio(pmdp_get(pmdp));
>  		if (is_huge_zero_folio(folio)) {
>  			spin_unlock(ptl);
>  			split_huge_pmd(vma, pmdp, addr);
> @@ -596,7 +596,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
>  	pmdp = pmd_alloc(mm, pudp, addr);
>  	if (!pmdp)
>  		goto abort;
> -	if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp))
> +	if (pmd_trans_huge(pmdp_get(pmdp)) || pmd_devmap(pmdp_get(pmdp)))
>  		goto abort;
>  	if (pte_alloc(mm, pmdp))
>  		goto abort;
> diff --git a/mm/mlock.c b/mm/mlock.c
> index e3e3dc2b2956..c3c479e9d0f8 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -363,11 +363,11 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
>  
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> -		if (!pmd_present(*pmd))
> +		if (!pmd_present(pmdp_get(pmd)))
>  			goto out;
> -		if (is_huge_zero_pmd(*pmd))
> +		if (is_huge_zero_pmd(pmdp_get(pmd)))
>  			goto out;
> -		folio = pmd_folio(*pmd);
> +		folio = pmd_folio(pmdp_get(pmd));
>  		if (vma->vm_flags & VM_LOCKED)
>  			mlock_folio(folio);
>  		else
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 222ab434da54..121fb448b0db 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -381,7 +381,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
>  			break;
>  		}
>  
> -		if (pmd_none(*pmd))
> +		if (pmd_none(pmdp_get(pmd)))
>  			goto next;
>  
>  		/* invoke the mmu notifier if the pmd is populated */
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e7ae140fc640..d42ac62bd34e 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -63,7 +63,7 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
>  		return NULL;
>  
>  	pmd = pmd_offset(pud, addr);
> -	if (pmd_none(*pmd))
> +	if (pmd_none(pmdp_get(pmd)))
>  		return NULL;
>  
>  	return pmd;
> @@ -97,7 +97,7 @@ static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (!pmd)
>  		return NULL;
>  
> -	VM_BUG_ON(pmd_trans_huge(*pmd));
> +	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmd)));
>  
>  	return pmd;
>  }
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 509c6ef8de40..48a2cf56c80e 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -241,7 +241,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd)
>  
>  	page_table_check_pmd_flags(pmd);
>  
> -	__page_table_check_pmd_clear(mm, *pmdp);
> +	__page_table_check_pmd_clear(mm, pmdp_get(pmdp));
>  	if (pmd_user_accessible_page(pmd)) {
>  		page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,
>  				     pmd_write(pmd));
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index ae2f08ce991b..c3019a160e77 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -86,7 +86,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>  	do {
>  again:
>  		next = pmd_addr_end(addr, end);
> -		if (pmd_none(*pmd)) {
> +		if (pmd_none(pmdp_get(pmd))) {
>  			if (ops->pte_hole)
>  				err = ops->pte_hole(addr, next, depth, walk);
>  			if (err)
> @@ -112,7 +112,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>  		 * Check this here so we only break down trans_huge
>  		 * pages when we _need_ to
>  		 */
> -		if ((!walk->vma && (pmd_leaf(*pmd) || !pmd_present(*pmd))) ||
> +		if ((!walk->vma && (pmd_leaf(pmdp_get(pmd)) || !pmd_present(pmdp_get(pmd)))) ||
>  		    walk->action == ACTION_CONTINUE ||
>  		    !(ops->pte_entry))
>  			continue;
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 20d91af8c033..7ee77c0fd5e3 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -3208,7 +3208,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
>  	}
>  
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd)) {
> +	if (!pmd_present(pmdp_get(pmd))) {
>  		pte_t *new;
>  
>  		new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE);
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index a78a4adf711a..920947bb76cd 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -51,7 +51,7 @@ void pud_clear_bad(pud_t *pud)
>   */
>  void pmd_clear_bad(pmd_t *pmd)
>  {
> -	pmd_ERROR(*pmd);
> +	pmd_ERROR(pmdp_get(pmd));
>  	pmd_clear(pmd);
>  }
>  
> @@ -110,7 +110,7 @@ int pmdp_set_access_flags(struct vm_area_struct *vma,
>  			  unsigned long address, pmd_t *pmdp,
>  			  pmd_t entry, int dirty)
>  {
> -	int changed = !pmd_same(*pmdp, entry);
> +	int changed = !pmd_same(pmdp_get(pmdp), entry);
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  	if (changed) {
>  		set_pmd_at(vma->vm_mm, address, pmdp, entry);
> @@ -137,10 +137,10 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
>  pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp)
>  {
> -	pmd_t pmd;
> +	pmd_t pmd, old_pmd = pmdp_get(pmdp);
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> -	VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
> -			   !pmd_devmap(*pmdp));
> +	VM_BUG_ON(pmd_present(old_pmd) && !pmd_trans_huge(old_pmd) &&
> +			   !pmd_devmap(old_pmd));
>  	pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
>  	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>  	return pmd;
> @@ -198,8 +198,10 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
>  pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	VM_WARN_ON_ONCE(!pmd_present(*pmdp));
> -	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
> +	pmd_t old_pmd = pmdp_get(pmdp);
> +
> +	VM_WARN_ON_ONCE(!pmd_present(old_pmd));
> +	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(old_pmd));
>  	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>  	return old;
>  }
> @@ -209,7 +211,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
>  			 pmd_t *pmdp)
>  {
> -	VM_WARN_ON_ONCE(!pmd_present(*pmdp));
> +	VM_WARN_ON_ONCE(!pmd_present(pmdp_get(pmdp)));
>  	return pmdp_invalidate(vma, address, pmdp);
>  }
>  #endif
> @@ -225,7 +227,7 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
>  	pmd_t pmd;
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> -	VM_BUG_ON(pmd_trans_huge(*pmdp));
> +	VM_BUG_ON(pmd_trans_huge(pmdp_get(pmdp)));
>  	pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
>  
>  	/* collapse entails shooting down ptes not pmd */
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index 106e1d66e9f9..e17588a32012 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -99,7 +99,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>  			    unsigned long next, struct mm_walk *walk)
>  {
>  	struct ptdump_state *st = walk->private;
> -	pmd_t val = READ_ONCE(*pmd);
> +	pmd_t val = pmdp_get(pmd);
>  
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>  	if (pmd_page(val) == virt_to_page(lm_alias(kasan_early_shadow_pte)))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2490e727e2dc..32e4920e419d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1034,9 +1034,9 @@ static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw)
>  		} else {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  			pmd_t *pmd = pvmw->pmd;
> -			pmd_t entry;
> +			pmd_t entry, old_pmd = pmdp_get(pmd);
>  
> -			if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
> +			if (!pmd_dirty(old_pmd) && !pmd_write(old_pmd))
>  				continue;
>  
>  			flush_cache_range(vma, address,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index edcc7a6b0f6f..c89706e107ce 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -187,7 +187,7 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>  pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>  {
>  	pmd_t *pmd = pmd_offset(pud, addr);
> -	if (pmd_none(*pmd)) {
> +	if (pmd_none(pmdp_get(pmd))) {
>  		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>  		if (!p)
>  			return NULL;
> @@ -332,7 +332,7 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
>  			return -ENOMEM;
>  
>  		pmd = pmd_offset(pud, addr);
> -		if (pmd_none(READ_ONCE(*pmd))) {
> +		if (pmd_none(pmdp_get(pmd))) {
>  			void *p;
>  
>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a0df1e2e155a..1da56cbe5feb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -150,7 +150,7 @@ static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
>  		return 0;
>  
> -	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
> +	if (pmd_present(pmdp_get(pmd)) && !pmd_free_pte_page(pmd, addr))
>  		return 0;
>  
>  	return pmd_set_huge(pmd, phys_addr, prot);
> @@ -371,7 +371,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>  		next = pmd_addr_end(addr, end);
>  
>  		cleared = pmd_clear_huge(pmd);
> -		if (cleared || pmd_bad(*pmd))
> +		if (cleared || pmd_bad(pmdp_get(pmd)))
>  			*mask |= PGTBL_PMD_MODIFIED;
>  
>  		if (cleared)
> @@ -743,7 +743,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  	pgd_t *pgd = pgd_offset_k(addr);
>  	p4d_t *p4d;
>  	pud_t *pud;
> -	pmd_t *pmd;
> +	pmd_t *pmd, old_pmd;
>  	pte_t *ptep, pte;
>  
>  	/*
> @@ -776,11 +776,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  		return NULL;
>  
>  	pmd = pmd_offset(pud, addr);
> -	if (pmd_none(*pmd))
> +	old_pmd = pmdp_get(pmd);
> +	if (pmd_none(old_pmd))
>  		return NULL;
> -	if (pmd_leaf(*pmd))
> -		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> -	if (WARN_ON_ONCE(pmd_bad(*pmd)))
> +	if (pmd_leaf(old_pmd))
> +		return pmd_page(old_pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	if (WARN_ON_ONCE(pmd_bad(old_pmd)))
>  		return NULL;
>  
>  	ptep = pte_offset_kernel(pmd, addr);


I tried applying the compiler trick that I used when doing the conversion for
the ptes. This series doesn't include the arm64 arch changes, so I can't get it
to compile cleanly, but even with that noise, I can see a couple of spots that
are missed:

mm.h:
#define pte_alloc(mm, pmd) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd))

huge_mm.h:
#define split_huge_pmd(__vma, __pmd, __address)				\
	do {								\
		pmd_handle_t ____pmd = (__pmd);				\
		if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd)	\
					|| pmd_devmap(*____pmd))	\
			__split_huge_pmd(__vma, __pmd, __address,	\
						false, NULL);		\
	}  while (0)

filemap.c::filemap_map_pmd():
if (pmd_trans_huge(*vmf->pmd)) {

I think there will likely be a bunch more. I suggest adding in the arch changes,
then trying this trick youself. Steps:

 - add "typedef void* pmd_handle_t;" in pgtable-types.h
 - s/pmd_t */pmd_handle_t /g (note the space at the end of the replacement)
 - make allyesconfig
 - make -s -j`nproc` all

This will flag up all attempts to dereference the type. Some (in arch code) will
be valid so you will need to manually cast back to (pmd_t *) to silence them.
With the ptes, there was some code like this:

foo()
{
	pte_t *ptep, pte;
	...
}

When you do the find/replace, you end up with:

foo()
{
	pte_handle_t ptep, pte;
	...
}

Which is wrong and needs to be fixed up to:

foo()
{
	pte_handle_t ptep;
	pte_t pte;
	...
}

So there will likely be some noise like that to fixup. But I was eventually able
to get it to compile for the ptes, which proved I had caught and converted
everything correctly.

Thanks,
Ryan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
  2024-09-17 10:05   ` Ryan Roberts
@ 2024-09-18 18:57   ` kernel test robot
  2024-09-19  7:21     ` Anshuman Khandual
  2024-09-18 19:07   ` kernel test robot
  2 siblings, 1 reply; 37+ messages in thread
From: kernel test robot @ 2024-09-18 18:57 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: llvm, oe-kbuild-all, Anshuman Khandual, Andrew Morton,
	Linux Memory Management List, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Muchun Song, Andrey Ryabinin, Miaohe Lin,
	Naoya Horiguchi, Pasha Tatashin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig

Hi Anshuman,

kernel test robot noticed the following build errors:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
[cannot apply to akpm-mm/mm-everything next-20240918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
base:   char-misc/char-misc-testing
patch link:    https://lore.kernel.org/r/20240917073117.1531207-5-anshuman.khandual%40arm.com
patch subject: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20240919/202409190205.YJ5gtx3T-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190205.YJ5gtx3T-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409190205.YJ5gtx3T-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from mm/pgtable-generic.c:10:
   In file included from include/linux/pagemap.h:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     548 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from mm/pgtable-generic.c:10:
   In file included from include/linux/pagemap.h:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from mm/pgtable-generic.c:10:
   In file included from include/linux/pagemap.h:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     585 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     693 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     701 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     709 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     718 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     727 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     736 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> mm/pgtable-generic.c:54:2: error: cannot take the address of an rvalue of type 'pgd_t'
      54 |         pmd_ERROR(pmdp_get(pmd));
         |         ^~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/pgtable-nopmd.h:36:28: note: expanded from macro 'pmd_ERROR'
      36 | #define pmd_ERROR(pmd)                          (pud_ERROR((pmd).pud))
         |                                                  ^~~~~~~~~~~~~~~~~~~~
   include/asm-generic/pgtable-nopud.h:32:28: note: expanded from macro 'pud_ERROR'
      32 | #define pud_ERROR(pud)                          (p4d_ERROR((pud).p4d))
         |                                                  ^~~~~~~~~~~~~~~~~~~~
   include/asm-generic/pgtable-nop4d.h:25:28: note: expanded from macro 'p4d_ERROR'
      25 | #define p4d_ERROR(p4d)                          (pgd_ERROR((p4d).pgd))
         |                                                  ^~~~~~~~~~~~~~~~~~~~
   arch/um/include/asm/pgtable-2level.h:31:67: note: expanded from macro 'pgd_ERROR'
      31 |         printk("%s:%d: bad pgd %p(%08lx).\n", __FILE__, __LINE__, &(e), \
         |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
      32 |                pgd_val(e))
         |                ~~~~~~~~~~~
   include/linux/printk.h:465:60: note: expanded from macro 'printk'
     465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
         |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
   include/linux/printk.h:437:19: note: expanded from macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   12 warnings and 1 error generated.


vim +/pgd_t +54 mm/pgtable-generic.c

    46	
    47	/*
    48	 * Note that the pmd variant below can't be stub'ed out just as for p4d/pud
    49	 * above. pmd folding is special and typically pmd_* macros refer to upper
    50	 * level even when folded
    51	 */
    52	void pmd_clear_bad(pmd_t *pmd)
    53	{
  > 54		pmd_ERROR(pmdp_get(pmd));
    55		pmd_clear(pmd);
    56	}
    57	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-18 18:57   ` kernel test robot
@ 2024-09-19  7:21     ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-19  7:21 UTC (permalink / raw)
  To: kernel test robot, linux-mm
  Cc: llvm, oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Muchun Song, Andrey Ryabinin, Miaohe Lin,
	Naoya Horiguchi, Pasha Tatashin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig



On 9/19/24 00:27, kernel test robot wrote:
> Hi Anshuman,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on char-misc/char-misc-testing]
> [also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
> [cannot apply to akpm-mm/mm-everything next-20240918]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
> base:   char-misc/char-misc-testing
> patch link:    https://lore.kernel.org/r/20240917073117.1531207-5-anshuman.khandual%40arm.com
> patch subject: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
> config: um-allnoconfig (https://download.01.org/0day-ci/archive/20240919/202409190205.YJ5gtx3T-lkp@intel.com/config)
> compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190205.YJ5gtx3T-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202409190205.YJ5gtx3T-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from mm/pgtable-generic.c:10:
>    In file included from include/linux/pagemap.h:11:
>    In file included from include/linux/highmem.h:12:
>    In file included from include/linux/hardirq.h:11:
>    In file included from arch/um/include/asm/hardirq.h:5:
>    In file included from include/asm-generic/hardirq.h:17:
>    In file included from include/linux/irq.h:20:
>    In file included from include/linux/io.h:14:
>    In file included from arch/um/include/asm/io.h:24:
>    include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      548 |         val = __raw_readb(PCI_IOBASE + addr);
>          |                           ~~~~~~~~~~ ^
>    include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
>          |                                                         ~~~~~~~~~~ ^
>    include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
>       37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
>          |                                                   ^
>    In file included from mm/pgtable-generic.c:10:
>    In file included from include/linux/pagemap.h:11:
>    In file included from include/linux/highmem.h:12:
>    In file included from include/linux/hardirq.h:11:
>    In file included from arch/um/include/asm/hardirq.h:5:
>    In file included from include/asm-generic/hardirq.h:17:
>    In file included from include/linux/irq.h:20:
>    In file included from include/linux/io.h:14:
>    In file included from arch/um/include/asm/io.h:24:
>    include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
>          |                                                         ~~~~~~~~~~ ^
>    include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
>       35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
>          |                                                   ^
>    In file included from mm/pgtable-generic.c:10:
>    In file included from include/linux/pagemap.h:11:
>    In file included from include/linux/highmem.h:12:
>    In file included from include/linux/hardirq.h:11:
>    In file included from arch/um/include/asm/hardirq.h:5:
>    In file included from include/asm-generic/hardirq.h:17:
>    In file included from include/linux/irq.h:20:
>    In file included from include/linux/io.h:14:
>    In file included from arch/um/include/asm/io.h:24:
>    include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      585 |         __raw_writeb(value, PCI_IOBASE + addr);
>          |                             ~~~~~~~~~~ ^
>    include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
>          |                                                       ~~~~~~~~~~ ^
>    include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
>          |                                                       ~~~~~~~~~~ ^
>    include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      693 |         readsb(PCI_IOBASE + addr, buffer, count);
>          |                ~~~~~~~~~~ ^
>    include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      701 |         readsw(PCI_IOBASE + addr, buffer, count);
>          |                ~~~~~~~~~~ ^
>    include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      709 |         readsl(PCI_IOBASE + addr, buffer, count);
>          |                ~~~~~~~~~~ ^
>    include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      718 |         writesb(PCI_IOBASE + addr, buffer, count);
>          |                 ~~~~~~~~~~ ^
>    include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      727 |         writesw(PCI_IOBASE + addr, buffer, count);
>          |                 ~~~~~~~~~~ ^
>    include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
>      736 |         writesl(PCI_IOBASE + addr, buffer, count);

Not sure if the above warnings are actually caused by this patch.

>          |                 ~~~~~~~~~~ ^
>>> mm/pgtable-generic.c:54:2: error: cannot take the address of an rvalue of type 'pgd_t'
>       54 |         pmd_ERROR(pmdp_get(pmd));
>          |         ^~~~~~~~~~~~~~~~~~~~~~~~
>    include/asm-generic/pgtable-nopmd.h:36:28: note: expanded from macro 'pmd_ERROR'
>       36 | #define pmd_ERROR(pmd)                          (pud_ERROR((pmd).pud))
>          |                                                  ^~~~~~~~~~~~~~~~~~~~
>    include/asm-generic/pgtable-nopud.h:32:28: note: expanded from macro 'pud_ERROR'
>       32 | #define pud_ERROR(pud)                          (p4d_ERROR((pud).p4d))
>          |                                                  ^~~~~~~~~~~~~~~~~~~~
>    include/asm-generic/pgtable-nop4d.h:25:28: note: expanded from macro 'p4d_ERROR'
>       25 | #define p4d_ERROR(p4d)                          (pgd_ERROR((p4d).pgd))
>          |                                                  ^~~~~~~~~~~~~~~~~~~~
>    arch/um/include/asm/pgtable-2level.h:31:67: note: expanded from macro 'pgd_ERROR'
>       31 |         printk("%s:%d: bad pgd %p(%08lx).\n", __FILE__, __LINE__, &(e), \
>          |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
>       32 |                pgd_val(e))
>          |                ~~~~~~~~~~~
>    include/linux/printk.h:465:60: note: expanded from macro 'printk'
>      465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
>          |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
>    include/linux/printk.h:437:19: note: expanded from macro 'printk_index_wrap'
>      437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
>          |                                 ^~~~~~~~~~~
>    12 warnings and 1 error generated.
> 
> 
> vim +/pgd_t +54 mm/pgtable-generic.c
> 
>     46	
>     47	/*
>     48	 * Note that the pmd variant below can't be stub'ed out just as for p4d/pud
>     49	 * above. pmd folding is special and typically pmd_* macros refer to upper
>     50	 * level even when folded
>     51	 */
>     52	void pmd_clear_bad(pmd_t *pmd)
>     53	{
>   > 54		pmd_ERROR(pmdp_get(pmd));
>     55		pmd_clear(pmd);
>     56	}
>     57	
> 

But the above build error can be fixed with the following change.

diff --git a/arch/um/include/asm/pgtable-3level.h b/arch/um/include/asm/pgtable-3level.h
index 8a5032ec231f..f442c1e3156a 100644
--- a/arch/um/include/asm/pgtable-3level.h
+++ b/arch/um/include/asm/pgtable-3level.h
@@ -43,13 +43,13 @@
 #define USER_PTRS_PER_PGD ((TASK_SIZE + (PGDIR_SIZE - 1)) / PGDIR_SIZE)
 
 #define pte_ERROR(e) \
-        printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), \
+        printk("%s:%d: bad pte (%016lx).\n", __FILE__, __LINE__, \
               pte_val(e))
 #define pmd_ERROR(e) \
-        printk("%s:%d: bad pmd %p(%016lx).\n", __FILE__, __LINE__, &(e), \
+        printk("%s:%d: bad pmd (%016lx).\n", __FILE__, __LINE__, \
               pmd_val(e))
 #define pgd_ERROR(e) \
-        printk("%s:%d: bad pgd %p(%016lx).\n", __FILE__, __LINE__, &(e), \
+        printk("%s:%d: bad pgd (%016lx).\n", __FILE__, __LINE__, \
               pgd_val(e))
 
 #define pud_none(x)    (!(pud_val(x) & ~_PAGE_NEWPAGE))

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
  2024-09-17 10:05   ` Ryan Roberts
  2024-09-18 18:57   ` kernel test robot
@ 2024-09-18 19:07   ` kernel test robot
  2024-09-19  7:12     ` Anshuman Khandual
  2 siblings, 1 reply; 37+ messages in thread
From: kernel test robot @ 2024-09-18 19:07 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: oe-kbuild-all, Anshuman Khandual, Andrew Morton,
	Linux Memory Management List, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Muchun Song, Andrey Ryabinin, Miaohe Lin,
	Naoya Horiguchi, Pasha Tatashin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig

Hi Anshuman,

kernel test robot noticed the following build errors:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
[cannot apply to akpm-mm/mm-everything next-20240918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
base:   char-misc/char-misc-testing
patch link:    https://lore.kernel.org/r/20240917073117.1531207-5-anshuman.khandual%40arm.com
patch subject: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20240919/202409190244.JcrD4CwD-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190244.JcrD4CwD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409190244.JcrD4CwD-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:22,
                    from arch/openrisc/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from include/linux/pagemap.h:8,
                    from mm/pgtable-generic.c:10:
   mm/pgtable-generic.c: In function 'pmd_clear_bad':
>> arch/openrisc/include/asm/pgtable.h:369:36: error: lvalue required as unary '&' operand
     369 |                __FILE__, __LINE__, &(e), pgd_val(e))
         |                                    ^
   include/linux/printk.h:437:33: note: in definition of macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   arch/openrisc/include/asm/pgtable.h:368:9: note: in expansion of macro 'printk'
     368 |         printk(KERN_ERR "%s:%d: bad pgd %p(%08lx).\n", \
         |         ^~~~~~
   include/asm-generic/pgtable-nop4d.h:25:50: note: in expansion of macro 'pgd_ERROR'
      25 | #define p4d_ERROR(p4d)                          (pgd_ERROR((p4d).pgd))
         |                                                  ^~~~~~~~~
   include/asm-generic/pgtable-nopud.h:32:50: note: in expansion of macro 'p4d_ERROR'
      32 | #define pud_ERROR(pud)                          (p4d_ERROR((pud).p4d))
         |                                                  ^~~~~~~~~
   include/asm-generic/pgtable-nopmd.h:36:50: note: in expansion of macro 'pud_ERROR'
      36 | #define pmd_ERROR(pmd)                          (pud_ERROR((pmd).pud))
         |                                                  ^~~~~~~~~
   mm/pgtable-generic.c:54:9: note: in expansion of macro 'pmd_ERROR'
      54 |         pmd_ERROR(pmdp_get(pmd));
         |         ^~~~~~~~~


vim +369 arch/openrisc/include/asm/pgtable.h

61e85e367535a7 Jonas Bonn 2011-06-04  363  
61e85e367535a7 Jonas Bonn 2011-06-04  364  #define pte_ERROR(e) \
61e85e367535a7 Jonas Bonn 2011-06-04  365  	printk(KERN_ERR "%s:%d: bad pte %p(%08lx).\n", \
61e85e367535a7 Jonas Bonn 2011-06-04  366  	       __FILE__, __LINE__, &(e), pte_val(e))
61e85e367535a7 Jonas Bonn 2011-06-04  367  #define pgd_ERROR(e) \
61e85e367535a7 Jonas Bonn 2011-06-04  368  	printk(KERN_ERR "%s:%d: bad pgd %p(%08lx).\n", \
61e85e367535a7 Jonas Bonn 2011-06-04 @369  	       __FILE__, __LINE__, &(e), pgd_val(e))
61e85e367535a7 Jonas Bonn 2011-06-04  370  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
  2024-09-18 19:07   ` kernel test robot
@ 2024-09-19  7:12     ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-19  7:12 UTC (permalink / raw)
  To: kernel test robot, linux-mm
  Cc: oe-kbuild-all, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Muchun Song, Andrey Ryabinin, Miaohe Lin,
	Naoya Horiguchi, Pasha Tatashin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig



On 9/19/24 00:37, kernel test robot wrote:
> Hi Anshuman,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on char-misc/char-misc-testing]
> [also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
> [cannot apply to akpm-mm/mm-everything next-20240918]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
> base:   char-misc/char-misc-testing
> patch link:    https://lore.kernel.org/r/20240917073117.1531207-5-anshuman.khandual%40arm.com
> patch subject: [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries
> config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20240919/202409190244.JcrD4CwD-lkp@intel.com/config)
> compiler: or1k-linux-gcc (GCC) 14.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190244.JcrD4CwD-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202409190244.JcrD4CwD-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/asm-generic/bug.h:22,
>                     from arch/openrisc/include/asm/bug.h:5,
>                     from include/linux/bug.h:5,
>                     from include/linux/mmdebug.h:5,
>                     from include/linux/mm.h:6,
>                     from include/linux/pagemap.h:8,
>                     from mm/pgtable-generic.c:10:
>    mm/pgtable-generic.c: In function 'pmd_clear_bad':
>>> arch/openrisc/include/asm/pgtable.h:369:36: error: lvalue required as unary '&' operand
>      369 |                __FILE__, __LINE__, &(e), pgd_val(e))
>          |                                    ^
>    include/linux/printk.h:437:33: note: in definition of macro 'printk_index_wrap'
>      437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
>          |                                 ^~~~~~~~~~~
>    arch/openrisc/include/asm/pgtable.h:368:9: note: in expansion of macro 'printk'
>      368 |         printk(KERN_ERR "%s:%d: bad pgd %p(%08lx).\n", \
>          |         ^~~~~~
>    include/asm-generic/pgtable-nop4d.h:25:50: note: in expansion of macro 'pgd_ERROR'
>       25 | #define p4d_ERROR(p4d)                          (pgd_ERROR((p4d).pgd))
>          |                                                  ^~~~~~~~~
>    include/asm-generic/pgtable-nopud.h:32:50: note: in expansion of macro 'p4d_ERROR'
>       32 | #define pud_ERROR(pud)                          (p4d_ERROR((pud).p4d))
>          |                                                  ^~~~~~~~~
>    include/asm-generic/pgtable-nopmd.h:36:50: note: in expansion of macro 'pud_ERROR'
>       36 | #define pmd_ERROR(pmd)                          (pud_ERROR((pmd).pud))
>          |                                                  ^~~~~~~~~
>    mm/pgtable-generic.c:54:9: note: in expansion of macro 'pmd_ERROR'
>       54 |         pmd_ERROR(pmdp_get(pmd));
>          |         ^~~~~~~~~
> 
> 
> vim +369 arch/openrisc/include/asm/pgtable.h
> 
> 61e85e367535a7 Jonas Bonn 2011-06-04  363  
> 61e85e367535a7 Jonas Bonn 2011-06-04  364  #define pte_ERROR(e) \
> 61e85e367535a7 Jonas Bonn 2011-06-04  365  	printk(KERN_ERR "%s:%d: bad pte %p(%08lx).\n", \
> 61e85e367535a7 Jonas Bonn 2011-06-04  366  	       __FILE__, __LINE__, &(e), pte_val(e))
> 61e85e367535a7 Jonas Bonn 2011-06-04  367  #define pgd_ERROR(e) \
> 61e85e367535a7 Jonas Bonn 2011-06-04  368  	printk(KERN_ERR "%s:%d: bad pgd %p(%08lx).\n", \
> 61e85e367535a7 Jonas Bonn 2011-06-04 @369  	       __FILE__, __LINE__, &(e), pgd_val(e))
> 61e85e367535a7 Jonas Bonn 2011-06-04  370  
> 

This build failure can be fixed with dropping address output from
pxd_ERROR() helpers as is being done for the x86 platform. Similar
fix is also required for the UM architecture as well.

diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 60c6ce7ff2dc..831efb71ab54 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -362,11 +362,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 #define pfn_pte(pfn, prot)  __pte((((pfn) << PAGE_SHIFT)) | pgprot_val(prot))
 
 #define pte_ERROR(e) \
-       printk(KERN_ERR "%s:%d: bad pte %p(%08lx).\n", \
-              __FILE__, __LINE__, &(e), pte_val(e))
+       printk(KERN_ERR "%s:%d: bad pte (%08lx).\n", \
+              __FILE__, __LINE__, pte_val(e))
 #define pgd_ERROR(e) \
-       printk(KERN_ERR "%s:%d: bad pgd %p(%08lx).\n", \
-              __FILE__, __LINE__, &(e), pgd_val(e))
+       printk(KERN_ERR "%s:%d: bad pgd (%08lx).\n", \
+              __FILE__, __LINE__, pgd_val(e))
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; /* defined in head.S */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 5/7] mm: Use pudp_get() for accessing PUD entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
                   ` (3 preceding siblings ...)
  2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 6/7] mm: Use p4dp_get() for accessing P4D entries Anshuman Khandual
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jérôme Glisse, Muchun Song,
	Andrey Ryabinin, Miaohe Lin, Naoya Horiguchi, Pasha Tatashin

Convert PUD accesses via pudp_get() helper that defaults as READ_ONCE() but
also provides the platform an opportunity to override when required. This
stores read page table entry value in a local variable which can be used in
multiple instances there after. This helps in avoiding multiple memory load
operations as well possible race conditions.

Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-perf-users@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/misc/sgi-gru/grufault.c |  2 +-
 fs/userfaultfd.c                |  2 +-
 include/linux/huge_mm.h         |  2 +-
 include/linux/mm.h              |  2 +-
 include/linux/pgtable.h         | 13 ++++++++-----
 kernel/events/core.c            |  2 +-
 mm/gup.c                        | 12 ++++++------
 mm/hmm.c                        |  2 +-
 mm/huge_memory.c                | 24 +++++++++++++++---------
 mm/hugetlb.c                    |  6 +++---
 mm/kasan/init.c                 | 10 +++++-----
 mm/kasan/shadow.c               |  4 ++--
 mm/mapping_dirty_helpers.c      |  2 +-
 mm/memory-failure.c             |  4 ++--
 mm/memory.c                     | 14 +++++++-------
 mm/page_table_check.c           |  2 +-
 mm/page_vma_mapped.c            |  2 +-
 mm/pagewalk.c                   |  6 +++---
 mm/percpu.c                     |  2 +-
 mm/pgalloc-track.h              |  2 +-
 mm/pgtable-generic.c            |  6 +++---
 mm/ptdump.c                     |  4 ++--
 mm/rmap.c                       |  2 +-
 mm/sparse-vmemmap.c             |  2 +-
 mm/vmalloc.c                    | 15 ++++++++-------
 mm/vmscan.c                     |  4 ++--
 26 files changed, 79 insertions(+), 69 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 804f275ece99..95d479d5e40f 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -220,7 +220,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
 		goto err;
 
 	pudp = pud_offset(p4dp, vaddr);
-	if (unlikely(pud_none(*pudp)))
+	if (unlikely(pud_none(pudp_get(pudp))))
 		goto err;
 
 	pmdp = pmd_offset(pudp, vaddr);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 27a3e9285fbf..00719a0f688c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -310,7 +310,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	if (!p4d_present(*p4d))
 		goto out;
 	pud = pud_offset(p4d, address);
-	if (!pud_present(*pud))
+	if (!pud_present(pudp_get(pud)))
 		goto out;
 	pmd = pmd_offset(pud, address);
 again:
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 38b5de040d02..66a19622d95b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -379,7 +379,7 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
 		struct vm_area_struct *vma)
 {
-	if (pud_trans_huge(*pud) || pud_devmap(*pud))
+	if (pud_trans_huge(pudp_get(pud)) || pud_devmap(pudp_get(pud)))
 		return __pud_trans_huge_lock(pud, vma);
 	else
 		return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 258e49323306..1bb1599b5779 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2832,7 +2832,7 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, p4d_t *p4d,
 
 static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
 {
-	return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
+	return (unlikely(pud_none(pudp_get(pud))) && __pmd_alloc(mm, pud, address)) ?
 		NULL: pmd_offset(pud, address);
 }
 #endif /* CONFIG_MMU */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index ea283ce958a7..eb993ef0946f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -611,7 +611,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
 					    unsigned long address,
 					    pud_t *pudp)
 {
-	pud_t pud = *pudp;
+	pud_t pud = pudp_get(pudp);
 
 	pud_clear(pudp);
 	page_table_check_pud_clear(mm, pud);
@@ -893,7 +893,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline void pudp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long address, pud_t *pudp)
 {
-	pud_t old_pud = *pudp;
+	pud_t old_pud = pudp_get(pudp);
 
 	set_pud_at(mm, address, pudp, pud_wrprotect(old_pud));
 }
@@ -1074,7 +1074,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 
 #define set_pud_safe(pudp, pud) \
 ({ \
-	WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \
+	pud_t __old = pudp_get(pudp); \
+	WARN_ON_ONCE(pud_present(__old) && !pud_same(__old, pud)); \
 	set_pud(pudp, pud); \
 })
 
@@ -1261,9 +1262,11 @@ static inline int p4d_none_or_clear_bad(p4d_t *p4d)
 
 static inline int pud_none_or_clear_bad(pud_t *pud)
 {
-	if (pud_none(*pud))
+	pud_t old_pud = pudp_get(pud);
+
+	if (pud_none(old_pud))
 		return 1;
-	if (unlikely(pud_bad(*pud))) {
+	if (unlikely(pud_bad(old_pud))) {
 		pud_clear_bad(pud);
 		return 1;
 	}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8a6c6bbcd658..35e2f2789246 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7619,7 +7619,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr)
 		return p4d_leaf_size(p4d);
 
 	pudp = pud_offset_lockless(p4dp, p4d, addr);
-	pud = READ_ONCE(*pudp);
+	pud = pudp_get(pudp);
 	if (!pud_present(pud))
 		return 0;
 
diff --git a/mm/gup.c b/mm/gup.c
index aeeac0a54944..300fc7eb306c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -606,7 +606,7 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
-	pud_t pud = *pudp;
+	pud_t pud = pudp_get(pudp);
 	unsigned long pfn = pud_pfn(pud);
 	int ret;
 
@@ -989,7 +989,7 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	pudp = pud_offset(p4dp, address);
-	pud = READ_ONCE(*pudp);
+	pud = pudp_get(pudp);
 	if (!pud_present(pud))
 		return no_page_table(vma, flags, address);
 	if (pud_leaf(pud)) {
@@ -1117,7 +1117,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	if (p4d_none(*p4d))
 		return -EFAULT;
 	pud = pud_offset(p4d, address);
-	if (pud_none(*pud))
+	if (pud_none(pudp_get(pud)))
 		return -EFAULT;
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(pmdp_get(pmd)))
@@ -3025,7 +3025,7 @@ static int gup_fast_devmap_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
 	if (!gup_fast_devmap_leaf(fault_pfn, addr, end, flags, pages, nr))
 		return 0;
 
-	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+	if (unlikely(pud_val(orig) != pud_val(pudp_get(pudp)))) {
 		gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
 		return 0;
 	}
@@ -3118,7 +3118,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
 	if (!folio)
 		return 0;
 
-	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+	if (unlikely(pud_val(orig) != pud_val(pudp_get(pudp)))) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
@@ -3219,7 +3219,7 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr,
 
 	pudp = pud_offset_lockless(p4dp, p4d, addr);
 	do {
-		pud_t pud = READ_ONCE(*pudp);
+		pud_t pud = pudp_get(pudp);
 
 		next = pud_addr_end(addr, end);
 		if (unlikely(!pud_present(pud)))
diff --git a/mm/hmm.c b/mm/hmm.c
index 7e0229ae4a5a..c1b093d670b8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -423,7 +423,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 	/* Normally we don't want to split the huge page */
 	walk->action = ACTION_CONTINUE;
 
-	pud = READ_ONCE(*pudp);
+	pud = pudp_get(pudp);
 	if (!pud_present(pud)) {
 		spin_unlock(ptl);
 		return hmm_vma_walk_hole(start, end, -1, walk);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bb63de935937..69e1400a51ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1243,17 +1243,18 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pgprot_t prot = vma->vm_page_prot;
-	pud_t entry;
+	pud_t entry, old_pud;
 	spinlock_t *ptl;
 
 	ptl = pud_lock(mm, pud);
-	if (!pud_none(*pud)) {
+	old_pud = pudp_get(pud);
+	if (!pud_none(old_pud)) {
 		if (write) {
-			if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
-				WARN_ON_ONCE(!is_huge_zero_pud(*pud));
+			if (pud_pfn(old_pud) != pfn_t_to_pfn(pfn)) {
+				WARN_ON_ONCE(!is_huge_zero_pud(old_pud));
 				goto out_unlock;
 			}
-			entry = pud_mkyoung(*pud);
+			entry = pud_mkyoung(old_pud);
 			entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
 			if (pudp_set_access_flags(vma, addr, pud, entry, 1))
 				update_mmu_cache_pud(vma, addr, pud);
@@ -1476,7 +1477,7 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
 {
 	pud_t _pud;
 
-	_pud = pud_mkyoung(*pud);
+	_pud = pud_mkyoung(pudp_get(pud));
 	if (write)
 		_pud = pud_mkdirty(_pud);
 	if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK,
@@ -2284,9 +2285,10 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
 spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma)
 {
 	spinlock_t *ptl;
+	pud_t old_pud = pudp_get(pud);
 
 	ptl = pud_lock(vma->vm_mm, pud);
-	if (likely(pud_trans_huge(*pud) || pud_devmap(*pud)))
+	if (likely(pud_trans_huge(old_pud) || pud_devmap(old_pud)))
 		return ptl;
 	spin_unlock(ptl);
 	return NULL;
@@ -2317,10 +2319,12 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
 static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud,
 		unsigned long haddr)
 {
+	pud_t old_pud = pudp_get(pud);
+
 	VM_BUG_ON(haddr & ~HPAGE_PUD_MASK);
 	VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
 	VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma);
-	VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud));
+	VM_BUG_ON(!pud_trans_huge(old_pud) && !pud_devmap(old_pud));
 
 	count_vm_event(THP_SPLIT_PUD);
 
@@ -2332,13 +2336,15 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
 {
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
+	pud_t old_pud;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
 				address & HPAGE_PUD_MASK,
 				(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	ptl = pud_lock(vma->vm_mm, pud);
-	if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud)))
+	old_pud = pudp_get(pud);
+	if (unlikely(!pud_trans_huge(old_pud) && !pud_devmap(old_pud)))
 		goto out;
 	__split_huge_pud_locked(vma, pud, range.start);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index aaf508be0a2b..a3820242b01e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7328,7 +7328,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 		goto out;
 
 	spin_lock(&mm->page_table_lock);
-	if (pud_none(*pud)) {
+	if (pud_none(pudp_get(pud))) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
 		mm_inc_nr_pmds(mm);
@@ -7417,7 +7417,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			pte = (pte_t *)pud;
 		} else {
 			BUG_ON(sz != PMD_SIZE);
-			if (want_pmd_share(vma, addr) && pud_none(*pud))
+			if (want_pmd_share(vma, addr) && pud_none(pudp_get(pud)))
 				pte = huge_pmd_share(mm, vma, addr, pud);
 			else
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);
@@ -7461,7 +7461,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (sz == PUD_SIZE)
 		/* must be pud huge, non-present or none */
 		return (pte_t *)pud;
-	if (!pud_present(*pud))
+	if (!pud_present(pudp_get(pud)))
 		return NULL;
 	/* must have a valid entry and size to go further */
 
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 4418bcdcb2aa..f4cf519443e1 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -162,7 +162,7 @@ static int __ref zero_pud_populate(p4d_t *p4d, unsigned long addr,
 			continue;
 		}
 
-		if (pud_none(*pud)) {
+		if (pud_none(pudp_get(pud))) {
 			pmd_t *p;
 
 			if (slab_is_available()) {
@@ -315,7 +315,7 @@ static void kasan_free_pmd(pmd_t *pmd_start, pud_t *pud)
 			return;
 	}
 
-	pmd_free(&init_mm, (pmd_t *)page_to_virt(pud_page(*pud)));
+	pmd_free(&init_mm, (pmd_t *)page_to_virt(pud_page(pudp_get(pud))));
 	pud_clear(pud);
 }
 
@@ -326,7 +326,7 @@ static void kasan_free_pud(pud_t *pud_start, p4d_t *p4d)
 
 	for (i = 0; i < PTRS_PER_PUD; i++) {
 		pud = pud_start + i;
-		if (!pud_none(*pud))
+		if (!pud_none(pudp_get(pud)))
 			return;
 	}
 
@@ -407,10 +407,10 @@ static void kasan_remove_pud_table(pud_t *pud, unsigned long addr,
 
 		next = pud_addr_end(addr, end);
 
-		if (!pud_present(*pud))
+		if (!pud_present(pudp_get(pud)))
 			continue;
 
-		if (kasan_pmd_table(*pud)) {
+		if (kasan_pmd_table(pudp_get(pud))) {
 			if (IS_ALIGNED(addr, PUD_SIZE) &&
 			    IS_ALIGNED(next, PUD_SIZE)) {
 				pud_clear(pud);
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index aec16a7236f7..dbd8164c75f1 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -197,9 +197,9 @@ static bool shadow_mapped(unsigned long addr)
 	if (p4d_none(*p4d))
 		return false;
 	pud = pud_offset(p4d, addr);
-	if (pud_none(*pud))
+	if (pud_none(pudp_get(pud)))
 		return false;
-	if (pud_leaf(*pud))
+	if (pud_leaf(pudp_get(pud)))
 		return true;
 	pmd = pmd_offset(pud, addr);
 	if (pmd_none(pmdp_get(pmd)))
diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c
index 2f8829b3541a..c556cc4e3480 100644
--- a/mm/mapping_dirty_helpers.c
+++ b/mm/mapping_dirty_helpers.c
@@ -149,7 +149,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end,
 			      struct mm_walk *walk)
 {
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-	pud_t pudval = READ_ONCE(*pud);
+	pud_t pudval = pudp_get(pud);
 
 	/* Do not split a huge pud */
 	if (pud_trans_huge(pudval) || pud_devmap(pudval)) {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 305dbef3cc4d..fbb63401fb51 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -417,9 +417,9 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
 	if (!p4d_present(*p4d))
 		return 0;
 	pud = pud_offset(p4d, address);
-	if (!pud_present(*pud))
+	if (!pud_present(pudp_get(pud)))
 		return 0;
-	if (pud_devmap(*pud))
+	if (pud_devmap(pudp_get(pud)))
 		return PUD_SHIFT;
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(pmdp_get(pmd)))
diff --git a/mm/memory.c b/mm/memory.c
index 5520e1f6a1b9..801750e4337c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1753,7 +1753,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 	pud = pud_offset(p4d, addr);
 	do {
 		next = pud_addr_end(addr, end);
-		if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
+		if (pud_trans_huge(pudp_get(pud)) || pud_devmap(pudp_get(pud))) {
 			if (next - addr != HPAGE_PUD_SIZE) {
 				mmap_assert_locked(tlb->mm);
 				split_huge_pud(vma, pud, addr);
@@ -2836,7 +2836,7 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
 	unsigned long next;
 	int err = 0;
 
-	BUG_ON(pud_leaf(*pud));
+	BUG_ON(pud_leaf(pudp_get(pud)));
 
 	if (create) {
 		pmd = pmd_alloc_track(mm, pud, addr, mask);
@@ -2883,11 +2883,11 @@ static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d,
 	}
 	do {
 		next = pud_addr_end(addr, end);
-		if (pud_none(*pud) && !create)
+		if (pud_none(pudp_get(pud)) && !create)
 			continue;
-		if (WARN_ON_ONCE(pud_leaf(*pud)))
+		if (WARN_ON_ONCE(pud_leaf(pudp_get(pud))))
 			return -EINVAL;
-		if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) {
+		if (!pud_none(pudp_get(pud)) && WARN_ON_ONCE(pud_bad(pudp_get(pud)))) {
 			if (!create)
 				continue;
 			pud_clear_bad(pud);
@@ -6099,7 +6099,7 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
 		return -ENOMEM;
 
 	ptl = pud_lock(mm, pud);
-	if (!pud_present(*pud)) {
+	if (!pud_present(pudp_get(pud))) {
 		mm_inc_nr_pmds(mm);
 		smp_wmb(); /* See comment in pmd_install() */
 		pud_populate(mm, pud, new);
@@ -6164,7 +6164,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 		goto out;
 
 	pud = pud_offset(p4d, address);
-	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+	if (pud_none(pudp_get(pud)) || unlikely(pud_bad(pudp_get(pud))))
 		goto out;
 
 	pmd = pmd_offset(pud, address);
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 48a2cf56c80e..2a22d098b0b1 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -254,7 +254,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud)
 	if (&init_mm == mm)
 		return;
 
-	__page_table_check_pud_clear(mm, *pudp);
+	__page_table_check_pud_clear(mm, pudp_get(pudp));
 	if (pud_user_accessible_page(pud)) {
 		page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 				     pud_write(pud));
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ae5cc42aa208..511266307771 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -222,7 +222,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			continue;
 		}
 		pud = pud_offset(p4d, pvmw->address);
-		if (!pud_present(*pud)) {
+		if (!pud_present(pudp_get(pud))) {
 			step_forward(pvmw, PUD_SIZE);
 			continue;
 		}
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c3019a160e77..1d32c6da1a0d 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -145,7 +145,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	do {
  again:
 		next = pud_addr_end(addr, end);
-		if (pud_none(*pud)) {
+		if (pud_none(pudp_get(pud))) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
@@ -163,14 +163,14 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		if (walk->action == ACTION_AGAIN)
 			goto again;
 
-		if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) ||
+		if ((!walk->vma && (pud_leaf(pudp_get(pud)) || !pud_present(pudp_get(pud)))) ||
 		    walk->action == ACTION_CONTINUE ||
 		    !(ops->pmd_entry || ops->pte_entry))
 			continue;
 
 		if (walk->vma)
 			split_huge_pud(walk->vma, pud, addr);
-		if (pud_none(*pud))
+		if (pud_none(pudp_get(pud)))
 			goto again;
 
 		err = walk_pmd_range(pud, addr, next, walk);
diff --git a/mm/percpu.c b/mm/percpu.c
index 7ee77c0fd5e3..5f32164b04a2 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3200,7 +3200,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
 	}
 
 	pud = pud_offset(p4d, addr);
-	if (pud_none(*pud)) {
+	if (pud_none(pudp_get(pud))) {
 		pmd = memblock_alloc(PMD_TABLE_SIZE, PMD_TABLE_SIZE);
 		if (!pmd)
 			goto err_alloc;
diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h
index e9e879de8649..0f6b809431a3 100644
--- a/mm/pgalloc-track.h
+++ b/mm/pgalloc-track.h
@@ -33,7 +33,7 @@ static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
 				     unsigned long address,
 				     pgtbl_mod_mask *mod_mask)
 {
-	if (unlikely(pud_none(*pud))) {
+	if (unlikely(pud_none(pudp_get(pud)))) {
 		if (__pmd_alloc(mm, pud, address))
 			return NULL;
 		*mod_mask |= PGTBL_PUD_MODIFIED;
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 920947bb76cd..e09e3f920f7a 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -39,7 +39,7 @@ void p4d_clear_bad(p4d_t *p4d)
 #ifndef __PAGETABLE_PUD_FOLDED
 void pud_clear_bad(pud_t *pud)
 {
-	pud_ERROR(*pud);
+	pud_ERROR(pudp_get(pud));
 	pud_clear(pud);
 }
 #endif
@@ -150,10 +150,10 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
 pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
 			    pud_t *pudp)
 {
-	pud_t pud;
+	pud_t pud, old_pud = pudp_get(pudp);
 
 	VM_BUG_ON(address & ~HPAGE_PUD_MASK);
-	VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp));
+	VM_BUG_ON(!pud_trans_huge(old_pud) && !pud_devmap(old_pud));
 	pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp);
 	flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
 	return pud;
diff --git a/mm/ptdump.c b/mm/ptdump.c
index e17588a32012..32ae8e829329 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -30,7 +30,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
 			    unsigned long next, struct mm_walk *walk)
 {
 	struct ptdump_state *st = walk->private;
-	pgd_t val = READ_ONCE(*pgd);
+	pgd_t val = pgdp_get(pgd);
 
 #if CONFIG_PGTABLE_LEVELS > 4 && \
 		(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
@@ -76,7 +76,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
 			    unsigned long next, struct mm_walk *walk)
 {
 	struct ptdump_state *st = walk->private;
-	pud_t val = READ_ONCE(*pud);
+	pud_t val = pudp_get(pud);
 
 #if CONFIG_PGTABLE_LEVELS > 2 && \
 		(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
diff --git a/mm/rmap.c b/mm/rmap.c
index 32e4920e419d..81f1946653e0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -817,7 +817,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
 		goto out;
 
 	pud = pud_offset(p4d, address);
-	if (!pud_present(*pud))
+	if (!pud_present(pudp_get(pud)))
 		goto out;
 
 	pmd = pmd_offset(pud, address);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index c89706e107ce..d8ea64ec665f 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -203,7 +203,7 @@ void __weak __meminit pmd_init(void *addr)
 pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node)
 {
 	pud_t *pud = pud_offset(p4d, addr);
-	if (pud_none(*pud)) {
+	if (pud_none(pudp_get(pud))) {
 		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1da56cbe5feb..05292d998122 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -200,7 +200,7 @@ static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
 	if (!IS_ALIGNED(phys_addr, PUD_SIZE))
 		return 0;
 
-	if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
+	if (pud_present(pudp_get(pud)) && !pud_free_pmd_page(pud, addr))
 		return 0;
 
 	return pud_set_huge(pud, phys_addr, prot);
@@ -396,7 +396,7 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		next = pud_addr_end(addr, end);
 
 		cleared = pud_clear_huge(pud);
-		if (cleared || pud_bad(*pud))
+		if (cleared || pud_bad(pudp_get(pud)))
 			*mask |= PGTBL_PUD_MODIFIED;
 
 		if (cleared)
@@ -742,7 +742,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 	struct page *page = NULL;
 	pgd_t *pgd = pgd_offset_k(addr);
 	p4d_t *p4d;
-	pud_t *pud;
+	pud_t *pud, old_pud;
 	pmd_t *pmd, old_pmd;
 	pte_t *ptep, pte;
 
@@ -768,11 +768,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 		return NULL;
 
 	pud = pud_offset(p4d, addr);
-	if (pud_none(*pud))
+	old_pud = pudp_get(pud);
+	if (pud_none(old_pud))
 		return NULL;
-	if (pud_leaf(*pud))
-		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-	if (WARN_ON_ONCE(pud_bad(*pud)))
+	if (pud_leaf(old_pud))
+		return pud_page(old_pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	if (WARN_ON_ONCE(pud_bad(old_pud)))
 		return NULL;
 
 	pmd = pmd_offset(pud, addr);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd489c1af228..04b03e6c3095 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3421,7 +3421,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area
 	DEFINE_MAX_SEQ(walk->lruvec);
 	int old_gen, new_gen = lru_gen_from_seq(max_seq);
 
-	VM_WARN_ON_ONCE(pud_leaf(*pud));
+	VM_WARN_ON_ONCE(pud_leaf(pudp_get(pud)));
 
 	/* try to batch at most 1+MIN_LRU_BATCH+1 entries */
 	if (*first == -1) {
@@ -3501,7 +3501,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
 	struct lru_gen_mm_walk *walk = args->private;
 	struct lru_gen_mm_state *mm_state = get_mm_state(walk->lruvec);
 
-	VM_WARN_ON_ONCE(pud_leaf(*pud));
+	VM_WARN_ON_ONCE(pud_leaf(pudp_get(pud)));
 
 	/*
 	 * Finish an entire PMD in two passes: the first only reaches to PTE
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 6/7] mm: Use p4dp_get() for accessing P4D entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
                   ` (4 preceding siblings ...)
  2024-09-17  7:31 ` [PATCH V2 5/7] mm: Use pudp_get() for accessing PUD entries Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-17  7:31 ` [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Anshuman Khandual
  2024-09-25 10:05 ` [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Christophe Leroy
  7 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

Convert P4D accesses via p4dp_get() helper that defaults as READ_ONCE() but
also provides the platform an opportunity to override when required. This
stores read page table entry value in a local variable which can be used in
multiple instances there after. This helps in avoiding multiple memory load
operations as well possible race conditions.

Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
cc: Christoph Lameter <cl@linux.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: kasan-dev@googlegroups.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/misc/sgi-gru/grufault.c |  2 +-
 fs/userfaultfd.c                |  2 +-
 include/linux/pgtable.h         |  9 ++++++---
 kernel/events/core.c            |  2 +-
 mm/gup.c                        |  6 +++---
 mm/hugetlb.c                    |  2 +-
 mm/kasan/init.c                 | 10 +++++-----
 mm/kasan/shadow.c               |  2 +-
 mm/memory-failure.c             |  2 +-
 mm/memory.c                     | 16 +++++++++-------
 mm/page_vma_mapped.c            |  2 +-
 mm/percpu.c                     |  2 +-
 mm/pgalloc-track.h              |  2 +-
 mm/pgtable-generic.c            |  2 +-
 mm/ptdump.c                     |  2 +-
 mm/rmap.c                       |  2 +-
 mm/sparse-vmemmap.c             |  2 +-
 mm/vmalloc.c                    | 15 ++++++++-------
 mm/vmscan.c                     |  2 +-
 19 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 95d479d5e40f..fcaceac60659 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -216,7 +216,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
 		goto err;
 
 	p4dp = p4d_offset(pgdp, vaddr);
-	if (unlikely(p4d_none(*p4dp)))
+	if (unlikely(p4d_none(p4dp_get(p4dp))))
 		goto err;
 
 	pudp = pud_offset(p4dp, vaddr);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 00719a0f688c..4044e15cdfd9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -307,7 +307,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	if (!pgd_present(*pgd))
 		goto out;
 	p4d = p4d_offset(pgd, address);
-	if (!p4d_present(*p4d))
+	if (!p4d_present(p4dp_get(p4d)))
 		goto out;
 	pud = pud_offset(p4d, address);
 	if (!pud_present(pudp_get(pud)))
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index eb993ef0946f..689cd5a32157 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1081,7 +1081,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 
 #define set_p4d_safe(p4dp, p4d) \
 ({ \
-	WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \
+	p4d_t __old = p4dp_get(p4dp); \
+	WARN_ON_ONCE(p4d_present(__old) && !p4d_same(__old, p4d)); \
 	set_p4d(p4dp, p4d); \
 })
 
@@ -1251,9 +1252,11 @@ static inline int pgd_none_or_clear_bad(pgd_t *pgd)
 
 static inline int p4d_none_or_clear_bad(p4d_t *p4d)
 {
-	if (p4d_none(*p4d))
+	p4d_t old_p4d = p4dp_get(p4d);
+
+	if (p4d_none(old_p4d))
 		return 1;
-	if (unlikely(p4d_bad(*p4d))) {
+	if (unlikely(p4d_bad(old_p4d))) {
 		p4d_clear_bad(p4d);
 		return 1;
 	}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 35e2f2789246..4e56a276ed25 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7611,7 +7611,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr)
 		return pgd_leaf_size(pgd);
 
 	p4dp = p4d_offset_lockless(pgdp, pgd, addr);
-	p4d = READ_ONCE(*p4dp);
+	p4d = p4dp_get(p4dp);
 	if (!p4d_present(p4d))
 		return 0;
 
diff --git a/mm/gup.c b/mm/gup.c
index 300fc7eb306c..3a97d0263052 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1014,7 +1014,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
 	p4d_t *p4dp, p4d;
 
 	p4dp = p4d_offset(pgdp, address);
-	p4d = READ_ONCE(*p4dp);
+	p4d = p4dp_get(p4dp);
 	BUILD_BUG_ON(p4d_leaf(p4d));
 
 	if (!p4d_present(p4d) || p4d_bad(p4d))
@@ -1114,7 +1114,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	if (pgd_none(*pgd))
 		return -EFAULT;
 	p4d = p4d_offset(pgd, address);
-	if (p4d_none(*p4d))
+	if (p4d_none(p4dp_get(p4d)))
 		return -EFAULT;
 	pud = pud_offset(p4d, address);
 	if (pud_none(pudp_get(pud)))
@@ -3245,7 +3245,7 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr,
 
 	p4dp = p4d_offset_lockless(pgdp, pgd, addr);
 	do {
-		p4d_t p4d = READ_ONCE(*p4dp);
+		p4d_t p4d = p4dp_get(p4dp);
 
 		next = p4d_addr_end(addr, end);
 		if (!p4d_present(p4d))
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a3820242b01e..4fdb91c8cc2b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7454,7 +7454,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (!pgd_present(*pgd))
 		return NULL;
 	p4d = p4d_offset(pgd, addr);
-	if (!p4d_present(*p4d))
+	if (!p4d_present(p4dp_get(p4d)))
 		return NULL;
 
 	pud = pud_offset(p4d, addr);
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index f4cf519443e1..02af738fee5e 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -208,7 +208,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
 			continue;
 		}
 
-		if (p4d_none(*p4d)) {
+		if (p4d_none(p4dp_get(p4d))) {
 			pud_t *p;
 
 			if (slab_is_available()) {
@@ -330,7 +330,7 @@ static void kasan_free_pud(pud_t *pud_start, p4d_t *p4d)
 			return;
 	}
 
-	pud_free(&init_mm, (pud_t *)page_to_virt(p4d_page(*p4d)));
+	pud_free(&init_mm, (pud_t *)page_to_virt(p4d_page(p4dp_get(p4d))));
 	p4d_clear(p4d);
 }
 
@@ -341,7 +341,7 @@ static void kasan_free_p4d(p4d_t *p4d_start, pgd_t *pgd)
 
 	for (i = 0; i < PTRS_PER_P4D; i++) {
 		p4d = p4d_start + i;
-		if (!p4d_none(*p4d))
+		if (!p4d_none(p4dp_get(p4d)))
 			return;
 	}
 
@@ -434,10 +434,10 @@ static void kasan_remove_p4d_table(p4d_t *p4d, unsigned long addr,
 
 		next = p4d_addr_end(addr, end);
 
-		if (!p4d_present(*p4d))
+		if (!p4d_present(p4dp_get(p4d)))
 			continue;
 
-		if (kasan_pud_table(*p4d)) {
+		if (kasan_pud_table(p4dp_get(p4d))) {
 			if (IS_ALIGNED(addr, P4D_SIZE) &&
 			    IS_ALIGNED(next, P4D_SIZE)) {
 				p4d_clear(p4d);
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index dbd8164c75f1..52150cc5ae5f 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -194,7 +194,7 @@ static bool shadow_mapped(unsigned long addr)
 	if (pgd_none(*pgd))
 		return false;
 	p4d = p4d_offset(pgd, addr);
-	if (p4d_none(*p4d))
+	if (p4d_none(p4dp_get(p4d)))
 		return false;
 	pud = pud_offset(p4d, addr);
 	if (pud_none(pudp_get(pud)))
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fbb63401fb51..3d900cc039b3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -414,7 +414,7 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
 	if (!pgd_present(*pgd))
 		return 0;
 	p4d = p4d_offset(pgd, address);
-	if (!p4d_present(*p4d))
+	if (!p4d_present(p4dp_get(p4d)))
 		return 0;
 	pud = pud_offset(p4d, address);
 	if (!pud_present(pudp_get(pud)))
diff --git a/mm/memory.c b/mm/memory.c
index 801750e4337c..5056f39f2c3b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2906,7 +2906,7 @@ static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd,
 				     pte_fn_t fn, void *data, bool create,
 				     pgtbl_mod_mask *mask)
 {
-	p4d_t *p4d;
+	p4d_t *p4d, old_p4d;
 	unsigned long next;
 	int err = 0;
 
@@ -2919,11 +2919,12 @@ static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd,
 	}
 	do {
 		next = p4d_addr_end(addr, end);
-		if (p4d_none(*p4d) && !create)
+		old_p4d = p4dp_get(p4d);
+		if (p4d_none(old_p4d) && !create)
 			continue;
-		if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+		if (WARN_ON_ONCE(p4d_leaf(old_p4d)))
 			return -EINVAL;
-		if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) {
+		if (!p4d_none(old_p4d) && WARN_ON_ONCE(p4d_bad(old_p4d))) {
 			if (!create)
 				continue;
 			p4d_clear_bad(p4d);
@@ -6075,7 +6076,7 @@ int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address)
 		return -ENOMEM;
 
 	spin_lock(&mm->page_table_lock);
-	if (!p4d_present(*p4d)) {
+	if (!p4d_present(p4dp_get(p4d))) {
 		mm_inc_nr_puds(mm);
 		smp_wmb(); /* See comment in pmd_install() */
 		p4d_populate(mm, p4d, new);
@@ -6143,7 +6144,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-	p4d_t *p4d;
+	p4d_t *p4d, old_p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *ptep;
@@ -6160,7 +6161,8 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 		goto out;
 
 	p4d = p4d_offset(pgd, address);
-	if (p4d_none(*p4d) || unlikely(p4d_bad(*p4d)))
+	old_p4d = p4dp_get(p4d);
+	if (p4d_none(old_p4d) || unlikely(p4d_bad(old_p4d)))
 		goto out;
 
 	pud = pud_offset(p4d, address);
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 511266307771..a33f92db2666 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -217,7 +217,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			continue;
 		}
 		p4d = p4d_offset(pgd, pvmw->address);
-		if (!p4d_present(*p4d)) {
+		if (!p4d_present(p4dp_get(p4d))) {
 			step_forward(pvmw, P4D_SIZE);
 			continue;
 		}
diff --git a/mm/percpu.c b/mm/percpu.c
index 5f32164b04a2..58660e8eb892 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3192,7 +3192,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
 	}
 
 	p4d = p4d_offset(pgd, addr);
-	if (p4d_none(*p4d)) {
+	if (p4d_none(p4dp_get(p4d))) {
 		pud = memblock_alloc(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
 		if (!pud)
 			goto err_alloc;
diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h
index 0f6b809431a3..3db8ccbcb141 100644
--- a/mm/pgalloc-track.h
+++ b/mm/pgalloc-track.h
@@ -20,7 +20,7 @@ static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
 				     unsigned long address,
 				     pgtbl_mod_mask *mod_mask)
 {
-	if (unlikely(p4d_none(*p4d))) {
+	if (unlikely(p4d_none(p4dp_get(p4d)))) {
 		if (__pud_alloc(mm, p4d, address))
 			return NULL;
 		*mod_mask |= PGTBL_P4D_MODIFIED;
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index e09e3f920f7a..f5ab52beb536 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -31,7 +31,7 @@ void pgd_clear_bad(pgd_t *pgd)
 #ifndef __PAGETABLE_P4D_FOLDED
 void p4d_clear_bad(p4d_t *p4d)
 {
-	p4d_ERROR(*p4d);
+	p4d_ERROR(p4dp_get(p4d));
 	p4d_clear(p4d);
 }
 #endif
diff --git a/mm/ptdump.c b/mm/ptdump.c
index 32ae8e829329..2c40224b8ad0 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -53,7 +53,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
 			    unsigned long next, struct mm_walk *walk)
 {
 	struct ptdump_state *st = walk->private;
-	p4d_t val = READ_ONCE(*p4d);
+	p4d_t val = p4dp_get(p4d);
 
 #if CONFIG_PGTABLE_LEVELS > 3 && \
 		(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
diff --git a/mm/rmap.c b/mm/rmap.c
index 81f1946653e0..a0ff325467eb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -813,7 +813,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
 		goto out;
 
 	p4d = p4d_offset(pgd, address);
-	if (!p4d_present(*p4d))
+	if (!p4d_present(p4dp_get(p4d)))
 		goto out;
 
 	pud = pud_offset(p4d, address);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d8ea64ec665f..2bd1c95f107a 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -220,7 +220,7 @@ void __weak __meminit pud_init(void *addr)
 p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
 {
 	p4d_t *p4d = p4d_offset(pgd, addr);
-	if (p4d_none(*p4d)) {
+	if (p4d_none(p4dp_get(p4d))) {
 		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 05292d998122..f27ecac7bd6e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -251,7 +251,7 @@ static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
 	if (!IS_ALIGNED(phys_addr, P4D_SIZE))
 		return 0;
 
-	if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr))
+	if (p4d_present(p4dp_get(p4d)) && !p4d_free_pud_page(p4d, addr))
 		return 0;
 
 	return p4d_set_huge(p4d, phys_addr, prot);
@@ -418,7 +418,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 		next = p4d_addr_end(addr, end);
 
 		p4d_clear_huge(p4d);
-		if (p4d_bad(*p4d))
+		if (p4d_bad(p4dp_get(p4d)))
 			*mask |= PGTBL_P4D_MODIFIED;
 
 		if (p4d_none_or_clear_bad(p4d))
@@ -741,7 +741,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 	unsigned long addr = (unsigned long) vmalloc_addr;
 	struct page *page = NULL;
 	pgd_t *pgd = pgd_offset_k(addr);
-	p4d_t *p4d;
+	p4d_t *p4d, old_p4d;
 	pud_t *pud, old_pud;
 	pmd_t *pmd, old_pmd;
 	pte_t *ptep, pte;
@@ -760,11 +760,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 		return NULL;
 
 	p4d = p4d_offset(pgd, addr);
-	if (p4d_none(*p4d))
+	old_p4d = p4dp_get(p4d);
+	if (p4d_none(old_p4d))
 		return NULL;
-	if (p4d_leaf(*p4d))
-		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
-	if (WARN_ON_ONCE(p4d_bad(*p4d)))
+	if (p4d_leaf(old_p4d))
+		return p4d_page(old_p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+	if (WARN_ON_ONCE(p4d_bad(old_p4d)))
 		return NULL;
 
 	pud = pud_offset(p4d, addr);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 04b03e6c3095..b16925b5f072 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3579,7 +3579,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long start, unsigned long end,
 	unsigned long next;
 	struct lru_gen_mm_walk *walk = args->private;
 
-	VM_WARN_ON_ONCE(p4d_leaf(*p4d));
+	VM_WARN_ON_ONCE(p4d_leaf(p4dp_get(p4d)));
 
 	pud = pud_offset(p4d, start & P4D_MASK);
 restart:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
                   ` (5 preceding siblings ...)
  2024-09-17  7:31 ` [PATCH V2 6/7] mm: Use p4dp_get() for accessing P4D entries Anshuman Khandual
@ 2024-09-17  7:31 ` Anshuman Khandual
  2024-09-18 20:30   ` kernel test robot
  2024-09-25 10:05 ` [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Christophe Leroy
  7 siblings, 1 reply; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-17  7:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

Convert PGD accesses via pgdp_get() helper that defaults as READ_ONCE() but
also provides the platform an opportunity to override when required. This
stores read page table entry value in a local variable which can be used in
multiple instances there after. This helps in avoiding multiple memory load
operations as well possible race conditions.

Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
cc: Christoph Lameter <cl@linux.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-perf-users@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/misc/sgi-gru/grufault.c |  2 +-
 fs/userfaultfd.c                |  2 +-
 include/linux/mm.h              |  2 +-
 include/linux/pgtable.h         |  9 ++++++---
 kernel/events/core.c            |  2 +-
 mm/gup.c                        | 11 ++++++-----
 mm/hugetlb.c                    |  2 +-
 mm/kasan/init.c                 |  8 ++++----
 mm/kasan/shadow.c               |  2 +-
 mm/memory-failure.c             |  2 +-
 mm/memory.c                     | 16 +++++++++-------
 mm/page_vma_mapped.c            |  2 +-
 mm/percpu.c                     |  2 +-
 mm/pgalloc-track.h              |  2 +-
 mm/pgtable-generic.c            |  2 +-
 mm/rmap.c                       |  2 +-
 mm/sparse-vmemmap.c             |  2 +-
 mm/vmalloc.c                    | 13 +++++++------
 18 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index fcaceac60659..6aeccbd440e7 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -212,7 +212,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
 	pte_t pte;
 
 	pgdp = pgd_offset(vma->vm_mm, vaddr);
-	if (unlikely(pgd_none(*pgdp)))
+	if (unlikely(pgd_none(pgdp_get(pgdp))))
 		goto err;
 
 	p4dp = p4d_offset(pgdp, vaddr);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 4044e15cdfd9..6d33c7a9eb01 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -304,7 +304,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	assert_fault_locked(vmf);
 
 	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
+	if (!pgd_present(pgdp_get(pgd)))
 		goto out;
 	p4d = p4d_offset(pgd, address);
 	if (!p4d_present(p4dp_get(p4d)))
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1bb1599b5779..1978a4b1fcf5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2819,7 +2819,7 @@ int __pte_alloc_kernel(pmd_t *pmd);
 static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
 		unsigned long address)
 {
-	return (unlikely(pgd_none(*pgd)) && __p4d_alloc(mm, pgd, address)) ?
+	return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
 		NULL : p4d_offset(pgd, address);
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 689cd5a32157..6d12ae7e3982 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1088,7 +1088,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 
 #define set_pgd_safe(pgdp, pgd) \
 ({ \
-	WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \
+	pgd_t __old = pgdp_get(pgdp); \
+	WARN_ON_ONCE(pgd_present(__old) && !pgd_same(__old, pgd)); \
 	set_pgd(pgdp, pgd); \
 })
 
@@ -1241,9 +1242,11 @@ void pmd_clear_bad(pmd_t *);
 
 static inline int pgd_none_or_clear_bad(pgd_t *pgd)
 {
-	if (pgd_none(*pgd))
+	pgd_t old_pgd = pgdp_get(pgd);
+
+	if (pgd_none(old_pgd))
 		return 1;
-	if (unlikely(pgd_bad(*pgd))) {
+	if (unlikely(pgd_bad(old_pgd))) {
 		pgd_clear_bad(pgd);
 		return 1;
 	}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e56a276ed25..1e3142211cce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7603,7 +7603,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr)
 	pte_t *ptep, pte;
 
 	pgdp = pgd_offset(mm, addr);
-	pgd = READ_ONCE(*pgdp);
+	pgd = pgdp_get(pgdp);
 	if (pgd_none(pgd))
 		return 0;
 
diff --git a/mm/gup.c b/mm/gup.c
index 3a97d0263052..3aff3555ba19 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1051,7 +1051,7 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int flags,
 			      struct follow_page_context *ctx)
 {
-	pgd_t *pgd;
+	pgd_t *pgd, old_pgd;
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 
@@ -1060,7 +1060,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 	ctx->page_mask = 0;
 	pgd = pgd_offset(mm, address);
 
-	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
+	old_pgd = pgdp_get(pgd);
+	if (pgd_none(old_pgd) || unlikely(pgd_bad(old_pgd)))
 		page = no_page_table(vma, flags, address);
 	else
 		page = follow_p4d_mask(vma, address, pgd, flags, ctx);
@@ -1111,7 +1112,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 		pgd = pgd_offset_k(address);
 	else
 		pgd = pgd_offset_gate(mm, address);
-	if (pgd_none(*pgd))
+	if (pgd_none(pgdp_get(pgd)))
 		return -EFAULT;
 	p4d = p4d_offset(pgd, address);
 	if (p4d_none(p4dp_get(p4d)))
@@ -3158,7 +3159,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 	if (!folio)
 		return 0;
 
-	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
+	if (unlikely(pgd_val(orig) != pgd_val(pgdp_get(pgdp)))) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
@@ -3267,7 +3268,7 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end,
 
 	pgdp = pgd_offset(current->mm, addr);
 	do {
-		pgd_t pgd = READ_ONCE(*pgdp);
+		pgd_t pgd = pgdp_get(pgdp);
 
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4fdb91c8cc2b..294d74b03d83 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7451,7 +7451,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	pmd_t *pmd;
 
 	pgd = pgd_offset(mm, addr);
-	if (!pgd_present(*pgd))
+	if (!pgd_present(pgdp_get(pgd)))
 		return NULL;
 	p4d = p4d_offset(pgd, addr);
 	if (!p4d_present(p4dp_get(p4d)))
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 02af738fee5e..c2b307716551 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -271,7 +271,7 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
 			continue;
 		}
 
-		if (pgd_none(*pgd)) {
+		if (pgd_none(pgdp_get(pgd))) {
 			p4d_t *p;
 
 			if (slab_is_available()) {
@@ -345,7 +345,7 @@ static void kasan_free_p4d(p4d_t *p4d_start, pgd_t *pgd)
 			return;
 	}
 
-	p4d_free(&init_mm, (p4d_t *)page_to_virt(pgd_page(*pgd)));
+	p4d_free(&init_mm, (p4d_t *)page_to_virt(pgd_page(pgdp_get(pgd))));
 	pgd_clear(pgd);
 }
 
@@ -468,10 +468,10 @@ void kasan_remove_zero_shadow(void *start, unsigned long size)
 		next = pgd_addr_end(addr, end);
 
 		pgd = pgd_offset_k(addr);
-		if (!pgd_present(*pgd))
+		if (!pgd_present(pgdp_get(pgd)))
 			continue;
 
-		if (kasan_p4d_table(*pgd)) {
+		if (kasan_p4d_table(pgdp_get(pgd))) {
 			if (IS_ALIGNED(addr, PGDIR_SIZE) &&
 			    IS_ALIGNED(next, PGDIR_SIZE)) {
 				pgd_clear(pgd);
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 52150cc5ae5f..7f3c46237816 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -191,7 +191,7 @@ static bool shadow_mapped(unsigned long addr)
 	pmd_t *pmd;
 	pte_t *pte;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(pgdp_get(pgd)))
 		return false;
 	p4d = p4d_offset(pgd, addr);
 	if (p4d_none(p4dp_get(p4d)))
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3d900cc039b3..c9397eab52bd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -411,7 +411,7 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
 
 	VM_BUG_ON_VMA(address == -EFAULT, vma);
 	pgd = pgd_offset(vma->vm_mm, address);
-	if (!pgd_present(*pgd))
+	if (!pgd_present(pgdp_get(pgd)))
 		return 0;
 	p4d = p4d_offset(pgd, address);
 	if (!p4d_present(p4dp_get(p4d)))
diff --git a/mm/memory.c b/mm/memory.c
index 5056f39f2c3b..b4845a84ceb5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2942,7 +2942,7 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr,
 				 unsigned long size, pte_fn_t fn,
 				 void *data, bool create)
 {
-	pgd_t *pgd;
+	pgd_t *pgd, old_pgd;
 	unsigned long start = addr, next;
 	unsigned long end = addr + size;
 	pgtbl_mod_mask mask = 0;
@@ -2954,11 +2954,12 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr,
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
-		if (pgd_none(*pgd) && !create)
+		old_pgd = pgdp_get(pgd);
+		if (pgd_none(old_pgd) && !create)
 			continue;
-		if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+		if (WARN_ON_ONCE(pgd_leaf(old_pgd)))
 			return -EINVAL;
-		if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
+		if (!pgd_none(old_pgd) && WARN_ON_ONCE(pgd_bad(old_pgd))) {
 			if (!create)
 				continue;
 			pgd_clear_bad(pgd);
@@ -6053,7 +6054,7 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
 		return -ENOMEM;
 
 	spin_lock(&mm->page_table_lock);
-	if (pgd_present(*pgd)) {	/* Another has populated it */
+	if (pgd_present(pgdp_get(pgd))) {	/* Another has populated it */
 		p4d_free(mm, new);
 	} else {
 		smp_wmb(); /* See comment in pmd_install() */
@@ -6143,7 +6144,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 	       pte_t **ptepp, spinlock_t **ptlp)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pgd_t *pgd;
+	pgd_t *pgd, old_pgd;
 	p4d_t *p4d, old_p4d;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -6157,7 +6158,8 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 		goto out;
 
 	pgd = pgd_offset(mm, address);
-	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
+	old_pgd = pgdp_get(pgd);
+	if (pgd_none(old_pgd) || unlikely(pgd_bad(old_pgd)))
 		goto out;
 
 	p4d = p4d_offset(pgd, address);
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index a33f92db2666..fb8b610f7378 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -212,7 +212,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 restart:
 	do {
 		pgd = pgd_offset(mm, pvmw->address);
-		if (!pgd_present(*pgd)) {
+		if (!pgd_present(pgdp_get(pgd))) {
 			step_forward(pvmw, PGDIR_SIZE);
 			continue;
 		}
diff --git a/mm/percpu.c b/mm/percpu.c
index 58660e8eb892..70e68ab002e9 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3184,7 +3184,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
 	pud_t *pud;
 	pmd_t *pmd;
 
-	if (pgd_none(*pgd)) {
+	if (pgd_none(pgdp_get(pgd))) {
 		p4d = memblock_alloc(P4D_TABLE_SIZE, P4D_TABLE_SIZE);
 		if (!p4d)
 			goto err_alloc;
diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h
index 3db8ccbcb141..644f632c7cba 100644
--- a/mm/pgalloc-track.h
+++ b/mm/pgalloc-track.h
@@ -7,7 +7,7 @@ static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
 				     unsigned long address,
 				     pgtbl_mod_mask *mod_mask)
 {
-	if (unlikely(pgd_none(*pgd))) {
+	if (unlikely(pgd_none(pgdp_get(pgd)))) {
 		if (__p4d_alloc(mm, pgd, address))
 			return NULL;
 		*mod_mask |= PGTBL_PGD_MODIFIED;
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index f5ab52beb536..16c1ed5b3d0b 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -24,7 +24,7 @@
 
 void pgd_clear_bad(pgd_t *pgd)
 {
-	pgd_ERROR(*pgd);
+	pgd_ERROR(pgdp_get(pgd));
 	pgd_clear(pgd);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
index a0ff325467eb..5f4c52f34192 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -809,7 +809,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
 	pmd_t *pmd = NULL;
 
 	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
+	if (!pgd_present(pgdp_get(pgd)))
 		goto out;
 
 	p4d = p4d_offset(pgd, address);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 2bd1c95f107a..ffc78329a130 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -233,7 +233,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
 pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
 {
 	pgd_t *pgd = pgd_offset_k(addr);
-	if (pgd_none(*pgd)) {
+	if (pgd_none(pgdp_get(pgd))) {
 		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f27ecac7bd6e..a40323a8c6ab 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -450,7 +450,7 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end)
 	pgd = pgd_offset_k(addr);
 	do {
 		next = pgd_addr_end(addr, end);
-		if (pgd_bad(*pgd))
+		if (pgd_bad(pgdp_get(pgd)))
 			mask |= PGTBL_PGD_MODIFIED;
 		if (pgd_none_or_clear_bad(pgd))
 			continue;
@@ -582,7 +582,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
 	pgd = pgd_offset_k(addr);
 	do {
 		next = pgd_addr_end(addr, end);
-		if (pgd_bad(*pgd))
+		if (pgd_bad(pgdp_get(pgd)))
 			mask |= PGTBL_PGD_MODIFIED;
 		err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
 		if (err)
@@ -740,7 +740,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
 	unsigned long addr = (unsigned long) vmalloc_addr;
 	struct page *page = NULL;
-	pgd_t *pgd = pgd_offset_k(addr);
+	pgd_t *pgd = pgd_offset_k(addr), old_pgd;
 	p4d_t *p4d, old_p4d;
 	pud_t *pud, old_pud;
 	pmd_t *pmd, old_pmd;
@@ -752,11 +752,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 	 */
 	VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
 
-	if (pgd_none(*pgd))
+	old_pgd = pgdp_get(pgd);
+	if (pgd_none(old_pgd))
 		return NULL;
-	if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+	if (WARN_ON_ONCE(pgd_leaf(old_pgd)))
 		return NULL; /* XXX: no allowance for huge pgd */
-	if (WARN_ON_ONCE(pgd_bad(*pgd)))
+	if (WARN_ON_ONCE(pgd_bad(old_pgd)))
 		return NULL;
 
 	p4d = p4d_offset(pgd, addr);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-17  7:31 ` [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Anshuman Khandual
@ 2024-09-18 20:30   ` kernel test robot
  2024-09-19  7:55     ` Anshuman Khandual
  0 siblings, 1 reply; 37+ messages in thread
From: kernel test robot @ 2024-09-18 20:30 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: llvm, oe-kbuild-all, Anshuman Khandual, Andrew Morton,
	Linux Memory Management List, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

Hi Anshuman,

kernel test robot noticed the following build errors:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
[cannot apply to akpm-mm/mm-everything next-20240918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
base:   char-misc/char-misc-testing
patch link:    https://lore.kernel.org/r/20240917073117.1531207-8-anshuman.khandual%40arm.com
patch subject: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
config: arm-footbridge_defconfig (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 8663a75fa2f31299ab8d1d90288d9df92aadee88)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409190310.ViHBRe12-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:30:
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |                         ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/pgtable.h:1243:48: note: 'pgd' declared here
    1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
         |                                                ^
>> include/linux/pgtable.h:1245:8: error: array initializer must be an initializer list or wide string literal
    1245 |         pgd_t old_pgd = pgdp_get(pgd);
         |               ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                         ^        ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                                       ^        ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                          ^         ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:27: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                                          ^         ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:5: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&
         |                          ^         ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:21: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&
         |                                          ^         ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:138:8: note: expanded from macro '_SIG_SET_BINOP'
     138 |                 a3 = a->sig[3]; a2 = a->sig[2];                         \
         |                      ^      ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
--
     163 | _SIG_SET_BINOP(sigandnsets, _sig_andn)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:140:3: note: expanded from macro '_SIG_SET_BINOP'
     140 |                 r->sig[3] = op(a3, b3);                                 \
         |                 ^      ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:163:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     163 | _SIG_SET_BINOP(sigandnsets, _sig_andn)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:141:3: note: expanded from macro '_SIG_SET_BINOP'
     141 |                 r->sig[2] = op(a2, b2);                                 \
         |                 ^      ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:187:1: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     187 | _SIG_SET_OP(signotset, _sig_not)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:174:27: note: expanded from macro '_SIG_SET_OP'
     174 |         case 4: set->sig[3] = op(set->sig[3]);                          \
         |                                  ^        ~
   include/linux/signal.h:186:24: note: expanded from macro '_sig_not'
     186 | #define _sig_not(x)     (~(x))
         |                            ^
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:187:1: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     187 | _SIG_SET_OP(signotset, _sig_not)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:174:10: note: expanded from macro '_SIG_SET_OP'
     174 |         case 4: set->sig[3] = op(set->sig[3]);                          \
         |                 ^        ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:187:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     187 | _SIG_SET_OP(signotset, _sig_not)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:175:20: note: expanded from macro '_SIG_SET_OP'
     175 |                 set->sig[2] = op(set->sig[2]);                          \
         |                                  ^        ~
   include/linux/signal.h:186:24: note: expanded from macro '_sig_not'
     186 | #define _sig_not(x)     (~(x))
         |                            ^
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:1131:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:187:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     187 | _SIG_SET_OP(signotset, _sig_not)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:175:3: note: expanded from macro '_SIG_SET_OP'
     175 |                 set->sig[2] = op(set->sig[2]);                          \
         |                 ^        ~
   arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here
      17 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from arch/arm/kernel/asm-offsets.c:12:
   In file included from include/linux/mm.h:2232:
   include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     517 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from arch/arm/kernel/asm-offsets.c:12:
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                                   ^
   arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                            ^
   include/linux/mm.h:2819:61: note: 'pgd' declared here
    2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd,
         |                                                             ^
>> include/linux/mm.h:2822:28: error: passing 'const volatile pmdval_t *' (aka 'const volatile unsigned int *') to parameter of type 'pmdval_t *' (aka 'unsigned int *') discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
    2822 |         return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ?
         |                 ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
   arch/arm/include/asm/pgtable.h:154:25: note: expanded from macro 'pgdp_get'
     154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
         |                                 ^
   include/asm-generic/rwonce.h:47:28: note: expanded from macro 'READ_ONCE'
      47 | #define READ_ONCE(x)                                                    \
         |                                                                         ^
   include/linux/compiler.h:77:42: note: expanded from macro 'unlikely'
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         |                                             ^
   include/asm-generic/pgtable-nop4d.h:21:34: note: passing argument to parameter 'pgd' here
      21 | static inline int pgd_none(pgd_t pgd)           { return 0; }
         |                                  ^
   29 warnings and 18 errors generated.
   make[3]: *** [scripts/Makefile.build:117: arch/arm/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1194: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:224: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:224: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +1245 include/linux/pgtable.h

  1242	
  1243	static inline int pgd_none_or_clear_bad(pgd_t *pgd)
  1244	{
> 1245		pgd_t old_pgd = pgdp_get(pgd);
  1246	
  1247		if (pgd_none(old_pgd))
  1248			return 1;
  1249		if (unlikely(pgd_bad(old_pgd))) {
  1250			pgd_clear_bad(pgd);
  1251			return 1;
  1252		}
  1253		return 0;
  1254	}
  1255	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-18 20:30   ` kernel test robot
@ 2024-09-19  7:55     ` Anshuman Khandual
  2024-09-19  9:11       ` Russell King (Oracle)
  0 siblings, 1 reply; 37+ messages in thread
From: Anshuman Khandual @ 2024-09-19  7:55 UTC (permalink / raw)
  To: kernel test robot, linux-mm, Russell King (Oracle)
  Cc: llvm, oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Ryan Roberts, Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On 9/19/24 02:00, kernel test robot wrote:
> Hi Anshuman,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on char-misc/char-misc-testing]
> [also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11]
> [cannot apply to akpm-mm/mm-everything next-20240918]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331
> base:   char-misc/char-misc-testing
> patch link:    https://lore.kernel.org/r/20240917073117.1531207-8-anshuman.khandual%40arm.com
> patch subject: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
> config: arm-footbridge_defconfig (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/config)
> compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 8663a75fa2f31299ab8d1d90288d9df92aadee88)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202409190310.ViHBRe12-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from arch/arm/kernel/asm-offsets.c:12:
>    In file included from include/linux/mm.h:30:
>>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'?
>     1245 |         pgd_t old_pgd = pgdp_get(pgd);
>          |                         ^
>    arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get'
>      154 | #define pgdp_get(pgpd)          READ_ONCE(*pgdp)
>          |                                            ^
>    include/linux/pgtable.h:1243:48: note: 'pgd' declared here
>     1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd)
>          |                                                ^

arm (32) platform currently overrides pgdp_get() helper in the platform but
defines that like the exact same version as the generic one, albeit with a
typo which can be fixed with something like this.

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index be91e376df79..aedb32d49c2a 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -151,7 +151,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 
-#define pgdp_get(pgpd)         READ_ONCE(*pgdp)
+#define pgdp_get(pgdp)         READ_ONCE(*pgdp)
 
 #define pud_page(pud)          pmd_page(__pmd(pud_val(pud)))
 #define pud_write(pud)         pmd_write(__pmd(pud_val(pud)))

Regardless there is another problem here. On arm platform there are multiple
pgd_t definitions available depending on various configs but some are arrays
instead of a single data element, although platform pgdp_get() helper remains
the same for all.

arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2];
arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t;
arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2];
arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t;
arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t;

I guess it might need different pgdp_get() variants depending applicable pgd_t
definition. Will continue looking into this further but meanwhile copied Russel
King in case he might be able to give some direction.

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19  7:55     ` Anshuman Khandual
@ 2024-09-19  9:11       ` Russell King (Oracle)
  2024-09-19 15:48         ` Ryan Roberts
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King (Oracle) @ 2024-09-19  9:11 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: kernel test robot, linux-mm, llvm, oe-kbuild-all, Andrew Morton,
	David Hildenbrand, Ryan Roberts, Mike Rapoport (IBM),
	Arnd Bergmann, x86, linux-m68k, linux-fsdevel, kasan-dev,
	linux-kernel, linux-perf-users, Dimitri Sivanich, Alexander Viro,
	Muchun Song, Andrey Ryabinin, Miaohe Lin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig

On Thu, Sep 19, 2024 at 01:25:08PM +0530, Anshuman Khandual wrote:
> arm (32) platform currently overrides pgdp_get() helper in the platform but
> defines that like the exact same version as the generic one, albeit with a
> typo which can be fixed with something like this.

pgdp_get() was added to arm in eba2591d99d1 ("mm: Introduce
pudp/p4dp/pgdp_get() functions") with the typo you've spotted. It seems
it was added with no users, otherwise the error would have been spotted
earlier. I'm not a fan of adding dead code to the kernel for this
reason.

> Regardless there is another problem here. On arm platform there are multiple
> pgd_t definitions available depending on various configs but some are arrays
> instead of a single data element, although platform pgdp_get() helper remains
> the same for all.
> 
> arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2];
> arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t;
> arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2];
> arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t;
> arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t;
> 
> I guess it might need different pgdp_get() variants depending applicable pgd_t
> definition. Will continue looking into this further but meanwhile copied Russel
> King in case he might be able to give some direction.

That's Russel*L*, thanks.

32-bit arm uses, in some circumstances, an array because each level 1
page table entry is actually two descriptors. It needs to be this way
because each level 2 table pointed to by each level 1 entry has 256
entries, meaning it only occupies 1024 bytes in a 4096 byte page.

In order to cut down on the wastage, treat the level 1 page table as
groups of two entries, which point to two consecutive 1024 byte tables
in the level 2 page.

The level 2 entry isn't suitable for the kernel's use cases (there are
no bits to represent accessed/dirty and other important stuff that the
Linux MM wants) so we maintain the hardware page tables and a separate
set that Linux uses in the same page. Again, the software tables are
consecutive, so from Linux's perspective, the level 2 page tables
have 512 entries in them and occupy one full page.

This is documented in arch/arm/include/asm/pgtable-2level.h

However, what this means is that from the software perspective, the
level 1 page table descriptors are an array of two entries, both of
which need to be setup when creating a level 2 page table, but only
the first one should ever be dereferenced when walking the tables,
otherwise the code that walks the second level of page table entries
will walk off the end of the software table into the actual hardware
descriptors.

I've no idea what the idea is behind introducing pgd_get() and what
it's semantics are, so I can't comment further.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19  9:11       ` Russell King (Oracle)
@ 2024-09-19 15:48         ` Ryan Roberts
  2024-09-19 17:06           ` Russell King (Oracle)
  0 siblings, 1 reply; 37+ messages in thread
From: Ryan Roberts @ 2024-09-19 15:48 UTC (permalink / raw)
  To: Russell King (Oracle), Anshuman Khandual
  Cc: kernel test robot, linux-mm, llvm, oe-kbuild-all, Andrew Morton,
	David Hildenbrand, Mike Rapoport (IBM), Arnd Bergmann, x86,
	linux-m68k, linux-fsdevel, kasan-dev, linux-kernel,
	linux-perf-users, Dimitri Sivanich, Alexander Viro, Muchun Song,
	Andrey Ryabinin, Miaohe Lin, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Uladzislau Rezki, Christoph Hellwig

On 19/09/2024 10:11, Russell King (Oracle) wrote:
> On Thu, Sep 19, 2024 at 01:25:08PM +0530, Anshuman Khandual wrote:
>> arm (32) platform currently overrides pgdp_get() helper in the platform but
>> defines that like the exact same version as the generic one, albeit with a
>> typo which can be fixed with something like this.
> 
> pgdp_get() was added to arm in eba2591d99d1 ("mm: Introduce
> pudp/p4dp/pgdp_get() functions") with the typo you've spotted. It seems
> it was added with no users, otherwise the error would have been spotted
> earlier. I'm not a fan of adding dead code to the kernel for this
> reason.
> 
>> Regardless there is another problem here. On arm platform there are multiple
>> pgd_t definitions available depending on various configs but some are arrays
>> instead of a single data element, although platform pgdp_get() helper remains
>> the same for all.
>>
>> arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2];
>> arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t;
>> arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2];
>> arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t;
>> arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t;
>>
>> I guess it might need different pgdp_get() variants depending applicable pgd_t
>> definition. Will continue looking into this further but meanwhile copied Russel
>> King in case he might be able to give some direction.
> 
> That's Russel*L*, thanks.
> 
> 32-bit arm uses, in some circumstances, an array because each level 1
> page table entry is actually two descriptors. It needs to be this way
> because each level 2 table pointed to by each level 1 entry has 256
> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
> 
> In order to cut down on the wastage, treat the level 1 page table as
> groups of two entries, which point to two consecutive 1024 byte tables
> in the level 2 page.
> 
> The level 2 entry isn't suitable for the kernel's use cases (there are
> no bits to represent accessed/dirty and other important stuff that the
> Linux MM wants) so we maintain the hardware page tables and a separate
> set that Linux uses in the same page. Again, the software tables are
> consecutive, so from Linux's perspective, the level 2 page tables
> have 512 entries in them and occupy one full page.
> 
> This is documented in arch/arm/include/asm/pgtable-2level.h
> 
> However, what this means is that from the software perspective, the
> level 1 page table descriptors are an array of two entries, both of
> which need to be setup when creating a level 2 page table, but only
> the first one should ever be dereferenced when walking the tables,
> otherwise the code that walks the second level of page table entries
> will walk off the end of the software table into the actual hardware
> descriptors.
> 
> I've no idea what the idea is behind introducing pgd_get() and what
> it's semantics are, so I can't comment further.

The helper is intended to read the value of the entry pointed to by the passed
in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
tearing. Further, the PTL is expected to be held when calling the getter. If the
HW can write to the entry such that its racing with the lock holder (i.e. HW
update of access/dirty) then READ_ONCE() should be suitable for most
architectures. If there is no possibility of racing (because HW doesn't write to
the entry), then a simple dereference would be sufficient, I think (which is
what the core code was already doing in most cases).

There is additional benefit that the architecture can hook this function if it
has exotic use cases (see contpte feature on arm64 as an example, which hooks
ptep_get()).

It sounds to me like the arm (32) implementation of pgdp_get() could just
continue to do a direct dereference and this should be safe? I don't think it
supports HW update of access/dirty?

Thanks,
Ryan



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19 15:48         ` Ryan Roberts
@ 2024-09-19 17:06           ` Russell King (Oracle)
  2024-09-19 17:49             ` Ryan Roberts
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King (Oracle) @ 2024-09-19 17:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
> > 32-bit arm uses, in some circumstances, an array because each level 1
> > page table entry is actually two descriptors. It needs to be this way
> > because each level 2 table pointed to by each level 1 entry has 256
> > entries, meaning it only occupies 1024 bytes in a 4096 byte page.
> > 
> > In order to cut down on the wastage, treat the level 1 page table as
> > groups of two entries, which point to two consecutive 1024 byte tables
> > in the level 2 page.
> > 
> > The level 2 entry isn't suitable for the kernel's use cases (there are
> > no bits to represent accessed/dirty and other important stuff that the
> > Linux MM wants) so we maintain the hardware page tables and a separate
> > set that Linux uses in the same page. Again, the software tables are
> > consecutive, so from Linux's perspective, the level 2 page tables
> > have 512 entries in them and occupy one full page.
> > 
> > This is documented in arch/arm/include/asm/pgtable-2level.h
> > 
> > However, what this means is that from the software perspective, the
> > level 1 page table descriptors are an array of two entries, both of
> > which need to be setup when creating a level 2 page table, but only
> > the first one should ever be dereferenced when walking the tables,
> > otherwise the code that walks the second level of page table entries
> > will walk off the end of the software table into the actual hardware
> > descriptors.
> > 
> > I've no idea what the idea is behind introducing pgd_get() and what
> > it's semantics are, so I can't comment further.
> 
> The helper is intended to read the value of the entry pointed to by the passed
> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
> tearing. Further, the PTL is expected to be held when calling the getter. If the
> HW can write to the entry such that its racing with the lock holder (i.e. HW
> update of access/dirty) then READ_ONCE() should be suitable for most
> architectures. If there is no possibility of racing (because HW doesn't write to
> the entry), then a simple dereference would be sufficient, I think (which is
> what the core code was already doing in most cases).

The core code should be making no access to the PGD entries on 32-bit
ARM since the PGD level does not exist. Writes are done at PMD level
in arch code. Reads are done by core code at PMD level.

It feels to me like pgd_get() just doesn't fit the model to which 32-bit
ARM was designed to use decades ago, so I want full details about what
pgd_get() is going to be used for and how it is going to be used,
because I feel completely in the dark over this new development. I fear
that someone hasn't understood the Linux page table model if they're
wanting to access stuff at levels that effectively "aren't implemented"
in the architecture specific kernel model of the page tables.

Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the
virtual address. As far as the kernel is concerned, each entry is
64-bit, and the generic kernel code has no business accessing that
through the pgd pointer.

The pgd pointer is passed through the PUD and PMD levels, where it is
typecast down through the kernel layers to a pmd_t pointer, where it
becomes a 32-bit quantity. This results in only the _first_ level 1
pointer being dereferenced by kernel code to a 32-bit pmd_t quantity.
pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which
points at the software level 2 page tables, not the hardware page
tables.)

So, as I'm now being told that the kernel wants to dereference the
pgd level despite the model I describe above, alarm bells are ringing.
I want full information please.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19 17:06           ` Russell King (Oracle)
@ 2024-09-19 17:49             ` Ryan Roberts
  2024-09-19 20:25               ` Russell King (Oracle)
  0 siblings, 1 reply; 37+ messages in thread
From: Ryan Roberts @ 2024-09-19 17:49 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On 19/09/2024 18:06, Russell King (Oracle) wrote:
> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
>>> 32-bit arm uses, in some circumstances, an array because each level 1
>>> page table entry is actually two descriptors. It needs to be this way
>>> because each level 2 table pointed to by each level 1 entry has 256
>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
>>>
>>> In order to cut down on the wastage, treat the level 1 page table as
>>> groups of two entries, which point to two consecutive 1024 byte tables
>>> in the level 2 page.
>>>
>>> The level 2 entry isn't suitable for the kernel's use cases (there are
>>> no bits to represent accessed/dirty and other important stuff that the
>>> Linux MM wants) so we maintain the hardware page tables and a separate
>>> set that Linux uses in the same page. Again, the software tables are
>>> consecutive, so from Linux's perspective, the level 2 page tables
>>> have 512 entries in them and occupy one full page.
>>>
>>> This is documented in arch/arm/include/asm/pgtable-2level.h
>>>
>>> However, what this means is that from the software perspective, the
>>> level 1 page table descriptors are an array of two entries, both of
>>> which need to be setup when creating a level 2 page table, but only
>>> the first one should ever be dereferenced when walking the tables,
>>> otherwise the code that walks the second level of page table entries
>>> will walk off the end of the software table into the actual hardware
>>> descriptors.
>>>
>>> I've no idea what the idea is behind introducing pgd_get() and what
>>> it's semantics are, so I can't comment further.
>>
>> The helper is intended to read the value of the entry pointed to by the passed
>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
>> tearing. Further, the PTL is expected to be held when calling the getter. If the
>> HW can write to the entry such that its racing with the lock holder (i.e. HW
>> update of access/dirty) then READ_ONCE() should be suitable for most
>> architectures. If there is no possibility of racing (because HW doesn't write to
>> the entry), then a simple dereference would be sufficient, I think (which is
>> what the core code was already doing in most cases).
> 
> The core code should be making no access to the PGD entries on 32-bit
> ARM since the PGD level does not exist. Writes are done at PMD level
> in arch code. Reads are done by core code at PMD level.
> 
> It feels to me like pgd_get() just doesn't fit the model to which 32-bit
> ARM was designed to use decades ago, so I want full details about what
> pgd_get() is going to be used for and how it is going to be used,
> because I feel completely in the dark over this new development. I fear
> that someone hasn't understood the Linux page table model if they're
> wanting to access stuff at levels that effectively "aren't implemented"
> in the architecture specific kernel model of the page tables.

This change isn't as big and scary as I think you fear. The core-mm today
dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
follow_pfnmap_start(), gup_fast_pgd_leaf(), and many other sites. These changes
aim to abstract those dereferences into an inline function that the architecture
can override and implement if it so wishes.

The core-mm implements default versions of these helper functions which do
READ_ONCE(), but does not currently use them consistently.

From Anshuman's comments earlier in this thread, it looked to me like the arm
pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on
that arch. So my proposal was to implement the override for arm to do exactly
what the core-mm used to do, which is a pointer dereference. So that would
result in exact same behaviour for the arm arch.

> 
> Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the
> virtual address. As far as the kernel is concerned, each entry is
> 64-bit, and the generic kernel code has no business accessing that
> through the pgd pointer.
> 
> The pgd pointer is passed through the PUD and PMD levels, where it is
> typecast down through the kernel layers to a pmd_t pointer, where it
> becomes a 32-bit quantity. This results in only the _first_ level 1
> pointer being dereferenced by kernel code to a 32-bit pmd_t quantity.
> pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which
> points at the software level 2 page tables, not the hardware page
> tables.)

As an aside, my understanding of Linux's pgtable model differs from what you
describe. As I understand it, Linux's logical page table model has 5 levels
(pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the
middle levels can be folded away (p4d first, then pud, then pmd). But the
core-mm still logically walks all 5 levels. So if the HW supports 2 levels,
those levels are (pgd, pte). But you are suggesting that arm exposes pmd and
pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch,
but that is being folded and accessed through the pgd helpers in core code, I
believe?

> 
> So, as I'm now being told that the kernel wants to dereference the
> pgd level despite the model I describe above, alarm bells are ringing.
> I want full information please.
> 

This is not new; the kernel already dereferences the pgd pointers.

Thanks,
Ryan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19 17:49             ` Ryan Roberts
@ 2024-09-19 20:25               ` Russell King (Oracle)
  2024-09-20  6:57                 ` Ryan Roberts
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King (Oracle) @ 2024-09-19 20:25 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote:
> On 19/09/2024 18:06, Russell King (Oracle) wrote:
> > On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
> >>> 32-bit arm uses, in some circumstances, an array because each level 1
> >>> page table entry is actually two descriptors. It needs to be this way
> >>> because each level 2 table pointed to by each level 1 entry has 256
> >>> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
> >>>
> >>> In order to cut down on the wastage, treat the level 1 page table as
> >>> groups of two entries, which point to two consecutive 1024 byte tables
> >>> in the level 2 page.
> >>>
> >>> The level 2 entry isn't suitable for the kernel's use cases (there are
> >>> no bits to represent accessed/dirty and other important stuff that the
> >>> Linux MM wants) so we maintain the hardware page tables and a separate
> >>> set that Linux uses in the same page. Again, the software tables are
> >>> consecutive, so from Linux's perspective, the level 2 page tables
> >>> have 512 entries in them and occupy one full page.
> >>>
> >>> This is documented in arch/arm/include/asm/pgtable-2level.h
> >>>
> >>> However, what this means is that from the software perspective, the
> >>> level 1 page table descriptors are an array of two entries, both of
> >>> which need to be setup when creating a level 2 page table, but only
> >>> the first one should ever be dereferenced when walking the tables,
> >>> otherwise the code that walks the second level of page table entries
> >>> will walk off the end of the software table into the actual hardware
> >>> descriptors.
> >>>
> >>> I've no idea what the idea is behind introducing pgd_get() and what
> >>> it's semantics are, so I can't comment further.
> >>
> >> The helper is intended to read the value of the entry pointed to by the passed
> >> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
> >> tearing. Further, the PTL is expected to be held when calling the getter. If the
> >> HW can write to the entry such that its racing with the lock holder (i.e. HW
> >> update of access/dirty) then READ_ONCE() should be suitable for most
> >> architectures. If there is no possibility of racing (because HW doesn't write to
> >> the entry), then a simple dereference would be sufficient, I think (which is
> >> what the core code was already doing in most cases).
> > 
> > The core code should be making no access to the PGD entries on 32-bit
> > ARM since the PGD level does not exist. Writes are done at PMD level
> > in arch code. Reads are done by core code at PMD level.
> > 
> > It feels to me like pgd_get() just doesn't fit the model to which 32-bit
> > ARM was designed to use decades ago, so I want full details about what
> > pgd_get() is going to be used for and how it is going to be used,
> > because I feel completely in the dark over this new development. I fear
> > that someone hasn't understood the Linux page table model if they're
> > wanting to access stuff at levels that effectively "aren't implemented"
> > in the architecture specific kernel model of the page tables.
> 
> This change isn't as big and scary as I think you fear.

The situation is as I state above. Core code must _not_ dereference pgd
pointers on 32-bit ARM.

> The core-mm today
> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
> follow_pfnmap_start(),

Doesn't seem to exist at least not in 6.11.

> gup_fast_pgd_leaf(), and many other sites.

Only built when CONFIG_HAVE_GUP_FAST is set, which 32-bit ARM doesn't
set because its meaningless there, except when LPAE is in use (which is
basically the situation I'm discussing.)

> These changes
> aim to abstract those dereferences into an inline function that the architecture
> can override and implement if it so wishes.
> 
> The core-mm implements default versions of these helper functions which do
> READ_ONCE(), but does not currently use them consistently.
> 
> From Anshuman's comments earlier in this thread, it looked to me like the arm
> pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on
> that arch. So my proposal was to implement the override for arm to do exactly
> what the core-mm used to do, which is a pointer dereference. So that would
> result in exact same behaviour for the arm arch.

Let me say this again: core code must NOT dereference pgds on 32-bit
non-LPAE ARM. They are meaningless to core code. A pgd_t does not
reference a single entry in hardware. It references two entries.

> > Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the
> > virtual address. As far as the kernel is concerned, each entry is
> > 64-bit, and the generic kernel code has no business accessing that
> > through the pgd pointer.
> > 
> > The pgd pointer is passed through the PUD and PMD levels, where it is
> > typecast down through the kernel layers to a pmd_t pointer, where it
> > becomes a 32-bit quantity. This results in only the _first_ level 1
> > pointer being dereferenced by kernel code to a 32-bit pmd_t quantity.
> > pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which
> > points at the software level 2 page tables, not the hardware page
> > tables.)
> 
> As an aside, my understanding of Linux's pgtable model differs from what you
> describe. As I understand it, Linux's logical page table model has 5 levels
> (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the
> middle levels can be folded away (p4d first, then pud, then pmd). But the
> core-mm still logically walks all 5 levels. So if the HW supports 2 levels,
> those levels are (pgd, pte). But you are suggesting that arm exposes pmd and
> pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch,
> but that is being folded and accessed through the pgd helpers in core code, I
> believe?

What ARM does dates from before the Linux MM invented the current
"folding" method when we had three page table levels - pgd, pmd
and pte. The current folding techniques were invented well after
32-bit ARM was implemented, which was using the original idea of
how to fold the page tables.

The new folding came up with a totally different way of doing it,
and I looked into converting 32-bit ARM over to it, but it wasn't
possible to do so with the need for two level-1 entries to be
managed for each level-2 page table.

> > So, as I'm now being told that the kernel wants to dereference the
> > pgd level despite the model I describe above, alarm bells are ringing.
> > I want full information please.
> > 
> 
> This is not new; the kernel already dereferences the pgd pointers.

Consider that 32-bit ARM has been this way for decades (Linux was ported
to 32-bit ARM by me back in the 1990s - so it's about 30 years old.)
Compare that to what you're stating is "not new"... I beg to differ with
your opinion on what is new and what isn't. It's all about the relative
time.

This is how the page tables are walked:

static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
{
        return (pgd + pgd_index(address));
}

#define pgd_offset(mm, address)         pgd_offset_pgd((mm)->pgd, (address))

This returns a pointer to the pgd. This is then used with p4d_offset()
when walking the next level, and this is defined on 32-bit ARM from
include/asm-generic/pgtable-nop4d.h:

static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
{
        return (p4d_t *)pgd;
}

Then from include/asm-generic/pgtable-nopud.h:

static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
{
        return (pud_t *)p4d;
}

Then from arch/arm/include/asm/pgtable-2level.h:

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
{
        return (pmd_t *)pud;
}

All of the above casts result in the pgd_t pointer being cast down
to a pmd_t pointer.

Now, looking at stuff in mm/memory.c such as unmap_page_range().

        pgd = pgd_offset(vma->vm_mm, addr);

This gets the pgd pointer into the level 1 page tables associated
with addr, and passes it down to zap_p4d_range().

That passes it to p4d_offset() without dereferencing it, which on
32-bit ARM, merely casts the pgd_t pointer to a p4d_t pointer. Since
a p4d_t is defined to be a struct of a pgd_t, this also points at an
array of two 32-bit quantities. This pointer is passed down to
zap_pud_range().

zap_pud_range() passes this pointer to pud_offset(), again without
dereferencing it, and we end up with a pud_t pointer. Since pud_t is
defined to be a struct of p4d_t, this also points to an array of two
32-bit quantities.

We then have:

                if (pud_trans_huge(*pud) || pud_devmap(*pud)) {

These is an implicit memory copy/access between the memory pointed to
by pud, and their destination (which might be a register). However,
these are optimised away because 32-bit ARM doesn't set
HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD nor ARCH_HAS_PTE_DEVMAP (as
neither inline function make use of their argument.)

NOTE: If making these use READ_ONCE results in an access that can not
be optimised away, that is a bug that needs to be addressed.

zap_pud_range() then passes the pud pointer to zap_pmd_range().

zap_pmd_range() passes this pointer to pud_offset() with no further
dereferences, and this gets cast to a pmd_t pointer, which is a
pointer to the first 32-bit quantity pointed to by the pgd_t pointer.

All the dereferences from this point on are 32-bit which can be done
as single-copy atomic accesses. This will be the first real access
to the level-1 page tables in this code path as the code stands today,
and from this point on, accesses to the page tables are as the
architecture intends them to be.

Now, realise that for all of the accesses above that have all been
optimised away, none of that code even existed when 32-bit ARM was
using this method. The addition of these features not intefering
with the way 32-bit non-LPAE ARM works relies on all of those
accesses being optimised away, and they need to continue to be so
going forward.

Maybe that means that this new (and I mean new in relative terms
compared to the age of the 32-bit ARM code) pgdp_get() accessor
needs to be a non-dereferencing operation, so something like:

#define pgdp_get(pgdp)		((pgd_t){ })

in arch/arm/include/asm/pgtable-2level.h (note the corrected
spelling of pgdp), and the existing pgdp_get() moved to
arch/arm/include/asm/pgtable-3level.h. This isn't tested.

However, let me say this again... without knowing exactly how
and where pgdp_get() is intended to be used, I'm clutching at
straws here. Even looking at Linus' tree, there's very little in
evidence there to suggest how pgdp_get() is intended to be used.
For example, there's no references to it in mm/.

Please realise that I have _no_ _clue_ what "[PATCH V2 7/7] mm: Use
pgdp_get() for accessing PGD entries" is proposing. I wasn't on its
Cc list. I haven't seen the patch. The first I knew anything about
this was with the email that Anshuman Khandual sent in response to
the kernel build bot's build error.

I'm afraid that the kernel build bot's build error means that this
patch:

commit eba2591d99d1f14a04c8a8a845ab0795b93f5646
Author: Alexandre Ghiti <alexghiti@rivosinc.com>
Date:   Wed Dec 13 21:29:59 2023 +0100

    mm: Introduce pudp/p4dp/pgdp_get() functions

is actually broken. I'm sorry that I didn't review that, but how the
series looked when it landed in my mailbox, it looked like it was
specific to RISC-V and of no interest to me, so I didn't bother
reading it (I get _lots_ of email, I can't read everything.) This
is how it looks like in my mailbox (and note that they're marked
as new to this day):

3218 N T Dec 13 Alexandre Ghiti (   0) [PATCH v2 0/4] riscv: Use READ_ONCE()/WRI
3219 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 1/4] riscv: Use WRITE_ONCE()
3220 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 2/4] mm: Introduce pudp/p4dp
3221 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 3/4] riscv: mm: Only compile
3222 N T Dec 13 Alexandre Ghiti (   0) └─>[PATCH v2 4/4] riscv: Use accessors to
3223 N C Dec 14 Anup Patel      (   0)   └─>

Sorry, but I'm not even going to look at something like that when it
looks like it's for RISC-V and nothing else.

One final point... because I'm sure someone's going to say "but you
were in the To: header". I've long since given up using "am I in the
Cc/To header" to carry any useful or meaningful information to
indicate whether it's something I should read. I'm afraid that the
kernel community has long since taught me that is of no value what
so ever, so I merely go by "does this look of any interest". If not,
I don't bother even _opening_ the email.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-19 20:25               ` Russell King (Oracle)
@ 2024-09-20  6:57                 ` Ryan Roberts
  2024-09-20  9:47                   ` Russell King (Oracle)
  0 siblings, 1 reply; 37+ messages in thread
From: Ryan Roberts @ 2024-09-20  6:57 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On 19/09/2024 21:25, Russell King (Oracle) wrote:
> On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote:
>> On 19/09/2024 18:06, Russell King (Oracle) wrote:
>>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
>>>>> 32-bit arm uses, in some circumstances, an array because each level 1
>>>>> page table entry is actually two descriptors. It needs to be this way
>>>>> because each level 2 table pointed to by each level 1 entry has 256
>>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
>>>>>
>>>>> In order to cut down on the wastage, treat the level 1 page table as
>>>>> groups of two entries, which point to two consecutive 1024 byte tables
>>>>> in the level 2 page.
>>>>>
>>>>> The level 2 entry isn't suitable for the kernel's use cases (there are
>>>>> no bits to represent accessed/dirty and other important stuff that the
>>>>> Linux MM wants) so we maintain the hardware page tables and a separate
>>>>> set that Linux uses in the same page. Again, the software tables are
>>>>> consecutive, so from Linux's perspective, the level 2 page tables
>>>>> have 512 entries in them and occupy one full page.
>>>>>
>>>>> This is documented in arch/arm/include/asm/pgtable-2level.h
>>>>>
>>>>> However, what this means is that from the software perspective, the
>>>>> level 1 page table descriptors are an array of two entries, both of
>>>>> which need to be setup when creating a level 2 page table, but only
>>>>> the first one should ever be dereferenced when walking the tables,
>>>>> otherwise the code that walks the second level of page table entries
>>>>> will walk off the end of the software table into the actual hardware
>>>>> descriptors.
>>>>>
>>>>> I've no idea what the idea is behind introducing pgd_get() and what
>>>>> it's semantics are, so I can't comment further.
>>>>
>>>> The helper is intended to read the value of the entry pointed to by the passed
>>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
>>>> tearing. Further, the PTL is expected to be held when calling the getter. If the
>>>> HW can write to the entry such that its racing with the lock holder (i.e. HW
>>>> update of access/dirty) then READ_ONCE() should be suitable for most
>>>> architectures. If there is no possibility of racing (because HW doesn't write to
>>>> the entry), then a simple dereference would be sufficient, I think (which is
>>>> what the core code was already doing in most cases).
>>>
>>> The core code should be making no access to the PGD entries on 32-bit
>>> ARM since the PGD level does not exist. Writes are done at PMD level
>>> in arch code. Reads are done by core code at PMD level.
>>>
>>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit
>>> ARM was designed to use decades ago, so I want full details about what
>>> pgd_get() is going to be used for and how it is going to be used,
>>> because I feel completely in the dark over this new development. I fear
>>> that someone hasn't understood the Linux page table model if they're
>>> wanting to access stuff at levels that effectively "aren't implemented"
>>> in the architecture specific kernel model of the page tables.
>>
>> This change isn't as big and scary as I think you fear.
> 
> The situation is as I state above. Core code must _not_ dereference pgd
> pointers on 32-bit ARM.

Let's just rewind a bit. This thread exists because the kernel test robot failed
to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture
after Anshuman changed the direct pgd dereference to pgdp_get(). The reason
compilation failed is because arm defines its own pgdp_get() override, but it is
broken (there is a typo).

Code before Anshuman's change:

static inline int pgd_none_or_clear_bad(pgd_t *pgd)
{
	if (pgd_none(*pgd))
		return 1;
	if (unlikely(pgd_bad(*pgd))) {
		pgd_clear_bad(pgd);
		return 1;
	}
	return 0;
}

Code after Anshuman's change:

static inline int pgd_none_or_clear_bad(pgd_t *pgd)
{
	pgd_t old_pgd = pgdp_get(pgd);

	if (pgd_none(old_pgd))
		return 1;
	if (unlikely(pgd_bad(old_pgd))) {
		pgd_clear_bad(pgd);
		return 1;
	}
	return 0;
}

So the kernel _is_ alreday dereferencing pgd pointers for the arm arch, and has
been since the beginning of (git) time. Note that pgd_none_or_clear_bad() is
called from core code and from arm arch code.

As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in
various circumstances. And other changes in this series are also replacing those
direct dereferences with calls to similar helpers. The fact that these are all
folded (by a custom arm implementation if I've understood the below correctly)
just means that each dereference is returning what you would call the pmd from
the HW perspective, I think?

> 
>> The core-mm today
>> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
>> follow_pfnmap_start(),
> 
> Doesn't seem to exist at least not in 6.11.

Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in
v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above.

> 
>> gup_fast_pgd_leaf(), and many other sites.
> 
> Only built when CONFIG_HAVE_GUP_FAST is set, which 32-bit ARM doesn't
> set because its meaningless there, except when LPAE is in use (which is
> basically the situation I'm discussing.)
> 
>> These changes
>> aim to abstract those dereferences into an inline function that the architecture
>> can override and implement if it so wishes.
>>
>> The core-mm implements default versions of these helper functions which do
>> READ_ONCE(), but does not currently use them consistently.
>>
>> From Anshuman's comments earlier in this thread, it looked to me like the arm
>> pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on
>> that arch. So my proposal was to implement the override for arm to do exactly
>> what the core-mm used to do, which is a pointer dereference. So that would
>> result in exact same behaviour for the arm arch.
> 
> Let me say this again: core code must NOT dereference pgds on 32-bit
> non-LPAE ARM. They are meaningless to core code. A pgd_t does not
> reference a single entry in hardware. It references two entries.

OK, so there are 3 options; either I have misunderstood what the core code is
doing (because as per above, I'm asserting that core code _is_ dereferencing pgd
pointers), or the core code is dereferencing and that is buggy, or the core code
is derefencing and its working as designed. I believe its the latter, but am
willing to be proved wrong.

> 
>>> Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the
>>> virtual address. As far as the kernel is concerned, each entry is
>>> 64-bit, and the generic kernel code has no business accessing that
>>> through the pgd pointer.
>>>
>>> The pgd pointer is passed through the PUD and PMD levels, where it is
>>> typecast down through the kernel layers to a pmd_t pointer, where it
>>> becomes a 32-bit quantity. This results in only the _first_ level 1
>>> pointer being dereferenced by kernel code to a 32-bit pmd_t quantity.
>>> pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which
>>> points at the software level 2 page tables, not the hardware page
>>> tables.)
>>
>> As an aside, my understanding of Linux's pgtable model differs from what you
>> describe. As I understand it, Linux's logical page table model has 5 levels
>> (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the
>> middle levels can be folded away (p4d first, then pud, then pmd). But the
>> core-mm still logically walks all 5 levels. So if the HW supports 2 levels,
>> those levels are (pgd, pte). But you are suggesting that arm exposes pmd and
>> pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch,
>> but that is being folded and accessed through the pgd helpers in core code, I
>> believe?
> 
> What ARM does dates from before the Linux MM invented the current
> "folding" method when we had three page table levels - pgd, pmd
> and pte. The current folding techniques were invented well after
> 32-bit ARM was implemented, which was using the original idea of
> how to fold the page tables.
> 
> The new folding came up with a totally different way of doing it,
> and I looked into converting 32-bit ARM over to it, but it wasn't
> possible to do so with the need for two level-1 entries to be
> managed for each level-2 page table.
> 
>>> So, as I'm now being told that the kernel wants to dereference the
>>> pgd level despite the model I describe above, alarm bells are ringing.
>>> I want full information please.
>>>
>>
>> This is not new; the kernel already dereferences the pgd pointers.
> 
> Consider that 32-bit ARM has been this way for decades (Linux was ported
> to 32-bit ARM by me back in the 1990s - so it's about 30 years old.)
> Compare that to what you're stating is "not new"... I beg to differ with
> your opinion on what is new and what isn't. It's all about the relative
> time.

By "not new" I meant that it's not introduced by this series. The kernel's
dereferencing of pgd pointers was present before this series came along.

> 
> This is how the page tables are walked:
> 
> static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
> {
>         return (pgd + pgd_index(address));
> }
> 
> #define pgd_offset(mm, address)         pgd_offset_pgd((mm)->pgd, (address))
> 
> This returns a pointer to the pgd. This is then used with p4d_offset()
> when walking the next level, and this is defined on 32-bit ARM from
> include/asm-generic/pgtable-nop4d.h:
> 
> static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
> {
>         return (p4d_t *)pgd;
> }
> 
> Then from include/asm-generic/pgtable-nopud.h:
> 
> static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
> {
>         return (pud_t *)p4d;
> }
> 
> Then from arch/arm/include/asm/pgtable-2level.h:
> 
> static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> {
>         return (pmd_t *)pud;
> }
> 
> All of the above casts result in the pgd_t pointer being cast down
> to a pmd_t pointer.
> 
> Now, looking at stuff in mm/memory.c such as unmap_page_range().
> 
>         pgd = pgd_offset(vma->vm_mm, addr);
> 
> This gets the pgd pointer into the level 1 page tables associated
> with addr, and passes it down to zap_p4d_range().
> 
> That passes it to p4d_offset() without dereferencing it, which on
> 32-bit ARM, merely casts the pgd_t pointer to a p4d_t pointer. Since
> a p4d_t is defined to be a struct of a pgd_t, this also points at an
> array of two 32-bit quantities. This pointer is passed down to
> zap_pud_range().
> 
> zap_pud_range() passes this pointer to pud_offset(), again without
> dereferencing it, and we end up with a pud_t pointer. Since pud_t is
> defined to be a struct of p4d_t, this also points to an array of two
> 32-bit quantities.
> 
> We then have:
> 
>                 if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
> 
> These is an implicit memory copy/access between the memory pointed to
> by pud, and their destination (which might be a register). However,
> these are optimised away because 32-bit ARM doesn't set
> HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD nor ARCH_HAS_PTE_DEVMAP (as
> neither inline function make use of their argument.)
> 
> NOTE: If making these use READ_ONCE results in an access that can not
> be optimised away, that is a bug that needs to be addressed.
> 
> zap_pud_range() then passes the pud pointer to zap_pmd_range().
> 
> zap_pmd_range() passes this pointer to pud_offset() with no further
> dereferences, and this gets cast to a pmd_t pointer, which is a
> pointer to the first 32-bit quantity pointed to by the pgd_t pointer.
> 
> All the dereferences from this point on are 32-bit which can be done
> as single-copy atomic accesses. This will be the first real access
> to the level-1 page tables in this code path as the code stands today,
> and from this point on, accesses to the page tables are as the
> architecture intends them to be.
> 
> 
> Now, realise that for all of the accesses above that have all been
> optimised away, none of that code even existed when 32-bit ARM was
> using this method. The addition of these features not intefering
> with the way 32-bit non-LPAE ARM works relies on all of those
> accesses being optimised away, and they need to continue to be so
> going forward.
> 
> 
> Maybe that means that this new (and I mean new in relative terms
> compared to the age of the 32-bit ARM code) pgdp_get() accessor
> needs to be a non-dereferencing operation, so something like:
> 
> #define pgdp_get(pgdp)		((pgd_t){ })

I'm afraid I haven't digested all these arm-specific details. But if I'm right
that the core kernel does and is correct to dereference pgd pointers for these
non-LPAE arm builds, then I think you at least need arm's implementation to be:

#define pgdp_get(pgdp)		(*pgdp)

Thanks,
Ryan

> 
> in arch/arm/include/asm/pgtable-2level.h (note the corrected
> spelling of pgdp), and the existing pgdp_get() moved to
> arch/arm/include/asm/pgtable-3level.h. This isn't tested.
> 
> However, let me say this again... without knowing exactly how
> and where pgdp_get() is intended to be used, I'm clutching at
> straws here. Even looking at Linus' tree, there's very little in
> evidence there to suggest how pgdp_get() is intended to be used.
> For example, there's no references to it in mm/.
> 
> 
> Please realise that I have _no_ _clue_ what "[PATCH V2 7/7] mm: Use
> pgdp_get() for accessing PGD entries" is proposing. I wasn't on its
> Cc list. I haven't seen the patch. The first I knew anything about
> this was with the email that Anshuman Khandual sent in response to
> the kernel build bot's build error.

Here is the full series for context:

https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@arm.com/

> 
> I'm afraid that the kernel build bot's build error means that this
> patch:
> 
> commit eba2591d99d1f14a04c8a8a845ab0795b93f5646
> Author: Alexandre Ghiti <alexghiti@rivosinc.com>
> Date:   Wed Dec 13 21:29:59 2023 +0100
> 
>     mm: Introduce pudp/p4dp/pgdp_get() functions
> 
> is actually broken. I'm sorry that I didn't review that, but how the
> series looked when it landed in my mailbox, it looked like it was
> specific to RISC-V and of no interest to me, so I didn't bother
> reading it (I get _lots_ of email, I can't read everything.) This
> is how it looks like in my mailbox (and note that they're marked
> as new to this day):
> 
> 3218 N T Dec 13 Alexandre Ghiti (   0) [PATCH v2 0/4] riscv: Use READ_ONCE()/WRI
> 3219 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 1/4] riscv: Use WRITE_ONCE()
> 3220 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 2/4] mm: Introduce pudp/p4dp
> 3221 N T Dec 13 Alexandre Ghiti (   0) ├─>[PATCH v2 3/4] riscv: mm: Only compile
> 3222 N T Dec 13 Alexandre Ghiti (   0) └─>[PATCH v2 4/4] riscv: Use accessors to
> 3223 N C Dec 14 Anup Patel      (   0)   └─>
> 
> Sorry, but I'm not even going to look at something like that when it
> looks like it's for RISC-V and nothing else.
> 
> One final point... because I'm sure someone's going to say "but you
> were in the To: header". I've long since given up using "am I in the
> Cc/To header" to carry any useful or meaningful information to
> indicate whether it's something I should read. I'm afraid that the
> kernel community has long since taught me that is of no value what
> so ever, so I merely go by "does this look of any interest". If not,
> I don't bother even _opening_ the email.
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-20  6:57                 ` Ryan Roberts
@ 2024-09-20  9:47                   ` Russell King (Oracle)
  2024-09-23 15:21                     ` Ryan Roberts
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King (Oracle) @ 2024-09-20  9:47 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

On Fri, Sep 20, 2024 at 08:57:23AM +0200, Ryan Roberts wrote:
> On 19/09/2024 21:25, Russell King (Oracle) wrote:
> > On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote:
> >> On 19/09/2024 18:06, Russell King (Oracle) wrote:
> >>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
> >>>>> 32-bit arm uses, in some circumstances, an array because each level 1
> >>>>> page table entry is actually two descriptors. It needs to be this way
> >>>>> because each level 2 table pointed to by each level 1 entry has 256
> >>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
> >>>>>
> >>>>> In order to cut down on the wastage, treat the level 1 page table as
> >>>>> groups of two entries, which point to two consecutive 1024 byte tables
> >>>>> in the level 2 page.
> >>>>>
> >>>>> The level 2 entry isn't suitable for the kernel's use cases (there are
> >>>>> no bits to represent accessed/dirty and other important stuff that the
> >>>>> Linux MM wants) so we maintain the hardware page tables and a separate
> >>>>> set that Linux uses in the same page. Again, the software tables are
> >>>>> consecutive, so from Linux's perspective, the level 2 page tables
> >>>>> have 512 entries in them and occupy one full page.
> >>>>>
> >>>>> This is documented in arch/arm/include/asm/pgtable-2level.h
> >>>>>
> >>>>> However, what this means is that from the software perspective, the
> >>>>> level 1 page table descriptors are an array of two entries, both of
> >>>>> which need to be setup when creating a level 2 page table, but only
> >>>>> the first one should ever be dereferenced when walking the tables,
> >>>>> otherwise the code that walks the second level of page table entries
> >>>>> will walk off the end of the software table into the actual hardware
> >>>>> descriptors.
> >>>>>
> >>>>> I've no idea what the idea is behind introducing pgd_get() and what
> >>>>> it's semantics are, so I can't comment further.
> >>>>
> >>>> The helper is intended to read the value of the entry pointed to by the passed
> >>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
> >>>> tearing. Further, the PTL is expected to be held when calling the getter. If the
> >>>> HW can write to the entry such that its racing with the lock holder (i.e. HW
> >>>> update of access/dirty) then READ_ONCE() should be suitable for most
> >>>> architectures. If there is no possibility of racing (because HW doesn't write to
> >>>> the entry), then a simple dereference would be sufficient, I think (which is
> >>>> what the core code was already doing in most cases).
> >>>
> >>> The core code should be making no access to the PGD entries on 32-bit
> >>> ARM since the PGD level does not exist. Writes are done at PMD level
> >>> in arch code. Reads are done by core code at PMD level.
> >>>
> >>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit
> >>> ARM was designed to use decades ago, so I want full details about what
> >>> pgd_get() is going to be used for and how it is going to be used,
> >>> because I feel completely in the dark over this new development. I fear
> >>> that someone hasn't understood the Linux page table model if they're
> >>> wanting to access stuff at levels that effectively "aren't implemented"
> >>> in the architecture specific kernel model of the page tables.
> >>
> >> This change isn't as big and scary as I think you fear.
> > 
> > The situation is as I state above. Core code must _not_ dereference pgd
> > pointers on 32-bit ARM.
> 
> Let's just rewind a bit. This thread exists because the kernel test robot failed
> to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture
> after Anshuman changed the direct pgd dereference to pgdp_get(). The reason
> compilation failed is because arm defines its own pgdp_get() override, but it is
> broken (there is a typo).

Let's not rewind, because had you fully read and digested my reply, you
would have seen why this isn't a problem... but let me spell it out.

> 
> Code before Anshuman's change:
> 
> static inline int pgd_none_or_clear_bad(pgd_t *pgd)
> {
> 	if (pgd_none(*pgd))
> 		return 1;
> 	if (unlikely(pgd_bad(*pgd))) {
> 		pgd_clear_bad(pgd);
> 		return 1;
> 	}
> 	return 0;
> }

This isn't a problem as the code stands. While there is a dereference
in C, that dereference is a simple struct copy, something that we use
everywhere in the kernel. However, that is as far as it goes, because
neither pgd_none() and pgd_bad() make use of their argument, and thus
the compiler will optimise it away, resulting in no actual access to
the page tables - _as_ _intended_.

If these are going to be converted to pgd_get(), then we need pgd_get()
to _also_ be optimised away, and if e.g. this is the only place that
pgd_get() is going to be used, the suggestion I made in my previous
email is entirely reasonable, since we know that the result of pgd_get()
will not actually be used.

> As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in
> various circumstances.

I already covered these in my previous reply.

> And other changes in this series are also replacing those
> direct dereferences with calls to similar helpers. The fact that these are all
> folded (by a custom arm implementation if I've understood the below correctly)
> just means that each dereference is returning what you would call the pmd from
> the HW perspective, I think?

It'll "return" the first of each pair of level-1 page table entries,
which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd
need to be optimised away, so throwing lots of READ_ONCE() around
this code without considering this is certainly the wrong approach.

> >> The core-mm today
> >> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
> >> follow_pfnmap_start(),
> > 
> > Doesn't seem to exist at least not in 6.11.
> 
> Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in
> v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above.

Looking at follow_pte(), it's not a problem.

I think we wouldn't be having this conversation before:

commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0
Author: Russell King <rmk+kernel@arm.linux.org.uk>
Date:   Tue Nov 22 17:30:28 2011 +0000

    ARM: pgtable: switch to use pgtable-nopud.h

where:
-#define pgd_none(pgd)          (0)
-#define pgd_bad(pgd)           (0)

existed before this commit - and thus the dereference in things like:

	pgd_none(*pgd)

wouldn't even be visible to beyond the preprocessor step.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
  2024-09-20  9:47                   ` Russell King (Oracle)
@ 2024-09-23 15:21                     ` Ryan Roberts
  0 siblings, 0 replies; 37+ messages in thread
From: Ryan Roberts @ 2024-09-23 15:21 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Anshuman Khandual, kernel test robot, linux-mm, llvm,
	oe-kbuild-all, Andrew Morton, David Hildenbrand,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users,
	Dimitri Sivanich, Alexander Viro, Muchun Song, Andrey Ryabinin,
	Miaohe Lin, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Uladzislau Rezki, Christoph Hellwig

>> Let's just rewind a bit. This thread exists because the kernel test robot failed
>> to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture
>> after Anshuman changed the direct pgd dereference to pgdp_get(). The reason
>> compilation failed is because arm defines its own pgdp_get() override, but it is
>> broken (there is a typo).
> 
> Let's not rewind, because had you fully read and digested my reply, you
> would have seen why this isn't a problem... but let me spell it out.
> 
>>
>> Code before Anshuman's change:
>>
>> static inline int pgd_none_or_clear_bad(pgd_t *pgd)
>> {
>> 	if (pgd_none(*pgd))
>> 		return 1;
>> 	if (unlikely(pgd_bad(*pgd))) {
>> 		pgd_clear_bad(pgd);
>> 		return 1;
>> 	}
>> 	return 0;
>> }
> 
> This isn't a problem as the code stands. While there is a dereference
> in C, that dereference is a simple struct copy, something that we use
> everywhere in the kernel. However, that is as far as it goes, because
> neither pgd_none() and pgd_bad() make use of their argument, and thus
> the compiler will optimise it away, resulting in no actual access to
> the page tables - _as_ _intended_.

Right. Are you saying you depend upon those loads being optimized away for
correctness or performance reasons?

> 
> If these are going to be converted to pgd_get(), then we need pgd_get()
> to _also_ be optimised away, 

OK, agreed.

So perhaps the best approach is to modify the existing default pxdp_get()
implementations to just do a C dereference. That will ensure that there are no
intended consequences, unlike moving to READ_ONCE() by default. Then riscv
(which I think is the only arch to actually use pxdp_get() currently?) will need
its own pxdp_get() overrides, which use READ_ONCE(). arm64 would also define its
own overrides in terms of READ_ONCE() to ensure single copy atomicity in the
presence of HW updates.

How does that sound to you?

> and if e.g. this is the only place that
> pgd_get() is going to be used, the suggestion I made in my previous
> email is entirely reasonable, since we know that the result of pgd_get()
> will not actually be used.

I guess you could do that as an arm-specific override, but I don't think it adds
anything over using my proposed reworked default? Your call.

> 
>> As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in
>> various circumstances.
> 
> I already covered these in my previous reply.
> 
>> And other changes in this series are also replacing those
>> direct dereferences with calls to similar helpers. The fact that these are all
>> folded (by a custom arm implementation if I've understood the below correctly)
>> just means that each dereference is returning what you would call the pmd from
>> the HW perspective, I think?
> 
> It'll "return" the first of each pair of level-1 page table entries,
> which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd
> need to be optimised away, so throwing lots of READ_ONCE() around
> this code without considering this is certainly the wrong approach.

Yep, got it.

> 
>>>> The core-mm today
>>>> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
>>>> follow_pfnmap_start(),
>>>
>>> Doesn't seem to exist at least not in 6.11.
>>
>> Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in
>> v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above.
> 
> Looking at follow_pte(), it's not a problem.
> 
> I think we wouldn't be having this conversation before:
> 
> commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0
> Author: Russell King <rmk+kernel@arm.linux.org.uk>
> Date:   Tue Nov 22 17:30:28 2011 +0000
> 
>     ARM: pgtable: switch to use pgtable-nopud.h
> 
> where:
> -#define pgd_none(pgd)          (0)
> -#define pgd_bad(pgd)           (0)
> 
> existed before this commit - and thus the dereference in things like:
> 
> 	pgd_none(*pgd)
> 
> wouldn't even be visible to beyond the preprocessor step.
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries
  2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
                   ` (6 preceding siblings ...)
  2024-09-17  7:31 ` [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Anshuman Khandual
@ 2024-09-25 10:05 ` Christophe Leroy
  7 siblings, 0 replies; 37+ messages in thread
From: Christophe Leroy @ 2024-09-25 10:05 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Ryan Roberts,
	Mike Rapoport (IBM), Arnd Bergmann, x86, linux-m68k,
	linux-fsdevel, kasan-dev, linux-kernel, linux-perf-users



Le 17/09/2024 à 09:31, Anshuman Khandual a écrit :
> This series converts all generic page table entries direct derefences via
> pxdp_get() based helpers extending the changes brought in via the commit
> c33c794828f2 ("mm: ptep_get() conversion"). First it does some platform
> specific changes for m68k and x86 architecture.
> 
> This series has been build tested on multiple architecture such as x86,
> arm64, powerpc, powerpc64le, riscv, and m68k etc.

Seems like this series imply sub-optimal code with unnecessary reads.

Lets take a simple exemple : function mm_find_pmd() in mm/rmap.c

On a PPC32 platform (2 level pagetables):

Before the patch:

00001b54 <mm_find_pmd>:
     1b54:	80 63 00 18 	lwz     r3,24(r3)
     1b58:	54 84 65 3a 	rlwinm  r4,r4,12,20,29
     1b5c:	7c 63 22 14 	add     r3,r3,r4
     1b60:	4e 80 00 20 	blr

Here, the function reads mm->pgd, then calculates and returns a pointer 
to the PMD entry corresponding to the address.

After the patch:

00001b54 <mm_find_pmd>:
     1b54:	81 23 00 18 	lwz     r9,24(r3)
     1b58:	54 84 65 3a 	rlwinm  r4,r4,12,20,29
     1b5c:	7d 49 20 2e 	lwzx    r10,r9,r4	<= useless read
     1b60:	7c 69 22 14 	add     r3,r9,r4
     1b64:	7d 49 20 2e 	lwzx    r10,r9,r4	<= useless read
     1b68:	7d 29 20 2e 	lwzx    r9,r9,r4	<= useless read
     1b6c:	4e 80 00 20 	blr

Here, the function also reads mm->pgd and still calculates and returns a 
pointer to the PMD entry corresponding to the address. But in addition 
to that it reads three times that entry while doing nothing at all with 
the value read.

On PPC32, PMD/PUD/P4D are single entry tables folded into the 
corresponding PGD entry, it is therefore pointless to read the 
intermediate entries.

Christophe

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-09-25 10:05 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17  7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
2024-09-17  7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
2024-09-17  8:40   ` Ryan Roberts
2024-09-17 10:20   ` David Hildenbrand
2024-09-17 10:27     ` Ryan Roberts
2024-09-17 10:30       ` David Hildenbrand
2024-09-17  7:31 ` [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR() Anshuman Khandual
2024-09-17 10:22   ` David Hildenbrand
2024-09-17 11:19     ` Dave Hansen
2024-09-17 11:25       ` Anshuman Khandual
2024-09-17 11:31       ` David Hildenbrand
2024-09-17  7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
2024-09-17  8:44   ` Ryan Roberts
2024-09-17 10:28   ` David Hildenbrand
2024-09-18  6:32     ` Anshuman Khandual
2024-09-19  8:04       ` David Hildenbrand
2024-09-19  9:20         ` Anshuman Khandual
2024-09-17  7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
2024-09-17 10:05   ` Ryan Roberts
2024-09-18 18:57   ` kernel test robot
2024-09-19  7:21     ` Anshuman Khandual
2024-09-18 19:07   ` kernel test robot
2024-09-19  7:12     ` Anshuman Khandual
2024-09-17  7:31 ` [PATCH V2 5/7] mm: Use pudp_get() for accessing PUD entries Anshuman Khandual
2024-09-17  7:31 ` [PATCH V2 6/7] mm: Use p4dp_get() for accessing P4D entries Anshuman Khandual
2024-09-17  7:31 ` [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Anshuman Khandual
2024-09-18 20:30   ` kernel test robot
2024-09-19  7:55     ` Anshuman Khandual
2024-09-19  9:11       ` Russell King (Oracle)
2024-09-19 15:48         ` Ryan Roberts
2024-09-19 17:06           ` Russell King (Oracle)
2024-09-19 17:49             ` Ryan Roberts
2024-09-19 20:25               ` Russell King (Oracle)
2024-09-20  6:57                 ` Ryan Roberts
2024-09-20  9:47                   ` Russell King (Oracle)
2024-09-23 15:21                     ` Ryan Roberts
2024-09-25 10:05 ` [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).