From: Harry Yoo <harry.yoo@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Dennis Zhou <dennis@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
x86@kernel.org, Borislav Petkov <bp@alien8.de>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Christoph Lameter <cl@gentwo.org>,
David Hildenbrand <david@redhat.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
kasan-dev@googlegroups.com, Mike Rapoport <rppt@kernel.org>,
Ard Biesheuvel <ardb@kernel.org>,
linux-kernel@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>,
Alexander Potapenko <glider@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Suren Baghdasaryan <surenb@google.com>,
Thomas Huth <thuth@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Michal Hocko <mhocko@suse.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-mm@kvack.org, "Kirill A. Shutemov" <kas@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Jane Chu <jane.chu@oracle.com>,
Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Joerg Roedel <joro@8bytes.org>,
Alistair Popple <apopple@nvidia.com>,
Joao Martins <joao.m.martins@oracle.com>,
linux-arch@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
Date: Tue, 12 Aug 2025 17:59:02 +0900 [thread overview]
Message-ID: <aJsCVtgfIVxT6Z93@hyeyoo> (raw)
In-Reply-To: <9b57f325-2dc7-48a4-b2f0-d7daa2192925@lucifer.local>
On Mon, Aug 11, 2025 at 12:46:32PM +0100, Lorenzo Stoakes wrote:
> On Mon, Aug 11, 2025 at 02:34:20PM +0900, Harry Yoo wrote:
> > Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure
> > page tables are properly synchronized when calling p*d_populate_kernel().
> > It is inteneded to synchronize page tables via pgd_pouplate_kernel() when
> > 5-level paging is in use and via p4d_pouplate_kernel() when 4-level paging
> > is used.
> >
>
> I think it's worth mentioning here that pgd_populate() is a no-op in 4-level
> systems, so the sychronisation must occur at the P4D level, just to make this
> clear.
Yeah, that's indeed confusing and agree that it's worth mentioning.
Will do. The new one:
Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to
ensure page tables are properly synchronized when calling
p*d_populate_kernel().
For 5-level paging, synchronization is performed via pgd_populate_kernel().
In 4-level paging, pgd_populate() is a no-op, so synchronization is instead
performed at the P4D level via p4d_populate_kernel().
> > This fixes intermittent boot failures on systems using 4-level paging
> > and a large amount of persistent memory:
> >
> > BUG: unable to handle page fault for address: ffffe70000000034
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 0 P4D 0
> > Oops: 0002 [#1] SMP NOPTI
> > RIP: 0010:__init_single_page+0x9/0x6d
> > Call Trace:
> > <TASK>
> > __init_zone_device_page+0x17/0x5d
> > memmap_init_zone_device+0x154/0x1bb
> > pagemap_range+0x2e0/0x40f
> > memremap_pages+0x10b/0x2f0
> > devm_memremap_pages+0x1e/0x60
> > dev_dax_probe+0xce/0x2ec [device_dax]
> > dax_bus_probe+0x6d/0xc9
> > [... snip ...]
> > </TASK>
> >
> > It also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap
> > before sync_global_pgds() [1]:
> >
> > BUG: unable to handle page fault for address: ffffeb3ff1200000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 0 P4D 0
> > Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> > Tainted: [W]=WARN
> > RIP: 0010:vmemmap_set_pmd+0xff/0x230
> > <TASK>
> > vmemmap_populate_hugepages+0x176/0x180
> > vmemmap_populate+0x34/0x80
> > __populate_section_memmap+0x41/0x90
> > sparse_add_section+0x121/0x3e0
> > __add_pages+0xba/0x150
> > add_pages+0x1d/0x70
> > memremap_pages+0x3dc/0x810
> > devm_memremap_pages+0x1c/0x60
> > xe_devm_add+0x8b/0x100 [xe]
> > xe_tile_init_noalloc+0x6a/0x70 [xe]
> > xe_device_probe+0x48c/0x740 [xe]
> > [... snip ...]
> >
> > Cc: <stable@vger.kernel.org>
> > Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
> > Closes: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@intel.com [1]
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>
> Other than nitty comments, this looks good to me, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Thanks!
> > ---
> > arch/x86/include/asm/pgtable_64_types.h | 3 +++
> > arch/x86/mm/init_64.c | 5 +++++
> > 2 files changed, 8 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> > index 4604f924d8b8..7eb61ef6a185 100644
> > --- a/arch/x86/include/asm/pgtable_64_types.h
> > +++ b/arch/x86/include/asm/pgtable_64_types.h
> > @@ -36,6 +36,9 @@ static inline bool pgtable_l5_enabled(void)
> > #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57)
> > #endif /* USE_EARLY_PGTABLE_L5 */
> >
> > +#define ARCH_PAGE_TABLE_SYNC_MASK \
> > + (pgtable_l5_enabled() ? PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
> > +
> > extern unsigned int pgdir_shift;
> > extern unsigned int ptrs_per_p4d;
> >
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index 76e33bd7c556..a78b498c0dc3 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -223,6 +223,11 @@ static void sync_global_pgds(unsigned long start, unsigned long end)
> > sync_global_pgds_l4(start, end);
> > }
> >
>
> Worth a comment to say 'if 4-level, then we synchronise at P4D level by
> convention, however the same sync_global_pgds() applies'?
Maybe:
/*
* Make kernel mappings visible in all page tables in the system.
* This is necessary except when the init task populates kernel mappings
* during the boot process. In that case, all processes originating from
* the init task copies the kernel mappings, so there is no issue.
* Otherwise, missing synchronization could lead to kernel crashes due
* to missing page table entries for certain kernel mappings.
*
* Synchronization is performed at the top level, which is the PGD in
* 5-level paging systems. But in 4-level paging systems, however,
* pgd_populate() is a no-op, so synchronization is done at P4D level instead.
* sync_global_pgds() handles this difference between paging levels.
*/
--
Cheers,
Harry / Hyeonggon
> > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> > +{
> > + sync_global_pgds(start, end);
> > +}
> > +
> > /*
> > * NOTE: This function is marked __ref because it calls __init function
> > * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
> > --
> > 2.43.0
> >
next prev parent reply other threads:[~2025-08-12 9:00 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 5:34 [PATCH V4 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss Harry Yoo
2025-08-11 5:34 ` [PATCH V4 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h Harry Yoo
2025-08-11 8:05 ` Mike Rapoport
2025-08-11 8:36 ` Harry Yoo
2025-08-11 8:52 ` Mike Rapoport
2025-08-11 9:19 ` Uladzislau Rezki
2025-08-11 11:21 ` Lorenzo Stoakes
2025-08-11 5:34 ` [PATCH V4 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel() Harry Yoo
2025-08-11 8:10 ` Mike Rapoport
2025-08-11 9:10 ` Lorenzo Stoakes
2025-08-11 10:36 ` Harry Yoo
2025-08-11 11:18 ` Lorenzo Stoakes
2025-08-11 11:38 ` Lorenzo Stoakes
2025-08-11 12:12 ` Harry Yoo
2025-08-11 12:18 ` Lorenzo Stoakes
2025-08-12 9:53 ` Harry Yoo
2025-08-12 16:08 ` Lorenzo Stoakes
2025-08-25 11:27 ` Christophe Leroy
2025-08-25 16:02 ` Harry Yoo
2025-08-11 5:34 ` [PATCH V4 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() Harry Yoo
2025-08-11 8:13 ` Mike Rapoport
2025-08-11 11:46 ` Lorenzo Stoakes
2025-08-12 8:59 ` Harry Yoo [this message]
2025-08-12 16:36 ` Lorenzo Stoakes
2025-08-11 6:46 ` [PATCH V4 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss Kiryl Shutsemau
2025-08-11 8:09 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJsCVtgfIVxT6Z93@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=cl@gentwo.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=gwan-gyeong.mun@intel.com \
--cc=hpa@zytor.com \
--cc=jane.chu@oracle.com \
--cc=jhubbard@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=joro@8bytes.org \
--cc=kas@kernel.org \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=osalvador@suse.de \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=ryabinin.a.a@gmail.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=thuth@redhat.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.