public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/4] x86: Reduce code duplication on page table initialization
@ 2024-08-14 12:46 Kirill A. Shutemov
  2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov

Use kernel_ident_mapping_init() to initialize kernel page tables where
possible, replacing manual initialization, reducing code duplication.

v2:
 - A separate patch to change what PA is mapped at relocate_kernel() VA.
 - Improve commit messages;
 - Add Reveiwed-by from Kai;

Kirill A. Shutemov (4):
  x86/mm/ident_map: Fix virtual address wrap to zero
  x86/acpi: Replace manual page table initialization with
    kernel_ident_mapping_init()
  x86/64/kexec: Map original relocate_kernel() in
    init_transition_pgtable()
  x86/64/kexec: Rewrite init_transition_pgtable() with
    kernel_ident_mapping_init()

 arch/x86/include/asm/kexec.h       |  5 +-
 arch/x86/kernel/acpi/madt_wakeup.c | 73 +++++-------------------
 arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
 arch/x86/mm/ident_map.c            | 14 +----
 4 files changed, 50 insertions(+), 131 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
  2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
  2024-08-14 14:22   ` Tom Lendacky
  2024-08-14 19:25   ` Thomas Gleixner
  2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
	Kai Huang

Calculation of 'next' virtual address doesn't protect against wrapping
to zero. It can result in page table corruption and hang. The
problematic case is possible if user sets high x86_mapping_info::offset.

Replace manual 'next' calculation with p?d_addr_end() which handles
wrapping correctly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/mm/ident_map.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 437e96fb4977..5872f3ee863c 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -101,9 +101,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		pmd_t *pmd;
 		bool use_gbpage;
 
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
+		next = pud_addr_end(addr, end);
 
 		/* if this is already a gbpage, this portion is already mapped */
 		if (pud_leaf(*pud))
@@ -154,10 +152,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		p4d_t *p4d = p4d_page + p4d_index(addr);
 		pud_t *pud;
 
-		next = (addr & P4D_MASK) + P4D_SIZE;
-		if (next > end)
-			next = end;
-
+		next = p4d_addr_end(addr, end);
 		if (p4d_present(*p4d)) {
 			pud = pud_offset(p4d, 0);
 			result = ident_pud_init(info, pud, addr, next);
@@ -199,10 +194,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		pgd_t *pgd = pgd_page + pgd_index(addr);
 		p4d_t *p4d;
 
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
+		next = pgd_addr_end(addr, end);
 		if (pgd_present(*pgd)) {
 			p4d = p4d_offset(pgd, 0);
 			result = ident_p4d_init(info, p4d, addr, next);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
  2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
  2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
  2024-08-14 15:55   ` Tom Lendacky
  2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
  2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
  3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
	Kai Huang

The function init_transition_pgtable() maps the page with
asm_acpi_mp_play_dead() into an identity mapping.

Replace manual page table initialization with kernel_ident_mapping_init()
to avoid code duplication. Use x86_mapping_info::offset to get the page
mapped at the correct location.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/acpi/madt_wakeup.c | 73 ++++++------------------------
 1 file changed, 15 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index d5ef6215583b..78960b338be9 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -70,58 +70,6 @@ static void __init free_pgt_page(void *pgt, void *dummy)
 	return memblock_free(pgt, PAGE_SIZE);
 }
 
-/*
- * Make sure asm_acpi_mp_play_dead() is present in the identity mapping at
- * the same place as in the kernel page tables. asm_acpi_mp_play_dead() switches
- * to the identity mapping and the function has be present at the same spot in
- * the virtual address space before and after switching page tables.
- */
-static int __init init_transition_pgtable(pgd_t *pgd)
-{
-	pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
-	unsigned long vaddr, paddr;
-	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-
-	vaddr = (unsigned long)asm_acpi_mp_play_dead;
-	pgd += pgd_index(vaddr);
-	if (!pgd_present(*pgd)) {
-		p4d = (p4d_t *)alloc_pgt_page(NULL);
-		if (!p4d)
-			return -ENOMEM;
-		set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
-	}
-	p4d = p4d_offset(pgd, vaddr);
-	if (!p4d_present(*p4d)) {
-		pud = (pud_t *)alloc_pgt_page(NULL);
-		if (!pud)
-			return -ENOMEM;
-		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
-	}
-	pud = pud_offset(p4d, vaddr);
-	if (!pud_present(*pud)) {
-		pmd = (pmd_t *)alloc_pgt_page(NULL);
-		if (!pmd)
-			return -ENOMEM;
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
-		pte = (pte_t *)alloc_pgt_page(NULL);
-		if (!pte)
-			return -ENOMEM;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
-	}
-	pte = pte_offset_kernel(pmd, vaddr);
-
-	paddr = __pa(vaddr);
-	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
-
-	return 0;
-}
-
 static int __init acpi_mp_setup_reset(u64 reset_vector)
 {
 	struct x86_mapping_info info = {
@@ -130,6 +78,7 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
 		.page_flag      = __PAGE_KERNEL_LARGE_EXEC,
 		.kernpg_flag    = _KERNPG_TABLE_NOENC,
 	};
+	unsigned long mstart, mend;
 	pgd_t *pgd;
 
 	pgd = alloc_pgt_page(NULL);
@@ -137,8 +86,6 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
 		return -ENOMEM;
 
 	for (int i = 0; i < nr_pfn_mapped; i++) {
-		unsigned long mstart, mend;
-
 		mstart = pfn_mapped[i].start << PAGE_SHIFT;
 		mend   = pfn_mapped[i].end << PAGE_SHIFT;
 		if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
@@ -147,14 +94,24 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
 		}
 	}
 
-	if (kernel_ident_mapping_init(&info, pgd,
-				      PAGE_ALIGN_DOWN(reset_vector),
-				      PAGE_ALIGN(reset_vector + 1))) {
+	mstart = PAGE_ALIGN_DOWN(reset_vector);
+	mend = mstart + PAGE_SIZE;
+	if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
 		kernel_ident_mapping_free(&info, pgd);
 		return -ENOMEM;
 	}
 
-	if (init_transition_pgtable(pgd)) {
+	/*
+	 * Make sure asm_acpi_mp_play_dead() is present in the identity mapping
+	 * at the same place as in the kernel page tables.
+	 * asm_acpi_mp_play_dead() switches to the identity mapping and the
+	 * function has be present at the same spot in the virtual address space
+	 * before and after switching page tables.
+	 */
+	info.offset = __START_KERNEL_map - phys_base;
+	mstart = PAGE_ALIGN_DOWN(__pa(asm_acpi_mp_play_dead));
+	mend = mstart + PAGE_SIZE;
+	if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
 		kernel_ident_mapping_free(&info, pgd);
 		return -ENOMEM;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
  2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
  2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
  2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
  2024-08-15  6:15   ` Baoquan He
  2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
  3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov

The init_transition_pgtable() function sets up transitional page tables.
It ensures that the relocate_kernel() function is present in the
identity mapping at the same location as in the kernel page tables.
relocate_kernel() switches to the identity mapping, and the function
must be present at the same location in the virtual address space before
and after switching page tables.

init_transition_pgtable() maps a copy of relocate_kernel() in
image->control_code_page at the relocate_kernel() virtual address, but
the original physical address of relocate_kernel() would also work.

It is safe to use original relocate_kernel() physical address cannot be
overwritten until swap_pages() is called, and the relocate_kernel()
virtual address will not be used by then.

Map the original relocate_kernel() at the relocate_kernel() virtual
address in the identity mapping. It is preparation to replace the
init_transition_pgtable() implementation with a call to
kernel_ident_mapping_init().

Note that while relocate_kernel() switches to the identity mapping, it
does not flush global TLB entries (CR4.PGE is not cleared). This means
that in most cases, the kernel still runs relocate_kernel() from the
original physical address before the change.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/machine_kexec_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 9c9ac606893e..645690e81c2d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 	pte_t *pte;
 
 	vaddr = (unsigned long)relocate_kernel;
-	paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+	paddr = __pa(relocate_kernel);
 	pgd += pgd_index(vaddr);
 	if (!pgd_present(*pgd)) {
 		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
  2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
  2024-08-14 16:02   ` Tom Lendacky
  3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov

init_transition_pgtable() sets up transitional page tables. Rewrite it
using kernel_ident_mapping_init() to avoid code duplication.

Change struct kimage_arch to track allocated page tables as a list, not
linking them to specific page table levels.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/kexec.h       |  5 +-
 arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
 2 files changed, 32 insertions(+), 62 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index ae5482a2f0ca..7f9287f371e6 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -145,10 +145,7 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	struct list_head pages;
 };
 #endif /* CONFIG_X86_32 */
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 645690e81c2d..fb350372835c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -134,71 +134,42 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
 	return 0;
 }
 
+static void *alloc_transition_pgt_page(void *data)
+{
+	struct kimage *image = (struct kimage *)data;
+	unsigned long virt;
+
+	virt = get_zeroed_page(GFP_KERNEL);
+	if (!virt)
+		return NULL;
+
+	list_add(&virt_to_page(virt)->lru, &image->arch.pages);
+	return (void *)virt;
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
-	free_page((unsigned long)image->arch.p4d);
-	image->arch.p4d = NULL;
-	free_page((unsigned long)image->arch.pud);
-	image->arch.pud = NULL;
-	free_page((unsigned long)image->arch.pmd);
-	image->arch.pmd = NULL;
-	free_page((unsigned long)image->arch.pte);
-	image->arch.pte = NULL;
+	struct page *page, *tmp;
+
+	list_for_each_entry_safe(page, tmp, &image->arch.pages, lru) {
+		list_del(&page->lru);
+		free_page((unsigned long)page_address(page));
+	}
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 {
-	pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
-	unsigned long vaddr, paddr;
-	int result = -ENOMEM;
-	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	struct x86_mapping_info info = {
+		.alloc_pgt_page	= alloc_transition_pgt_page,
+		.context	= image,
+		.page_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.kernpg_flag	= _KERNPG_TABLE_NOENC,
+		.offset = __START_KERNEL_map - phys_base,
+	};
+	unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
+	unsigned long mend = mstart + PAGE_SIZE;
 
-	vaddr = (unsigned long)relocate_kernel;
-	paddr = __pa(relocate_kernel);
-	pgd += pgd_index(vaddr);
-	if (!pgd_present(*pgd)) {
-		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
-		if (!p4d)
-			goto err;
-		image->arch.p4d = p4d;
-		set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
-	}
-	p4d = p4d_offset(pgd, vaddr);
-	if (!p4d_present(*p4d)) {
-		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
-		if (!pud)
-			goto err;
-		image->arch.pud = pud;
-		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
-	}
-	pud = pud_offset(p4d, vaddr);
-	if (!pud_present(*pud)) {
-		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
-		if (!pmd)
-			goto err;
-		image->arch.pmd = pmd;
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
-		if (!pte)
-			goto err;
-		image->arch.pte = pte;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
-	}
-	pte = pte_offset_kernel(pmd, vaddr);
-
-	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
-		prot = PAGE_KERNEL_EXEC;
-
-	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
-	return 0;
-err:
-	return result;
+	return kernel_ident_mapping_init(&info, pgd, mstart, mend);
 }
 
 static void *alloc_pgt_page(void *data)
@@ -299,6 +270,8 @@ int machine_kexec_prepare(struct kimage *image)
 	unsigned long start_pgtable;
 	int result;
 
+	INIT_LIST_HEAD(&image->arch.pages);
+
 	/* Calculate the offsets */
 	start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
  2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-08-14 14:22   ` Tom Lendacky
  2024-08-14 19:25   ` Thomas Gleixner
  1 sibling, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 14:22 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
	Andy Lutomirski, Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kai Huang

On 8/14/24 07:46, Kirill A. Shutemov wrote:
> Calculation of 'next' virtual address doesn't protect against wrapping
> to zero. It can result in page table corruption and hang. The
> problematic case is possible if user sets high x86_mapping_info::offset.
> 
> Replace manual 'next' calculation with p?d_addr_end() which handles
> wrapping correctly.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/mm/ident_map.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 437e96fb4977..5872f3ee863c 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -101,9 +101,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  		pmd_t *pmd;
>  		bool use_gbpage;
>  
> -		next = (addr & PUD_MASK) + PUD_SIZE;
> -		if (next > end)
> -			next = end;
> +		next = pud_addr_end(addr, end);
>  
>  		/* if this is already a gbpage, this portion is already mapped */
>  		if (pud_leaf(*pud))
> @@ -154,10 +152,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>  		p4d_t *p4d = p4d_page + p4d_index(addr);
>  		pud_t *pud;
>  
> -		next = (addr & P4D_MASK) + P4D_SIZE;
> -		if (next > end)
> -			next = end;
> -
> +		next = p4d_addr_end(addr, end);
>  		if (p4d_present(*p4d)) {
>  			pud = pud_offset(p4d, 0);
>  			result = ident_pud_init(info, pud, addr, next);
> @@ -199,10 +194,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>  		pgd_t *pgd = pgd_page + pgd_index(addr);
>  		p4d_t *p4d;
>  
> -		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
> -		if (next > end)
> -			next = end;
> -
> +		next = pgd_addr_end(addr, end);
>  		if (pgd_present(*pgd)) {
>  			p4d = p4d_offset(pgd, 0);
>  			result = ident_p4d_init(info, p4d, addr, next);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
  2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 15:55   ` Tom Lendacky
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 15:55 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
	Andy Lutomirski, Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kai Huang

On 8/14/24 07:46, Kirill A. Shutemov wrote:
> The function init_transition_pgtable() maps the page with
> asm_acpi_mp_play_dead() into an identity mapping.
> 
> Replace manual page table initialization with kernel_ident_mapping_init()
> to avoid code duplication. Use x86_mapping_info::offset to get the page
> mapped at the correct location.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>

A bit confusing on the last change. You have to understand how
kernel_ident_mapping_init() works in regards to supplying an offset
value and how to set that offset value up properly to get the kernel VA
address into the identity map. Otherwise:

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/kernel/acpi/madt_wakeup.c | 73 ++++++------------------------
>  1 file changed, 15 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
> index d5ef6215583b..78960b338be9 100644
> --- a/arch/x86/kernel/acpi/madt_wakeup.c
> +++ b/arch/x86/kernel/acpi/madt_wakeup.c
> @@ -70,58 +70,6 @@ static void __init free_pgt_page(void *pgt, void *dummy)
>  	return memblock_free(pgt, PAGE_SIZE);
>  }
>  
> -/*
> - * Make sure asm_acpi_mp_play_dead() is present in the identity mapping at
> - * the same place as in the kernel page tables. asm_acpi_mp_play_dead() switches
> - * to the identity mapping and the function has be present at the same spot in
> - * the virtual address space before and after switching page tables.
> - */
> -static int __init init_transition_pgtable(pgd_t *pgd)
> -{
> -	pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
> -	unsigned long vaddr, paddr;
> -	p4d_t *p4d;
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte;
> -
> -	vaddr = (unsigned long)asm_acpi_mp_play_dead;
> -	pgd += pgd_index(vaddr);
> -	if (!pgd_present(*pgd)) {
> -		p4d = (p4d_t *)alloc_pgt_page(NULL);
> -		if (!p4d)
> -			return -ENOMEM;
> -		set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
> -	}
> -	p4d = p4d_offset(pgd, vaddr);
> -	if (!p4d_present(*p4d)) {
> -		pud = (pud_t *)alloc_pgt_page(NULL);
> -		if (!pud)
> -			return -ENOMEM;
> -		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
> -	}
> -	pud = pud_offset(p4d, vaddr);
> -	if (!pud_present(*pud)) {
> -		pmd = (pmd_t *)alloc_pgt_page(NULL);
> -		if (!pmd)
> -			return -ENOMEM;
> -		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> -	}
> -	pmd = pmd_offset(pud, vaddr);
> -	if (!pmd_present(*pmd)) {
> -		pte = (pte_t *)alloc_pgt_page(NULL);
> -		if (!pte)
> -			return -ENOMEM;
> -		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> -	}
> -	pte = pte_offset_kernel(pmd, vaddr);
> -
> -	paddr = __pa(vaddr);
> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
> -
> -	return 0;
> -}
> -
>  static int __init acpi_mp_setup_reset(u64 reset_vector)
>  {
>  	struct x86_mapping_info info = {
> @@ -130,6 +78,7 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
>  		.page_flag      = __PAGE_KERNEL_LARGE_EXEC,
>  		.kernpg_flag    = _KERNPG_TABLE_NOENC,
>  	};
> +	unsigned long mstart, mend;
>  	pgd_t *pgd;
>  
>  	pgd = alloc_pgt_page(NULL);
> @@ -137,8 +86,6 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
>  		return -ENOMEM;
>  
>  	for (int i = 0; i < nr_pfn_mapped; i++) {
> -		unsigned long mstart, mend;
> -
>  		mstart = pfn_mapped[i].start << PAGE_SHIFT;
>  		mend   = pfn_mapped[i].end << PAGE_SHIFT;
>  		if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
> @@ -147,14 +94,24 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
>  		}
>  	}
>  
> -	if (kernel_ident_mapping_init(&info, pgd,
> -				      PAGE_ALIGN_DOWN(reset_vector),
> -				      PAGE_ALIGN(reset_vector + 1))) {
> +	mstart = PAGE_ALIGN_DOWN(reset_vector);
> +	mend = mstart + PAGE_SIZE;
> +	if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
>  		kernel_ident_mapping_free(&info, pgd);
>  		return -ENOMEM;
>  	}
>  
> -	if (init_transition_pgtable(pgd)) {
> +	/*
> +	 * Make sure asm_acpi_mp_play_dead() is present in the identity mapping
> +	 * at the same place as in the kernel page tables.
> +	 * asm_acpi_mp_play_dead() switches to the identity mapping and the
> +	 * function has be present at the same spot in the virtual address space
> +	 * before and after switching page tables.
> +	 */
> +	info.offset = __START_KERNEL_map - phys_base;
> +	mstart = PAGE_ALIGN_DOWN(__pa(asm_acpi_mp_play_dead));
> +	mend = mstart + PAGE_SIZE;
> +	if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
>  		kernel_ident_mapping_free(&info, pgd);
>  		return -ENOMEM;
>  	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
  2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 16:02   ` Tom Lendacky
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 16:02 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
	Andy Lutomirski, Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi

On 8/14/24 07:46, Kirill A. Shutemov wrote:
> init_transition_pgtable() sets up transitional page tables. Rewrite it
> using kernel_ident_mapping_init() to avoid code duplication.
> 
> Change struct kimage_arch to track allocated page tables as a list, not
> linking them to specific page table levels.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/include/asm/kexec.h       |  5 +-
>  arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
>  2 files changed, 32 insertions(+), 62 deletions(-)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
  2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
  2024-08-14 14:22   ` Tom Lendacky
@ 2024-08-14 19:25   ` Thomas Gleixner
  2024-08-15  9:15     ` Kirill A. Shutemov
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-08-14 19:25 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
	Peter Zijlstra, Baoquan He
  Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
	Kai Huang

On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
> Calculation of 'next' virtual address doesn't protect against wrapping
> to zero. It can result in page table corruption and hang. The
> problematic case is possible if user sets high x86_mapping_info::offset.

So this should have a Fixes tag, right?

> Replace manual 'next' calculation with p?d_addr_end() which handles
> wrapping correctly.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
  2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
@ 2024-08-15  6:15   ` Baoquan He
  2024-08-15  9:18     ` Kirill A. Shutemov
  0 siblings, 1 reply; 13+ messages in thread
From: Baoquan He @ 2024-08-15  6:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
	Andy Lutomirski, Peter Zijlstra, Ard Biesheuvel, Tom Lendacky,
	Andrew Morton, Thomas Zimmermann, Sean Christopherson,
	linux-kernel, linux-acpi, kexec

Cc Eric and kexec mailing list.

On 08/14/24 at 03:46pm, Kirill A. Shutemov wrote:
> The init_transition_pgtable() function sets up transitional page tables.
> It ensures that the relocate_kernel() function is present in the
> identity mapping at the same location as in the kernel page tables.
> relocate_kernel() switches to the identity mapping, and the function
> must be present at the same location in the virtual address space before
> and after switching page tables.
> 
> init_transition_pgtable() maps a copy of relocate_kernel() in
> image->control_code_page at the relocate_kernel() virtual address, but
> the original physical address of relocate_kernel() would also work.
> 
> It is safe to use original relocate_kernel() physical address cannot be
> overwritten until swap_pages() is called, and the relocate_kernel()
> virtual address will not be used by then.

I haven't read these codes for long time, wondering if we still need
copy relocate_kernel() to image->control_code_page + PAGE_SIZE as you
said.

> 
> Map the original relocate_kernel() at the relocate_kernel() virtual
> address in the identity mapping. It is preparation to replace the
> init_transition_pgtable() implementation with a call to
> kernel_ident_mapping_init().
> 
> Note that while relocate_kernel() switches to the identity mapping, it
> does not flush global TLB entries (CR4.PGE is not cleared). This means
> that in most cases, the kernel still runs relocate_kernel() from the
> original physical address before the change.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 9c9ac606893e..645690e81c2d 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  	pte_t *pte;
>  
>  	vaddr = (unsigned long)relocate_kernel;
> -	paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
> +	paddr = __pa(relocate_kernel);
>  	pgd += pgd_index(vaddr);
>  	if (!pgd_present(*pgd)) {
>  		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
  2024-08-14 19:25   ` Thomas Gleixner
@ 2024-08-15  9:15     ` Kirill A. Shutemov
  2024-08-15 10:57       ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-15  9:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Rafael J. Wysocki, Andy Lutomirski, Peter Zijlstra, Baoquan He,
	Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kai Huang

On Wed, Aug 14, 2024 at 09:25:35PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
> > Calculation of 'next' virtual address doesn't protect against wrapping
> > to zero. It can result in page table corruption and hang. The
> > problematic case is possible if user sets high x86_mapping_info::offset.
> 
> So this should have a Fixes tag, right?

Well, I guess we can add

Fixes: e4630fdd4763 ("x86/power/64: Always create temporary identity mapping correctly")

but the bug is not triggirable with current upstream code.

It only wraps to zero if you touch top PGD entry. There's no such users in
upstream. Only hibernate_64.c uses x86_mapping_info::offset and it works
on direct mapping range which is not top PGD entry.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
  2024-08-15  6:15   ` Baoquan He
@ 2024-08-15  9:18     ` Kirill A. Shutemov
  0 siblings, 0 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-15  9:18 UTC (permalink / raw)
  To: Baoquan He
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
	Andy Lutomirski, Peter Zijlstra, Ard Biesheuvel, Tom Lendacky,
	Andrew Morton, Thomas Zimmermann, Sean Christopherson,
	linux-kernel, linux-acpi, kexec

On Thu, Aug 15, 2024 at 02:15:40PM +0800, Baoquan He wrote:
> Cc Eric and kexec mailing list.
> 
> On 08/14/24 at 03:46pm, Kirill A. Shutemov wrote:
> > The init_transition_pgtable() function sets up transitional page tables.
> > It ensures that the relocate_kernel() function is present in the
> > identity mapping at the same location as in the kernel page tables.
> > relocate_kernel() switches to the identity mapping, and the function
> > must be present at the same location in the virtual address space before
> > and after switching page tables.
> > 
> > init_transition_pgtable() maps a copy of relocate_kernel() in
> > image->control_code_page at the relocate_kernel() virtual address, but
> > the original physical address of relocate_kernel() would also work.
> > 
> > It is safe to use original relocate_kernel() physical address cannot be
> > overwritten until swap_pages() is called, and the relocate_kernel()
> > virtual address will not be used by then.
> 
> I haven't read these codes for long time, wondering if we still need
> copy relocate_kernel() to image->control_code_page + PAGE_SIZE as you
> said.

I think we can get away with only coping starting with identity_mapped().
But given that KEXEC_CONTROL_CODE_MAX_SIZE is 2K I don't see a reason to
change anything here.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
  2024-08-15  9:15     ` Kirill A. Shutemov
@ 2024-08-15 10:57       ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2024-08-15 10:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Rafael J. Wysocki, Andy Lutomirski, Peter Zijlstra, Baoquan He,
	Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
	Sean Christopherson, linux-kernel, linux-acpi, Kai Huang

On Thu, Aug 15 2024 at 12:15, Kirill A. Shutemov wrote:
> On Wed, Aug 14, 2024 at 09:25:35PM +0200, Thomas Gleixner wrote:
>> On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
>> > Calculation of 'next' virtual address doesn't protect against wrapping
>> > to zero. It can result in page table corruption and hang. The
>> > problematic case is possible if user sets high x86_mapping_info::offset.
>> 
>> So this should have a Fixes tag, right?
>
> Well, I guess we can add
>
> Fixes: e4630fdd4763 ("x86/power/64: Always create temporary identity mapping correctly")
>
> but the bug is not triggirable with current upstream code.
>
> It only wraps to zero if you touch top PGD entry. There's no such users in
> upstream. Only hibernate_64.c uses x86_mapping_info::offset and it works
> on direct mapping range which is not top PGD entry.

Fair enough, but please mention that in the change log.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-15 10:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
2024-08-14 14:22   ` Tom Lendacky
2024-08-14 19:25   ` Thomas Gleixner
2024-08-15  9:15     ` Kirill A. Shutemov
2024-08-15 10:57       ` Thomas Gleixner
2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
2024-08-14 15:55   ` Tom Lendacky
2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
2024-08-15  6:15   ` Baoquan He
2024-08-15  9:18     ` Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
2024-08-14 16:02   ` Tom Lendacky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox