* [PATCH 0/3] x86: Reduce code duplication on page table initialization
@ 2024-07-01 12:43 Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2024-07-01 12:43 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
Use kernel_ident_mapping_init() to initialize kernel page tables where
possible, replacing manual initialization, reducing code duplication.
Kirill A. Shutemov (3):
x86/mm/ident_map: Fix virtual address wrap to zero
x86/acpi: Replace manual page table initialization with
kernel_ident_mapping_init()
x86/64/kexec: Rewrite init_transition_pgtable() with
kernel_ident_mapping_init()
arch/x86/include/asm/kexec.h | 5 +-
arch/x86/kernel/acpi/madt_wakeup.c | 73 +++++-------------------
arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
arch/x86/mm/ident_map.c | 15 +----
4 files changed, 50 insertions(+), 132 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero
2024-07-01 12:43 [PATCH 0/3] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
@ 2024-07-01 12:43 ` Kirill A. Shutemov
2024-07-03 10:11 ` Huang, Kai
2024-07-01 12:43 ` [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() " Kirill A. Shutemov
2 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2024-07-01 12:43 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
Calculation of 'next' virtual address doesn't protect against wrapping
to zero. It can result in page table corruption and hang. The
problematic case is possible if user sets high x86_mapping_info::offset.
Replace manual 'next' calculation with p?d_addr_and() which handles
wrapping correctly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/mm/ident_map.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index c45127265f2f..7422146b0dc9 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -100,10 +100,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
- next = (addr & PUD_MASK) + PUD_SIZE;
- if (next > end)
- next = end;
-
+ next = pud_addr_end(addr, end);
if (info->direct_gbpages) {
pud_t pudval;
@@ -141,10 +138,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
p4d_t *p4d = p4d_page + p4d_index(addr);
pud_t *pud;
- next = (addr & P4D_MASK) + P4D_SIZE;
- if (next > end)
- next = end;
-
+ next = p4d_addr_end(addr, end);
if (p4d_present(*p4d)) {
pud = pud_offset(p4d, 0);
result = ident_pud_init(info, pud, addr, next);
@@ -186,10 +180,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
pgd_t *pgd = pgd_page + pgd_index(addr);
p4d_t *p4d;
- next = (addr & PGDIR_MASK) + PGDIR_SIZE;
- if (next > end)
- next = end;
-
+ next = pgd_addr_end(addr, end);
if (pgd_present(*pgd)) {
p4d = p4d_offset(pgd, 0);
result = ident_p4d_init(info, p4d, addr, next);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
2024-07-01 12:43 [PATCH 0/3] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-07-01 12:43 ` Kirill A. Shutemov
2024-07-03 10:23 ` Huang, Kai
2024-07-01 12:43 ` [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() " Kirill A. Shutemov
2 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2024-07-01 12:43 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
The function init_transition_pgtable() maps the page with
asm_acpi_mp_play_dead() into an identity mapping.
Replace manual page table initialization with kernel_ident_mapping_init()
to avoid code duplicatiion. Use x86_mapping_info::offset to get the page
is mapped at the correct location.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/kernel/acpi/madt_wakeup.c | 73 ++++++------------------------
1 file changed, 15 insertions(+), 58 deletions(-)
diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index 6cfe762be28b..acbc9305c8e3 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -70,58 +70,6 @@ static void __init free_pgt_page(void *pgt, void *dummy)
return memblock_free(pgt, PAGE_SIZE);
}
-/*
- * Make sure asm_acpi_mp_play_dead() is present in the identity mapping at
- * the same place as in the kernel page tables. asm_acpi_mp_play_dead() switches
- * to the identity mapping and the function has be present at the same spot in
- * the virtual address space before and after switching page tables.
- */
-static int __init init_transition_pgtable(pgd_t *pgd)
-{
- pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
- unsigned long vaddr, paddr;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
-
- vaddr = (unsigned long)asm_acpi_mp_play_dead;
- pgd += pgd_index(vaddr);
- if (!pgd_present(*pgd)) {
- p4d = (p4d_t *)alloc_pgt_page(NULL);
- if (!p4d)
- return -ENOMEM;
- set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
- }
- p4d = p4d_offset(pgd, vaddr);
- if (!p4d_present(*p4d)) {
- pud = (pud_t *)alloc_pgt_page(NULL);
- if (!pud)
- return -ENOMEM;
- set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
- }
- pud = pud_offset(p4d, vaddr);
- if (!pud_present(*pud)) {
- pmd = (pmd_t *)alloc_pgt_page(NULL);
- if (!pmd)
- return -ENOMEM;
- set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
- }
- pmd = pmd_offset(pud, vaddr);
- if (!pmd_present(*pmd)) {
- pte = (pte_t *)alloc_pgt_page(NULL);
- if (!pte)
- return -ENOMEM;
- set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
- }
- pte = pte_offset_kernel(pmd, vaddr);
-
- paddr = __pa(vaddr);
- set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
-
- return 0;
-}
-
static int __init acpi_mp_setup_reset(u64 reset_vector)
{
struct x86_mapping_info info = {
@@ -130,6 +78,7 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
+ unsigned long mstart, mend;
pgd_t *pgd;
pgd = alloc_pgt_page(NULL);
@@ -137,8 +86,6 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
return -ENOMEM;
for (int i = 0; i < nr_pfn_mapped; i++) {
- unsigned long mstart, mend;
-
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
@@ -147,14 +94,24 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
}
}
- if (kernel_ident_mapping_init(&info, pgd,
- PAGE_ALIGN_DOWN(reset_vector),
- PAGE_ALIGN(reset_vector + 1))) {
+ mstart = PAGE_ALIGN_DOWN(reset_vector);
+ mend = mstart + PAGE_SIZE;
+ if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
kernel_ident_mapping_free(&info, pgd);
return -ENOMEM;
}
- if (init_transition_pgtable(pgd)) {
+ /*
+ * Make sure asm_acpi_mp_play_dead() is present in the identity mapping
+ * at the same place as in the kernel page tables.
+ * asm_acpi_mp_play_dead() switches to the identity mapping and the
+ * function has be present at the same spot in the virtual address space
+ * before and after switching page tables.
+ */
+ info.offset = __START_KERNEL_map - phys_base;
+ mstart = PAGE_ALIGN_DOWN(__pa(asm_acpi_mp_play_dead));
+ mend = mstart + PAGE_SIZE;
+ if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
kernel_ident_mapping_free(&info, pgd);
return -ENOMEM;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-07-01 12:43 [PATCH 0/3] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-07-01 12:43 ` Kirill A. Shutemov
2024-07-03 11:06 ` Huang, Kai
2 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2024-07-01 12:43 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
init_transition_pgtable() setups transitional page tables. Rewrite it
using kernel_ident_mapping_init() to avoid code duplication.
struct kimage_arch changed to track allocated page tables as a list, not
linking them to specific page table levels.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/include/asm/kexec.h | 5 +-
arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
2 files changed, 32 insertions(+), 62 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index ae5482a2f0ca..7f9287f371e6 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -145,10 +145,7 @@ struct kimage_arch {
};
#else
struct kimage_arch {
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ struct list_head pages;
};
#endif /* CONFIG_X86_32 */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index cc0f7f70b17b..951b17d217ab 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -107,71 +107,42 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
return 0;
}
+static void *alloc_transition_pgt_page(void *data)
+{
+ struct kimage *image = (struct kimage *)data;
+ unsigned long virt;
+
+ virt = get_zeroed_page(GFP_KERNEL);
+ if (!virt)
+ return NULL;
+
+ list_add(&virt_to_page(virt)->lru, &image->arch.pages);
+ return (void *)virt;
+}
+
static void free_transition_pgtable(struct kimage *image)
{
- free_page((unsigned long)image->arch.p4d);
- image->arch.p4d = NULL;
- free_page((unsigned long)image->arch.pud);
- image->arch.pud = NULL;
- free_page((unsigned long)image->arch.pmd);
- image->arch.pmd = NULL;
- free_page((unsigned long)image->arch.pte);
- image->arch.pte = NULL;
+ struct page *page, *tmp;
+
+ list_for_each_entry_safe(page, tmp, &image->arch.pages, lru) {
+ list_del(&page->lru);
+ free_page((unsigned long)page_address(page));
+ }
}
static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
{
- pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
- unsigned long vaddr, paddr;
- int result = -ENOMEM;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ struct x86_mapping_info info = {
+ .alloc_pgt_page = alloc_transition_pgt_page,
+ .context = image,
+ .page_flag = __PAGE_KERNEL_LARGE_EXEC,
+ .kernpg_flag = _KERNPG_TABLE_NOENC,
+ .offset = __START_KERNEL_map - phys_base,
+ };
+ unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
+ unsigned long mend = mstart + PAGE_SIZE;
- vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
- pgd += pgd_index(vaddr);
- if (!pgd_present(*pgd)) {
- p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
- if (!p4d)
- goto err;
- image->arch.p4d = p4d;
- set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
- }
- p4d = p4d_offset(pgd, vaddr);
- if (!p4d_present(*p4d)) {
- pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
- if (!pud)
- goto err;
- image->arch.pud = pud;
- set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
- }
- pud = pud_offset(p4d, vaddr);
- if (!pud_present(*pud)) {
- pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
- if (!pmd)
- goto err;
- image->arch.pmd = pmd;
- set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
- }
- pmd = pmd_offset(pud, vaddr);
- if (!pmd_present(*pmd)) {
- pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
- if (!pte)
- goto err;
- image->arch.pte = pte;
- set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
- }
- pte = pte_offset_kernel(pmd, vaddr);
-
- if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
- prot = PAGE_KERNEL_EXEC;
-
- set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
- return 0;
-err:
- return result;
+ return kernel_ident_mapping_init(&info, pgd, mstart, mend);
}
static void *alloc_pgt_page(void *data)
@@ -272,6 +243,8 @@ int machine_kexec_prepare(struct kimage *image)
unsigned long start_pgtable;
int result;
+ INIT_LIST_HEAD(&image->arch.pages);
+
/* Calculate the offsets */
start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero
2024-07-01 12:43 ` [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-07-03 10:11 ` Huang, Kai
0 siblings, 0 replies; 9+ messages in thread
From: Huang, Kai @ 2024-07-03 10:11 UTC (permalink / raw)
To: luto@kernel.org, rafael@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
mingo@redhat.com, kirill.shutemov@linux.intel.com,
tglx@linutronix.de, bhe@redhat.com, x86@kernel.org
Cc: thomas.lendacky@amd.com, linux-acpi@vger.kernel.org,
linux-kernel@vger.kernel.org, ardb@kernel.org, seanjc@google.com,
akpm@linux-foundation.org, tzimmermann@suse.de
On Mon, 2024-07-01 at 15:43 +0300, Kirill A. Shutemov wrote:
> Calculation of 'next' virtual address doesn't protect against wrapping
> to zero. It can result in page table corruption and hang. The
> problematic case is possible if user sets high x86_mapping_info::offset.
>
> Replace manual 'next' calculation with p?d_addr_and() which handles
p?d_addr_and() -> p?d_addr_end().
> wrapping correctly.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
2024-07-01 12:43 ` [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-07-03 10:23 ` Huang, Kai
0 siblings, 0 replies; 9+ messages in thread
From: Huang, Kai @ 2024-07-03 10:23 UTC (permalink / raw)
To: luto@kernel.org, rafael@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
mingo@redhat.com, kirill.shutemov@linux.intel.com,
tglx@linutronix.de, bhe@redhat.com, x86@kernel.org
Cc: thomas.lendacky@amd.com, linux-acpi@vger.kernel.org,
linux-kernel@vger.kernel.org, ardb@kernel.org, seanjc@google.com,
akpm@linux-foundation.org, tzimmermann@suse.de
On Mon, 2024-07-01 at 15:43 +0300, Kirill A. Shutemov wrote:
> The function init_transition_pgtable() maps the page with
> asm_acpi_mp_play_dead() into an identity mapping.
>
> Replace manual page table initialization with kernel_ident_mapping_init()
> to avoid code duplicatiion.
>
duplicatiion -> duplication.
> Use x86_mapping_info::offset to get the page
> is mapped at the correct location.
I don't think the "is" is needed. Just "... get the page mapped at ...".
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-07-01 12:43 ` [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() " Kirill A. Shutemov
@ 2024-07-03 11:06 ` Huang, Kai
2024-07-04 13:44 ` kirill.shutemov
0 siblings, 1 reply; 9+ messages in thread
From: Huang, Kai @ 2024-07-03 11:06 UTC (permalink / raw)
To: luto@kernel.org, rafael@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
mingo@redhat.com, kirill.shutemov@linux.intel.com,
tglx@linutronix.de, bhe@redhat.com, x86@kernel.org
Cc: thomas.lendacky@amd.com, linux-acpi@vger.kernel.org,
linux-kernel@vger.kernel.org, ardb@kernel.org, seanjc@google.com,
akpm@linux-foundation.org, tzimmermann@suse.de
On Mon, 2024-07-01 at 15:43 +0300, Kirill A. Shutemov wrote:
> init_transition_pgtable() setups transitional page tables. Rewrite it
> using kernel_ident_mapping_init() to avoid code duplication.
setups -> sets up
>
> struct kimage_arch changed to track allocated page tables as a list, not
> linking them to specific page table levels.
This doesn't look like imperative mode. Perhaps change to:
Change struct kimage_arch to track ...
[...]
> static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> {
> - pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
> - unsigned long vaddr, paddr;
> - int result = -ENOMEM;
> - p4d_t *p4d;
> - pud_t *pud;
> - pmd_t *pmd;
> - pte_t *pte;
> + struct x86_mapping_info info = {
> + .alloc_pgt_page = alloc_transition_pgt_page,
> + .context = image,
> + .page_flag = __PAGE_KERNEL_LARGE_EXEC,
> + .kernpg_flag = _KERNPG_TABLE_NOENC,
> + .offset = __START_KERNEL_map - phys_base,
> + };
> + unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
> + unsigned long mend = mstart + PAGE_SIZE;
>
> - vaddr = (unsigned long)relocate_kernel;
> - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
Perhaps I am missing something, but this seems a functional change to me.
IIUC the page after image->control_code_page is allocated when loading the
kexec kernel image. It is a different page from the page where the
relocate_kernel code resides in.
The old code maps relocate_kernel kernel VA to the page after the
control_code_page. Later in machine_kexec(), the relocate_kernel code is
copied to that page so the mapping can work for that:
control_page = page_address(image->control_code_page) + PAGE_SIZE;
__memcpy(control_page, relocate_kernel,
KEXEC_CONTROL_CODE_MAX_SIZE);
The new code in this patch, however, seems just maps the relocate_kernel VA
to the PA of the relocate_kernel, which should be different from the old
mapping.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-07-03 11:06 ` Huang, Kai
@ 2024-07-04 13:44 ` kirill.shutemov
2024-07-05 10:35 ` Huang, Kai
0 siblings, 1 reply; 9+ messages in thread
From: kirill.shutemov @ 2024-07-04 13:44 UTC (permalink / raw)
To: Huang, Kai
Cc: luto@kernel.org, rafael@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
mingo@redhat.com, tglx@linutronix.de, bhe@redhat.com,
x86@kernel.org, thomas.lendacky@amd.com,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
ardb@kernel.org, seanjc@google.com, akpm@linux-foundation.org,
tzimmermann@suse.de
On Wed, Jul 03, 2024 at 11:06:21AM +0000, Huang, Kai wrote:
> > static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> > {
> > - pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
> > - unsigned long vaddr, paddr;
> > - int result = -ENOMEM;
> > - p4d_t *p4d;
> > - pud_t *pud;
> > - pmd_t *pmd;
> > - pte_t *pte;
> > + struct x86_mapping_info info = {
> > + .alloc_pgt_page = alloc_transition_pgt_page,
> > + .context = image,
> > + .page_flag = __PAGE_KERNEL_LARGE_EXEC,
> > + .kernpg_flag = _KERNPG_TABLE_NOENC,
> > + .offset = __START_KERNEL_map - phys_base,
> > + };
> > + unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
> > + unsigned long mend = mstart + PAGE_SIZE;
> >
> > - vaddr = (unsigned long)relocate_kernel;
> > - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
>
> Perhaps I am missing something, but this seems a functional change to me.
>
> IIUC the page after image->control_code_page is allocated when loading the
> kexec kernel image. It is a different page from the page where the
> relocate_kernel code resides in.
>
> The old code maps relocate_kernel kernel VA to the page after the
> control_code_page. Later in machine_kexec(), the relocate_kernel code is
> copied to that page so the mapping can work for that:
>
> control_page = page_address(image->control_code_page) + PAGE_SIZE;
> __memcpy(control_page, relocate_kernel,
> KEXEC_CONTROL_CODE_MAX_SIZE);
>
> The new code in this patch, however, seems just maps the relocate_kernel VA
> to the PA of the relocate_kernel, which should be different from the old
> mapping.
Yes, original code maps at relocate_kernel() VA the page with copy of the
relocate_kernel() in control_code_page. But it is safe to map original
relocate_kernel() page there as well as it is not going to be overwritten
until swap_pages(). We are not going to use original relocate_kernel()
page after RET at the end of relocate_kernel().
Does it make any sense?
I will try to explain it in the commit message in the next version.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-07-04 13:44 ` kirill.shutemov
@ 2024-07-05 10:35 ` Huang, Kai
0 siblings, 0 replies; 9+ messages in thread
From: Huang, Kai @ 2024-07-05 10:35 UTC (permalink / raw)
To: kirill.shutemov@linux.intel.com
Cc: ardb@kernel.org, luto@kernel.org, dave.hansen@linux.intel.com,
thomas.lendacky@amd.com, tzimmermann@suse.de,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
seanjc@google.com, mingo@redhat.com, bhe@redhat.com,
tglx@linutronix.de, hpa@zytor.com, peterz@infradead.org,
bp@alien8.de, rafael@kernel.org, linux-acpi@vger.kernel.org,
x86@kernel.org
On Thu, 2024-07-04 at 16:44 +0300, kirill.shutemov@linux.intel.com wrote:
> On Wed, Jul 03, 2024 at 11:06:21AM +0000, Huang, Kai wrote:
> > > static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> > > {
> > > - pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
> > > - unsigned long vaddr, paddr;
> > > - int result = -ENOMEM;
> > > - p4d_t *p4d;
> > > - pud_t *pud;
> > > - pmd_t *pmd;
> > > - pte_t *pte;
> > > + struct x86_mapping_info info = {
> > > + .alloc_pgt_page = alloc_transition_pgt_page,
> > > + .context = image,
> > > + .page_flag = __PAGE_KERNEL_LARGE_EXEC,
> > > + .kernpg_flag = _KERNPG_TABLE_NOENC,
> > > + .offset = __START_KERNEL_map - phys_base,
> > > + };
> > > + unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
> > > + unsigned long mend = mstart + PAGE_SIZE;
> > >
> > > - vaddr = (unsigned long)relocate_kernel;
> > > - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
> >
> > Perhaps I am missing something, but this seems a functional change to me.
> >
> > IIUC the page after image->control_code_page is allocated when loading the
> > kexec kernel image. It is a different page from the page where the
> > relocate_kernel code resides in.
> >
> > The old code maps relocate_kernel kernel VA to the page after the
> > control_code_page. Later in machine_kexec(), the relocate_kernel code is
> > copied to that page so the mapping can work for that:
> >
> > control_page = page_address(image->control_code_page) + PAGE_SIZE;
> > __memcpy(control_page, relocate_kernel,
> > KEXEC_CONTROL_CODE_MAX_SIZE);
> >
> > The new code in this patch, however, seems just maps the relocate_kernel VA
> > to the PA of the relocate_kernel, which should be different from the old
> > mapping.
>
> Yes, original code maps at relocate_kernel() VA the page with copy of the
> relocate_kernel() in control_code_page. But it is safe to map original
> relocate_kernel() page there as well as it is not going to be overwritten
> until swap_pages(). We are not going to use original relocate_kernel()
> page after RET at the end of relocate_kernel().
I am not super familiar with this, but this doesn't seem 100% safe to me.
E.g, did you consider the kexec jump case?
The second half of control page is also used to store registers in kexec
jump. If the relocate_kernel VA isn't mapped to the control page, then IIUC
after jumping back to old kernel it seems we won't be able to read those
registers back?
>
> Does it make any sense?
>
> I will try to explain it in the commit message in the next version.
>
I think even it's safe to change to map to the relocate_kernel() page, it
should be done in a separate patch. This patch should just focus on removing
the duplicated page table setup code.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-05 10:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 12:43 [PATCH 0/3] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-07-01 12:43 ` [PATCH 1/3] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
2024-07-03 10:11 ` Huang, Kai
2024-07-01 12:43 ` [PATCH 2/3] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
2024-07-03 10:23 ` Huang, Kai
2024-07-01 12:43 ` [PATCH 3/3] x86/64/kexec: Rewrite init_transition_pgtable() " Kirill A. Shutemov
2024-07-03 11:06 ` Huang, Kai
2024-07-04 13:44 ` kirill.shutemov
2024-07-05 10:35 ` Huang, Kai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox