* [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
2024-08-14 14:22 ` Tom Lendacky
2024-08-14 19:25 ` Thomas Gleixner
2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
` (2 subsequent siblings)
3 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
Kai Huang
Calculation of 'next' virtual address doesn't protect against wrapping
to zero. It can result in page table corruption and hang. The
problematic case is possible if user sets high x86_mapping_info::offset.
Replace manual 'next' calculation with p?d_addr_end() which handles
wrapping correctly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
arch/x86/mm/ident_map.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 437e96fb4977..5872f3ee863c 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -101,9 +101,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
pmd_t *pmd;
bool use_gbpage;
- next = (addr & PUD_MASK) + PUD_SIZE;
- if (next > end)
- next = end;
+ next = pud_addr_end(addr, end);
/* if this is already a gbpage, this portion is already mapped */
if (pud_leaf(*pud))
@@ -154,10 +152,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
p4d_t *p4d = p4d_page + p4d_index(addr);
pud_t *pud;
- next = (addr & P4D_MASK) + P4D_SIZE;
- if (next > end)
- next = end;
-
+ next = p4d_addr_end(addr, end);
if (p4d_present(*p4d)) {
pud = pud_offset(p4d, 0);
result = ident_pud_init(info, pud, addr, next);
@@ -199,10 +194,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
pgd_t *pgd = pgd_page + pgd_index(addr);
p4d_t *p4d;
- next = (addr & PGDIR_MASK) + PGDIR_SIZE;
- if (next > end)
- next = end;
-
+ next = pgd_addr_end(addr, end);
if (pgd_present(*pgd)) {
p4d = p4d_offset(pgd, 0);
result = ident_p4d_init(info, p4d, addr, next);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-08-14 14:22 ` Tom Lendacky
2024-08-14 19:25 ` Thomas Gleixner
1 sibling, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 14:22 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
Andy Lutomirski, Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kai Huang
On 8/14/24 07:46, Kirill A. Shutemov wrote:
> Calculation of 'next' virtual address doesn't protect against wrapping
> to zero. It can result in page table corruption and hang. The
> problematic case is possible if user sets high x86_mapping_info::offset.
>
> Replace manual 'next' calculation with p?d_addr_end() which handles
> wrapping correctly.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> arch/x86/mm/ident_map.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 437e96fb4977..5872f3ee863c 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -101,9 +101,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> pmd_t *pmd;
> bool use_gbpage;
>
> - next = (addr & PUD_MASK) + PUD_SIZE;
> - if (next > end)
> - next = end;
> + next = pud_addr_end(addr, end);
>
> /* if this is already a gbpage, this portion is already mapped */
> if (pud_leaf(*pud))
> @@ -154,10 +152,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
> p4d_t *p4d = p4d_page + p4d_index(addr);
> pud_t *pud;
>
> - next = (addr & P4D_MASK) + P4D_SIZE;
> - if (next > end)
> - next = end;
> -
> + next = p4d_addr_end(addr, end);
> if (p4d_present(*p4d)) {
> pud = pud_offset(p4d, 0);
> result = ident_pud_init(info, pud, addr, next);
> @@ -199,10 +194,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> pgd_t *pgd = pgd_page + pgd_index(addr);
> p4d_t *p4d;
>
> - next = (addr & PGDIR_MASK) + PGDIR_SIZE;
> - if (next > end)
> - next = end;
> -
> + next = pgd_addr_end(addr, end);
> if (pgd_present(*pgd)) {
> p4d = p4d_offset(pgd, 0);
> result = ident_p4d_init(info, p4d, addr, next);
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
2024-08-14 14:22 ` Tom Lendacky
@ 2024-08-14 19:25 ` Thomas Gleixner
2024-08-15 9:15 ` Kirill A. Shutemov
1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-08-14 19:25 UTC (permalink / raw)
To: Kirill A. Shutemov, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86, H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
Kai Huang
On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
> Calculation of 'next' virtual address doesn't protect against wrapping
> to zero. It can result in page table corruption and hang. The
> problematic case is possible if user sets high x86_mapping_info::offset.
So this should have a Fixes tag, right?
> Replace manual 'next' calculation with p?d_addr_end() which handles
> wrapping correctly.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
2024-08-14 19:25 ` Thomas Gleixner
@ 2024-08-15 9:15 ` Kirill A. Shutemov
2024-08-15 10:57 ` Thomas Gleixner
0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-15 9:15 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Rafael J. Wysocki, Andy Lutomirski, Peter Zijlstra, Baoquan He,
Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kai Huang
On Wed, Aug 14, 2024 at 09:25:35PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
> > Calculation of 'next' virtual address doesn't protect against wrapping
> > to zero. It can result in page table corruption and hang. The
> > problematic case is possible if user sets high x86_mapping_info::offset.
>
> So this should have a Fixes tag, right?
Well, I guess we can add
Fixes: e4630fdd4763 ("x86/power/64: Always create temporary identity mapping correctly")
but the bug is not triggirable with current upstream code.
It only wraps to zero if you touch top PGD entry. There's no such users in
upstream. Only hibernate_64.c uses x86_mapping_info::offset and it works
on direct mapping range which is not top PGD entry.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero
2024-08-15 9:15 ` Kirill A. Shutemov
@ 2024-08-15 10:57 ` Thomas Gleixner
0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2024-08-15 10:57 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Rafael J. Wysocki, Andy Lutomirski, Peter Zijlstra, Baoquan He,
Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kai Huang
On Thu, Aug 15 2024 at 12:15, Kirill A. Shutemov wrote:
> On Wed, Aug 14, 2024 at 09:25:35PM +0200, Thomas Gleixner wrote:
>> On Wed, Aug 14 2024 at 15:46, Kirill A. Shutemov wrote:
>> > Calculation of 'next' virtual address doesn't protect against wrapping
>> > to zero. It can result in page table corruption and hang. The
>> > problematic case is possible if user sets high x86_mapping_info::offset.
>>
>> So this should have a Fixes tag, right?
>
> Well, I guess we can add
>
> Fixes: e4630fdd4763 ("x86/power/64: Always create temporary identity mapping correctly")
>
> but the bug is not triggirable with current upstream code.
>
> It only wraps to zero if you touch top PGD entry. There's no such users in
> upstream. Only hibernate_64.c uses x86_mapping_info::offset and it works
> on direct mapping range which is not top PGD entry.
Fair enough, but please mention that in the change log.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
2024-08-14 15:55 ` Tom Lendacky
2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov,
Kai Huang
The function init_transition_pgtable() maps the page with
asm_acpi_mp_play_dead() into an identity mapping.
Replace manual page table initialization with kernel_ident_mapping_init()
to avoid code duplication. Use x86_mapping_info::offset to get the page
mapped at the correct location.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
arch/x86/kernel/acpi/madt_wakeup.c | 73 ++++++------------------------
1 file changed, 15 insertions(+), 58 deletions(-)
diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index d5ef6215583b..78960b338be9 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -70,58 +70,6 @@ static void __init free_pgt_page(void *pgt, void *dummy)
return memblock_free(pgt, PAGE_SIZE);
}
-/*
- * Make sure asm_acpi_mp_play_dead() is present in the identity mapping at
- * the same place as in the kernel page tables. asm_acpi_mp_play_dead() switches
- * to the identity mapping and the function has be present at the same spot in
- * the virtual address space before and after switching page tables.
- */
-static int __init init_transition_pgtable(pgd_t *pgd)
-{
- pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
- unsigned long vaddr, paddr;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
-
- vaddr = (unsigned long)asm_acpi_mp_play_dead;
- pgd += pgd_index(vaddr);
- if (!pgd_present(*pgd)) {
- p4d = (p4d_t *)alloc_pgt_page(NULL);
- if (!p4d)
- return -ENOMEM;
- set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
- }
- p4d = p4d_offset(pgd, vaddr);
- if (!p4d_present(*p4d)) {
- pud = (pud_t *)alloc_pgt_page(NULL);
- if (!pud)
- return -ENOMEM;
- set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
- }
- pud = pud_offset(p4d, vaddr);
- if (!pud_present(*pud)) {
- pmd = (pmd_t *)alloc_pgt_page(NULL);
- if (!pmd)
- return -ENOMEM;
- set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
- }
- pmd = pmd_offset(pud, vaddr);
- if (!pmd_present(*pmd)) {
- pte = (pte_t *)alloc_pgt_page(NULL);
- if (!pte)
- return -ENOMEM;
- set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
- }
- pte = pte_offset_kernel(pmd, vaddr);
-
- paddr = __pa(vaddr);
- set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
-
- return 0;
-}
-
static int __init acpi_mp_setup_reset(u64 reset_vector)
{
struct x86_mapping_info info = {
@@ -130,6 +78,7 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
+ unsigned long mstart, mend;
pgd_t *pgd;
pgd = alloc_pgt_page(NULL);
@@ -137,8 +86,6 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
return -ENOMEM;
for (int i = 0; i < nr_pfn_mapped; i++) {
- unsigned long mstart, mend;
-
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
@@ -147,14 +94,24 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
}
}
- if (kernel_ident_mapping_init(&info, pgd,
- PAGE_ALIGN_DOWN(reset_vector),
- PAGE_ALIGN(reset_vector + 1))) {
+ mstart = PAGE_ALIGN_DOWN(reset_vector);
+ mend = mstart + PAGE_SIZE;
+ if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
kernel_ident_mapping_free(&info, pgd);
return -ENOMEM;
}
- if (init_transition_pgtable(pgd)) {
+ /*
+ * Make sure asm_acpi_mp_play_dead() is present in the identity mapping
+ * at the same place as in the kernel page tables.
+ * asm_acpi_mp_play_dead() switches to the identity mapping and the
+ * function has be present at the same spot in the virtual address space
+ * before and after switching page tables.
+ */
+ info.offset = __START_KERNEL_map - phys_base;
+ mstart = PAGE_ALIGN_DOWN(__pa(asm_acpi_mp_play_dead));
+ mend = mstart + PAGE_SIZE;
+ if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
kernel_ident_mapping_free(&info, pgd);
return -ENOMEM;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init()
2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 15:55 ` Tom Lendacky
0 siblings, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 15:55 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
Andy Lutomirski, Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kai Huang
On 8/14/24 07:46, Kirill A. Shutemov wrote:
> The function init_transition_pgtable() maps the page with
> asm_acpi_mp_play_dead() into an identity mapping.
>
> Replace manual page table initialization with kernel_ident_mapping_init()
> to avoid code duplication. Use x86_mapping_info::offset to get the page
> mapped at the correct location.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
A bit confusing on the last change. You have to understand how
kernel_ident_mapping_init() works in regards to supplying an offset
value and how to set that offset value up properly to get the kernel VA
address into the identity map. Otherwise:
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> arch/x86/kernel/acpi/madt_wakeup.c | 73 ++++++------------------------
> 1 file changed, 15 insertions(+), 58 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
> index d5ef6215583b..78960b338be9 100644
> --- a/arch/x86/kernel/acpi/madt_wakeup.c
> +++ b/arch/x86/kernel/acpi/madt_wakeup.c
> @@ -70,58 +70,6 @@ static void __init free_pgt_page(void *pgt, void *dummy)
> return memblock_free(pgt, PAGE_SIZE);
> }
>
> -/*
> - * Make sure asm_acpi_mp_play_dead() is present in the identity mapping at
> - * the same place as in the kernel page tables. asm_acpi_mp_play_dead() switches
> - * to the identity mapping and the function has be present at the same spot in
> - * the virtual address space before and after switching page tables.
> - */
> -static int __init init_transition_pgtable(pgd_t *pgd)
> -{
> - pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
> - unsigned long vaddr, paddr;
> - p4d_t *p4d;
> - pud_t *pud;
> - pmd_t *pmd;
> - pte_t *pte;
> -
> - vaddr = (unsigned long)asm_acpi_mp_play_dead;
> - pgd += pgd_index(vaddr);
> - if (!pgd_present(*pgd)) {
> - p4d = (p4d_t *)alloc_pgt_page(NULL);
> - if (!p4d)
> - return -ENOMEM;
> - set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
> - }
> - p4d = p4d_offset(pgd, vaddr);
> - if (!p4d_present(*p4d)) {
> - pud = (pud_t *)alloc_pgt_page(NULL);
> - if (!pud)
> - return -ENOMEM;
> - set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
> - }
> - pud = pud_offset(p4d, vaddr);
> - if (!pud_present(*pud)) {
> - pmd = (pmd_t *)alloc_pgt_page(NULL);
> - if (!pmd)
> - return -ENOMEM;
> - set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> - }
> - pmd = pmd_offset(pud, vaddr);
> - if (!pmd_present(*pmd)) {
> - pte = (pte_t *)alloc_pgt_page(NULL);
> - if (!pte)
> - return -ENOMEM;
> - set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> - }
> - pte = pte_offset_kernel(pmd, vaddr);
> -
> - paddr = __pa(vaddr);
> - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
> -
> - return 0;
> -}
> -
> static int __init acpi_mp_setup_reset(u64 reset_vector)
> {
> struct x86_mapping_info info = {
> @@ -130,6 +78,7 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
> .page_flag = __PAGE_KERNEL_LARGE_EXEC,
> .kernpg_flag = _KERNPG_TABLE_NOENC,
> };
> + unsigned long mstart, mend;
> pgd_t *pgd;
>
> pgd = alloc_pgt_page(NULL);
> @@ -137,8 +86,6 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
> return -ENOMEM;
>
> for (int i = 0; i < nr_pfn_mapped; i++) {
> - unsigned long mstart, mend;
> -
> mstart = pfn_mapped[i].start << PAGE_SHIFT;
> mend = pfn_mapped[i].end << PAGE_SHIFT;
> if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
> @@ -147,14 +94,24 @@ static int __init acpi_mp_setup_reset(u64 reset_vector)
> }
> }
>
> - if (kernel_ident_mapping_init(&info, pgd,
> - PAGE_ALIGN_DOWN(reset_vector),
> - PAGE_ALIGN(reset_vector + 1))) {
> + mstart = PAGE_ALIGN_DOWN(reset_vector);
> + mend = mstart + PAGE_SIZE;
> + if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
> kernel_ident_mapping_free(&info, pgd);
> return -ENOMEM;
> }
>
> - if (init_transition_pgtable(pgd)) {
> + /*
> + * Make sure asm_acpi_mp_play_dead() is present in the identity mapping
> + * at the same place as in the kernel page tables.
> + * asm_acpi_mp_play_dead() switches to the identity mapping and the
> + * function has be present at the same spot in the virtual address space
> + * before and after switching page tables.
> + */
> + info.offset = __START_KERNEL_map - phys_base;
> + mstart = PAGE_ALIGN_DOWN(__pa(asm_acpi_mp_play_dead));
> + mend = mstart + PAGE_SIZE;
> + if (kernel_ident_mapping_init(&info, pgd, mstart, mend)) {
> kernel_ident_mapping_free(&info, pgd);
> return -ENOMEM;
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 1/4] x86/mm/ident_map: Fix virtual address wrap to zero Kirill A. Shutemov
2024-08-14 12:46 ` [PATCHv2 2/4] x86/acpi: Replace manual page table initialization with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
2024-08-15 6:15 ` Baoquan He
2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
The init_transition_pgtable() function sets up transitional page tables.
It ensures that the relocate_kernel() function is present in the
identity mapping at the same location as in the kernel page tables.
relocate_kernel() switches to the identity mapping, and the function
must be present at the same location in the virtual address space before
and after switching page tables.
init_transition_pgtable() maps a copy of relocate_kernel() in
image->control_code_page at the relocate_kernel() virtual address, but
the original physical address of relocate_kernel() would also work.
It is safe to use original relocate_kernel() physical address cannot be
overwritten until swap_pages() is called, and the relocate_kernel()
virtual address will not be used by then.
Map the original relocate_kernel() at the relocate_kernel() virtual
address in the identity mapping. It is preparation to replace the
init_transition_pgtable() implementation with a call to
kernel_ident_mapping_init().
Note that while relocate_kernel() switches to the identity mapping, it
does not flush global TLB entries (CR4.PGE is not cleared). This means
that in most cases, the kernel still runs relocate_kernel() from the
original physical address before the change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/kernel/machine_kexec_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 9c9ac606893e..645690e81c2d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
pte_t *pte;
vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+ paddr = __pa(relocate_kernel);
pgd += pgd_index(vaddr);
if (!pgd_present(*pgd)) {
p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
@ 2024-08-15 6:15 ` Baoquan He
2024-08-15 9:18 ` Kirill A. Shutemov
0 siblings, 1 reply; 13+ messages in thread
From: Baoquan He @ 2024-08-15 6:15 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
Andy Lutomirski, Peter Zijlstra, Ard Biesheuvel, Tom Lendacky,
Andrew Morton, Thomas Zimmermann, Sean Christopherson,
linux-kernel, linux-acpi, kexec
Cc Eric and kexec mailing list.
On 08/14/24 at 03:46pm, Kirill A. Shutemov wrote:
> The init_transition_pgtable() function sets up transitional page tables.
> It ensures that the relocate_kernel() function is present in the
> identity mapping at the same location as in the kernel page tables.
> relocate_kernel() switches to the identity mapping, and the function
> must be present at the same location in the virtual address space before
> and after switching page tables.
>
> init_transition_pgtable() maps a copy of relocate_kernel() in
> image->control_code_page at the relocate_kernel() virtual address, but
> the original physical address of relocate_kernel() would also work.
>
> It is safe to use original relocate_kernel() physical address cannot be
> overwritten until swap_pages() is called, and the relocate_kernel()
> virtual address will not be used by then.
I haven't read these codes for long time, wondering if we still need
copy relocate_kernel() to image->control_code_page + PAGE_SIZE as you
said.
>
> Map the original relocate_kernel() at the relocate_kernel() virtual
> address in the identity mapping. It is preparation to replace the
> init_transition_pgtable() implementation with a call to
> kernel_ident_mapping_init().
>
> Note that while relocate_kernel() switches to the identity mapping, it
> does not flush global TLB entries (CR4.PGE is not cleared). This means
> that in most cases, the kernel still runs relocate_kernel() from the
> original physical address before the change.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> arch/x86/kernel/machine_kexec_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 9c9ac606893e..645690e81c2d 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> pte_t *pte;
>
> vaddr = (unsigned long)relocate_kernel;
> - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
> + paddr = __pa(relocate_kernel);
> pgd += pgd_index(vaddr);
> if (!pgd_present(*pgd)) {
> p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable()
2024-08-15 6:15 ` Baoquan He
@ 2024-08-15 9:18 ` Kirill A. Shutemov
0 siblings, 0 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-15 9:18 UTC (permalink / raw)
To: Baoquan He
Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
Andy Lutomirski, Peter Zijlstra, Ard Biesheuvel, Tom Lendacky,
Andrew Morton, Thomas Zimmermann, Sean Christopherson,
linux-kernel, linux-acpi, kexec
On Thu, Aug 15, 2024 at 02:15:40PM +0800, Baoquan He wrote:
> Cc Eric and kexec mailing list.
>
> On 08/14/24 at 03:46pm, Kirill A. Shutemov wrote:
> > The init_transition_pgtable() function sets up transitional page tables.
> > It ensures that the relocate_kernel() function is present in the
> > identity mapping at the same location as in the kernel page tables.
> > relocate_kernel() switches to the identity mapping, and the function
> > must be present at the same location in the virtual address space before
> > and after switching page tables.
> >
> > init_transition_pgtable() maps a copy of relocate_kernel() in
> > image->control_code_page at the relocate_kernel() virtual address, but
> > the original physical address of relocate_kernel() would also work.
> >
> > It is safe to use original relocate_kernel() physical address cannot be
> > overwritten until swap_pages() is called, and the relocate_kernel()
> > virtual address will not be used by then.
>
> I haven't read these codes for long time, wondering if we still need
> copy relocate_kernel() to image->control_code_page + PAGE_SIZE as you
> said.
I think we can get away with only coping starting with identity_mapped().
But given that KEXEC_CONTROL_CODE_MAX_SIZE is 2K I don't see a reason to
change anything here.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-08-14 12:46 [PATCHv2 0/4] x86: Reduce code duplication on page table initialization Kirill A. Shutemov
` (2 preceding siblings ...)
2024-08-14 12:46 ` [PATCHv2 3/4] x86/64/kexec: Map original relocate_kernel() in init_transition_pgtable() Kirill A. Shutemov
@ 2024-08-14 12:46 ` Kirill A. Shutemov
2024-08-14 16:02 ` Tom Lendacky
3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2024-08-14 12:46 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Rafael J. Wysocki, Andy Lutomirski,
Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Tom Lendacky, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi, Kirill A. Shutemov
init_transition_pgtable() sets up transitional page tables. Rewrite it
using kernel_ident_mapping_init() to avoid code duplication.
Change struct kimage_arch to track allocated page tables as a list, not
linking them to specific page table levels.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/include/asm/kexec.h | 5 +-
arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
2 files changed, 32 insertions(+), 62 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index ae5482a2f0ca..7f9287f371e6 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -145,10 +145,7 @@ struct kimage_arch {
};
#else
struct kimage_arch {
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ struct list_head pages;
};
#endif /* CONFIG_X86_32 */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 645690e81c2d..fb350372835c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -134,71 +134,42 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
return 0;
}
+static void *alloc_transition_pgt_page(void *data)
+{
+ struct kimage *image = (struct kimage *)data;
+ unsigned long virt;
+
+ virt = get_zeroed_page(GFP_KERNEL);
+ if (!virt)
+ return NULL;
+
+ list_add(&virt_to_page(virt)->lru, &image->arch.pages);
+ return (void *)virt;
+}
+
static void free_transition_pgtable(struct kimage *image)
{
- free_page((unsigned long)image->arch.p4d);
- image->arch.p4d = NULL;
- free_page((unsigned long)image->arch.pud);
- image->arch.pud = NULL;
- free_page((unsigned long)image->arch.pmd);
- image->arch.pmd = NULL;
- free_page((unsigned long)image->arch.pte);
- image->arch.pte = NULL;
+ struct page *page, *tmp;
+
+ list_for_each_entry_safe(page, tmp, &image->arch.pages, lru) {
+ list_del(&page->lru);
+ free_page((unsigned long)page_address(page));
+ }
}
static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
{
- pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
- unsigned long vaddr, paddr;
- int result = -ENOMEM;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ struct x86_mapping_info info = {
+ .alloc_pgt_page = alloc_transition_pgt_page,
+ .context = image,
+ .page_flag = __PAGE_KERNEL_LARGE_EXEC,
+ .kernpg_flag = _KERNPG_TABLE_NOENC,
+ .offset = __START_KERNEL_map - phys_base,
+ };
+ unsigned long mstart = PAGE_ALIGN_DOWN(__pa(relocate_kernel));
+ unsigned long mend = mstart + PAGE_SIZE;
- vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(relocate_kernel);
- pgd += pgd_index(vaddr);
- if (!pgd_present(*pgd)) {
- p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
- if (!p4d)
- goto err;
- image->arch.p4d = p4d;
- set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
- }
- p4d = p4d_offset(pgd, vaddr);
- if (!p4d_present(*p4d)) {
- pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
- if (!pud)
- goto err;
- image->arch.pud = pud;
- set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
- }
- pud = pud_offset(p4d, vaddr);
- if (!pud_present(*pud)) {
- pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
- if (!pmd)
- goto err;
- image->arch.pmd = pmd;
- set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
- }
- pmd = pmd_offset(pud, vaddr);
- if (!pmd_present(*pmd)) {
- pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
- if (!pte)
- goto err;
- image->arch.pte = pte;
- set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
- }
- pte = pte_offset_kernel(pmd, vaddr);
-
- if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
- prot = PAGE_KERNEL_EXEC;
-
- set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
- return 0;
-err:
- return result;
+ return kernel_ident_mapping_init(&info, pgd, mstart, mend);
}
static void *alloc_pgt_page(void *data)
@@ -299,6 +270,8 @@ int machine_kexec_prepare(struct kimage *image)
unsigned long start_pgtable;
int result;
+ INIT_LIST_HEAD(&image->arch.pages);
+
/* Calculate the offsets */
start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init()
2024-08-14 12:46 ` [PATCHv2 4/4] x86/64/kexec: Rewrite init_transition_pgtable() with kernel_ident_mapping_init() Kirill A. Shutemov
@ 2024-08-14 16:02 ` Tom Lendacky
0 siblings, 0 replies; 13+ messages in thread
From: Tom Lendacky @ 2024-08-14 16:02 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Rafael J. Wysocki,
Andy Lutomirski, Peter Zijlstra, Baoquan He
Cc: Ard Biesheuvel, Andrew Morton, Thomas Zimmermann,
Sean Christopherson, linux-kernel, linux-acpi
On 8/14/24 07:46, Kirill A. Shutemov wrote:
> init_transition_pgtable() sets up transitional page tables. Rewrite it
> using kernel_ident_mapping_init() to avoid code duplication.
>
> Change struct kimage_arch to track allocated page tables as a list, not
> linking them to specific page table levels.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> arch/x86/include/asm/kexec.h | 5 +-
> arch/x86/kernel/machine_kexec_64.c | 89 +++++++++++-------------------
> 2 files changed, 32 insertions(+), 62 deletions(-)
>
^ permalink raw reply [flat|nested] 13+ messages in thread