From: Nathan Chancellor <nathan@kernel.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: kexec@lists.infradead.org, Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
David Woodhouse <dwmw@amazon.co.uk>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Kai Huang <kai.huang@intel.com>,
Nikolay Borisov <nik.borisov@suse.com>,
linux-kernel@vger.kernel.org, Simon Horman <horms@kernel.org>,
Dave Young <dyoung@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
jpoimboe@kernel.org, bsz@amazon.de
Subject: Re: [PATCH v5 07/20] x86/kexec: Invoke copy of relocate_kernel() instead of the original
Date: Sat, 14 Dec 2024 16:08:18 -0700 [thread overview]
Message-ID: <20241214230818.GA677337@ax162> (raw)
In-Reply-To: <20241205153343.3275139-8-dwmw2@infradead.org>
Hi David,
On Thu, Dec 05, 2024 at 03:05:13PM +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> This currently calls set_memory_x() from machine_kexec_prepare() just
> like the 32-bit version does. That's actually a bit earlier than I'd
> like, as it leaves the page RWX all the time the image is even *loaded*.
>
> Subsequent commits will eliminate all the writes to the page between the
> point it's marked executable in machine_kexec_prepare() the time that
> relocate_kernel() is running and has switched to the identmap %cr3, so
> that it can be ROX. But that can't happen until it's moved to the .data
> section of the kernel, and *that* can't happen until we start executing
> the copy instead of executing it in place in the kernel .text. So break
> the circular dependency in those commits by letting it be RWX for now.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> arch/x86/kernel/machine_kexec_64.c | 30 ++++++++++++++++++++++------
> arch/x86/kernel/relocate_kernel_64.S | 5 ++++-
> 2 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 3a4cbac1a0c6..9567347f7a9b 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
...
> void machine_kexec(struct kimage *image)
> {
> + unsigned long (*relocate_kernel_ptr)(unsigned long indirection_page,
> + unsigned long page_list,
> + unsigned long start_address,
> + unsigned int preserve_context,
> + unsigned int host_mem_enc_active);
> unsigned long page_list[PAGES_NR];
> unsigned int host_mem_enc_active;
> int save_ftrace_enabled;
> @@ -371,6 +387,8 @@ void machine_kexec(struct kimage *image)
> page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
> << PAGE_SHIFT);
>
> + relocate_kernel_ptr = control_page;
> +
Because of this change...
> /*
> * The segment registers are funny things, they have both a
> * visible and an invisible part. Whenever the visible part is
> @@ -390,11 +408,11 @@ void machine_kexec(struct kimage *image)
> native_gdt_invalidate();
>
> /* now call it */
> - image->start = relocate_kernel((unsigned long)image->head,
> - (unsigned long)page_list,
> - image->start,
> - image->preserve_context,
> - host_mem_enc_active);
> + image->start = relocate_kernel_ptr((unsigned long)image->head,
> + (unsigned long)page_list,
> + image->start,
> + image->preserve_context,
> + host_mem_enc_active);
kexec-ing on a CONFIG_CFI_CLANG kernel crashes and burns:
RAX=0000000000000018 RBX=ff408cf982596000 RCX=0000000000000000 RDX=000000047fffb280
RSI=ff6c99a785943d20 RDI=000000011e145002 RBP=ff6c99a785943d70 RSP=ff6c99a785943d10
R8 =0000000000000000 R9 =0000000000000000 R10=00000000ab150dc6 R11=ff408cf99e139000
R12=0000000028121969 R13=00000000fee1dead R14=0000000000000000 R15=0000000000000001
RIP=ffffffff84c9c510 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 fffffe441f360000 00004087 00008b00 DPL=0 TSS64-busy
GDT= 0000000000000000 00000000
IDT= 0000000000000000 00000000
CR0=80050033 CR2=00007ffde71dbfc0 CR3=00000001140b0002 CR4=00771ef0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=83 e1 01 48 8d 74 24 10 41 ba c6 0d 15 ab 45 03 53 f1 74 02 <0f> 0b 41 ff d3 0f 1f 00 48 89 43 18 f6 83 78 02 00 00 02 74 05 e8 16 06 da 00 44 89 3d 17
a0c: 83 e1 01 andl $0x1, %ecx
a0f: 48 8d 74 24 10 leaq 0x10(%rsp), %rsi
; image->start = relocate_kernel_ptr((unsigned long)image->head,
a14: 41 ba 67 a6 7c e6 movl $0xe67ca667, %r10d # imm = 0xE67CA667
a1a: 45 03 53 f1 addl -0xf(%r11), %r10d
a1e: 74 02 je 0xa22 <machine_kexec+0x1c2>
a20: 0f 0b ud2
a22: 2e e8 00 00 00 00 callq 0xa28 <machine_kexec+0x1c8>
a28: 48 89 43 18 movq %rax, 0x18(%rbx)
I guess this seems somewhat unavoidable because control_page is just a
'void *', perhaps machine_kexec() should just be marked as __nocfi? This
diff resolves that issue for me.
Cheers,
Nathan
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 9567347f7a9b..e77110c4bb91 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -334,7 +334,7 @@ void machine_kexec_cleanup(struct kimage *image)
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
-void machine_kexec(struct kimage *image)
+void __nocfi machine_kexec(struct kimage *image)
{
unsigned long (*relocate_kernel_ptr)(unsigned long indirection_page,
unsigned long page_list,
next prev parent reply other threads:[~2024-12-14 23:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-05 15:05 [PATCH v5 00/20] x86/kexec: Add exception handling for relocate_kernel and further yak-shaving David Woodhouse
2024-12-05 15:05 ` [PATCH v5 01/20] x86/kexec: Restore GDT on return from preserve_context kexec David Woodhouse
2024-12-06 10:16 ` [tip: x86/urgent] x86/kexec: Restore GDT on return from ::preserve_context kexec tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 02/20] x86/kexec: Clean up and document register use in relocate_kernel_64.S David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 03/20] x86/kexec: Use named labels in swap_pages " David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 04/20] x86/kexec: Only swap pages for preserve_context mode David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] x86/kexec: Only swap pages for ::preserve_context mode tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 05/20] x86/kexec: Allocate PGD for x86_64 transition page tables separately David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 06/20] x86/kexec: Copy control page into place in machine_kexec_prepare() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 07/20] x86/kexec: Invoke copy of relocate_kernel() instead of the original David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-14 23:08 ` Nathan Chancellor [this message]
2024-12-15 7:19 ` [PATCH v5 07/20] " David Woodhouse
2024-12-15 10:09 ` David Woodhouse
2024-12-16 5:49 ` Nathan Chancellor
2024-12-16 8:13 ` David Woodhouse
2024-12-16 12:09 ` David Woodhouse
2024-12-17 12:03 ` David Woodhouse
2024-12-18 9:03 ` Josh Poimboeuf
2024-12-18 9:44 ` David Woodhouse
2024-12-18 21:23 ` Josh Poimboeuf
2024-12-18 22:27 ` David Woodhouse
2024-12-19 0:20 ` Josh Poimboeuf
2024-12-19 10:02 ` David Woodhouse
2024-12-19 22:28 ` Josh Poimboeuf
2024-12-05 15:05 ` [PATCH v5 08/20] x86/kexec: Move relocate_kernel to kernel .data section David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 09/20] x86/kexec: Add data section to relocate_kernel David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 10/20] x86/kexec: Drop page_list argument from relocate_kernel() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 11/20] x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 12/20] x86/kexec: Clean up register usage in relocate_kernel() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 13/20] x86/kexec: Mark relocate_kernel page as ROX instead of RWX David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-12 1:44 ` [PATCH v5 13/20] " Nathan Chancellor
2024-12-12 10:30 ` David Woodhouse
2024-12-12 15:04 ` Nathan Chancellor
2024-12-12 17:00 ` David Woodhouse
2024-12-12 17:42 ` Nathan Chancellor
2024-12-12 19:31 ` David Woodhouse
2024-12-12 20:11 ` [PATCH] x86/kexec: Only write through identity mapping of control page David Woodhouse
2024-12-12 20:31 ` Nathan Chancellor
2024-12-12 21:18 ` Dave Hansen
2024-12-12 21:32 ` David Woodhouse
2024-12-12 21:43 ` Dave Hansen
2024-12-12 21:59 ` David Woodhouse
2024-12-12 23:08 ` [PATCH] x86/kexec: Disable global pages before writing to " David Woodhouse
2024-12-13 7:51 ` Ning, Hongyu
2024-12-13 6:47 ` [PATCH] x86/kexec: Only write through identity mapping of " Ning, Hongyu
2024-12-12 3:03 ` [PATCH v5 13/20] x86/kexec: Mark relocate_kernel page as ROX instead of RWX Ning, Hongyu
2024-12-12 10:13 ` David Woodhouse
2024-12-13 6:45 ` Ning, Hongyu
2024-12-13 7:01 ` David Woodhouse
2024-12-13 7:41 ` Ning, Hongyu
2024-12-05 15:05 ` [PATCH v5 14/20] x86/kexec: Add CONFIG_KEXEC_DEBUG option David Woodhouse
2024-12-05 15:05 ` [PATCH v5 15/20] x86/kexec: Debugging support: load a GDT David Woodhouse
2024-12-05 15:05 ` [PATCH v5 16/20] x86/kexec: Debugging support: Load an IDT and basic exception entry points David Woodhouse
2024-12-05 15:05 ` [PATCH v5 17/20] x86/kexec: Debugging support: Dump registers on exception David Woodhouse
2024-12-05 15:05 ` [PATCH v5 18/20] x86/kexec: Add 8250 serial port output David Woodhouse
2024-12-05 15:05 ` [PATCH v5 19/20] x86/kexec: Add 8250 MMIO " David Woodhouse
2024-12-05 15:05 ` [PATCH v5 20/20] [DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241214230818.GA677337@ax162 \
--to=nathan@kernel.org \
--cc=bp@alien8.de \
--cc=bsz@amazon.de \
--cc=dave.hansen@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=dwmw@amazon.co.uk \
--cc=dyoung@redhat.com \
--cc=horms@kernel.org \
--cc=hpa@zytor.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=kexec@lists.infradead.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.