From: Nathan Chancellor <nathan@kernel.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: "Ning, Hongyu" <hongyu.ning@linux.intel.com>,
kexec@lists.infradead.org, Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Kai Huang <kai.huang@intel.com>,
Nikolay Borisov <nik.borisov@suse.com>,
linux-kernel@vger.kernel.org, Simon Horman <horms@kernel.org>,
Dave Young <dyoung@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
jpoimboe@kernel.org, bsz@amazon.de
Subject: Re: [PATCH v5 13/20] x86/kexec: Mark relocate_kernel page as ROX instead of RWX
Date: Thu, 12 Dec 2024 10:42:43 -0700 [thread overview]
Message-ID: <20241212174243.GA2149156@ax162> (raw)
In-Reply-To: <38aaf87162d10c79b3d3ecae38df99e89ad16fce.camel@infradead.org>
On Thu, Dec 12, 2024 at 05:00:00PM +0000, David Woodhouse wrote:
> >
> > Here is the output that I see with that patch applied when rebooting via
> > 'systemctl kexec':
> >
> > RAX=000000010c070001 RBX=0000000000000000 RCX=0000000000000000 RDX=000000047fffb1a0
> > RSI=000000011c444000 RDI=000000011c450002 RBP=ff1cd0424d6e8c00 RSP=ff4178d5c5aebc60
> > R8 =0000000000000000 R9 =000000011c446000 R10=ffffffff909f3e00 R11=0000000000000003
> > R12=0000000000000000 R13=0000000000000001 R14=00000000fee1dead R15=0000000000000000
> > RIP=ff1cd0425c44401c RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > LDT=0000 0000000000000000 ffffffff 00c00000
> > TR =0040 fffffe043fedb000 00004087 00008b00 DPL=0 TSS64-busy
> > GDT= 0000000000000000 00000000
> > IDT= 0000000000000000 00000000
> > CR0=80050033 CR2=ff1cd0425c4441de CR3=000000011c446000 CR4=00771ef0
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> > DR6=00000000fffe0ff0 DR7=0000000000000400
> > EFER=0000000000000d01
> > Code=41 57 9c 6a 00 9d 0f 20 d8 4c 8b 0d ee 01 00 00 41 0f 22 d9 <48> 89 25 bb 01 00 00 48 89 05 c4 01 00 00 0f 20 c0 48 89 05 b2 01 00 00 41 0f 20 e5 4c 89
>
> ...
>
> > I have attached the output of 'objdump -S'. Please let me know if you
> > would like any other information or testing.
>
> /* Switch to the identity mapped page tables */
> movq %cr3, %rax
> e: 0f 20 d8 mov %cr3,%rax
> movq kexec_pa_table_page(%rip), %r9
> 11: 4c 8b 0d 00 00 00 00 mov 0x0(%rip),%r9 # 18 <relocate_kernel+0x18>
> movq %r9, %cr3
> 18: 41 0f 22 d9 mov %r9,%cr3
>
> /* Save %rsp and CRs. */
> movq %rsp, saved_rsp(%rip)
> 1c: 48 89 25 00 00 00 00 mov %rsp,0x0(%rip) # 23 <relocate_kernel+0x23>
> movq %rax, saved_cr3(%rip)
>
>
> So it's faulting when it tries to write to saved_rsp, which is the
> first instruction after it loads the new page tables into %cr3.
>
> Before loading %cr3, the CPU is running on the original kernel page
> tables, It's running from the *virtual* address of the control page,
> which in your case is ff1cd0425c444000. In those old page tables, this
> control page is read-only.
>
> But in the newly-loaded page tables, this control page should be read-
> write.
>
> What CPU is this? Shouldn't loading %cr3 have caused the existing TLB
> entry to be flushed?
The machine I have mainly been reproducing and testing this issue on is
an Intel Xeon Gold 6314U but I initially noticed this problem on a few
consumer AMD and Intel chips that I have locally.
> Can you tell me what happens if we don't write through *that* virtual
> mapping of the page, but instead do it a few instructions later through
> the identity mapping of the same page (which surely *won't* have an
> older, non-writeable TLB entry already...)?
Applying that diff on top of 5a82223e0743 makes the issue go away for
me.
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 8bc86a1e056a..2763c509e513 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -70,6 +70,18 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
> movq kexec_pa_table_page(%rip), %r9
> movq %r9, %cr3
>
> + /* setup a new stack at the end of the physical control page */
> + lea PAGE_SIZE(%rsi), %rsp
> +
> + /* jump to identity mapped page */
> + addq $(identity_mapped - relocate_kernel), %rsi
> + ANNOTATE_RETPOLINE_SAFE
> + jmp *%rsi
> + int3
> +SYM_CODE_END(relocate_kernel)
> +
> +SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> + UNWIND_HINT_END_OF_STACK
> /* Save %rsp and CRs. */
> movq %rsp, saved_rsp(%rip)
> movq %rax, saved_cr3(%rip)
> @@ -85,19 +97,6 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
> /* Save the preserve_context to %r11 as swap_pages clobbers %rcx. */
> movq %rcx, %r11
>
> - /* setup a new stack at the end of the physical control page */
> - lea PAGE_SIZE(%rsi), %rsp
> -
> - /* jump to identity mapped page */
> - addq $(identity_mapped - relocate_kernel), %rsi
> - pushq %rsi
> - ANNOTATE_UNRET_SAFE
> - ret
> - int3
> -SYM_CODE_END(relocate_kernel)
> -
> -SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> - UNWIND_HINT_END_OF_STACK
> /*
> * %rdi indirection page
> * %rdx start address
>
next prev parent reply other threads:[~2024-12-12 17:47 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-05 15:05 [PATCH v5 00/20] x86/kexec: Add exception handling for relocate_kernel and further yak-shaving David Woodhouse
2024-12-05 15:05 ` [PATCH v5 01/20] x86/kexec: Restore GDT on return from preserve_context kexec David Woodhouse
2024-12-06 10:16 ` [tip: x86/urgent] x86/kexec: Restore GDT on return from ::preserve_context kexec tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 02/20] x86/kexec: Clean up and document register use in relocate_kernel_64.S David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 03/20] x86/kexec: Use named labels in swap_pages " David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 04/20] x86/kexec: Only swap pages for preserve_context mode David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] x86/kexec: Only swap pages for ::preserve_context mode tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 05/20] x86/kexec: Allocate PGD for x86_64 transition page tables separately David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 06/20] x86/kexec: Copy control page into place in machine_kexec_prepare() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 07/20] x86/kexec: Invoke copy of relocate_kernel() instead of the original David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-14 23:08 ` [PATCH v5 07/20] " Nathan Chancellor
2024-12-15 7:19 ` David Woodhouse
2024-12-15 10:09 ` David Woodhouse
2024-12-16 5:49 ` Nathan Chancellor
2024-12-16 8:13 ` David Woodhouse
2024-12-16 12:09 ` David Woodhouse
2024-12-17 12:03 ` David Woodhouse
2024-12-18 9:03 ` Josh Poimboeuf
2024-12-18 9:44 ` David Woodhouse
2024-12-18 21:23 ` Josh Poimboeuf
2024-12-18 22:27 ` David Woodhouse
2024-12-19 0:20 ` Josh Poimboeuf
2024-12-19 10:02 ` David Woodhouse
2024-12-19 22:28 ` Josh Poimboeuf
2024-12-05 15:05 ` [PATCH v5 08/20] x86/kexec: Move relocate_kernel to kernel .data section David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 09/20] x86/kexec: Add data section to relocate_kernel David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 10/20] x86/kexec: Drop page_list argument from relocate_kernel() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 11/20] x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 12/20] x86/kexec: Clean up register usage in relocate_kernel() David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-05 15:05 ` [PATCH v5 13/20] x86/kexec: Mark relocate_kernel page as ROX instead of RWX David Woodhouse
2024-12-06 10:16 ` [tip: x86/boot] " tip-bot2 for David Woodhouse
2024-12-12 1:44 ` [PATCH v5 13/20] " Nathan Chancellor
2024-12-12 10:30 ` David Woodhouse
2024-12-12 15:04 ` Nathan Chancellor
2024-12-12 17:00 ` David Woodhouse
2024-12-12 17:42 ` Nathan Chancellor [this message]
2024-12-12 19:31 ` David Woodhouse
2024-12-12 20:11 ` [PATCH] x86/kexec: Only write through identity mapping of control page David Woodhouse
2024-12-12 20:31 ` Nathan Chancellor
2024-12-12 21:18 ` Dave Hansen
2024-12-12 21:32 ` David Woodhouse
2024-12-12 21:43 ` Dave Hansen
2024-12-12 21:59 ` David Woodhouse
2024-12-12 23:08 ` [PATCH] x86/kexec: Disable global pages before writing to " David Woodhouse
2024-12-13 7:51 ` Ning, Hongyu
2024-12-13 6:47 ` [PATCH] x86/kexec: Only write through identity mapping of " Ning, Hongyu
2024-12-12 3:03 ` [PATCH v5 13/20] x86/kexec: Mark relocate_kernel page as ROX instead of RWX Ning, Hongyu
2024-12-12 10:13 ` David Woodhouse
2024-12-13 6:45 ` Ning, Hongyu
2024-12-13 7:01 ` David Woodhouse
2024-12-13 7:41 ` Ning, Hongyu
2024-12-05 15:05 ` [PATCH v5 14/20] x86/kexec: Add CONFIG_KEXEC_DEBUG option David Woodhouse
2024-12-05 15:05 ` [PATCH v5 15/20] x86/kexec: Debugging support: load a GDT David Woodhouse
2024-12-05 15:05 ` [PATCH v5 16/20] x86/kexec: Debugging support: Load an IDT and basic exception entry points David Woodhouse
2024-12-05 15:05 ` [PATCH v5 17/20] x86/kexec: Debugging support: Dump registers on exception David Woodhouse
2024-12-05 15:05 ` [PATCH v5 18/20] x86/kexec: Add 8250 serial port output David Woodhouse
2024-12-05 15:05 ` [PATCH v5 19/20] x86/kexec: Add 8250 MMIO " David Woodhouse
2024-12-05 15:05 ` [PATCH v5 20/20] [DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241212174243.GA2149156@ax162 \
--to=nathan@kernel.org \
--cc=bp@alien8.de \
--cc=bsz@amazon.de \
--cc=dave.hansen@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=dyoung@redhat.com \
--cc=hongyu.ning@linux.intel.com \
--cc=horms@kernel.org \
--cc=hpa@zytor.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=kexec@lists.infradead.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.