From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>,
Juergen Gross <jgross@suse.com>, Sasha Levin <sashal@kernel.org>,
xen-devel@lists.xenproject.org
Subject: [PATCH AUTOSEL 6.19-5.10] x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set
Date: Wed, 11 Feb 2026 07:30:28 -0500 [thread overview]
Message-ID: <20260211123112.1330287-18-sashal@kernel.org> (raw)
In-Reply-To: <20260211123112.1330287-1-sashal@kernel.org>
From: Hou Wenlong <houwenlong.hwl@antgroup.com>
[ Upstream commit db9aded979b491a24871e1621cd4e8822dbca859 ]
The PVH entry is available for 32-bit KVM guests, and 32-bit KVM guests
do not depend on CONFIG_X86_PAE. However, mk_early_pgtbl_32() builds
different pagetables depending on whether CONFIG_X86_PAE is set.
Therefore, enabling PAE mode for 32-bit KVM guests without
CONFIG_X86_PAE being set would result in a boot failure during CR3
loading.
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <d09ce9a134eb9cbc16928a5b316969f8ba606b81.1768017442.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed for a complete analysis.
---
## Detailed Analysis
### 1. Commit Message Analysis
The subject clearly states the problem: PAE mode is being
unconditionally enabled for all PVH guests, but for 32-bit guests it
should only be enabled when `CONFIG_X86_PAE` is set. The commit message
explicitly describes a **boot failure** scenario: loading CR3 fails when
the page table format doesn't match the CPU's paging mode expectation.
The commit is:
- Reviewed by Juergen Gross (Xen subsystem maintainer)
- Signed off by Juergen Gross (subsystem maintainer sign-off)
### 2. Code Change Analysis
**The Bug Mechanism:**
The PVH entry path in `arch/x86/platform/pvh/head.S` is the boot entry
point for PVH (Para-Virtualized Hardware) guests, used by Xen and KVM.
The flow is:
1. **Line 94-97** (before fix): PAE mode is unconditionally enabled in
CR4:
```94:97:arch/x86/platform/pvh/head.S
/* Enable PAE mode. */
mov %cr4, %eax
orl $X86_CR4_PAE, %eax
mov %eax, %cr4
```
2. For **64-bit** guests (`CONFIG_X86_64`), this is correct — PAE is
always needed as a prerequisite for long mode (line 99-104).
3. For **32-bit** guests (the `#else` path starting at line 196), the
code calls `mk_early_pgtbl_32()` to build early page tables:
```196:205:arch/x86/platform/pvh/head.S
#else /* CONFIG_X86_64 */
call mk_early_pgtbl_32
mov $_pa(initial_page_table), %eax
mov %eax, %cr3
mov %cr0, %eax
or $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
```
4. `mk_early_pgtbl_32()` in `arch/x86/kernel/head32.c` builds
**fundamentally different** page table structures depending on
`CONFIG_X86_PAE`:
```95:103:arch/x86/kernel/head32.c
#ifdef CONFIG_X86_PAE
typedef pmd_t pl2_t;
#define pl2_base initial_pg_pmd
#define SET_PL2(val) { .pmd = (val), }
#else
typedef pgd_t pl2_t;
#define pl2_base initial_page_table
#define SET_PL2(val) { .pgd = (val), }
#endif
```
- With `CONFIG_X86_PAE`: Builds **3-level PAE page tables**
(PGDIR_SHIFT=30, uses PMDs + PDPTEs)
- Without `CONFIG_X86_PAE`: Builds **2-level non-PAE page tables**
(PGDIR_SHIFT=22, uses PGDs directly)
**The crash**: When PAE is enabled in CR4 but non-PAE page tables are
loaded into CR3, the CPU interprets the 2-level page directory as a PAE
PDPT (Page Directory Pointer Table). When paging is activated
(CR0.PG=1), the processor tries to load the PDPTE entries from the
address in CR3. The non-PAE page directory entries are completely
incompatible with PAE PDPTE format, causing a **#GP fault or triple
fault**, resulting in an immediate boot failure.
**The Fix:** Simply wrapping the PAE enablement with proper `#ifdef`
guards:
```asm
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
/* Enable PAE mode. */
mov %cr4, %eax
orl $X86_CR4_PAE, %eax
mov %eax, %cr4
#endif
```
This ensures PAE is only enabled when:
- `CONFIG_X86_64` is set (64-bit always needs PAE for long mode), or
- `CONFIG_X86_PAE` is set (32-bit with PAE — page tables match)
Note that the 32-bit path at lines 212-220 already has code to disable
PAE before jumping to `startup_32`, which confirms the original author
was aware that PAE and non-PAE modes exist, but the initial enablement
was not properly guarded.
### 3. Classification
This is a **boot failure fix**. It's not a feature, cleanup, or
optimization. It fixes a configuration where a 32-bit PVH guest without
`CONFIG_X86_PAE` completely fails to boot.
### 4. Scope and Risk Assessment
- **Lines changed**: 2 lines added (`#if defined(...)` and `#endif`), 0
lines removed
- **Files touched**: 1 (`arch/x86/platform/pvh/head.S`)
- **Complexity**: Minimal — conditional compilation guard
- **Risk**: Extremely low
- For `CONFIG_X86_64`: No change (the `#if` is always true)
- For `CONFIG_X86_32` with `CONFIG_X86_PAE`: No change (the `#if` is
true)
- For `CONFIG_X86_32` without `CONFIG_X86_PAE`: PAE is no longer
enabled, matching the page table format — this is the bug fix
- **Regression potential**: Near zero. The only behavioral change is for
the broken configuration that currently crashes.
### 5. User Impact
- **Who is affected**: Anyone running a 32-bit kernel without PAE as a
PVH/KVM guest. This is a legitimate configuration since `config PVH`
has no dependency on `CONFIG_X86_PAE` or `CONFIG_X86_64`.
- **Severity**: Complete boot failure — the system cannot boot at all
- **Workaround**: Users must enable `CONFIG_X86_PAE` for 32-bit PVH
guests, which may not be obvious and adds unnecessary configuration
constraints
### 6. Stability Indicators
- **Reviewed-by**: Juergen Gross (Xen subsystem maintainer) — very
strong indicator
- **Signed-off-by**: Juergen Gross (accepted through the Xen tree)
- The fix is trivially correct by inspection — when `CONFIG_X86_PAE` is
not set, enabling PAE in CR4 is wrong because the page tables aren't
in PAE format
### 7. Dependency Check
- **No dependencies** on other patches — this is a completely self-
contained fix
- The affected code (`pvh_start_xen` with the unconditional PAE
enablement) exists in **all current stable trees** (confirmed present
from v5.4 through v6.12)
- The fix context is slightly different in older stable trees (e.g.,
`rep\n movsl` vs `rep movsl`, PIC vs absolute addressing), so
minor backport adjustments may be needed for older trees, but the fix
itself (adding `#if`/`#endif` around the PAE lines) is trivial to
adapt
### 8. Conclusion
This is a textbook stable kernel fix:
- **Fixes a real, severe bug**: Complete boot failure for a valid kernel
configuration
- **Obviously correct**: The fix is a 2-line conditional compilation
guard that makes PAE enablement match page table format
- **Small and contained**: 2 lines added to 1 file
- **No new features**: Just correctness fix
- **No risk of regression**: Only changes behavior for the currently-
broken configuration
- **Present in all stable trees**: The bug has existed since the
original PVH commit (v4.11, 2017)
- **Reviewed by subsystem maintainer**: Juergen Gross
**YES**
arch/x86/platform/pvh/head.S | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 344030c1a81d4..53ee2d53fcf8e 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -91,10 +91,12 @@ SYM_CODE_START(pvh_start_xen)
leal rva(early_stack_end)(%ebp), %esp
+#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
/* Enable PAE mode. */
mov %cr4, %eax
orl $X86_CR4_PAE, %eax
mov %eax, %cr4
+#endif
#ifdef CONFIG_X86_64
/* Enable Long mode. */
--
2.51.0
next prev parent reply other threads:[~2026-02-11 12:31 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-11 12:30 [PATCH AUTOSEL 6.19-5.10] s390/perf: Disable register readout on sampling events Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] arm64: Add support for TSV110 Spectre-BHB mitigation Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] xenbus: Use .freeze/.thaw to handle xenbus devices Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] s390/boot: " Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.1] perf/arm-cmn: Support CMN-600AE Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] ntfs: ->d_compare() must not block Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] block: decouple secure erase size limit from discard size limit Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: don't reference obsolete termio struct for TC* constants Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't go past the ARM processor CPER record buffer Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] cpufreq: dt-platdev: Block the driver from probing on more QC platforms Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't dump the entire memory region Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: battery: fix incorrect charging status when current is zero Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] rust: cpufreq: always inline functions using build_assert with arguments Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] blk-mq-sched: unify elevators checking for async requests Sasha Levin
2026-02-11 12:30 ` Sasha Levin [this message]
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] APEI/GHES: ARM processor Error: don't go past allocated memory Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] md raid: fix hang when stopping arrays with metadata through dm-raid Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] tools/power cpupower: Reset errno before strtoull() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: Synchronize user stack on fork and clone Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] rnbd-srv: Zero the rsp buffer before using it Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] alpha: fix user-space corruption during memory compaction Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] ACPICA: Abort AML bytecode execution when executing AML_FATAL_OP Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] tools/cpupower: Fix inverted APERF capability check Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.15] ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[] Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] md-cluster: fix NULL pointer dereference in process_metadata_update Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] APEI/GHES: ensure that won't go past CPER allocated record Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] powercap: intel_rapl: Add PL4 support for Ice Lake Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260211123112.1330287-18-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=houwenlong.hwl@antgroup.com \
--cc=jgross@suse.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox