From: Ingo Molnar <mingo@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de,
bp@alien8.de, joro@8bytes.org, luto@kernel.org,
peterz@infradead.org, kirill.shutemov@linux.intel.com,
rick.p.edgecombe@intel.com, jgross@suse.com
Subject: Re: [RFC][PATCH 0/8] x86/mm: Simplify PAE page table handling
Date: Mon, 24 Feb 2025 19:55:01 +0100 [thread overview]
Message-ID: <Z7zAhSAzpU_MCGnO@gmail.com> (raw)
In-Reply-To: <20250123172428.D6D8C8D9@davehans-spike.ostc.intel.com>
* Dave Hansen <dave.hansen@linux.intel.com> wrote:
> tl;dr: 32-bit PAE page table handing is a bit different when PTI
> is on and off. Making the handling uniform removes a good amount
> of code at the cost of not sharing kernel PMDs. The downside of
> this simplification is bloating non-PTI PAE kernels by ~2 pages
> per process.
>
> Anyone who cares about security on 32-bit is running with PTI and
> PAE because PAE has the No-eXecute page table bit. They are already
> paying the 2-page penalty. Anyone who cares more about memory
> footprint than security is probably already running a !PAE kernel
> and will not be affected by this.
>
> --
>
> There are two 32-bit x86 hardware page table formats. A 2-level one
> with 32-bit pte_t's and a 3-level one with 64-bit pte_t's called PAE.
> But the PAE one is wonky. It effectively loses a bit of addressing
> radix per level since its PTEs are twice as large. It makes up for
> that by adding the third level, but with only 4 entries in the level.
>
> This leads to all kinds of fun because this level only needs 32 bytes
> instead of a whole page. Also, since it has only 4 entries in the top
> level, the hardware just always caches the entire thing aggressively.
> Modifying a PAE pgd_t ends up needing different rules than the other
> other x86 paging modes and probably every other architecture too.
>
> PAE support got even weirder when Xen came along. Xen wants to trap
> into the hypervisor on page table writes and so it protects the guest
> page tables with paging protections. It can't protect a 32 byte
> object with paging protections so it bloats the 32-byte object out
> to a page. Xen also didn't support sharing kernel PMD pages. This
> is mostly moot now because the Xen support running as a 32-bit guest
> was ripped out, but there are still remnants around.
>
> PAE also interacts with PTI in fun and exciting ways. Since pgd
> updates are so fraught, the PTI PAE implementation just chose to
> avoid pgd updates by preallocating all the PMDs up front since
> there are only 4 instead of 512 or 1024 in the other x86 paging
> modes.
>
> Make PAE less weird:
> * Always allocate a page for PAE PGDs. This brings them in line
> with the other 2 paging modes. It was done for Xen and for
> PTI already and nobody screamed, so just do it everywhere.
> * Never share kernel PMD pages. This brings PAE in line with
> 32-bit !PAE and 64-bit.
> * Always preallocate all PAE PMD pages. This basically makes
> all PAE kernels behave like PTI ones. It might waste a page
> of memory, but all 4 pages probably get allocated in the common
> case anyway.
>
> --
>
> include/asm/pgtable-2level_types.h | 2
> include/asm/pgtable-3level_types.h | 4 -
> include/asm/pgtable_64_types.h | 2
> mm/pat/set_memory.c | 2
> mm/pgtable.c | 104 +++++--------------------------------
> 5 files changed, 18 insertions(+), 96 deletions(-)
The diffstat alone is pretty nice, so I'd suggest we pursue this series
even if continued work on 32-bit kernel features is being questioned.
Until the code exists and isn't explicitly marked as obsolete, such
changes are legit.
Thanks,
Ingo
prev parent reply other threads:[~2025-02-24 18:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 17:24 [RFC][PATCH 0/8] x86/mm: Simplify PAE page table handling Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 1/8] x86/mm: Always allocate a whole page for PAE PGDs Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 2/8] x86/mm: Always "broadcast" PMD setting operations Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 3/8] x86/mm: Always tell core mm to sync kernel mappings Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 4/8] x86/mm: Simplify PAE PGD sharing macros Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 5/8] x86/mm: Fix up comments around PMD preallocation Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 6/8] x86/mm: Preallocate all PAE page tables Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 7/8] x86/mm: Remove duplicated PMD preallocation macro Dave Hansen
2025-01-23 17:24 ` [RFC][PATCH 8/8] x86/mm: Remove now unused SHARED_KERNEL_PMD Dave Hansen
2025-01-23 21:49 ` [RFC][PATCH 0/8] x86/mm: Simplify PAE page table handling Peter Zijlstra
2025-01-23 23:06 ` Dave Hansen
2025-01-24 7:58 ` Joerg Roedel
2025-01-24 19:12 ` Dave Hansen
2025-01-28 8:13 ` Joerg Roedel
2025-01-24 8:52 ` Peter Zijlstra
2025-02-24 18:55 ` Ingo Molnar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7zAhSAzpU_MCGnO@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=jgross@suse.com \
--cc=joro@8bytes.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox