From: Mike Rapoport <rppt@kernel.org>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Ingo Molnar <mingo@redhat.com>, "H . Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@kernel.org>,
linux-efi@vger.kernel.org, linux-mm@kvack.org,
stable@vger.kernel.org
Subject: Re: [PATCH] x86/efi: defer freeing of boot services memory
Date: Mon, 23 Feb 2026 13:40:53 +0200 [thread overview]
Message-ID: <aZw8xSI-TM-Gz84t@kernel.org> (raw)
In-Reply-To: <e2ad0845-2f87-418a-9f87-5ce619e004ef@app.fastmail.com>
On Mon, Feb 23, 2026 at 12:17:22PM +0100, Ard Biesheuvel wrote:
>
> On Mon, 23 Feb 2026, at 11:55, Mike Rapoport wrote:
> > Hi Ard,
> >
> > On Mon, Feb 23, 2026 at 09:08:29AM +0100, Ard Biesheuvel wrote:
> >> Hi Mike,
> >>
> >> On Mon, 23 Feb 2026, at 08:52, Mike Rapoport wrote:
> >> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >> >
> >> > efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
> >> > and EFI_BOOT_SERVICES_DATA using memblock_free_late().
> >> >
> >> > There are two issue with that: memblock_free_late() should be used for
> >> > memory allocated with memblock_alloc() while the memory reserved with
> >> > memblock_reserve() should be freed with free_reserved_area().
> >> >
> >> > More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
> >> > efi_free_boot_services() is called before deferred initialization of the
> >> > memory map is complete.
> >> >
> >> > Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
> >> > RAM on EC2 t3a.nano instances which only have 512MB or RAM.
> >> >
> >> > If the freed memory resides in the areas that memory map for them is
> >> > still uninitialized, they won't be actually freed because
> >> > memblock_free_late() calls memblock_free_pages() and the latter skips
> >> > uninitialized pages.
> >> >
> >> > Using free_reserved_area() at this point is also problematic because
> >> > __free_page() accesses the buddy of the freed page and that again might
> >> > end up in uninitialized part of the memory map.
> >> >
> >> > Delaying the entire efi_free_boot_services() could be problematic
> >> > because in addition to freeing boot services memory it updates
> >> > efi.memmap without any synchronization and that's undesirable late in
> >> > boot when there is concurrency.
> >> >
> >> > More robust approach is to only defer freeing of the EFI boot services
> >> > memory.
> >> >
> >> > Make efi_free_boot_services() collect ranges that should be freed into
> >> > an array and add an initcall efi_free_boot_services_memory() that walks
> >> > that array and actually frees the memory using free_reserved_area().
> >> >
> >>
> >> Instead of creating another table, could we just traverse the EFI memory
> >> map again in the arch_initcall(), and free all boot services code/data
> >> above 1M with EFI_MEMORY_RUNTIME cleared ?
> >
> > Currently efi_free_boot_services() unmaps all boot services code/data with
> > EFI_MEMORY_RUNTIME cleared and removes them from the efi.memmap.
>
> Ah yes, I failed to spot that those entries are long gone by initcall
> time. Other architectures don't touch the EFI memory map at all, but x86
> mangles it beyond recognition :-)
Heh, EFI on x86 does a lot of, hmm, interesting things with memory, like
memremaping kmalloced memory and I it really begs for cleanups :)
> > I wasn't sure it's Ok to only unmap them, but leave in efi.memmap, that's
> > why I didn't use the existing EFI memory map.
> >
> > Now thinking about it, if the unmapping can happen later, maybe we'll just
> > move the entire efi_free_boot_services() to an initcall?
> >
>
> As long as it is pre-SMP, as that code also contains a quirk to allocate
> the real mode trampoline if all memory below 1 MB is used for boot
> services.
initcall is long after SMP. It the real mode trampoline allocation is the
only thing that should happen pre-SMP?
> But actually, that should be a separate quirk to begin with, rather than
> being integrated into an unrelated function that happens to iterate over
> the boot services regions. The only problem, I guess, is that
> memblock_reserve()'ing that sub-1MB region in the old location in the
> ordinary way would cause it to be freed again in the initcall?
Right now we anyway don't free anything below 1M, I don't see why it should
change.
> But yes, in general I think it is fine to unmap those regions from the
> EFI page tables during an initcall.
Thanks for confirming. I'll look into extracting the allocation of the real
mode trampoline to a separate quirk and then making the entire
efi_free_boot_services() an initcall.
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-02-23 11:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 7:52 [PATCH] x86/efi: defer freeing of boot services memory Mike Rapoport
2026-02-23 8:08 ` Ard Biesheuvel
2026-02-23 10:55 ` Mike Rapoport
2026-02-23 11:17 ` Ard Biesheuvel
2026-02-23 11:40 ` Mike Rapoport [this message]
2026-02-23 12:18 ` Ard Biesheuvel
2026-02-24 9:28 ` Mike Rapoport
2026-02-24 9:29 ` Ard Biesheuvel
2026-02-24 9:53 ` Mike Rapoport
2026-02-24 9:56 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZw8xSI-TM-Gz84t@kernel.org \
--to=rppt@kernel.org \
--cc=ardb@kernel.org \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=ilias.apalodimas@linaro.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.