From: Pratyush Yadav <pratyush@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jonathan Corbet <corbet@lwn.net>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
Alexander Graf <graf@amazon.com>,
David Matlack <dmatlack@google.com>,
David Rientjes <rientjes@google.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Samiullah Khawaja <skhawaja@google.com>,
Vipin Sharma <vipinsh@google.com>,
Zhu Yanjun <yanjun.zhu@linux.dev>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, kexec@lists.infradead.org
Subject: Re: [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation
Date: Mon, 29 Dec 2025 22:21:29 +0100 [thread overview]
Message-ID: <86qzsd7zmu.fsf@kernel.org> (raw)
In-Reply-To: <CA+CK2bAVuHG1cVPQz8Wafe8o2TtitrqJjqfHOT7Xun=zWMoo2Q@mail.gmail.com> (Pasha Tatashin's message of "Tue, 23 Dec 2025 13:15:31 -0500")
On Tue, Dec 23 2025, Pasha Tatashin wrote:
> On Sat, Dec 6, 2025 at 6:03 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>>
>> HugeTLB manages its own pages. It allocates them on boot and uses those
>> to fulfill hugepage requests.
>>
>> To support live update for a hugetlb-backed memfd, it is necessary to
>> track how many pages of each hstate are coming from live update. This is
>> needed to ensure the boot time allocations don't over-allocate huge
>> pages, causing the rest of the system unexpected memory pressure.
>>
>> For example, say the system has 100G memory and it uses 90 1G huge
>> pages, with 10G put aside for other processes. Now say 5 of those pages
>> are preserved via KHO for live updating a huge memfd.
>>
>> But during boot, the system will still see that it needs 90 huge pages,
>> so it will attempt to allocate those. When the file is later retrieved,
>> those 5 pages also get added to the huge page pool, resulting in 95
>> total huge pages. This exceeds the original expectation of 90 pages, and
>> ends up wasting memory.
>>
>> LUO has file-lifecycle-bound (FLB) data to keep track of global state of
>> a subsystem. Use it to track how many huge pages are used up for each
>> hstate. When a file is preserved, it will increment to the counter, and
>> when it is unpreserved, it will decrement it. During boot time
>> allocations, this data can be used to calculate how many hugepages
>> actually need to be allocated.
>>
>> Design note: another way of doing this would be to preserve the entire
>> set of hugepages using the FLB, skip boot time allocation, and restore
>> them all on FLB retrieve. The pain problem with that approach is that it
>> would need to freeze all hstates after serializing them. This will need
>> a lot more invasive changes in hugetlb since there are many ways folios
>> can be added to or removed from a hstate. Doing it this way is simpler
>> and less invasive.
>>
>> Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
>> ---
>> Documentation/mm/memfd_preservation.rst | 9 ++
>> MAINTAINERS | 1 +
>> include/linux/kho/abi/hugetlb.h | 66 +++++++++
>> kernel/liveupdate/Kconfig | 12 ++
>> mm/Makefile | 1 +
>> mm/hugetlb.c | 1 +
>> mm/hugetlb_internal.h | 15 ++
>> mm/hugetlb_luo.c | 179 ++++++++++++++++++++++++
>> 8 files changed, 284 insertions(+)
>> create mode 100644 include/linux/kho/abi/hugetlb.h
>> create mode 100644 mm/hugetlb_luo.c
>>
[...]
>> +static int hugetlb_flb_retrieve(struct liveupdate_flb_op_args *args)
>> +{
>> + /*
>> + * The FLB is only needed for boot-time calculation of how many
>> + * hugepages are needed. This is done by early boot handlers already.
>> + * Free the serialized state now.
>> + */
>
> It should be done in this function.
The calculations can't be done in retrieve. Retrieve happens only once
and for the whole FLB. They will need to come from
hugetlb_hstate_alloc_pages().
Maybe you mean getting rid of liveupdate_flb_incoming_early()? Yeah,
that I can do. It will make this function a no-op once we move the
kho_restore_free() to finish().
>
>> + kho_restore_free(phys_to_virt(args->data));
>
> This should be moved to finish() after blackout.
Sure.
>
>> +
>> + /*
>> + * HACK: But since LUO FLB still needs an obj, use ZERO_SIZE_PTR to
>> + * satisfy it.
>> + */
>> + args->obj = ZERO_SIZE_PTR;
>
> Hopefully this is not needed any more with the updated FLB, please check :-)
Yep. IIRC when I sent this series the older version of FLB was in
mm-nonmm-unstable.
>
>> + return 0;
>> +}
>> +
[...]
--
Regards,
Pratyush Yadav
next prev parent reply other threads:[~2025-12-29 21:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 23:02 [RFC PATCH 00/10] liveupdate: hugetlb support Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 01/10] kho: drop restriction on maximum page order Pratyush Yadav
2025-12-23 17:59 ` Pasha Tatashin
2025-12-29 21:24 ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 02/10] kho: disable scratch-only earlier in boot Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 03/10] liveupdate: do early initialization before hugepages are allocated Pratyush Yadav
2025-12-23 18:08 ` Pasha Tatashin
2025-12-29 21:23 ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 04/10] liveupdate: flb: allow getting FLB data in early boot Pratyush Yadav
2025-12-18 18:25 ` Pasha Tatashin
2025-12-20 3:26 ` Pratyush Yadav
2025-12-20 15:11 ` Pasha Tatashin
2025-12-22 14:58 ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 05/10] mm: hugetlb: export some functions to hugetlb-internal header Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation Pratyush Yadav
2025-12-23 18:15 ` Pasha Tatashin
2025-12-29 21:21 ` Pratyush Yadav [this message]
2025-12-30 16:37 ` Pasha Tatashin
2025-12-06 23:02 ` [RFC PATCH 07/10] mm: hugetlb: don't allocate pages already in live update Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 08/10] mm: hugetlb: disable CMA if liveupdate is enabled Pratyush Yadav
2025-12-23 18:16 ` Pasha Tatashin
2025-12-29 21:14 ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 09/10] mm: hugetlb: allow freezing the inode Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 10/10] liveupdate: allow preserving hugetlb-backed memfd Pratyush Yadav
2025-12-09 4:43 ` [RFC PATCH 00/10] liveupdate: hugetlb support Zhu Yanjun
2025-12-09 8:18 ` Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86qzsd7zmu.fsf@kernel.org \
--to=pratyush@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgg@nvidia.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vipinsh@google.com \
--cc=x86@kernel.org \
--cc=yanjun.zhu@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.