linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>, Jonathan Corbet <corbet@lwn.net>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	 x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	 Muchun Song <muchun.song@linux.dev>,
	 Oscar Salvador <osalvador@suse.de>,
	 Alexander Graf <graf@amazon.com>,
	 David Matlack <dmatlack@google.com>,
	 David Rientjes <rientjes@google.com>,
	 Jason Gunthorpe <jgg@nvidia.com>,
	 Samiullah Khawaja <skhawaja@google.com>,
	Vipin Sharma <vipinsh@google.com>,
	 Zhu Yanjun <yanjun.zhu@linux.dev>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	linux-doc@vger.kernel.org,  kexec@lists.infradead.org
Subject: Re: [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation
Date: Mon, 29 Dec 2025 22:21:29 +0100	[thread overview]
Message-ID: <86qzsd7zmu.fsf@kernel.org> (raw)
In-Reply-To: <CA+CK2bAVuHG1cVPQz8Wafe8o2TtitrqJjqfHOT7Xun=zWMoo2Q@mail.gmail.com> (Pasha Tatashin's message of "Tue, 23 Dec 2025 13:15:31 -0500")

On Tue, Dec 23 2025, Pasha Tatashin wrote:

> On Sat, Dec 6, 2025 at 6:03 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>>
>> HugeTLB manages its own pages. It allocates them on boot and uses those
>> to fulfill hugepage requests.
>>
>> To support live update for a hugetlb-backed memfd, it is necessary to
>> track how many pages of each hstate are coming from live update. This is
>> needed to ensure the boot time allocations don't over-allocate huge
>> pages, causing the rest of the system unexpected memory pressure.
>>
>> For example, say the system has 100G memory and it uses 90 1G huge
>> pages, with 10G put aside for other processes. Now say 5 of those pages
>> are preserved via KHO for live updating a huge memfd.
>>
>> But during boot, the system will still see that it needs 90 huge pages,
>> so it will attempt to allocate those. When the file is later retrieved,
>> those 5 pages also get added to the huge page pool, resulting in 95
>> total huge pages. This exceeds the original expectation of 90 pages, and
>> ends up wasting memory.
>>
>> LUO has file-lifecycle-bound (FLB) data to keep track of global state of
>> a subsystem. Use it to track how many huge pages are used up for each
>> hstate. When a file is preserved, it will increment to the counter, and
>> when it is unpreserved, it will decrement it. During boot time
>> allocations, this data can be used to calculate how many hugepages
>> actually need to be allocated.
>>
>> Design note: another way of doing this would be to preserve the entire
>> set of hugepages using the FLB, skip boot time allocation, and restore
>> them all on FLB retrieve. The pain problem with that approach is that it
>> would need to freeze all hstates after serializing them. This will need
>> a lot more invasive changes in hugetlb since there are many ways folios
>> can be added to or removed from a hstate. Doing it this way is simpler
>> and less invasive.
>>
>> Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
>> ---
>>  Documentation/mm/memfd_preservation.rst |   9 ++
>>  MAINTAINERS                             |   1 +
>>  include/linux/kho/abi/hugetlb.h         |  66 +++++++++
>>  kernel/liveupdate/Kconfig               |  12 ++
>>  mm/Makefile                             |   1 +
>>  mm/hugetlb.c                            |   1 +
>>  mm/hugetlb_internal.h                   |  15 ++
>>  mm/hugetlb_luo.c                        | 179 ++++++++++++++++++++++++
>>  8 files changed, 284 insertions(+)
>>  create mode 100644 include/linux/kho/abi/hugetlb.h
>>  create mode 100644 mm/hugetlb_luo.c
>>
[...]
>> +static int hugetlb_flb_retrieve(struct liveupdate_flb_op_args *args)
>> +{
>> +       /*
>> +        * The FLB is only needed for boot-time calculation of how many
>> +        * hugepages are needed. This is done by early boot handlers already.
>> +        * Free the serialized state now.
>> +        */
>
> It should be done in this function.

The calculations can't be done in retrieve. Retrieve happens only once
and for the whole FLB. They will need to come from
hugetlb_hstate_alloc_pages().

Maybe you mean getting rid of liveupdate_flb_incoming_early()? Yeah,
that I can do. It will make this function a no-op once we move the
kho_restore_free() to finish().

>
>> +       kho_restore_free(phys_to_virt(args->data));
>
> This should be moved to finish() after blackout.

Sure.

>
>> +
>> +       /*
>> +        * HACK: But since LUO FLB still needs an obj, use ZERO_SIZE_PTR to
>> +        * satisfy it.
>> +        */
>> +       args->obj = ZERO_SIZE_PTR;
>
> Hopefully this is not needed any more with the updated FLB, please check :-)

Yep. IIRC when I sent this series the older version of FLB was in
mm-nonmm-unstable.

>
>> +       return 0;
>> +}
>> +
[...]

-- 
Regards,
Pratyush Yadav

  reply	other threads:[~2025-12-29 21:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 23:02 [RFC PATCH 00/10] liveupdate: hugetlb support Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 01/10] kho: drop restriction on maximum page order Pratyush Yadav
2025-12-23 17:59   ` Pasha Tatashin
2025-12-29 21:24     ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 02/10] kho: disable scratch-only earlier in boot Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 03/10] liveupdate: do early initialization before hugepages are allocated Pratyush Yadav
2025-12-23 18:08   ` Pasha Tatashin
2025-12-29 21:23     ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 04/10] liveupdate: flb: allow getting FLB data in early boot Pratyush Yadav
2025-12-18 18:25   ` Pasha Tatashin
2025-12-20  3:26     ` Pratyush Yadav
2025-12-20 15:11       ` Pasha Tatashin
2025-12-22 14:58         ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 05/10] mm: hugetlb: export some functions to hugetlb-internal header Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation Pratyush Yadav
2025-12-23 18:15   ` Pasha Tatashin
2025-12-29 21:21     ` Pratyush Yadav [this message]
2025-12-30 16:37       ` Pasha Tatashin
2025-12-06 23:02 ` [RFC PATCH 07/10] mm: hugetlb: don't allocate pages already in live update Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 08/10] mm: hugetlb: disable CMA if liveupdate is enabled Pratyush Yadav
2025-12-23 18:16   ` Pasha Tatashin
2025-12-29 21:14     ` Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 09/10] mm: hugetlb: allow freezing the inode Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 10/10] liveupdate: allow preserving hugetlb-backed memfd Pratyush Yadav
2025-12-09  4:43 ` [RFC PATCH 00/10] liveupdate: hugetlb support Zhu Yanjun
2025-12-09  8:18   ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86qzsd7zmu.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgg@nvidia.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=skhawaja@google.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vipinsh@google.com \
    --cc=x86@kernel.org \
    --cc=yanjun.zhu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).