From: Mike Rapoport <rppt@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com,
dmatlack@google.com, rientjes@google.com, corbet@lwn.net,
rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com,
kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com,
masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org,
yoann.congal@smile.fr, mmaurer@google.com,
roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk,
mark.rutland@arm.com, jannh@google.com,
vincent.guittot@linaro.org, hannes@cmpxchg.org,
dan.j.williams@intel.com, david@redhat.com,
joel.granados@kernel.org, rostedt@goodmis.org,
anna.schumaker@oracle.com, song@kernel.org, linux@weissschuh.net,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, gregkh@linuxfoundation.org,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
rafael@kernel.org, dakr@kernel.org,
bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
myungjoo.ham@samsung.com, yesanishhere@gmail.com,
Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com,
aleksander.lobakin@intel.com, ira.weiny@intel.com,
andriy.shevchenko@linux.intel.com, leon@kernel.org,
lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org,
djeffery@redhat.com, stuart.w.hayes@gmail.com, ptyadav@amazon.de,
lennart@poettering.net, brauner@kernel.org,
linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
saeedm@nvidia.com, ajayachandra@nvidia.com, jgg@nvidia.com,
parav@nvidia.com, leonro@nvidia.com, witu@nvidia.com,
hughd@google.com, skhawaja@google.com, chrisl@kernel.org
Subject: Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
Date: Mon, 17 Nov 2025 13:03:59 +0200 [thread overview]
Message-ID: <aRsBHy5aQ_Ypyy9r@kernel.org> (raw)
In-Reply-To: <20251115233409.768044-16-pasha.tatashin@soleen.com>
On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> The ability to preserve a memfd allows userspace to use KHO and LUO to
> transfer its memory contents to the next kernel. This is useful in many
> ways. For one, it can be used with IOMMUFD as the backing store for
> IOMMU page tables. Preserving IOMMUFD is essential for performing a
> hypervisor live update with passthrough devices. memfd support provides
> the first building block for making that possible.
>
> For another, applications with a large amount of memory that takes time
> to reconstruct, reboots to consume kernel upgrades can be very
> expensive. memfd with LUO gives those applications reboot-persistent
> memory that they can use to quickly save and reconstruct that state.
>
> While memfd is backed by either hugetlbfs or shmem, currently only
> support on shmem is added. To be more precise, support for anonymous
> shmem files is added.
>
> The handover to the next kernel is not transparent. All the properties
> of the file are not preserved; only its memory contents, position, and
> size. The recreated file gets the UID and GID of the task doing the
> restore, and the task's cgroup gets charged with the memory.
>
> Once preserved, the file cannot grow or shrink, and all its pages are
> pinned to avoid migrations and swapping. The file can still be read from
> or written to.
>
> Use vmalloc to get the buffer to hold the folios, and preserve
> it using kho_preserve_vmalloc(). This doesn't have the size limit.
>
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
The order of signed-offs seems wrong, Pasha's should be the last one.
> ---
...
> +/**
> + * DOC: memfd Live Update ABI
> + *
> + * This header defines the ABI for preserving the state of a memfd across a
> + * kexec reboot using the LUO.
> + *
> + * The state is serialized into a Flattened Device Tree which is then handed
> + * over to the next kernel via the KHO mechanism. The FDT is passed as the
> + * opaque `data` handle in the file handler callbacks.
> + *
> + * This interface is a contract. Any modification to the FDT structure,
> + * node properties, compatible string, or the layout of the serialization
> + * structures defined here constitutes a breaking change. Such changes require
> + * incrementing the version number in the MEMFD_LUO_FH_COMPATIBLE string.
The same comment about contract as for the generic LUO documentation
applies here (https://lore.kernel.org/all/aRnG8wDSSAtkEI_z@kernel.org/)
> + *
> + * FDT Structure Overview:
> + * The memfd state is contained within a single FDT with the following layout:
...
> +static struct memfd_luo_folio_ser *memfd_luo_preserve_folios(struct file *file, void *fdt,
> + u64 *nr_foliosp)
> +{
If we are already returning nr_folios by reference, we might do it for
memfd_luo_folio_ser as well and make the function return int.
> + struct inode *inode = file_inode(file);
> + struct memfd_luo_folio_ser *pfolios;
> + struct kho_vmalloc *kho_vmalloc;
> + unsigned int max_folios;
> + long i, size, nr_pinned;
> + struct folio **folios;
pfolios and folios read like the former is a pointer to latter.
I'd s/pfolios/folios_ser/
> + int err = -EINVAL;
> + pgoff_t offset;
> + u64 nr_folios;
...
> + kvfree(folios);
> + *nr_foliosp = nr_folios;
> + return pfolios;
> +
> +err_unpreserve:
> + i--;
> + for (; i >= 0; i--)
Maybe a single line
for (--i; i >= 0; --i)
> + kho_unpreserve_folio(folios[i]);
> + vfree(pfolios);
> +err_unpin:
> + unpin_folios(folios, nr_folios);
> +err_free_folios:
> + kvfree(folios);
> + return ERR_PTR(err);
> +}
> +
> +static void memfd_luo_unpreserve_folios(void *fdt, struct memfd_luo_folio_ser *pfolios,
> + u64 nr_folios)
> +{
> + struct kho_vmalloc *kho_vmalloc;
> + long i;
> +
> + if (!nr_folios)
> + return;
> +
> + kho_vmalloc = (struct kho_vmalloc *)fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, NULL);
> + /* The FDT was created by this kernel so expect it to be sane. */
> + WARN_ON_ONCE(!kho_vmalloc);
The FDT won't have FOLIOS property if size was zero, will it?
I think that if we add kho_vmalloc handle to struct memfd_luo_private and
pass that around it will make things easier and simpler.
> + kho_unpreserve_vmalloc(kho_vmalloc);
> +
> + for (i = 0; i < nr_folios; i++) {
> + const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
> + struct folio *folio;
> +
> + if (!pfolio->foliodesc)
> + continue;
How can this happen? Can pfolios be a sparse array?
> + folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> +
> + kho_unpreserve_folio(folio);
> + unpin_folio(folio);
> + }
> +
> + vfree(pfolios);
> +}
...
> +static void memfd_luo_finish(struct liveupdate_file_op_args *args)
> +{
> + const struct memfd_luo_folio_ser *pfolios;
> + struct folio *fdt_folio;
> + const void *fdt;
> + u64 nr_folios;
> +
> + if (args->retrieved)
> + return;
> +
> + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
> + if (!fdt_folio) {
> + pr_err("failed to restore memfd FDT\n");
> + return;
> + }
> +
> + fdt = folio_address(fdt_folio);
> +
> + pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
> + if (!pfolios)
> + goto out;
> +
> + memfd_luo_discard_folios(pfolios, nr_folios);
Does not this free the actual folios that were supposed to be preserved?
> + vfree(pfolios);
> +
> +out:
> + folio_put(fdt_folio);
> +}
...
> +static int memfd_luo_retrieve(struct liveupdate_file_op_args *args)
> +{
> + struct folio *fdt_folio;
> + const u64 *pos, *size;
> + struct file *file;
> + int len, ret = 0;
> + const void *fdt;
> +
> + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
Why do we need to kho_restore_folio() twice? Here and in
memfd_luo_finish()?
> + if (!fdt_folio)
> + return -ENOENT;
> +
> + fdt = page_to_virt(folio_page(fdt_folio, 0));
folio_address()
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-11-17 11:04 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
2025-11-17 2:54 ` Andrew Morton
2025-11-17 14:27 ` Pasha Tatashin
2025-11-18 15:45 ` Pratyush Yadav
2025-11-18 16:11 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-11-16 12:43 ` Mike Rapoport
2025-11-16 14:55 ` Pasha Tatashin
2025-11-16 19:16 ` Mike Rapoport
2025-11-17 18:29 ` Pasha Tatashin
2025-11-17 21:05 ` Mike Rapoport
2025-11-18 4:22 ` Pasha Tatashin
2025-11-18 11:21 ` Mike Rapoport
2025-11-18 14:03 ` Jason Gunthorpe
2025-11-18 15:06 ` Mike Rapoport
2025-11-18 15:18 ` Pasha Tatashin
2025-11-18 15:36 ` Jason Gunthorpe
2025-11-18 15:46 ` Pasha Tatashin
2025-11-18 16:15 ` Jason Gunthorpe
2025-11-18 22:07 ` Pasha Tatashin
2025-11-18 23:25 ` Jason Gunthorpe
2025-11-19 3:03 ` Pasha Tatashin
2025-11-24 19:08 ` Jason Gunthorpe
2025-11-15 23:33 ` [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec Pasha Tatashin
2025-11-16 12:44 ` Mike Rapoport
2025-11-21 15:55 ` Pratyush Yadav
2025-11-15 23:33 ` [PATCH v6 04/20] liveupdate: luo_session: add sessions support Pasha Tatashin
2025-11-16 17:05 ` Mike Rapoport
2025-11-17 15:09 ` Pasha Tatashin
2025-11-17 21:11 ` Mike Rapoport
2025-11-18 4:28 ` Pasha Tatashin
2025-11-21 16:32 ` Pratyush Yadav
2025-11-21 21:30 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface Pasha Tatashin
2025-11-16 17:15 ` Mike Rapoport
2025-11-17 14:22 ` Pasha Tatashin
2025-11-20 18:37 ` David Matlack
2025-11-20 19:22 ` Pasha Tatashin
2025-11-20 19:42 ` David Matlack
2025-11-20 20:13 ` Pasha Tatashin
2025-11-21 16:45 ` Pratyush Yadav
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-11-16 18:15 ` Mike Rapoport
2025-11-17 17:50 ` Pasha Tatashin
2025-11-20 17:20 ` Mike Rapoport
2025-11-20 20:25 ` Pasha Tatashin
2025-11-18 17:38 ` David Matlack
2025-11-18 17:43 ` Pratyush Yadav
2025-11-18 17:58 ` Pasha Tatashin
2025-11-18 18:17 ` Pratyush Yadav
2025-11-18 19:09 ` Jason Gunthorpe
2025-11-18 19:31 ` Pasha Tatashin
2025-11-21 17:24 ` Pratyush Yadav
2025-11-15 23:33 ` [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation Pasha Tatashin
2025-11-16 18:25 ` Mike Rapoport
2025-11-18 2:58 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state Pasha Tatashin
2025-11-17 9:39 ` Mike Rapoport
2025-11-18 3:54 ` Pasha Tatashin
2025-11-18 11:28 ` Mike Rapoport
2025-11-18 15:37 ` Pasha Tatashin
2025-11-20 18:50 ` Mike Rapoport
2025-11-20 19:10 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 09/20] docs: add luo documentation Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 10/20] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-11-17 9:40 ` Mike Rapoport
2025-11-17 18:20 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-11-17 9:48 ` Mike Rapoport
2025-11-17 18:25 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 12/20] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-11-17 10:08 ` Mike Rapoport
2025-11-18 4:13 ` Pasha Tatashin
2025-11-24 15:06 ` Pratyush Yadav
2025-11-15 23:33 ` [PATCH v6 13/20] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-11-17 10:14 ` Mike Rapoport
2025-11-17 18:43 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state Pasha Tatashin
2025-11-17 10:15 ` Mike Rapoport
2025-11-17 18:45 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd Pasha Tatashin
2025-11-17 11:03 ` Mike Rapoport [this message]
2025-11-19 21:56 ` Pasha Tatashin
2025-11-20 15:34 ` Pratyush Yadav
2025-11-15 23:34 ` [PATCH v6 16/20] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests Pasha Tatashin
2025-11-17 19:38 ` David Matlack
2025-11-17 20:16 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
2025-11-16 18:53 ` Zhu Yanjun
2025-11-17 18:23 ` Pasha Tatashin
2025-11-17 19:27 ` David Matlack
2025-11-17 20:08 ` David Matlack
2025-11-17 21:06 ` David Matlack
2025-11-18 1:01 ` Pasha Tatashin
2025-11-18 0:06 ` David Matlack
2025-11-18 1:08 ` Pasha Tatashin
2025-11-19 21:20 ` David Matlack
2025-11-19 22:12 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 19/20] selftests/liveupdate: Add kexec test for multiple and empty sessions Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test Pasha Tatashin
2025-11-17 11:13 ` Mike Rapoport
2025-11-17 19:00 ` Pasha Tatashin
2025-11-18 11:30 ` Mike Rapoport
2025-11-18 18:56 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRsBHy5aQ_Ypyy9r@kernel.org \
--to=rppt@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aleksander.lobakin@intel.com \
--cc=aliceryhl@google.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anna.schumaker@oracle.com \
--cc=axboe@kernel.dk \
--cc=bartosz.golaszewski@linaro.org \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chenridong@huawei.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=cw00.choi@samsung.com \
--cc=dakr@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=djeffery@redhat.com \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jasonmiu@google.com \
--cc=jgg@nvidia.com \
--cc=joel.granados@kernel.org \
--cc=kanie@linux.alibaba.com \
--cc=lennart@poettering.net \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@weissschuh.net \
--cc=lukas@wunner.de \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mingo@redhat.com \
--cc=mmaurer@google.com \
--cc=myungjoo.ham@samsung.com \
--cc=ojeda@kernel.org \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=ptyadav@amazon.de \
--cc=quic_zijuhu@quicinc.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=saeedm@nvidia.com \
--cc=skhawaja@google.com \
--cc=song@kernel.org \
--cc=stuart.w.hayes@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=wagi@kernel.org \
--cc=witu@nvidia.com \
--cc=x86@kernel.org \
--cc=yesanishhere@gmail.com \
--cc=yoann.congal@smile.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.