All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com,
	changyuanl@google.com, rppt@kernel.org, dmatlack@google.com,
	rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org,
	ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
	ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org,
	akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr,
	mmaurer@google.com, roman.gushchin@linux.dev,
	chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com,
	jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org,
	dan.j.williams@intel.com, david@redhat.com,
	joel.granados@kernel.org, rostedt@goodmis.org,
	anna.schumaker@oracle.com, song@kernel.org,
	zhangguopeng@kylinos.cn, linux@weissschuh.net,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, gregkh@linuxfoundation.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	rafael@kernel.org, dakr@kernel.org,
	bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
	myungjoo.ham@samsung.com, yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com, ira.weiny@intel.com,
	andriy.shevchenko@linux.intel.com, leon@kernel.org,
	lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org,
	djeffery@redhat.com, stuart.w.hayes@gmail.com, ptyadav@amazon.de,
	lennart@poettering.net, brauner@kernel.org,
	linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	saeedm@nvidia.com, ajayachandra@nvidia.com, parav@nvidia.com,
	leonro@nvidia.com, witu@nvidia.com
Subject: Re: [PATCH v3 29/30] luo: allow preserving memfd
Date: Tue, 26 Aug 2025 13:20:19 -0300	[thread overview]
Message-ID: <20250826162019.GD2130239@nvidia.com> (raw)
In-Reply-To: <20250807014442.3829950-30-pasha.tatashin@soleen.com>

On Thu, Aug 07, 2025 at 01:44:35AM +0000, Pasha Tatashin wrote:

> +	/*
> +	 * Most of the space should be taken by preserved folios. So take its
> +	 * size, plus a page for other properties.
> +	 */
> +	fdt = memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE);
> +	if (!fdt) {
> +		err = -ENOMEM;
> +		goto err_unpin;
> +	}

This doesn't seem to have any versioning scheme, it really should..

> +	err = fdt_property_placeholder(fdt, "folios", preserved_size,
> +				       (void **)&preserved_folios);
> +	if (err) {
> +		pr_err("Failed to reserve folios property in FDT: %s\n",
> +		       fdt_strerror(err));
> +		err = -ENOMEM;
> +		goto err_free_fdt;
> +	}

Yuk.

This really wants some luo helper

'luo alloc array'
'luo restore array'
'luo free array'

Which would get a linearized list of pages in the vmap to hold the
array and then allocate some structure to record the page list and
return back the u64 of the phys_addr of the top of the structure to
store in whatever.

Getting fdt to allocate the array inside the fds is just not going to
work for anything of size.

> +	for (; i < nr_pfolios; i++) {
> +		const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
> +		phys_addr_t phys;
> +		u64 index;
> +		int flags;
> +
> +		if (!pfolio->foliodesc)
> +			continue;
> +
> +		phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> +		folio = kho_restore_folio(phys);
> +		if (!folio) {
> +			pr_err("Unable to restore folio at physical address: %llx\n",
> +			       phys);
> +			goto put_file;
> +		}
> +		index = pfolio->index;
> +		flags = PRESERVED_FOLIO_FLAGS(pfolio->foliodesc);
> +
> +		/* Set up the folio for insertion. */
> +		/*
> +		 * TODO: Should find a way to unify this and
> +		 * shmem_alloc_and_add_folio().
> +		 */
> +		__folio_set_locked(folio);
> +		__folio_set_swapbacked(folio);
> 
> +		ret = mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping));
> +		if (ret) {
> +			pr_err("shmem: failed to charge folio index %d: %d\n",
> +			       i, ret);
> +			goto unlock_folio;
> +		}

[..]

> +		folio_add_lru(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}

Probably some consolidation will be needed to make this less
duplicated..

But overall I think just using the memfd_luo_preserved_folio as the
serialization is entirely file, I don't think this needs anything more
complicated.

What it does need is an alternative to the FDT with versioning.

Which seems to me to be entirely fine as:

 struct memfd_luo_v0 {
    __aligned_u64 size;
    __aligned_u64 pos;
    __aligned_u64 folios;
 };

 struct memfd_luo_v0 memfd_luo_v0 = {.size = size, pos = file->f_pos, folios = folios};
 luo_store_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0);

Which also shows the actual data needing to be serialized comes from
more than one struct and has to be marshaled in code, somehow, to a
single struct.

Then I imagine a fairly simple forwards/backwards story. If something
new is needed that is non-optional, lets say you compress the folios
list to optimize holes:

 struct memfd_luo_v1 {
    __aligned_u64 size;
    __aligned_u64 pos;
    __aligned_u64 folios_list_with_holes;
 };

Obviously a v0 kernel cannot parse this, but in this case a v1 aware
kernel could optionally duplicate and write out the v0 format as well:

 luo_store_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0);
 luo_store_object(&memfd_luo_v1, sizeof(memfd_luo_v1), <.. identifier for this fd..>, /*version=*/1);

Then the rule is fairly simple, when the sucessor kernel goes to
deserialize it asks luo for the versions it supports:

 if (luo_restore_object(&memfd_luo_v1, sizeof(memfd_luo_v1), <.. identifier for this fd..>, /*version=*/1))
    restore_v1(&memfd_luo_v1)
 else if (luo_restore_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0))
    restore_v0(&memfd_luo_v0)
 else
    luo_failure("Do not understand this");

luo core just manages this list of versioned data per serialized
object. There is only one version per object.

Jason

  parent reply	other threads:[~2025-08-26 16:20 UTC|newest]

Thread overview: 147+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  1:44 [PATCH v3 00/30] Live Update Orchestrator Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 01/30] kho: init new_physxa->phys_bits to fix lockdep Pasha Tatashin
2025-08-08 11:42   ` Pratyush Yadav
2025-08-08 11:52     ` Pratyush Yadav
2025-08-08 14:00       ` Pasha Tatashin
2025-08-08 19:06         ` Andrew Morton
2025-08-08 19:51           ` Pasha Tatashin
2025-08-08 20:19             ` Pasha Tatashin
2025-08-14 13:11   ` Jason Gunthorpe
2025-08-14 14:57     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 02/30] kho: mm: Don't allow deferred struct page with KHO Pasha Tatashin
2025-08-08 11:47   ` Pratyush Yadav
2025-08-08 14:01     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 03/30] kho: warn if KHO is disabled due to an error Pasha Tatashin
2025-08-08 11:48   ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 04/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 05/30] kho: make debugfs interface optional Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 06/30] kho: drop notifiers Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 07/30] kho: add interfaces to unpreserve folios and physical memory ranges Pasha Tatashin
2025-08-14 13:22   ` Jason Gunthorpe
2025-08-14 15:05     ` Pasha Tatashin
2025-08-14 17:01       ` Jason Gunthorpe
2025-08-15  9:12     ` Mike Rapoport
2025-08-18 13:55       ` Jason Gunthorpe
2025-09-21 22:20         ` Pasha Tatashin
2025-09-25  5:26           ` Mike Rapoport
2025-08-07  1:44 ` [PATCH v3 08/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-08-14 13:30   ` Jason Gunthorpe
2025-09-22 14:57     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 09/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-08-30  8:35   ` Mike Rapoport
2025-09-22 14:54     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 10/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-08-14 13:31   ` Jason Gunthorpe
2025-09-22 15:00     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 11/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 12/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 13/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 14/30] liveupdate: luo_files: add infrastructure for FDs Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 15/30] liveupdate: luo_files: implement file systems callbacks Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 16/30] liveupdate: luo_ioctl: add userpsace interface Pasha Tatashin
2025-08-14 13:49   ` Jason Gunthorpe
2025-09-22 21:09     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 17/30] liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close Pasha Tatashin
2025-08-27 15:34   ` Pratyush Yadav
2025-09-22 21:23     ` Pasha Tatashin
2025-09-23 13:13       ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 18/30] liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state management Pasha Tatashin
2025-08-14 14:02   ` Jason Gunthorpe
2025-09-22 23:17     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 19/30] liveupdate: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-08-26 16:03   ` Jason Gunthorpe
2025-08-26 18:58     ` Pasha Tatashin
2025-10-09  1:07   ` yanjun.zhu
2025-10-09  5:20     ` Greg KH
2025-10-09 10:58     ` Pratyush Yadav
2025-10-09 12:01       ` Pasha Tatashin
2025-10-09 14:50         ` Jason Gunthorpe
2025-10-09 15:34         ` Zhu Yanjun
2025-10-09 17:04           ` Pasha Tatashin
2025-10-09 17:30             ` Yanjun.Zhu
2025-10-09 17:56             ` Yanjun.Zhu
2025-10-09 23:12               ` Pratyush Yadav
2025-10-10  6:39                 ` Greg KH
2025-08-07  1:44 ` [PATCH v3 20/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 21/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 22/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 23/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 24/30] docs: add luo documentation Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 25/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 26/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-08-11 23:11   ` Vipin Sharma
2025-08-13 12:42     ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 27/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 28/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 29/30] luo: allow preserving memfd Pasha Tatashin
2025-08-08 20:22   ` Pasha Tatashin
2025-08-13 12:44     ` Pratyush Yadav
2025-08-13  6:34   ` Vipin Sharma
2025-08-13  7:09     ` Greg KH
2025-08-13 12:02       ` Pratyush Yadav
2025-08-13 12:14         ` Greg KH
2025-08-13 12:41           ` Jason Gunthorpe
2025-08-13 13:00             ` Greg KH
2025-08-13 13:37               ` Pratyush Yadav
2025-08-13 13:41                 ` Pasha Tatashin
2025-08-13 13:53                   ` Greg KH
2025-08-13 13:53                 ` Greg KH
2025-08-13 20:03               ` Jason Gunthorpe
2025-08-13 13:31             ` Pratyush Yadav
2025-08-13 12:29     ` Pratyush Yadav
2025-08-13 13:49       ` Pasha Tatashin
2025-08-13 13:55         ` Pratyush Yadav
2025-08-26 16:20   ` Jason Gunthorpe [this message]
2025-08-27 15:03     ` Pratyush Yadav
2025-08-28 12:43       ` Jason Gunthorpe
2025-08-28 23:00         ` Chris Li
2025-09-01 17:10         ` Pratyush Yadav
2025-09-02 13:48           ` Jason Gunthorpe
2025-09-03 14:10             ` Pratyush Yadav
2025-09-03 15:01               ` Jason Gunthorpe
2025-09-04 12:57                 ` Pratyush Yadav
2025-09-04 14:42                   ` Jason Gunthorpe
2025-09-09 14:53                     ` Pratyush Yadav
2025-09-09 15:40                       ` Pasha Tatashin
2025-09-09 15:54                         ` Jason Gunthorpe
2025-09-09 16:30                           ` Pasha Tatashin
2025-09-09 16:57                             ` Jason Gunthorpe
2025-09-09 17:27                               ` Pasha Tatashin
2025-09-09 15:56                         ` Pratyush Yadav
2025-09-09 16:25                           ` Pasha Tatashin
2025-08-28  7:14     ` Mike Rapoport
2025-08-29 18:47       ` Chris Li
2025-08-29 19:18     ` Chris Li
2025-09-02 13:41       ` Jason Gunthorpe
2025-09-03 12:01         ` Chris Li
2025-09-04 17:34           ` Jason Gunthorpe
2025-09-09 14:48             ` Pratyush Yadav
2025-09-01 16:23     ` Mike Rapoport
2025-09-01 16:54       ` Pasha Tatashin
2025-09-01 17:21         ` Pratyush Yadav
2025-09-01 19:02           ` Pasha Tatashin
2025-09-02 11:38             ` Jason Gunthorpe
2025-09-03 15:59               ` Pasha Tatashin
2025-09-03 16:40                 ` Jason Gunthorpe
2025-09-03 19:29                 ` Mike Rapoport
2025-09-02 11:58         ` Mike Rapoport
2025-09-01 17:01       ` Pratyush Yadav
2025-09-02 11:44         ` Mike Rapoport
2025-09-03 14:17           ` Pratyush Yadav
2025-09-03 19:39             ` Mike Rapoport
2025-09-04 12:39               ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 30/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-08-08 12:07 ` [PATCH v3 00/30] Live Update Orchestrator David Hildenbrand
2025-08-08 12:24   ` Pratyush Yadav
2025-08-08 13:53     ` Pasha Tatashin
2025-08-08 13:52   ` Pasha Tatashin
2025-08-26 13:16 ` Pratyush Yadav
2025-08-26 13:54   ` Pasha Tatashin
2025-08-26 14:24     ` Jason Gunthorpe
2025-08-26 15:02       ` Pasha Tatashin
2025-08-26 15:13         ` Jason Gunthorpe
2025-08-26 16:10           ` Pasha Tatashin
2025-08-26 16:22             ` Jason Gunthorpe
2025-08-26 17:03               ` Pasha Tatashin
2025-08-26 17:08                 ` Jason Gunthorpe
2025-08-27 14:01                 ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250826162019.GD2130239@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=lennart@poettering.net \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=ptyadav@amazon.de \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=song@kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=witu@nvidia.com \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.