linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Vipin Sharma <vipinsh@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
	 pratyush@kernel.org, jasonmiu@google.com,  graf@amazon.com,
	 changyuanl@google.com, rppt@kernel.org,  dmatlack@google.com,
	 rientjes@google.com, corbet@lwn.net,  rdunlap@infradead.org,
	 ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
	 ojeda@kernel.org,  aliceryhl@google.com, masahiroy@kernel.org,
	 akpm@linux-foundation.org,  tj@kernel.org,
	yoann.congal@smile.fr,  mmaurer@google.com,
	 roman.gushchin@linux.dev, chenridong@huawei.com,
	 axboe@kernel.dk,  mark.rutland@arm.com, jannh@google.com,
	 vincent.guittot@linaro.org,  hannes@cmpxchg.org,
	dan.j.williams@intel.com,  david@redhat.com,
	 joel.granados@kernel.org, rostedt@goodmis.org,
	 anna.schumaker@oracle.com,  song@kernel.org,
	zhangguopeng@kylinos.cn,  linux@weissschuh.net,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  gregkh@linuxfoundation.org,
	 tglx@linutronix.de, mingo@redhat.com,  bp@alien8.de,
	 dave.hansen@linux.intel.com, x86@kernel.org,  hpa@zytor.com,
	 rafael@kernel.org,  dakr@kernel.org,
	bartosz.golaszewski@linaro.org,  cw00.choi@samsung.com,
	myungjoo.ham@samsung.com,  yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com,  quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com,  ira.weiny@intel.com,
	andriy.shevchenko@linux.intel.com,  leon@kernel.org,
	 lukas@wunner.de, bhelgaas@google.com,  wagi@kernel.org,
	 djeffery@redhat.com, stuart.w.hayes@gmail.com,
	 lennart@poettering.net,  brauner@kernel.org,
	linux-api@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	saeedm@nvidia.com,  ajayachandra@nvidia.com,  jgg@nvidia.com,
	parav@nvidia.com,  leonro@nvidia.com,  witu@nvidia.com
Subject: Re: [PATCH v3 29/30] luo: allow preserving memfd
Date: Wed, 13 Aug 2025 14:29:38 +0200	[thread overview]
Message-ID: <mafs0wm77wgjx.fsf@kernel.org> (raw)
In-Reply-To: <20250813063407.GA3182745.vipinsh@google.com>

Hi Vipin,

Thanks for the review.

On Tue, Aug 12 2025, Vipin Sharma wrote:

> On 2025-08-07 01:44:35, Pasha Tatashin wrote:
>> From: Pratyush Yadav <ptyadav@amazon.de>
>> +static void memfd_luo_unpreserve_folios(const struct memfd_luo_preserved_folio *pfolios,
>> +					unsigned int nr_folios)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < nr_folios; i++) {
>> +		const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
>> +		struct folio *folio;
>> +
>> +		if (!pfolio->foliodesc)
>> +			continue;
>> +
>> +		folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
>> +
>> +		kho_unpreserve_folio(folio);
>
> This one is missing WARN_ON_ONCE() similar to the one in
> memfd_luo_preserve_folios().

Right, will add.

>
>> +		unpin_folio(folio);

Looking at this code caught my eye. This can also be called from LUO's
finish callback if no one claimed the memfd after live update. In that
case, unpin_folio() is going to underflow the pincount or refcount on
the folio since after the kexec, the folio is no longer pinned. We
should only be doing folio_put().

I think this function should take a argument to specify which of these
cases it is dealing with.

>> +	}
>> +}
>> +
>> +static void *memfd_luo_create_fdt(unsigned long size)
>> +{
>> +	unsigned int order = get_order(size);
>> +	struct folio *fdt_folio;
>> +	int err = 0;
>> +	void *fdt;
>> +
>> +	if (order > MAX_PAGE_ORDER)
>> +		return NULL;
>> +
>> +	fdt_folio = folio_alloc(GFP_KERNEL, order);
>
> __GFP_ZERO should also be used here. Otherwise this can lead to
> unintentional passing of old kernel memory.

fdt_create() zeroes out the buffer so this should not be a problem.

>
>> +static int memfd_luo_prepare(struct liveupdate_file_handler *handler,
>> +			     struct file *file, u64 *data)
>> +{
>> +	struct memfd_luo_preserved_folio *preserved_folios;
>> +	struct inode *inode = file_inode(file);
>> +	unsigned int max_folios, nr_folios = 0;
>> +	int err = 0, preserved_size;
>> +	struct folio **folios;
>> +	long size, nr_pinned;
>> +	pgoff_t offset;
>> +	void *fdt;
>> +	u64 pos;
>> +
>> +	if (WARN_ON_ONCE(!shmem_file(file)))
>> +		return -EINVAL;
>
> This one is only check for shmem_file, whereas in
> memfd_luo_can_preserve() there is check for inode->i_nlink also. Is that
> not needed here?

Actually, this should never happen since the LUO can_preserve() callback
should make sure of this. I think it would be perfectly fine to just
drop this check. I only added it because I was being extra careful.

>
>> +
>> +	inode_lock(inode);
>> +	shmem_i_mapping_freeze(inode, true);
>> +
>> +	size = i_size_read(inode);
>> +	if ((PAGE_ALIGN(size) / PAGE_SIZE) > UINT_MAX) {
>> +		err = -E2BIG;
>> +		goto err_unlock;
>> +	}
>> +
>> +	/*
>> +	 * Guess the number of folios based on inode size. Real number might end
>> +	 * up being smaller if there are higher order folios.
>> +	 */
>> +	max_folios = PAGE_ALIGN(size) / PAGE_SIZE;
>> +	folios = kvmalloc_array(max_folios, sizeof(*folios), GFP_KERNEL);
>
> __GFP_ZERO?

Why? This is only used in this function and gets freed on return. And
the function only looks at the elements that get initialized by
memfd_pin_folios().

>
>> +static int memfd_luo_freeze(struct liveupdate_file_handler *handler,
>> +			    struct file *file, u64 *data)
>> +{
>> +	u64 pos = file->f_pos;
>> +	void *fdt;
>> +	int err;
>> +
>> +	if (WARN_ON_ONCE(!*data))
>> +		return -EINVAL;
>> +
>> +	fdt = phys_to_virt(*data);
>> +
>> +	/*
>> +	 * The pos or size might have changed since prepare. Everything else
>> +	 * stays the same.
>> +	 */
>> +	err = fdt_setprop(fdt, 0, "pos", &pos, sizeof(pos));
>> +	if (err)
>> +		return err;
>
> Comment is talking about pos and size but code is only updating pos. 

Right. Comment is out of date. size can no longer change since prepare.
So will update the comment.

>
>> +static int memfd_luo_retrieve(struct liveupdate_file_handler *handler, u64 data,
>> +			      struct file **file_p)
>> +{
>> +	const struct memfd_luo_preserved_folio *pfolios;
>> +	int nr_pfolios, len, ret = 0, i = 0;
>> +	struct address_space *mapping;
>> +	struct folio *folio, *fdt_folio;
>> +	const u64 *pos, *size;
>> +	struct inode *inode;
>> +	struct file *file;
>> +	const void *fdt;
>> +
>> +	fdt_folio = memfd_luo_get_fdt(data);
>> +	if (!fdt_folio)
>> +		return -ENOENT;
>> +
>> +	fdt = page_to_virt(folio_page(fdt_folio, 0));
>> +
>> +	pfolios = fdt_getprop(fdt, 0, "folios", &len);
>> +	if (!pfolios || len % sizeof(*pfolios)) {
>> +		pr_err("invalid 'folios' property\n");
>
> Print should clearly state that error is because fields is not found or
> len is not multiple of sizeof(*pfolios).

Eh, there is already too much boilerplate one has to write (and read)
for parsing the FDT. Is there really a need for an extra 3-4 lines of
code for _each_ property that is parsed?

Long term, I think we shouldn't be doing this manually anyway. I think
the maintainable path forward is to define a schema for the serialized
data and have a parser that takes in the schema and gives out a parsed
struct, doing all sorts of checks in the process.

-- 
Regards,
Pratyush Yadav


  parent reply	other threads:[~2025-08-13 12:29 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  1:44 [PATCH v3 00/30] Live Update Orchestrator Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 01/30] kho: init new_physxa->phys_bits to fix lockdep Pasha Tatashin
2025-08-08 11:42   ` Pratyush Yadav
2025-08-08 11:52     ` Pratyush Yadav
2025-08-08 14:00       ` Pasha Tatashin
2025-08-08 19:06         ` Andrew Morton
2025-08-08 19:51           ` Pasha Tatashin
2025-08-08 20:19             ` Pasha Tatashin
2025-08-14 13:11   ` Jason Gunthorpe
2025-08-14 14:57     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 02/30] kho: mm: Don't allow deferred struct page with KHO Pasha Tatashin
2025-08-08 11:47   ` Pratyush Yadav
2025-08-08 14:01     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 03/30] kho: warn if KHO is disabled due to an error Pasha Tatashin
2025-08-08 11:48   ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 04/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 05/30] kho: make debugfs interface optional Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 06/30] kho: drop notifiers Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 07/30] kho: add interfaces to unpreserve folios and physical memory ranges Pasha Tatashin
2025-08-14 13:22   ` Jason Gunthorpe
2025-08-14 15:05     ` Pasha Tatashin
2025-08-14 17:01       ` Jason Gunthorpe
2025-08-15  9:12     ` Mike Rapoport
2025-08-18 13:55       ` Jason Gunthorpe
2025-08-07  1:44 ` [PATCH v3 08/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-08-14 13:30   ` Jason Gunthorpe
2025-08-07  1:44 ` [PATCH v3 09/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-08-30  8:35   ` Mike Rapoport
2025-08-07  1:44 ` [PATCH v3 10/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-08-14 13:31   ` Jason Gunthorpe
2025-08-07  1:44 ` [PATCH v3 11/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 12/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 13/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 14/30] liveupdate: luo_files: add infrastructure for FDs Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 15/30] liveupdate: luo_files: implement file systems callbacks Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 16/30] liveupdate: luo_ioctl: add userpsace interface Pasha Tatashin
2025-08-14 13:49   ` Jason Gunthorpe
2025-08-07  1:44 ` [PATCH v3 17/30] liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close Pasha Tatashin
2025-08-27 15:34   ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 18/30] liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state management Pasha Tatashin
2025-08-14 14:02   ` Jason Gunthorpe
2025-08-07  1:44 ` [PATCH v3 19/30] liveupdate: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-08-26 16:03   ` Jason Gunthorpe
2025-08-26 18:58     ` Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 20/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 21/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 22/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 23/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 24/30] docs: add luo documentation Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 25/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 26/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-08-11 23:11   ` Vipin Sharma
2025-08-13 12:42     ` Pratyush Yadav
2025-08-07  1:44 ` [PATCH v3 27/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 28/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-08-07  1:44 ` [PATCH v3 29/30] luo: allow preserving memfd Pasha Tatashin
2025-08-08 20:22   ` Pasha Tatashin
2025-08-13 12:44     ` Pratyush Yadav
2025-08-13  6:34   ` Vipin Sharma
2025-08-13  7:09     ` Greg KH
2025-08-13 12:02       ` Pratyush Yadav
2025-08-13 12:14         ` Greg KH
2025-08-13 12:41           ` Jason Gunthorpe
2025-08-13 13:00             ` Greg KH
2025-08-13 13:37               ` Pratyush Yadav
2025-08-13 13:41                 ` Pasha Tatashin
2025-08-13 13:53                   ` Greg KH
2025-08-13 13:53                 ` Greg KH
2025-08-13 20:03               ` Jason Gunthorpe
2025-08-13 13:31             ` Pratyush Yadav
2025-08-13 12:29     ` Pratyush Yadav [this message]
2025-08-13 13:49       ` Pasha Tatashin
2025-08-13 13:55         ` Pratyush Yadav
2025-08-26 16:20   ` Jason Gunthorpe
2025-08-27 15:03     ` Pratyush Yadav
2025-08-28 12:43       ` Jason Gunthorpe
2025-08-28 23:00         ` Chris Li
2025-09-01 17:10         ` Pratyush Yadav
2025-09-02 13:48           ` Jason Gunthorpe
2025-09-03 14:10             ` Pratyush Yadav
2025-09-03 15:01               ` Jason Gunthorpe
2025-08-28  7:14     ` Mike Rapoport
2025-08-29 18:47       ` Chris Li
2025-08-29 19:18     ` Chris Li
2025-09-02 13:41       ` Jason Gunthorpe
2025-09-03 12:01         ` Chris Li
2025-09-01 16:23     ` Mike Rapoport
2025-09-01 16:54       ` Pasha Tatashin
2025-09-01 17:21         ` Pratyush Yadav
2025-09-01 19:02           ` Pasha Tatashin
2025-09-02 11:38             ` Jason Gunthorpe
2025-09-03 15:59               ` Pasha Tatashin
2025-09-03 16:40                 ` Jason Gunthorpe
2025-09-03 19:29                 ` Mike Rapoport
2025-09-02 11:58         ` Mike Rapoport
2025-09-01 17:01       ` Pratyush Yadav
2025-09-02 11:44         ` Mike Rapoport
2025-09-03 14:17           ` Pratyush Yadav
2025-09-03 19:39             ` Mike Rapoport
2025-08-07  1:44 ` [PATCH v3 30/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-08-08 12:07 ` [PATCH v3 00/30] Live Update Orchestrator David Hildenbrand
2025-08-08 12:24   ` Pratyush Yadav
2025-08-08 13:53     ` Pasha Tatashin
2025-08-08 13:52   ` Pasha Tatashin
2025-08-26 13:16 ` Pratyush Yadav
2025-08-26 13:54   ` Pasha Tatashin
2025-08-26 14:24     ` Jason Gunthorpe
2025-08-26 15:02       ` Pasha Tatashin
2025-08-26 15:13         ` Jason Gunthorpe
2025-08-26 16:10           ` Pasha Tatashin
2025-08-26 16:22             ` Jason Gunthorpe
2025-08-26 17:03               ` Pasha Tatashin
2025-08-26 17:08                 ` Jason Gunthorpe
2025-08-27 14:01                 ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mafs0wm77wgjx.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=jgg@nvidia.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=lennart@poettering.net \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=song@kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vipinsh@google.com \
    --cc=wagi@kernel.org \
    --cc=witu@nvidia.com \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).