linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	jasonmiu@google.com,  graf@amazon.com, changyuanl@google.com,
	rppt@kernel.org, rientjes@google.com,  corbet@lwn.net,
	rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com,
	 kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com,
	 masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org,
	 yoann.congal@smile.fr, mmaurer@google.com,
	roman.gushchin@linux.dev,  chenridong@huawei.com,
	axboe@kernel.dk, mark.rutland@arm.com,  jannh@google.com,
	vincent.guittot@linaro.org, hannes@cmpxchg.org,
	 dan.j.williams@intel.com, david@redhat.com,
	joel.granados@kernel.org,  rostedt@goodmis.org,
	anna.schumaker@oracle.com, song@kernel.org,
	 zhangguopeng@kylinos.cn, linux@weissschuh.net,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org, gregkh@linuxfoundation.org,
	 tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	 dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	rafael@kernel.org,  dakr@kernel.org,
	bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
	 myungjoo.ham@samsung.com, yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com,  quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com, ira.weiny@intel.com,
	 andriy.shevchenko@linux.intel.com, leon@kernel.org,
	lukas@wunner.de,  bhelgaas@google.com, wagi@kernel.org,
	djeffery@redhat.com,  stuart.w.hayes@gmail.com
Subject: Re: [RFC v2 10/16] luo: luo_ioctl: add ioctl interface
Date: Thu, 17 Jul 2025 09:17:17 -0700	[thread overview]
Message-ID: <CALzav=cUQGF_DnmyDOORssoThmfQwnPgUxQiLmXyAKY1-hyT4g@mail.gmail.com> (raw)
In-Reply-To: <mafs04iveu8gs.fsf@kernel.org>

On Mon, Jul 14, 2025 at 7:56 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> On Thu, Jun 26 2025, David Matlack wrote:
> > On Thu, Jun 26, 2025 at 8:42 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> >> On Wed, Jun 25 2025, David Matlack wrote:
> >> > On Wed, Jun 25, 2025 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
> >> >> >
> >> >> > While I agree that a filesystem offers superior introspection and
> >> >> > integration with standard tools, building this complex, stateful
> >> >> > orchestration logic on top of VFS seemed to be forcing a square peg
> >> >> > into a round hole. The ioctl interface, while more opaque, provides a
> >> >> > direct and explicit way to command the state machine and manage these
> >> >> > complex lifecycle and dependency rules.
> >> >>
> >> >> I'm not going to argue that you have to switch to this kexecfs idea
> >> >> but...
> >> >>
> >> >> You're using a character device that's tied to devmptfs. In other words,
> >> >> you're already using a filesystem interface. Literally the whole code
> >> >> here is built on top of filesystem APIs. So this argument is just very
> >> >> wrong imho. If you can built it on top of a character device using VFS
> >> >> interfaces you can do it as a minimal filesystem.
> >> >>
> >> >> You're free to define the filesystem interface any way you like it. We
> >> >> have a ton of examples there. All your ioctls would just be tied to the
> >> >> fileystem instance instead of the /dev/somethingsomething character
> >> >> device. The state machine could just be implemented the same way.
> >> >>
> >> >> One of my points is that with an fs interface you can have easy state
> >> >> seralization on a per-service level. IOW, you have a bunch of virtual
> >> >> machines running as services or some networking services or whatever.
> >> >> You could just bind-mount an instance of kexecfs into the service and
> >> >> the service can persist state into the instance and easily recover it
> >> >> after kexec.
> >> >
> >> > This approach sounds worth exploring more. It would avoid the need for
> >> > a centralized daemon to mediate the preservation and restoration of
> >> > all file descriptors.
> >>
> >> One of the jobs of the centralized daemon is to decide the _policy_ of
> >> who gets to preserve things and more importantly, make sure the right
> >> party unpreserves the right FDs after a kexec. I don't see how this
> >> interface fixes this problem. You would still need a way to identify
> >> which kexecfs instance belongs to who and enforce that. The kernel
> >> probably shouldn't be the one doing this kind of policy so you still
> >> need some userspace component to make those decisions.
> >
> > The main benefits I see of kexecfs is that it avoids needing to send
> > all FDs over UDS to/from liveupdated and therefore the need for
> > dynamic cross-process communication (e.g. RPCs).
> >
> > Instead, something just needs to set up a kexecfs for each VM when it
> > is created, and give the same kexecfs back to each VM after kexec.
> > Then VMs are free to save/restore any FDs in that kexecfs without
> > cross-process communication or transferring file descriptors.
>
> Isn't giving back the right kexecfs instance to the right VMM the main
> problem? After a kexec, you need a way to make that policy decision. You
> would need a userspace agent to do that.
>
> I think what you are suggesting does make a lot of sense -- the agent
> should be handing out sessions instead of FDs, which would make FD
> save/restore simpler for applications. But that can be done using the
> ioctl interface as well. Each time you open() the /dev/liveupdate, you
> get a new session. Instead of file FDs like memfd or iommufs, we can
> have the agent hand out these session FDs and anything that was saved
> using this session would be ready for restoring.
>
> My main point is that this can be done with the current interface as
> well as kexecfs. I think there is very much a reason for considering
> kexecfs (like not being dependent on devtmpfs), but I don't think this
> is necessarily the main one.

The main problem I'd like solved is requiring all FDs to preserved and
restored in the context of a central daemon, since I think this will
inevitably cause problems for KVM. I agree with you that this problem
can also be solved in other ways, such as session FDs (good idea!).

>
> >
> > Policy can be enforced by controlling access to kexecfs mounts. This
> > naturally fits into the standard architecture of running untrusted VMs
> > (e.g. using chroots and containers to enforce security and isolation).
>
> How? After a kexec, how do you tell which process can get which kexecfs
> mount/instance? If any of them can get any, then we lose all sort of
> policy enforcement.

I was imagining it's up to whatever process/daemon creates the kexecfs
instances before kexec is also responsible for reassociating them with
the right processes after kexec.

If you are asking how that association would be done mechanically, I
was imagining it would be through a combination of filesystem
permissions, mounts, and chroots. For example, the kexecfs instance
for VM A would be mounted in VM A's chroot. VM A would then only have
access to its own kexecfs instance.

> >> > I'm not sure that we can get rid of the machine-wide state machine
> >> > though, as there is some kernel state that will necessarily cross
> >> > these kexecfs domains (e.g. IOMMU driver state). So we still might
> >> > need /dev/liveupdate for that.
> >>
> >> Generally speaking, I think both VFS-based and IOCTL-based interfaces
> >> are more or less equally expressive/powerful. Most of the ioctl
> >> operations can be translated to a VFS operation and vice versa.
> >>
> >> For example, the fsopen() call is similar to open("/dev/liveupdate") --
> >> both would create a live update session which auto closes when the FD is
> >> closed or FS unmounted. Similarly, each ioctl can be replaced with a
> >> file in the FS. For example, LIVEUPDATE_IOCTL_FD_PRESERVE can be
> >> replaced with a fd_preserve file where you write() the FD number.
> >> LIVEUPDATE_IOCTL_GET_STATE or LIVEUPDATE_IOCTL_PREPARE, etc. can be
> >> replaced by a "state" file where you can read() or write() the state.
> >>
> >> I think the main benefit of the VFS-based interface is ease of use.
> >> There already exist a bunch of utilites and libraries that we can use to
> >> interact with files. When we have ioctls, we would need to write
> >> everything ourselves. For example, instead of
> >> LIVEUPDATE_IOCTL_GET_STATE, you can do "cat state", which is a bit
> >> easier to do.
> >>
> >> As for downsides, I think we might end up with a bit more boilerplate
> >> code, but beyond that I am not sure.
> >
> > I agree we can more or less get to the same end state with either
> > approach. And also, I don't think we have to do one or the other. I
> > think kexecfs is something that we can build on top of this series.
> > For example, kexecfs would be a new kernel subsystem that registers
> > with LUO.
>
> Yeah, fair point. Though I'd rather we agree on one and go with that.
> Having two interfaces for the same thing isn't the best.

Agreed, tt would be better to have a single way to preserve FDs rather
than 2 (LUO ioctl and kexecfs).


  reply	other threads:[~2025-07-17 16:17 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-15 18:23 [RFC v2 00/16] Live Update Orchestrator Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 01/16] kho: make debugfs interface optional Pasha Tatashin
2025-06-04 16:03   ` Pratyush Yadav
2025-06-06 16:12     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 02/16] kho: allow to drive kho from within kernel Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 03/16] kho: add kho_unpreserve_folio/phys Pasha Tatashin
2025-06-04 15:00   ` Pratyush Yadav
2025-06-06 16:22     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 04/16] luo: luo_core: Live Update Orchestrator Pasha Tatashin
2025-05-26  6:31   ` Mike Rapoport
2025-05-30  5:00     ` Pasha Tatashin
2025-06-04 15:17   ` Pratyush Yadav
2025-06-07 17:11     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 05/16] luo: luo_core: integrate with KHO Pasha Tatashin
2025-05-26  7:18   ` Mike Rapoport
2025-06-07 17:50     ` Pasha Tatashin
2025-06-09  2:14       ` Pasha Tatashin
2025-06-04 16:00   ` Pratyush Yadav
2025-06-07 23:30     ` Pasha Tatashin
2025-06-13 14:58       ` Pratyush Yadav
2025-06-17 15:23         ` Jason Gunthorpe
2025-06-17 19:32           ` Pasha Tatashin
2025-06-18 13:11             ` Pratyush Yadav
2025-06-18 14:48               ` Pasha Tatashin
2025-06-18 16:40                 ` Mike Rapoport
2025-06-18 17:00                   ` Pasha Tatashin
2025-06-18 17:43                     ` Pasha Tatashin
2025-06-19 12:00                       ` Mike Rapoport
2025-06-19 14:22                         ` Pasha Tatashin
2025-06-20 15:28                           ` Pratyush Yadav
2025-06-20 16:03                             ` Pasha Tatashin
2025-06-24 16:12                               ` Pratyush Yadav
2025-06-24 16:55                                 ` Pasha Tatashin
2025-06-24 18:31                                 ` Jason Gunthorpe
2025-06-23  7:32                       ` Mike Rapoport
2025-06-23 11:29                         ` Pasha Tatashin
2025-06-25 13:46                           ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 06/16] luo: luo_subsystems: add subsystem registration Pasha Tatashin
2025-05-26  7:31   ` Mike Rapoport
2025-06-07 23:42     ` Pasha Tatashin
2025-05-28 19:12   ` David Matlack
2025-06-07 23:58     ` Pasha Tatashin
2025-06-04 16:30   ` Pratyush Yadav
2025-06-08  0:04     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 07/16] luo: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 08/16] luo: luo_files: add infrastructure for FDs Pasha Tatashin
2025-05-15 23:15   ` James Houghton
2025-05-23 18:09     ` Pasha Tatashin
2025-05-26  7:55   ` Mike Rapoport
2025-06-05 11:56     ` Pratyush Yadav
2025-06-08 13:13     ` Pasha Tatashin
2025-06-05 15:56   ` Pratyush Yadav
2025-06-08 13:37     ` Pasha Tatashin
2025-06-13 15:27       ` Pratyush Yadav
2025-06-15 18:02         ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 09/16] luo: luo_files: implement file systems callbacks Pasha Tatashin
2025-06-05 16:03   ` Pratyush Yadav
2025-06-08 13:49     ` Pasha Tatashin
2025-06-13 15:18       ` Pratyush Yadav
2025-06-13 20:26         ` Pasha Tatashin
2025-06-16 10:43           ` Pratyush Yadav
2025-06-16 14:57             ` Pasha Tatashin
2025-06-18 13:16               ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 10/16] luo: luo_ioctl: add ioctl interface Pasha Tatashin
2025-05-26  8:42   ` Mike Rapoport
2025-06-08 15:08     ` Pasha Tatashin
2025-05-28 20:29   ` David Matlack
2025-06-08 16:32     ` Pasha Tatashin
2025-06-05 16:15   ` Pratyush Yadav
2025-06-08 16:35     ` Pasha Tatashin
2025-06-24  9:50   ` Christian Brauner
2025-06-24 14:27     ` Pasha Tatashin
2025-06-25  9:36       ` Christian Brauner
2025-06-25 16:12         ` David Matlack
2025-06-26 15:42           ` Pratyush Yadav
2025-06-26 16:24             ` David Matlack
2025-07-14 14:56               ` Pratyush Yadav
2025-07-17 16:17                 ` David Matlack [this message]
2025-07-23 14:51                   ` Pratyush Yadav
2025-07-06 14:33             ` Mike Rapoport
2025-07-07 12:56               ` Jason Gunthorpe
2025-06-25 16:58         ` pasha.tatashin
2025-07-06 14:24     ` Mike Rapoport
2025-07-09 21:27       ` Pratyush Yadav
2025-07-10  7:26         ` Mike Rapoport
2025-07-14 14:34           ` Jason Gunthorpe
2025-07-16  9:43             ` Greg KH
2025-05-15 18:23 ` [RFC v2 11/16] luo: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-06-05 16:20   ` Pratyush Yadav
2025-06-08 16:36     ` Pasha Tatashin
2025-06-13 15:13       ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 12/16] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 13/16] luo: add selftests for subsystems un/registration Pasha Tatashin
2025-05-26  8:52   ` Mike Rapoport
2025-06-08 16:47     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 14/16] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 15/16] docs: add luo documentation Pasha Tatashin
2025-05-26  9:00   ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 16/16] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-05-20  7:25 ` [RFC v2 00/16] Live Update Orchestrator Mike Rapoport
2025-05-23 18:07   ` Pasha Tatashin
2025-05-26  6:32 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALzav=cUQGF_DnmyDOORssoThmfQwnPgUxQiLmXyAKY1-hyT4g@mail.gmail.com' \
    --to=dmatlack@google.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).