linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 00/16] Live Update Orchestrator
@ 2025-05-15 18:23 Pasha Tatashin
  2025-05-15 18:23 ` [RFC v2 01/16] kho: make debugfs interface optional Pasha Tatashin
                   ` (17 more replies)
  0 siblings, 18 replies; 104+ messages in thread
From: Pasha Tatashin @ 2025-05-15 18:23 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
	dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
	aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
	roman.gushchin, chenridong, axboe, mark.rutland, jannh,
	vincent.guittot, hannes, dan.j.williams, david, joel.granados,
	rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
	linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
	hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
	yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
	ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
	djeffery, stuart.w.hayes, ptyadav

This v2 series introduces the LUO, a kernel subsystem designed to
facilitate live kernel updates with minimal downtime,
particularly in cloud delplyoments aiming to update without fully
disrupting running virtual machines.

This series builds upon KHO framework [1] by adding programmatic
control over KHO's lifecycle and leveraging KHO for persisting LUO's
own metadata across the kexec boundary. The git branch for this series
can be found at:
https://github.com/googleprodkernel/linux-liveupdate/tree/luo/rfc-v2

Changelog from v1:
- Control Interface: Shifted from sysfs-based control
  (/sys/kernel/liveupdate/{prepare,finish}) to an ioctl interface
  (/dev/liveupdate). Sysfs is now primarily for monitoring the state.
- Event/State Renaming: LIVEUPDATE_REBOOT event/phase is now
  LIVEUPDATE_FREEZE.
- FD Preservation: A new component for preserving file descriptors.
  Subsystem Registration: A formal mechanism for kernel subsystems
  to participate.
- Device Layer: removed device list handling from this series, it is
  going to be added separately.
- Selftests: Kernel-side selftest hooks and userspace selftests are
  now included.
KHO Enhancements:
- KHO debugfs became optional, and kernel APIs for finalize/abort
  were added (driven by LUO's needs).
- KHO unpreserve functions were also added.

What is Live Update?
Live Update is a specialized reboot process where selected kernel
resources (memory, file descriptors, and eventually devices) are kept
operational or their state preserved across a kernel transition (e.g.,
via kexec). For certain resources, DMA and interrupt activity might
continue with minimal interruption during the kernel reboot.

LUO v2 Overview:
LUO v2 provides a framework for coordinating live updates. It features:
State Machine: Manages the live update process through states:
NORMAL, PREPARED, FROZEN, UPDATED.

KHO Integration:

LUO programmatically drives KHO's finalization and abort sequences.
KHO's debugfs interface is now optional configured via
CONFIG_KEXEC_HANDOVER_DEBUG.

LUO preserves its own metadata via KHO's kho_add_subtree and
kho_preserve_phys() mechanisms.

Subsystem Participation: A callback API liveupdate_register_subsystem()
allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register
handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a
u64 payload via the LUO FDT.

File Descriptor Preservation: Infrastructure
liveupdate_register_filesystem, luo_register_file, luo_retrieve_file to
allow specific types of file descriptors (e.g., memfd, vfio) to be
preserved and restored.

Handlers for specific file types can be registered to manage their
preservation and restoration, storing a u64 payload in the LUO FDT.

Example WIP for memfd preservation can be found here [2].

User-space Interface:

ioctl (/dev/liveupdate): The primary control interface for
triggering LUO state transitions (prepare, freeze, finish, cancel)
and managing the preservation/restoration of file descriptors.
Access requires CAP_SYS_ADMIN.

sysfs (/sys/kernel/liveupdate/state): A read-only interface for
monitoring the current LUO state. This allows userspace services to
track progress and coordinate actions.

Selftests: Includes kernel-side hooks and userspace selftests to
verify core LUO functionality, particularly subsystem registration and
basic state transitions.

LUO State Machine and Events:

NORMAL:   Default operational state.
PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE
          event. Subsystems have saved initial state.
FROZEN:   Final "blackout window" state after LIVEUPDATE_FREEZE
          event, just before kexec. Workloads must be suspended.
UPDATED:  Next kernel has booted via live update. Awaiting restoration
          and LIVEUPDATE_FINISH.

Events:
LIVEUPDATE_PREPARE: Prepare for reboot, serialize state.
LIVEUPDATE_FREEZE:  Final opportunity to save state before kexec.
LIVEUPDATE_FINISH:  Post-reboot cleanup in the next kernel.
LIVEUPDATE_CANCEL:  Abort prepare or freeze, revert changes.

[1] https://lore.kernel.org/all/20250509074635.3187114-1-changyuanl@google.com
    https://github.com/googleprodkernel/linux-liveupdate/tree/luo/kho-v8
[2] https://github.com/googleprodkernel/linux-liveupdate/tree/luo/memfd-v0.1

RFC v1: https://lore.kernel.org/all/20250320024011.2995837-1-pasha.tatashin@soleen.com

Changyuan Lyu (1):
  kho: add kho_unpreserve_folio/phys

Pasha Tatashin (15):
  kho: make debugfs interface optional
  kho: allow to drive kho from within kernel
  luo: luo_core: Live Update Orchestrator
  luo: luo_core: integrate with KHO
  luo: luo_subsystems: add subsystem registration
  luo: luo_subsystems: implement subsystem callbacks
  luo: luo_files: add infrastructure for FDs
  luo: luo_files: implement file systems callbacks
  luo: luo_ioctl: add ioctl interface
  luo: luo_sysfs: add sysfs state monitoring
  reboot: call liveupdate_reboot() before kexec
  luo: add selftests for subsystems un/registration
  selftests/liveupdate: add subsystem/state tests
  docs: add luo documentation
  MAINTAINERS: add liveupdate entry

 .../ABI/testing/sysfs-kernel-liveupdate       |  51 ++
 Documentation/admin-guide/index.rst           |   1 +
 Documentation/admin-guide/liveupdate.rst      |  62 ++
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |  14 +-
 drivers/misc/Kconfig                          |   1 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/liveupdate/Kconfig               |  60 ++
 drivers/misc/liveupdate/Makefile              |   7 +
 drivers/misc/liveupdate/luo_core.c            | 547 +++++++++++++++
 drivers/misc/liveupdate/luo_files.c           | 664 ++++++++++++++++++
 drivers/misc/liveupdate/luo_internal.h        |  59 ++
 drivers/misc/liveupdate/luo_ioctl.c           | 203 ++++++
 drivers/misc/liveupdate/luo_selftests.c       | 283 ++++++++
 drivers/misc/liveupdate/luo_selftests.h       |  23 +
 drivers/misc/liveupdate/luo_subsystems.c      | 413 +++++++++++
 drivers/misc/liveupdate/luo_sysfs.c           |  92 +++
 include/linux/kexec_handover.h                |  27 +
 include/linux/liveupdate.h                    | 214 ++++++
 include/uapi/linux/liveupdate.h               | 324 +++++++++
 kernel/Kconfig.kexec                          |  10 +
 kernel/Makefile                               |   1 +
 kernel/kexec_handover.c                       | 343 +++------
 kernel/kexec_handover_debug.c                 | 237 +++++++
 kernel/kexec_handover_internal.h              |  74 ++
 kernel/reboot.c                               |   4 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/liveupdate/.gitignore |   1 +
 tools/testing/selftests/liveupdate/Makefile   |   7 +
 tools/testing/selftests/liveupdate/config     |   6 +
 .../testing/selftests/liveupdate/liveupdate.c | 440 ++++++++++++
 31 files changed, 3933 insertions(+), 238 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-liveupdate
 create mode 100644 Documentation/admin-guide/liveupdate.rst
 create mode 100644 drivers/misc/liveupdate/Kconfig
 create mode 100644 drivers/misc/liveupdate/Makefile
 create mode 100644 drivers/misc/liveupdate/luo_core.c
 create mode 100644 drivers/misc/liveupdate/luo_files.c
 create mode 100644 drivers/misc/liveupdate/luo_internal.h
 create mode 100644 drivers/misc/liveupdate/luo_ioctl.c
 create mode 100644 drivers/misc/liveupdate/luo_selftests.c
 create mode 100644 drivers/misc/liveupdate/luo_selftests.h
 create mode 100644 drivers/misc/liveupdate/luo_subsystems.c
 create mode 100644 drivers/misc/liveupdate/luo_sysfs.c
 create mode 100644 include/linux/liveupdate.h
 create mode 100644 include/uapi/linux/liveupdate.h
 create mode 100644 kernel/kexec_handover_debug.c
 create mode 100644 kernel/kexec_handover_internal.h
 create mode 100644 tools/testing/selftests/liveupdate/.gitignore
 create mode 100644 tools/testing/selftests/liveupdate/Makefile
 create mode 100644 tools/testing/selftests/liveupdate/config
 create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c

-- 
2.49.0.1101.gccaa498523-goog



^ permalink raw reply	[flat|nested] 104+ messages in thread
* Re: [RFC v2 08/16] luo: luo_files: add infrastructure for FDs
@ 2025-06-06 22:28 Anish Moorthy
  2025-06-08  0:07 ` Pasha Tatashin
  0 siblings, 1 reply; 104+ messages in thread
From: Anish Moorthy @ 2025-06-06 22:28 UTC (permalink / raw)
  To: pasha.tatashin
  Cc: Jonathan.Cameron, akpm, aleksander.lobakin, aliceryhl,
	andriy.shevchenko, anna.schumaker, axboe, bartosz.golaszewski,
	bhelgaas, bp, changyuanl, chenridong, corbet, cw00.choi, dakr,
	dan.j.williams, dave.hansen, david, djeffery, dmatlack, graf,
	gregkh, hannes, hpa, ilpo.jarvinen, ira.weiny, jannh, jasonmiu,
	joel.granados, kanie, leon, linux-doc, linux-kernel, linux-mm,
	linux, lukas, mark.rutland, masahiroy, mingo, mmaurer,
	myungjoo.ham, ojeda, pratyush, ptyadav, quic_zijuhu, rafael,
	rdunlap, rientjes, roman.gushchin, rostedt, rppt, song,
	stuart.w.hayes, tglx, tj, vincent.guittot, wagi, x86,
	yesanishhere, yoann.congal, zhangguopeng

[-- Attachment #1: Type: text/plain, Size: 241 bytes --]

> + token = luo_next_file_token;
> + luo_next_file_token++;

This seems like it should be an atomic fetch_add: I'm only seeing read
locks up till this point

(Sorry if this is too nitpicky. Also for any formatting issues, I'm on
mobile atm)

[-- Attachment #2: Type: text/html, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2025-07-23 14:51 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 18:23 [RFC v2 00/16] Live Update Orchestrator Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 01/16] kho: make debugfs interface optional Pasha Tatashin
2025-06-04 16:03   ` Pratyush Yadav
2025-06-06 16:12     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 02/16] kho: allow to drive kho from within kernel Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 03/16] kho: add kho_unpreserve_folio/phys Pasha Tatashin
2025-06-04 15:00   ` Pratyush Yadav
2025-06-06 16:22     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 04/16] luo: luo_core: Live Update Orchestrator Pasha Tatashin
2025-05-26  6:31   ` Mike Rapoport
2025-05-30  5:00     ` Pasha Tatashin
2025-06-04 15:17   ` Pratyush Yadav
2025-06-07 17:11     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 05/16] luo: luo_core: integrate with KHO Pasha Tatashin
2025-05-26  7:18   ` Mike Rapoport
2025-06-07 17:50     ` Pasha Tatashin
2025-06-09  2:14       ` Pasha Tatashin
2025-06-04 16:00   ` Pratyush Yadav
2025-06-07 23:30     ` Pasha Tatashin
2025-06-13 14:58       ` Pratyush Yadav
2025-06-17 15:23         ` Jason Gunthorpe
2025-06-17 19:32           ` Pasha Tatashin
2025-06-18 13:11             ` Pratyush Yadav
2025-06-18 14:48               ` Pasha Tatashin
2025-06-18 16:40                 ` Mike Rapoport
2025-06-18 17:00                   ` Pasha Tatashin
2025-06-18 17:43                     ` Pasha Tatashin
2025-06-19 12:00                       ` Mike Rapoport
2025-06-19 14:22                         ` Pasha Tatashin
2025-06-20 15:28                           ` Pratyush Yadav
2025-06-20 16:03                             ` Pasha Tatashin
2025-06-24 16:12                               ` Pratyush Yadav
2025-06-24 16:55                                 ` Pasha Tatashin
2025-06-24 18:31                                 ` Jason Gunthorpe
2025-06-23  7:32                       ` Mike Rapoport
2025-06-23 11:29                         ` Pasha Tatashin
2025-06-25 13:46                           ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 06/16] luo: luo_subsystems: add subsystem registration Pasha Tatashin
2025-05-26  7:31   ` Mike Rapoport
2025-06-07 23:42     ` Pasha Tatashin
2025-05-28 19:12   ` David Matlack
2025-06-07 23:58     ` Pasha Tatashin
2025-06-04 16:30   ` Pratyush Yadav
2025-06-08  0:04     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 07/16] luo: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 08/16] luo: luo_files: add infrastructure for FDs Pasha Tatashin
2025-05-15 23:15   ` James Houghton
2025-05-23 18:09     ` Pasha Tatashin
2025-05-26  7:55   ` Mike Rapoport
2025-06-05 11:56     ` Pratyush Yadav
2025-06-08 13:13     ` Pasha Tatashin
2025-06-05 15:56   ` Pratyush Yadav
2025-06-08 13:37     ` Pasha Tatashin
2025-06-13 15:27       ` Pratyush Yadav
2025-06-15 18:02         ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 09/16] luo: luo_files: implement file systems callbacks Pasha Tatashin
2025-06-05 16:03   ` Pratyush Yadav
2025-06-08 13:49     ` Pasha Tatashin
2025-06-13 15:18       ` Pratyush Yadav
2025-06-13 20:26         ` Pasha Tatashin
2025-06-16 10:43           ` Pratyush Yadav
2025-06-16 14:57             ` Pasha Tatashin
2025-06-18 13:16               ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 10/16] luo: luo_ioctl: add ioctl interface Pasha Tatashin
2025-05-26  8:42   ` Mike Rapoport
2025-06-08 15:08     ` Pasha Tatashin
2025-05-28 20:29   ` David Matlack
2025-06-08 16:32     ` Pasha Tatashin
2025-06-05 16:15   ` Pratyush Yadav
2025-06-08 16:35     ` Pasha Tatashin
2025-06-24  9:50   ` Christian Brauner
2025-06-24 14:27     ` Pasha Tatashin
2025-06-25  9:36       ` Christian Brauner
2025-06-25 16:12         ` David Matlack
2025-06-26 15:42           ` Pratyush Yadav
2025-06-26 16:24             ` David Matlack
2025-07-14 14:56               ` Pratyush Yadav
2025-07-17 16:17                 ` David Matlack
2025-07-23 14:51                   ` Pratyush Yadav
2025-07-06 14:33             ` Mike Rapoport
2025-07-07 12:56               ` Jason Gunthorpe
2025-06-25 16:58         ` pasha.tatashin
2025-07-06 14:24     ` Mike Rapoport
2025-07-09 21:27       ` Pratyush Yadav
2025-07-10  7:26         ` Mike Rapoport
2025-07-14 14:34           ` Jason Gunthorpe
2025-07-16  9:43             ` Greg KH
2025-05-15 18:23 ` [RFC v2 11/16] luo: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-06-05 16:20   ` Pratyush Yadav
2025-06-08 16:36     ` Pasha Tatashin
2025-06-13 15:13       ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 12/16] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 13/16] luo: add selftests for subsystems un/registration Pasha Tatashin
2025-05-26  8:52   ` Mike Rapoport
2025-06-08 16:47     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 14/16] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 15/16] docs: add luo documentation Pasha Tatashin
2025-05-26  9:00   ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 16/16] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-05-20  7:25 ` [RFC v2 00/16] Live Update Orchestrator Mike Rapoport
2025-05-23 18:07   ` Pasha Tatashin
2025-05-26  6:32 ` Mike Rapoport
  -- strict thread matches above, loose matches on Subject: below --
2025-06-06 22:28 [RFC v2 08/16] luo: luo_files: add infrastructure for FDs Anish Moorthy
2025-06-08  0:07 ` Pasha Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).