Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: David Matlack @ 2025-11-18  0:06 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-19-pasha.tatashin@soleen.com>

On 2025-11-15 06:34 PM, Pasha Tatashin wrote:

> +/* Stage 1: Executed before the kexec reboot. */
> +static void run_stage_1(int luo_fd)
> +{
> +	int session_fd;
> +
> +	ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> +
> +	ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> +	create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> +
> +	ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> +		       TEST_SESSION_NAME);
> +	session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> +	if (session_fd < 0)
> +		fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> +
> +	if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> +				      TEST_MEMFD_DATA) < 0) {
> +		fail_exit("create_and_preserve_memfd for token %#x",
> +			  TEST_MEMFD_TOKEN);
> +	}
> +
> +	ksft_print_msg("[STAGE 1] Executing kexec...\n");
> +	if (system(KEXEC_SCRIPT) != 0)
> +		fail_exit("kexec script failed");
> +	exit(EXIT_FAILURE);

Can we separate the kexec from the test and allow the user/automation to
trigger it however is appropriate for their system? The current
do_kexec.sh script does not do any sort of graceful shutdown, and I bet
everyone will have different ways of initiating kexec on their systems.

For example, something like this (but sleeping in the child instead of
busy waiting):

diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
index 67ab6ebf9eec..513693bfb77b 100644
--- a/tools/testing/selftests/liveupdate/luo_kexec_simple.c
+++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
@@ -24,6 +24,7 @@
 static void run_stage_1(int luo_fd)
 {
 	int session_fd;
+	int ret;
 
 	ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
 
@@ -42,10 +43,17 @@ static void run_stage_1(int luo_fd)
 			  TEST_MEMFD_TOKEN);
 	}
 
-	ksft_print_msg("[STAGE 1] Executing kexec...\n");
-	if (system(KEXEC_SCRIPT) != 0)
-		fail_exit("kexec script failed");
-	exit(EXIT_FAILURE);
+	ksft_print_msg("[STAGE 1] Forking child process to hold session open\n");
+	ret = fork();
+	if (ret < 0)
+		fail_exit("fork() failed");
+	if (!ret)
+		for (;;) {}
+
+	ksft_print_msg("[STAGE 1] Child Process: %d\n", ret);
+	ksft_print_msg("[STAGE 1] Complete!\n");
+	ksft_print_msg("[STAGE 1] Execute kexec to continue\n");
+	exit(0);
 }
 
 /* Stage 2: Executed after the kexec reboot. */

> +int main(int argc, char *argv[])
> +{
> +	int luo_fd;
> +	int state_session_fd;
> +
> +	luo_fd = luo_open_device();
> +	if (luo_fd < 0)
> +		ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
> +			       LUO_DEVICE);
> +
> +	/*
> +	 * Determine the stage by attempting to retrieve the state session.
> +	 * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
> +	 */
> +	state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);

I don't think the test should try to infer the stage from the state of
the system. If a user runs this test, then does the kexec, then runs
this test again and the session can't be retrieved, that should be a
test failure (not just run stage 1 again).

I think it'd be better to require the user to pass in what stage of the
test should be run when invoking the test. e.g.

 $ ./luo_kexec_simple stage_2

^ permalink raw reply related

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: Pasha Tatashin @ 2025-11-18  1:01 UTC (permalink / raw)
  To: David Matlack
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CALzav=ekHM8a3yYHHUJNgtYVwLYf1hFhEmrXJjHUXRt=xrSy4A@mail.gmail.com>

> > TEST_PROGS_O := $(patsubst %, %.o, $(TEST_PROGS))
> >
> > TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
> > TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_PROGS_O))
> > -include $(TEST_DEP_FILES)
> >
> > $(LIBLIVEUPDATE_O): $(OUTPUT)/%.o: %.c
> >         $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> >
> > $(TEST_PROGS): %: %.o $(LIBLIVEUPDATE_O)
> >         $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $<
> > $(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
> >
> > EXTRA_CLEAN += $(LIBLIVEUPDATE_O)
> > EXTRA_CLEAN += $(TEST_PROGS_O)
> > EXTRA_CLEAN += $(TEST_DEP_FILES)

Took your suggestion, thank you!

^ permalink raw reply

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: Pasha Tatashin @ 2025-11-18  1:08 UTC (permalink / raw)
  To: David Matlack
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRu4hBPz2g-cealt@google.com>

On Mon, Nov 17, 2025 at 7:06 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
>
> > +/* Stage 1: Executed before the kexec reboot. */
> > +static void run_stage_1(int luo_fd)
> > +{
> > +     int session_fd;
> > +
> > +     ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> > +
> > +     ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> > +     create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> > +
> > +     ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> > +                    TEST_SESSION_NAME);
> > +     session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> > +     if (session_fd < 0)
> > +             fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> > +
> > +     if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> > +                                   TEST_MEMFD_DATA) < 0) {
> > +             fail_exit("create_and_preserve_memfd for token %#x",
> > +                       TEST_MEMFD_TOKEN);
> > +     }
> > +
> > +     ksft_print_msg("[STAGE 1] Executing kexec...\n");
> > +     if (system(KEXEC_SCRIPT) != 0)
> > +             fail_exit("kexec script failed");
> > +     exit(EXIT_FAILURE);
>
> Can we separate the kexec from the test and allow the user/automation to
> trigger it however is appropriate for their system? The current
> do_kexec.sh script does not do any sort of graceful shutdown, and I bet
> everyone will have different ways of initiating kexec on their systems.

Yes, this is a good idea, I am going to do what you  suggested:
1. provide stage as argument.
2. allow user to do kexec command

Thank you,
Pasha

^ permalink raw reply

* Re: [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation
From: Pasha Tatashin @ 2025-11-18  2:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoXGYC4GeAoNKPl@kernel.org>

> >  static int luo_session_release(struct inode *inodep, struct file *filep)
> >  {
> >       struct luo_session *session = filep->private_data;
> >       struct luo_session_header *sh;
> > +     int err = 0;
> >
> >       /* If retrieved is set, it means this session is from incoming list */
> > -     if (session->retrieved)
> > +     if (session->retrieved) {
> >               sh = &luo_session_global.incoming;
> > -     else
> > +
> > +             err = luo_session_finish_one(session);
> > +             if (err) {
> > +                     pr_warn("Unable to finish session [%s] on release\n",
> > +                             session->name);
>
>                         return err;
>
> and then else can go away here and luo_session_remove() and
> luo_session_free() can be moved outside if (session->retrieved).

Done.

Thanks,
Pasha

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Pasha Tatashin @ 2025-11-18  3:54 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRrtRfJaaIHw5DZN@kernel.org>

>
> The concept makes sense to me, but it's hard to review the implementation
> without an actual user.

There are three users: we will have HugeTLB support that is going to
be posted as RFC in a few weeks. Also, in two weeks we are going to
have an updated VFIO and IOMMU series posted both using FLBs. In the
mean time, this series provides an FLB in-kernel test that verifies
that multiple FLBs can be attached to File-Handlers, and the basic
interfaces are working.


> > +struct liveupdate_flb {
> > +     const struct liveupdate_flb_ops *ops;
> > +     const char compatible[LIVEUPDATE_FLB_COMPAT_LENGTH];
> > +     struct list_head list;
> > +     void *internal;
>
> Can't list be a part of internal?

Yes, I moved it inside internal, and also, I removed
liveupdate_init_flb function (do that automatically now), and use the
__private as you suggested earlier, and also removed the kmalloc() for
the internal data, so FLBs can be safely used early in boot.

> And don't we usually call this .private rather than .internal?

Renamed.

>
> >  };
> >
> >  #ifdef CONFIG_LIVEUPDATE
> > @@ -111,6 +187,17 @@ int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> >  int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> >                                 struct file *file, u64 *tokenp);
> >
> > +/* Before using FLB for the first time it should be initialized */
> > +int liveupdate_init_flb(struct liveupdate_flb *flb);
> > +
> > +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> > +                         struct liveupdate_flb *flb);
>
> While these are obvious ...
>
> > +
> > +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp);
> > +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj);
> > +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp);
> > +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj);
> > +
>
> ... it's not very clear what these APIs are for and how they are going to be
> used.

Global resource that is accessible either while a file is getting
preserved or anytime during boot.

>
> >  #else /* CONFIG_LIVEUPDATE */
>
> ...
>
> > +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> > +                         struct liveupdate_flb *flb)
> > +{
> > +     struct luo_flb_internal *internal = flb->internal;
> > +     struct luo_flb_link *link __free(kfree) = NULL;
> > +     static DEFINE_MUTEX(register_flb_lock);
> > +     struct liveupdate_flb *gflb;
> > +     struct luo_flb_link *iter;
> > +
> > +     if (!liveupdate_enabled())
> > +             return -EOPNOTSUPP;
> > +
> > +     if (WARN_ON(!h || !flb || !internal))
> > +             return -EINVAL;
> > +
> > +     if (WARN_ON(!flb->ops->preserve || !flb->ops->unpreserve ||
> > +                 !flb->ops->retrieve || !flb->ops->finish)) {
> > +             return -EINVAL;
> > +     }
> > +
> > +     /*
> > +      * Once session/files have been deserialized, FLBs cannot be registered,
> > +      * it is too late. Deserialization uses file handlers, and FLB registers
> > +      * to file handlers.
> > +      */
> > +     if (WARN_ON(luo_session_is_deserialized()))
> > +             return -EBUSY;
> > +
> > +     /*
> > +      * File handler must already be registered, as it is initializes the
> > +      * flb_list
> > +      */
> > +     if (WARN_ON(list_empty(&h->list)))
> > +             return -EINVAL;
> > +
> > +     link = kzalloc(sizeof(*link), GFP_KERNEL);
> > +     if (!link)
> > +             return -ENOMEM;
> > +
> > +     guard(mutex)(&register_flb_lock);
> > +
> > +     /* Check that this FLB is not already linked to this file handler */
> > +     list_for_each_entry(iter, &h->flb_list, list) {
> > +             if (iter->flb == flb)
> > +                     return -EEXIST;
> > +     }
> > +
> > +     /* Is this FLB linked to global list ? */
>
> Maybe:
>
>         /*
>          * If this FLB is not linked to global list it's first time the FLB
>          * is registered
>          */

Done


> > +/**
> > + * liveupdate_flb_incoming_unlock - Unlock an incoming FLB object.
> > + * @flb: The FLB definition.
> > + * @obj: The object that was returned by the _locked call (used for validation).
> > + *
> > + * Releases the internal lock acquired by liveupdate_flb_incoming_locked().
> > + */
> > +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj)
> > +{
> > +     struct luo_flb_internal *internal = flb->internal;
> > +
> > +     lockdep_assert_held(&internal->incoming.lock);
> > +     internal->incoming.obj = obj;
>
> The comment says obj is for validation and here it's assigned to flb.
> Something is off here :)

Thank you for catching stale comment, fixed.

> > +     mutex_unlock(&internal->incoming.lock);
> > +}
> > +
> > +/**
> > + * liveupdate_flb_outgoing_locked - Lock and retrieve the outgoing FLB object.
> > + * @flb:  The FLB definition.
> > + * @objp: Output parameter; will be populated with the live shared object.
> > + *
> > + * Acquires the FLB's internal lock and returns a pointer to its shared live
> > + * object for the outgoing (pre-reboot) path.
> > + *
> > + * This function assumes the object has already been created by the FLB's
> > + * .preserve() callback, which is triggered when the first dependent file
> > + * is preserved.
> > + *
> > + * The caller MUST call liveupdate_flb_outgoing_unlock() to release the lock.
> > + *
> > + * Return: 0 on success, or a negative errno on failure.
> > + */
> > +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp)
> > +{
> > +     struct luo_flb_internal *internal = flb->internal;
> > +
> > +     if (!liveupdate_enabled())
> > +             return -EOPNOTSUPP;
> > +
> > +     if (WARN_ON(!internal))
> > +             return -EINVAL;
> > +
> > +     mutex_lock(&internal->outgoing.lock);
> > +
> > +     /* The object must exist if any file is being preserved */
> > +     if (WARN_ON_ONCE(!internal->outgoing.obj)) {
> > +             mutex_unlock(&internal->outgoing.lock);
> > +             return -ENOENT;
> > +     }
>
> _incoming_locked() and outgoing_locked() are nearly identical, it seems we
> can have the common part in a
> static liveupdate_flb_locked(struct luo_flb_state *state).
>
> liveupdate_flb_incoming_locked() will be oneline wrapper and
> liveupdate_flb_outgoing_locked() will have this WARN_ON if obj is NULL.

Done

^ permalink raw reply

* Re: [PATCH v6 12/20] mm: shmem: allow freezing inode mapping
From: Pasha Tatashin @ 2025-11-18  4:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRr0CQsV16usRW1J@kernel.org>

> > +/* Must be called with inode lock taken exclusive. */
> > +static inline void shmem_i_mapping_freeze(struct inode *inode, bool freeze)
>
> _mapping usually refers to operations on struct address_space.
> It seems that all shmem methods that take inode are just shmem_<operation>,
> so shmem_freeze() looks more appropriate.

Done, renamed to shmem_freeze()

>
> > +{
> > +     if (freeze)
> > +             SHMEM_I(inode)->flags |= SHMEM_F_MAPPING_FROZEN;
> > +     else
> > +             SHMEM_I(inode)->flags &= ~SHMEM_F_MAPPING_FROZEN;
> > +}
> > +
> >  /*
> >   * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
> >   * beyond i_size's notion of EOF, which fallocate has committed to reserving:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 1d5036dec08a..05c3db840257 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1292,7 +1292,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
> >               loff_t newsize = attr->ia_size;
> >
> >               /* protected by i_rwsem */
> > -             if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> > +             if ((info->flags & SHMEM_F_MAPPING_FROZEN) ||
>
> A corner case: if newsize == oldsize this will be a false positive

Added a fix.

Thanks,
Pasha

>
> > +                 (newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> >                   (newsize > oldsize && (info->seals & F_SEAL_GROW)))
> >                       return -EPERM;
> >
> > @@ -3289,6 +3290,10 @@ shmem_write_begin(const struct kiocb *iocb, struct address_space *mapping,
> >                       return -EPERM;
> >       }
> >
> > +     if (unlikely((info->flags & SHMEM_F_MAPPING_FROZEN) &&
> > +                  pos + len > inode->i_size))
> > +             return -EPERM;
> > +
> >       ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE);
> >       if (ret)
> >               return ret;
> > @@ -3662,6 +3667,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >
> >       inode_lock(inode);
> >
> > +     if (info->flags & SHMEM_F_MAPPING_FROZEN) {
> > +             error = -EPERM;
> > +             goto out;
> > +     }
> > +
> >       if (mode & FALLOC_FL_PUNCH_HOLE) {
> >               struct address_space *mapping = file->f_mapping;
> >               loff_t unmap_start = round_up(offset, PAGE_SIZE);
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18  4:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRuODFfqP-qsxa-j@kernel.org>

> You can avoid that complexity if you register the device with a different
> fops, but that's technicality.
>
> Your point about treating the incoming FDT as an underlying resource that
> failed to initialize makes sense, but nevertheless userspace needs a
> reliable way to detect it and parsing dmesg is not something we should rely
> on.

I see two solutions:

1. LUO fails to retrieve the preserved data, the user gets informed by
not finding /dev/liveupdate, and studying the dmesg for what has
happened (in reality in fleets version mismatches should not be
happening, those should be detected in quals).
2. Create a zombie device to return some errno on open, and still
study dmesg to understand what really happened.

I think that 1 is better

Pasha

^ permalink raw reply

* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-18  4:28 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRuPcjyNBZqlZuEm@kernel.org>

On Mon, Nov 17, 2025 at 4:11 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 10:09:28AM -0500, Pasha Tatashin wrote:
> >
> > > > +     }
> > > > +
> > > > +     for (int i = 0; i < sh->header_ser->count; i++) {
> > > > +             struct luo_session *session;
> > > > +
> > > > +             session = luo_session_alloc(sh->ser[i].name);
> > > > +             if (IS_ERR(session)) {
> > > > +                     pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > > > +                             sh->ser[i].name, session);
> > > > +                     return PTR_ERR(session);
> > > > +             }
> > >
> > > The allocated sessions still need to be freed if an insert fails ;-)
> >
> > No. We have failed to deserialize, so anyways the machine will need to
> > be rebooted by the user in order to release the preserved resources.
> >
> > This is something that Jason Gunthrope also mentioned regarding IOMMU:
> > if something is not correct (i.e., if a session cannot finish for some
> > reason), don't add complicated "undo" code that cleans up all
> > resources. Instead, treat them as a memory leak and allow a reboot to
> > perform the cleanup.
> >
> > While in this particular patch the clean-up looks simple, later in the
> > series we are adding file deserialization to each session to this
> > function. So, the clean-up will look like this: we would have to free
> > the resources for each session we deserialized, and also free the
> > resources for files that were deserialized for those sessions, only to
> > still boot into a "maintenance" mode where bunch of resources are not
> > accessible from which the machine would have to be rebooted to get
> > back to a normal state. This code will never be tested, and never be
> > used, so let's use reboot to solve this problem, where devices are
> > going to be properly reset, and memory is going to be properly freed.
>
> A part of this explanation should be a comment in the code.

Done.

>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-18 11:21 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bAEdNE0Rs1i7GdHz8Q3DK9Npozm8sRL8Epa+o50NOMY7A@mail.gmail.com>

On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > You can avoid that complexity if you register the device with a different
> > fops, but that's technicality.
> >
> > Your point about treating the incoming FDT as an underlying resource that
> > failed to initialize makes sense, but nevertheless userspace needs a
> > reliable way to detect it and parsing dmesg is not something we should rely
> > on.
> 
> I see two solutions:
> 
> 1. LUO fails to retrieve the preserved data, the user gets informed by
> not finding /dev/liveupdate, and studying the dmesg for what has
> happened (in reality in fleets version mismatches should not be
> happening, those should be detected in quals).
> 2. Create a zombie device to return some errno on open, and still
> study dmesg to understand what really happened.

User should not study dmesg. We need another solution.
What's wrong with e.g. ioctl()?
 
> I think that 1 is better
> 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Mike Rapoport @ 2025-11-18 11:28 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bBxVNRkJ-8Qv1AzfHEwpxnc4fSxdzKCL_7ku0TMd6Rjow@mail.gmail.com>

On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> >
> > The concept makes sense to me, but it's hard to review the implementation
> > without an actual user.
> 
> There are three users: we will have HugeTLB support that is going to
> be posted as RFC in a few weeks. Also, in two weeks we are going to
> have an updated VFIO and IOMMU series posted both using FLBs. In the
> mean time, this series provides an FLB in-kernel test that verifies
> that multiple FLBs can be attached to File-Handlers, and the basic
> interfaces are working.
 
Which means that essentially there won't be a real kernel user for FLB for
a while.
We usually don't merge dead code because some future patchset depends on
it.

I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
it there until the first user is going to land and then FLB will move
upstream along with that user.

If keeping FLB in mm tree is an issue we can set up an integration tree for
LUO/KHO.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
From: Mike Rapoport @ 2025-11-18 11:30 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bCfPeY558f499JHKN7aekDzsxQkZJ9Uz4e+saR0qtXyfg@mail.gmail.com>

On Mon, Nov 17, 2025 at 02:00:15PM -0500, Pasha Tatashin wrote:
> > >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > index df337c9c4f21..9a531096bdb5 100644
> > > --- a/kernel/liveupdate/luo_file.c
> > > +++ b/kernel/liveupdate/luo_file.c
> > > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > >       INIT_LIST_HEAD(&fh->flb_list);
> > >       list_add_tail(&fh->list, &luo_file_handler_list);
> > >
> > > +     liveupdate_test_register(fh);
> > > +
> >
> > Why this cannot be called from the test?
> 
> Because test does not have access to all file_handlers that are being
> registered with LUO.

Unless I'm missing something, an FLB users registers a file handlers and
let's LUO know that it will need FLB. Why the test can't do the same?
 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 0/2] man7/ip.7: Clarify PKTINFO's docs
From: Alejandro Colomar @ 2025-11-18 13:51 UTC (permalink / raw)
  To: Jakub Głogowski; +Cc: linux-man, LKML, Linux API, ej
In-Reply-To: <cover.1763130571.git.not@dzwdz.net>

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

Hi Jakub,

On Fri, Nov 14, 2025 at 03:29:29PM +0100, Jakub Głogowski wrote:
> I found the PKTINFO docs pretty confusing, so I tried clarifying them:
> - being more specific about each field in the struct
>   (e.g. "local address of the packet" for a received packet could've
>   been interpreted in myriad ways),
> - making the differences between sendmsg(2)'s and recvmsg(2)'s handling
>   of that struct more explicit,
> - and some other slight rewording to make it (IMO) more readable - I cut
>   out most of a paragraph that wasn't really saying anything, etc.
> 
> I'm not sure if this should even be documented in ip(7) together with
> the other sockopts, though?  sendmsg(2)'s handling of in_pktinfo is
> completely unrelated to the IP_PKTINFO sockopt.  Documenting it in its
> own manual page would also give us more room for subsection headings and
> other formatting, examples, etc - instead of trying to cram it into
> what's already an enormous manpage.
> 
> Same goes for some of the other more complex sockopts, I guess.

Do you suggest moving each socket option to a manual page under
man2const/?  I think I agree with that.  There's precedent, and it makes
the pages more readable.

I'll try to do that soon.  I'll ping you when I've finished, in case you
want to apply further changes.

> PS. sorry for not signing this email, but neomutt didn't want to
> cooperate :/  I'll try to figure it out for any followup patches.

Ok.


Have a lovely day!
Alex

> Jakub Głogowski (2):
>   man/man7/ip.7: Clarify PKTINFO's semantics depending on packet
>     direction
>   man/man7/ip.7: Reword IP_PKTINFO's description
> 
>  man/man7/ip.7 | 57 +++++++++++++++++++++++++++------------------------
>  1 file changed, 30 insertions(+), 27 deletions(-)
> 
> -- 
> 2.47.3
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 14:03 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <aRxWvsdv1dQz8oZ4@kernel.org>

On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > You can avoid that complexity if you register the device with a different
> > > fops, but that's technicality.
> > >
> > > Your point about treating the incoming FDT as an underlying resource that
> > > failed to initialize makes sense, but nevertheless userspace needs a
> > > reliable way to detect it and parsing dmesg is not something we should rely
> > > on.
> > 
> > I see two solutions:
> > 
> > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > not finding /dev/liveupdate, and studying the dmesg for what has
> > happened (in reality in fleets version mismatches should not be
> > happening, those should be detected in quals).
> > 2. Create a zombie device to return some errno on open, and still
> > study dmesg to understand what really happened.
> 
> User should not study dmesg. We need another solution.
> What's wrong with e.g. ioctl()?

It seems very dangerous to even boot at all if the next kernel doesn't
understand the serialization information..

IMHO I think we should not even be thinking about this, it is up to
the predecessor environment to prevent it from happening. The ideas to
use ELF metadata/etc to allow a pre-flight validation are the right
solution.

If we get into the next kernel and it receives information it cannot
process it should just BUG_ON and die, or some broad equivalent. 
It is a catastrophic orchestration error, and we don't need some fine
grain recovery or userspace visibility. Crash dump the system and
reboot it.

IOW, I would not invest time in this.

Jason

^ permalink raw reply

* Re: [PATCH 1/2] man/man7/ip.7: Clarify PKTINFO's semantics depending on packet direction
From: Alejandro Colomar @ 2025-11-18 14:31 UTC (permalink / raw)
  To: Jakub Głogowski; +Cc: linux-man, LKML, Linux API, ej
In-Reply-To: <fb3980b64d1c827ad59726bb30761d735396e109.1763130571.git.not@dzwdz.net>

[-- Attachment #1: Type: text/plain, Size: 4944 bytes --]

Hi Jakub,

On Fri, Nov 14, 2025 at 03:29:30PM +0100, Jakub Głogowski wrote:
> For recvmsg(2), ipi_spec_dst is set by ipv4_pktinfo_prepare() to the
> result of fib_compute_sec_dst().  The latter was introduced in
> 	linux.git 35ebf65e851c6d97 ("ipv4: Create and use fib_compute_spec_dst() helper.").
> 
> Quoting its commit message:
> 
> > The specific destination is the host we direct unicast replies to.
> > Usually this is the original packet source address, but if we are
> > responding to a multicast or broadcast packet we have to use something
> > different.
> >
> > Specifically we must use the source address we would use if we were to
> > send a packet to the unicast source of the original packet.
> 
> Experimentation seems to confirm that behavior.
> 
> As for the note about ipi_spec_dst being on a different interface:
> - For unicast packets (for which ipi_spec_dst is the original
>   destination address), I believe this is trivially true because Linux
>   uses the weak host model (unless there's some interaction with
>   RTCF_LOCAL that I'm missing).
> - For multicast/broadcast packets, fib_compute_sec_dst() only passes the
>   original interface to the lookup in the context of L3M.  In
>   particular, the original implementation (cited above) set iif and oof
>   to 0. Also, citing
> 	linux.git e7372197e15856ec ("net/ipv4: Set oif in fib_compute_spec_dst"),
>   > If the device is not enslaved, oif is still 0 so no affect.
> 
> It doesn't seem like using an address specifically from the interface
> the packet was received on was ever the intention.  I've also confirmed
> this behavior (sending a multicast packet from another machine, whose IP
> I've routed to a dummy interface).
> 
> I'm focusing on this because that's a misconception I've had before
> digging into the code - the sendmsg behavior explained in the same
> paragraph made me think ipi_spec_dst was the (primary?) address of
> ipi_ifindex.  I think this is worth clarifying.
> 
> I've made it explicit that ipi_addr isn't used by sendmsg because that's
> another possible misconception.
> 
> The (first) extra comma in sendmsg's ipi_spec_dst's description is meant
> to emphasize that it's used as the local source address _and_ for the
> routing table lookup, as opposed to just affecting the routing table
> lookup.
> Stylistically it might be a bit weird but idk how to convey this better.
> 
> Apart from the cited commits I was referencing the linux-6.17.7 tarball.
> 
> __fib_validate_source (and the comment near it) might also be of
> interest to people trying to figure out what "specific destinations"
> are, exactly.
> 
> Signed-off-by: Jakub Głogowski <not@dzwdz.net>

Thanks!  I've applied the patch.  I've added CC tags (please, copy those
yourself in future patches).

I've also s/PKTINFO/IP_PKTINFO/ in the subject.

And I've applied minor wording and source improvements.

I've pushed here:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=b8f472450f6607e2d5bd68a1b60615a91ed3d111>
(use port 80).

> ---
>  man/man7/ip.7 | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/man/man7/ip.7 b/man/man7/ip.7
> index a92939cd0..a7f118b42 100644
> --- a/man/man7/ip.7
> +++ b/man/man7/ip.7
> @@ -809,12 +809,20 @@ .SS Socket options
>  .EE
>  .in
>  .IP
> +When returned by
> +.BR recvmsg (2) ,
>  .I ipi_ifindex
>  is the unique index of the interface the packet was received on.
>  .I ipi_spec_dst
> -is the local address of the packet and
> +is the preferred source address for replies to the given packet, and
>  .I ipi_addr
>  is the destination address in the packet header.
> +These addresses are usually the same,
> +but can differ for broadcast or multicast packets.
> +Note that, depending on the configured routes,

I've removed 'Note that,'.  It's redundant.  Everything in a manual page
should be noteworthy.


Have a lovely day!
Alex

> +.I ipi_spec_dst
> +might belong to a different interface from the one that received the packet.
> +.IP
>  If
>  .B IP_PKTINFO
>  is passed to
> @@ -822,14 +830,16 @@ .SS Socket options
>  and
>  .\" This field is grossly misnamed
>  .I ipi_spec_dst
> -is not zero, then it is used as the local source address for the routing
> -table lookup and for setting up IP source route options.
> +is not zero, then it is used as the local source address, for the routing
> +table lookup, and for setting up IP source route options.
>  When
>  .I ipi_ifindex
>  is not zero, the primary local address of the interface specified by the
>  index overwrites
>  .I ipi_spec_dst
>  for the routing table lookup.
> +.I ipi_addr
> +is ignored.
>  .IP
>  Not supported for
>  .B SOCK_STREAM
> -- 
> 2.47.3
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-18 15:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118140300.GK10864@nvidia.com>

On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > You can avoid that complexity if you register the device with a different
> > > > fops, but that's technicality.
> > > >
> > > > Your point about treating the incoming FDT as an underlying resource that
> > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > on.
> > > 
> > > I see two solutions:
> > > 
> > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > happened (in reality in fleets version mismatches should not be
> > > happening, those should be detected in quals).
> > > 2. Create a zombie device to return some errno on open, and still
> > > study dmesg to understand what really happened.
> > 
> > User should not study dmesg. We need another solution.
> > What's wrong with e.g. ioctl()?
> 
> It seems very dangerous to even boot at all if the next kernel doesn't
> understand the serialization information..
> 
> IMHO I think we should not even be thinking about this, it is up to
> the predecessor environment to prevent it from happening. The ideas to
> use ELF metadata/etc to allow a pre-flight validation are the right
> solution.
> 
> If we get into the next kernel and it receives information it cannot
> process it should just BUG_ON and die, or some broad equivalent. 
> It is a catastrophic orchestration error, and we don't need some fine
> grain recovery or userspace visibility. Crash dump the system and
> reboot it.

I was under impression Pasha wanted to get up to the userspace no matter
what.

panic() in liveupdate_early_init() makes perfect sense to me. Parsing dmesg
does not.
 
> IOW, I would not invest time in this.
> 
> Jason

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18 15:18 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Jason Gunthorpe, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <aRyLbB8yoQwUJ3dh@kernel.org>

On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > You can avoid that complexity if you register the device with a different
> > > > > fops, but that's technicality.
> > > > >
> > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > on.
> > > >
> > > > I see two solutions:
> > > >
> > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > happened (in reality in fleets version mismatches should not be
> > > > happening, those should be detected in quals).
> > > > 2. Create a zombie device to return some errno on open, and still
> > > > study dmesg to understand what really happened.
> > >
> > > User should not study dmesg. We need another solution.
> > > What's wrong with e.g. ioctl()?
> >
> > It seems very dangerous to even boot at all if the next kernel doesn't
> > understand the serialization information..
> >
> > IMHO I think we should not even be thinking about this, it is up to
> > the predecessor environment to prevent it from happening. The ideas to
> > use ELF metadata/etc to allow a pre-flight validation are the right
> > solution.

100% agreed, this is the goal.

> > If we get into the next kernel and it receives information it cannot
> > process it should just BUG_ON and die, or some broad equivalent.

I initially had a panic() that would kill the kernel, but after
further consideration, I realized that we can still boot into
"maintenance" mode and allow the user to decide when and how to reboot
the machine back to a normal state.

Crashing during early boot has its own disadvantages: the crash kernel
is not available. Also, because live-update has to be very fast, the
console is likely to be disabled. Therefore, getting to userspace and
allowing the user to investigate what happened (e.g., automatically
retrieving dmesg or a core dump and filing a bug) before rebooting
seems like the most sensible approach.

This won't leak data, as /dev/liveupdate is completely disabled, so
nothing preserved in memory will be recoverable.

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 15:36 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <CA+CK2bBFtG3LWmCtLs-5vfS8FYm_r24v=jJra9gOGPKKcs=55g@mail.gmail.com>

On Tue, Nov 18, 2025 at 10:18:28AM -0500, Pasha Tatashin wrote:
> On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > > You can avoid that complexity if you register the device with a different
> > > > > > fops, but that's technicality.
> > > > > >
> > > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > > on.
> > > > >
> > > > > I see two solutions:
> > > > >
> > > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > > happened (in reality in fleets version mismatches should not be
> > > > > happening, those should be detected in quals).
> > > > > 2. Create a zombie device to return some errno on open, and still
> > > > > study dmesg to understand what really happened.
> > > >
> > > > User should not study dmesg. We need another solution.
> > > > What's wrong with e.g. ioctl()?
> > >
> > > It seems very dangerous to even boot at all if the next kernel doesn't
> > > understand the serialization information..
> > >
> > > IMHO I think we should not even be thinking about this, it is up to
> > > the predecessor environment to prevent it from happening. The ideas to
> > > use ELF metadata/etc to allow a pre-flight validation are the right
> > > solution.
> 
> 100% agreed, this is the goal.
> 
> > > If we get into the next kernel and it receives information it cannot
> > > process it should just BUG_ON and die, or some broad equivalent.
> 
> I initially had a panic() that would kill the kernel, but after
> further consideration, I realized that we can still boot into
> "maintenance" mode and allow the user to decide when and how to reboot
> the machine back to a normal state.
 
> This won't leak data, as /dev/liveupdate is completely disabled, so
> nothing preserved in memory will be recoverable.

This seems reasonable, but it is still dangerous.

At the minimum the KHO startup either needs to succeed, panic, or fail
to online most of the memory (ie run from the safe region only)

The above approach works better for things like VFIO or memfd where
you can boot significantly safely. Not sure about iommu though, if
iommu doesn't deserialize properly then it probably corrupts all
memory too.

Jason

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Pasha Tatashin @ 2025-11-18 15:37 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRxYQKrQeP8BzR_2@kernel.org>

On Tue, Nov 18, 2025 at 6:28 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> > >
> > > The concept makes sense to me, but it's hard to review the implementation
> > > without an actual user.
> >
> > There are three users: we will have HugeTLB support that is going to
> > be posted as RFC in a few weeks. Also, in two weeks we are going to
> > have an updated VFIO and IOMMU series posted both using FLBs. In the
> > mean time, this series provides an FLB in-kernel test that verifies
> > that multiple FLBs can be attached to File-Handlers, and the basic
> > interfaces are working.
>
> Which means that essentially there won't be a real kernel user for FLB for
> a while.
> We usually don't merge dead code because some future patchset depends on
> it.

I understand the concern. I would prefer to merge FLB with the rest of
the LUO series; I don't view it as completely dead code since I have
added the in-kernel test that specifically exercises and validates
this API.

> I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
> it there until the first user is going to land and then FLB will move
> upstream along with that user.

My reasoning for pushing for inclusion now is that there are many
developers who currently depend on the FLB functionality. Having it in
a public tree, preferably upstream, or at least linux-next, would be
highly beneficial for their development and testing.

However, to avoid blocking the entire series, I am going to move the
FLB patch and the in-kernel test patch to be the last two patches in
LUOv7.

This way, the rest of the LUO series can be merged without them if
they are blocked, however, in this case it would be best if the two
FLB patches stayed in mm tree to allow VFIO/IOMMU/PCI/HugeTLB
preservation developers to use them, as they all depend on functional
FLB.

Pasha

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pratyush Yadav @ 2025-11-18 15:45 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-2-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). The primary use case is updating hypervisors with minimal
> disruption to running virtual machines. For userspace side of hypervisor
> update we have copyless migration. LUO is for updating the kernel.
>
> This initial patch lays the groundwork for the LUO subsystem.
>
> Further functionality, including the implementation of state transition
> logic, integration with KHO, and hooks for subsystems and file
> descriptors, will be added in subsequent patches.
>
> Create a character device at /dev/liveupdate.
>
> A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> structures. The magic number for IOCTL is registered in
> Documentation/userspace-api/ioctl/ioctl-number.rst.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> new file mode 100644
> index 000000000000..0e1ab19fa1cd
> --- /dev/null
> +++ b/kernel/liveupdate/luo_core.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: Live Update Orchestrator (LUO)
> + *
> + * Live Update is a specialized, kexec-based reboot process that allows a
> + * running kernel to be updated from one version to another while preserving
> + * the state of selected resources and keeping designated hardware devices
> + * operational. For these devices, DMA activity may continue throughout the
> + * kernel transition.
> + *
> + * While the primary use case driving this work is supporting live updates of
> + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> + * Live Patching, which applies security fixes regardless of the workload,
> + * Live Update facilitates a full kernel version upgrade for any type of system.

Nit: I think live update is very different from live patching. It has
very different limitations and advantages. In fact, I view live patching
and live update on two opposite ends of the "applying security patches"
spectrum. I think this line is going to mislead or confuse people.

I think it would better to either spend more lines explaining the
difference between the two, or just drop it from here.

> + *
> + * For example, a non-hypervisor system running an in-memory cache like
> + * memcached with many gigabytes of data can use LUO. The userspace service
> + * can place its cache into a memfd, have its state preserved by LUO, and
> + * restore it immediately after the kernel kexec.
> + *
> + * Whether the system is running virtual machines, containers, a
> + * high-performance database, or networking services, LUO's primary goal is to
> + * enable a full kernel update by preserving critical userspace state and
> + * keeping essential devices operational.
> + *
> + * The core of LUO is a mechanism that tracks the progress of a live update,
> + * along with a callback API that allows other kernel subsystems to participate
> + * in the process. Example subsystems that can hook into LUO include: kvm,
> + * iommu, interrupts, vfio, participating filesystems, and memory management.
> + *
> + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> + * the next kernel. For more details see
> + * Documentation/core-api/kho/concepts.rst.
> + */
> +
[...]
> diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> new file mode 100644
> index 000000000000..44d365185f7c
> --- /dev/null
> +++ b/kernel/liveupdate/luo_ioctl.c
[...]
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Pasha Tatashin");
> +MODULE_DESCRIPTION("Live Update Orchestrator");
> +MODULE_VERSION("0.1");

Nit: do we really need the module version? I don't think LUO can even be
used as a module. What does this number mean then?

Other than these two nitpicks,

Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18 15:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118153631.GB90703@nvidia.com>

> > This won't leak data, as /dev/liveupdate is completely disabled, so
> > nothing preserved in memory will be recoverable.
>
> This seems reasonable, but it is still dangerous.
>
> At the minimum the KHO startup either needs to succeed, panic, or fail
> to online most of the memory (ie run from the safe region only)

Allowing degrade booting using only scratch memory sounds like a very
good compromise. This allows the live-update boot to stay alive as a
sort of "crash kernel," particularly since kdump functionality is not
available here. However, it would require some work in KHO to enable
such a feature.

> The above approach works better for things like VFIO or memfd where
> you can boot significantly safely. Not sure about iommu though, if
> iommu doesn't deserialize properly then it probably corrupts all
> memory too.

Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
broken LUO recovering, the KHO preserved memory should still stay as
preserved but unretriable, so DMA activity should only happen to those
regions...

Pasha

>
> Jason

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-18 16:11 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: jasonmiu, graf, rppt, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <mafs0ecpv4a4q.fsf@kernel.org>

On Tue, Nov 18, 2025 at 10:46 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Sat, Nov 15 2025, Pasha Tatashin wrote:
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec). The primary use case is updating hypervisors with minimal
> > disruption to running virtual machines. For userspace side of hypervisor
> > update we have copyless migration. LUO is for updating the kernel.
> >
> > This initial patch lays the groundwork for the LUO subsystem.
> >
> > Further functionality, including the implementation of state transition
> > logic, integration with KHO, and hooks for subsystems and file
> > descriptors, will be added in subsequent patches.
> >
> > Create a character device at /dev/liveupdate.
> >
> > A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> > structures. The magic number for IOCTL is registered in
> > Documentation/userspace-api/ioctl/ioctl-number.rst.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> [...]
> > diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> > new file mode 100644
> > index 000000000000..0e1ab19fa1cd
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_core.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +/**
> > + * DOC: Live Update Orchestrator (LUO)
> > + *
> > + * Live Update is a specialized, kexec-based reboot process that allows a
> > + * running kernel to be updated from one version to another while preserving
> > + * the state of selected resources and keeping designated hardware devices
> > + * operational. For these devices, DMA activity may continue throughout the
> > + * kernel transition.
> > + *
> > + * While the primary use case driving this work is supporting live updates of
> > + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> > + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> > + * Live Patching, which applies security fixes regardless of the workload,
> > + * Live Update facilitates a full kernel version upgrade for any type of system.
>
> Nit: I think live update is very different from live patching. It has
> very different limitations and advantages. In fact, I view live patching
> and live update on two opposite ends of the "applying security patches"
> spectrum. I think this line is going to mislead or confuse people.
>
> I think it would better to either spend more lines explaining the
> difference between the two, or just drop it from here.

I removed mentioning live-patching.

>
> > + *
> > + * For example, a non-hypervisor system running an in-memory cache like
> > + * memcached with many gigabytes of data can use LUO. The userspace service
> > + * can place its cache into a memfd, have its state preserved by LUO, and
> > + * restore it immediately after the kernel kexec.
> > + *
> > + * Whether the system is running virtual machines, containers, a
> > + * high-performance database, or networking services, LUO's primary goal is to
> > + * enable a full kernel update by preserving critical userspace state and
> > + * keeping essential devices operational.
> > + *
> > + * The core of LUO is a mechanism that tracks the progress of a live update,
> > + * along with a callback API that allows other kernel subsystems to participate
> > + * in the process. Example subsystems that can hook into LUO include: kvm,
> > + * iommu, interrupts, vfio, participating filesystems, and memory management.
> > + *
> > + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> > + * the next kernel. For more details see
> > + * Documentation/core-api/kho/concepts.rst.
> > + */
> > +
> [...]
> > diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> > new file mode 100644
> > index 000000000000..44d365185f7c
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_ioctl.c
> [...]
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Pasha Tatashin");
> > +MODULE_DESCRIPTION("Live Update Orchestrator");
> > +MODULE_VERSION("0.1");
>
> Nit: do we really need the module version? I don't think LUO can even be
> used as a module. What does this number mean then?

Removed the above and also removed liveupdate_exit(). Also changed:
module_init(liveupdate_ioctl_init); to late_initcall(liveupdate_ioctl_init);

> Other than these two nitpicks,
>
> Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

Thank you!

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 16:15 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <CA+CK2bC6sZe1qYd4=KjqDY-eUb95RBPK-Us+-PZbvkrVsvS5Cw@mail.gmail.com>

On Tue, Nov 18, 2025 at 10:46:35AM -0500, Pasha Tatashin wrote:
> > > This won't leak data, as /dev/liveupdate is completely disabled, so
> > > nothing preserved in memory will be recoverable.
> >
> > This seems reasonable, but it is still dangerous.
> >
> > At the minimum the KHO startup either needs to succeed, panic, or fail
> > to online most of the memory (ie run from the safe region only)
> 
> Allowing degrade booting using only scratch memory sounds like a very
> good compromise. This allows the live-update boot to stay alive as a
> sort of "crash kernel," particularly since kdump functionality is not
> available here. However, it would require some work in KHO to enable
> such a feature.
> 
> > The above approach works better for things like VFIO or memfd where
> > you can boot significantly safely. Not sure about iommu though, if
> > iommu doesn't deserialize properly then it probably corrupts all
> > memory too.
> 
> Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
> broken LUO recovering, the KHO preserved memory should still stay as
> preserved but unretriable, so DMA activity should only happen to those
> regions...

If the iommu is not preserved then normal iommu boot will possibly set
the translation the identiy and it will scribble over random memory.

You can't rely on the translation being present and only reaching kho
preserved memroy if the iommu can't restore itself.

Jason

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: Ned Ulbricht @ 2025-11-18 16:33 UTC (permalink / raw)
  To: H. Peter Anvin, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <2846db90-fb05-41d2-b8de-c678af75a04b@zytor.com>

On 11/15/25 16:47, H. Peter Anvin wrote:
> On 2025-11-15 13:29, Ned Ulbricht wrote:
>> |
>> | O_TTY_INIT
>>
>> https://pubs.opengroup.org/onlinepubs/9799919799/
>>
>> That's what motivates my first-glance preference to name this new flag,
>> which will have approximately opposite behavior, as O_TTY_NOINIT.
>>
>> But as a generic abstraction, I more prefer O_KEEP.
>>
> 
> O_KEEP seems a little vague, but O_KEEPCONFIG seems like a decent name.
> 
> It seems like we don't have several new flags:
> 
> 	O_EXEC
> 	O_SEARCH
> 	O_CLOFORK
> 	O_TTY_INIT
> 	O_RSYNC
> 	O_NOCLOBBER
> 
> Some of them *may* be possible to construct with existing Linux options, I'm
> not 100% sure; in particular O_SEARCH might be the same as (O_DIRECTORY|O_PATH).
> 
> O_NOCLOBBER looks like an odd in-between between O_EXCL and
> (O_EXCL|O_NOFOLLOW); stated to be specifically to implement the shell
> "noclobber" semantic.

"(O_EXCL|O_NOFOLLOW)" provokes a thought...

As essential context, fs/open.c build_open_flags() has:

if (flags & O_CREAT) {
	op->intent |= LOOKUP_CREATE;
	if (flags & O_EXCL) {
		op->intent |= LOOKUP_EXCL;
		flags |= O_NOFOLLOW;
	}
}

if (!(flags & O_NOFOLLOW))
	lookup_flags |= LOOKUP_FOLLOW;

So with that context, just imagine hypothetically implementing both a
non-zero O_TTY_INIT flag and an O_KEEPCONFIG flag. What would
build_open_flags() look like to handle the case where userspace
simultaneously asserts both flags?  Even if it's documented, specified
as unspecified behavior, what would the code actually do?

Or, alternatively, should an hypothetical standardization insist that in
any implementation, one of O_TTY_INIT, O_KEEPCONFIG must be #define'd 0?


Ned

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-18 17:31 UTC (permalink / raw)
  To: Ned Ulbricht, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <06279d25-73d6-01f5-dcf8-8667415048d2@netscape.net>

On November 18, 2025 8:33:07 AM PST, Ned Ulbricht <nedu@netscape.net> wrote:
>On 11/15/25 16:47, H. Peter Anvin wrote:
>> On 2025-11-15 13:29, Ned Ulbricht wrote:
>>> |
>>> | O_TTY_INIT
>>> 
>>> https://pubs.opengroup.org/onlinepubs/9799919799/
>>> 
>>> That's what motivates my first-glance preference to name this new flag,
>>> which will have approximately opposite behavior, as O_TTY_NOINIT.
>>> 
>>> But as a generic abstraction, I more prefer O_KEEP.
>>> 
>> 
>> O_KEEP seems a little vague, but O_KEEPCONFIG seems like a decent name.
>> 
>> It seems like we don't have several new flags:
>> 
>> 	O_EXEC
>> 	O_SEARCH
>> 	O_CLOFORK
>> 	O_TTY_INIT
>> 	O_RSYNC
>> 	O_NOCLOBBER
>> 
>> Some of them *may* be possible to construct with existing Linux options, I'm
>> not 100% sure; in particular O_SEARCH might be the same as (O_DIRECTORY|O_PATH).
>> 
>> O_NOCLOBBER looks like an odd in-between between O_EXCL and
>> (O_EXCL|O_NOFOLLOW); stated to be specifically to implement the shell
>> "noclobber" semantic.
>
>"(O_EXCL|O_NOFOLLOW)" provokes a thought...
>
>As essential context, fs/open.c build_open_flags() has:
>
>if (flags & O_CREAT) {
>	op->intent |= LOOKUP_CREATE;
>	if (flags & O_EXCL) {
>		op->intent |= LOOKUP_EXCL;
>		flags |= O_NOFOLLOW;
>	}
>}
>
>if (!(flags & O_NOFOLLOW))
>	lookup_flags |= LOOKUP_FOLLOW;
>
>So with that context, just imagine hypothetically implementing both a
>non-zero O_TTY_INIT flag and an O_KEEPCONFIG flag. What would
>build_open_flags() look like to handle the case where userspace
>simultaneously asserts both flags?  Even if it's documented, specified
>as unspecified behavior, what would the code actually do?
>
>Or, alternatively, should an hypothetical standardization insist that in
>any implementation, one of O_TTY_INIT, O_KEEPCONFIG must be #define'd 0?
>
>
>Ned

It's not actually a contradiction: it means preserve all configuration *except* the minimal termios tweaks required to bring it inside the POSIX compliant envelope, notably setting winsize to "appropriate default parameters."

Linux doesn't have a lot of such settings, but I can see at least one *very useful* one: bringing B19200 and B38400 (EXTA and EXTB), which can be tweaked by setserial, back to their proper baud rates.

There are even two ways to do that: either keep the B19200/B38400 setting and change the baud rate, or keep the baud rate and change termios to match (using BOTHER if necessary; after my changes to glibc 2.42+ BOTHER is a private interface between glibc and the kernel and thus doesn't break POSIX compliance.)

A fairly reasonable implementation would be the first with O_TTY_INIT and the second with O_TTY_INIT | O_KEEPCONFIG.

Flags in termios that probably should be cleared by O_TTY_INIT are CMSPAR, OLCUC, IUCLC, IMAXBEL, ECHOPRT, ECHOKE, FLUSHO, PENDIN, IEXTEN and EXTPROC; I'm not sure about ADDRB, CRTSCTS, IUTF8 or the line discipline; at least with O_KEEPCONFIG at least CRTSCTS ought to be kept I would think, as changing it would have immediate effect visible to 

Obviously an application that wants to absolutely minimize changes must not pass O_TTY_INIT.

The other (and simpler!) option is to simply declare O_KEEPCONFIG | O_TTY_INIT as a reserved bit combination and return -EINVAL until we find a good reason to do anything different.

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: David Matlack @ 2025-11-18 17:38 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-7-pasha.tatashin@soleen.com>

On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
> This patch implements the core mechanism for managing preserved
> files throughout the live update lifecycle. It provides the logic to
> invoke the file handler callbacks (preserve, unpreserve, freeze,
> unfreeze, retrieve, and finish) at the appropriate stages.
> 
> During the reboot phase, luo_file_freeze() serializes the final
> metadata for each file (handler compatible string, token, and data
> handle) into a memory region preserved by KHO. In the new kernel,
> luo_file_deserialize() reconstructs the in-memory file list from this
> data, preparing the session for retrieval.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);

Should there be a way to unregister a file handler?

If VFIO is built as module then I think it  would need to be able to
unregister its file handler when the module is unloaded to avoid leaking
pointers to its text in LUO.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox