Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Safety of resolving untrusted paths with detached mount dirfd
From: Alyssa Ross @ 2025-11-19 13:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Demi Marie Obenour, Aleksa Sarai, Jann Horn, Eric W. Biederman,
	jlayton, Bruce Fields, Al Viro, Arnd Bergmann, shuah,
	David Howells, Andy Lutomirski, Christian Brauner, Tycho Andersen,
	linux-kernel, linux-api

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

Hello,

As we know, it's not safe to use chroot() for resolving untrusted paths
within some root, as a subdirectory could be moved outside of the
process root while walking the path[1].  On the other hand,
LOOKUP_BENEATH is supposed to be robust against this, and going by [2],
it sounds like resolving with the mount namespace root as dirfd should
also be.

My question is: would resolving an untrusted path against a detached
mount root dirfd opened with OPEN_TREE_CLONE (not necessarily a
filesystem root) also be expected to be robust against traversal issues?
i.e. can I rely on an untrusted path never resolving to a path that
isn't under the mount root?

[1]: https://lore.kernel.org/lkml/CAG48ez30WJhbsro2HOc_DR7V91M+hNFzBP5ogRMZaxbAORvqzg@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/C89D720F-3CC4-4FA9-9CBB-E41A67360A6B@amacapital.net/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Wiadomość z księgowości
From: Marek Poradecki @ 2025-11-19  8:55 UTC (permalink / raw)
  To: linux-api

Dzień dobry,

pomagamy przedsiębiorcom wprowadzić model wymiany walut, który minimalizuje wahania kosztów przy rozliczeniach międzynarodowych.

Kiedyv możemy umówić się na 15-minutową rozmowę, aby zaprezentować, jak taki model mógłby działać w Państwa firmie - z gwarancją indywidualnych kursów i pełnym uproszczeniem płatności? Proszę o propozycję dogodnego terminu.

Pozdrawiam
Marek Poradecki

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-19  3:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118232517.GD120075@nvidia.com>

On Tue, Nov 18, 2025 at 6:25 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 05:07:15PM -0500, Pasha Tatashin wrote:
>
> > In this case, we cannot even rely on having "safe" memory, i.e. this
> > scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
> > there a way to avoid defaulting to identify mode when we are booting
> > into the "maintenance" mode?
>
> Maybe one could be created?
>
> It's tricky though because you also really want to block drivers from
> using the iommu if you don't know they are quieted and you can't do
> that without parsing the KHO data, which you can't do because it
> doesn't understand it..
>
> IDK, I think the "maintenance" mode is something that is probably best
> effort and shouldn't be relied on. It will work if the iommu data is
> restored or other lucky conditions hit, so it is not useless, but it
> is certainly not robust or guaranteed.

Right, even kdump has always been best-effort; many types of crashes
do not make it to the crash kernel.

> You are better to squirt a panic message out of the serial port and

For early boot LUO mismatches, or if FLB data is inaccessible for any
reason, devices might go rogue, so triggering a panic during boot is
appropriate.

However, session and file data structures are deserialized later, when
/dev/liveupdate is first opened by userspace. If deserialization fails
at that stage, I think we should simply fail the open(/dev/liveupdate)
call with an error such as -EIO.

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 23:25 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <CA+CK2bCguutAdsXETdDSEPCPT_=OQupgyTfGKQuxi924mOfhTQ@mail.gmail.com>

On Tue, Nov 18, 2025 at 05:07:15PM -0500, Pasha Tatashin wrote:

> In this case, we cannot even rely on having "safe" memory, i.e. this
> scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
> there a way to avoid defaulting to identify mode when we are booting
> into the "maintenance" mode?

Maybe one could be created?

It's tricky though because you also really want to block drivers from
using the iommu if you don't know they are quieted and you can't do
that without parsing the KHO data, which you can't do because it
doesn't understand it..

IDK, I think the "maintenance" mode is something that is probably best
effort and shouldn't be relied on. It will work if the iommu data is
restored or other lucky conditions hit, so it is not useless, but it
is certainly not robust or guaranteed.

You are better to squirt a panic message out of the serial port and
hope for the best I guess.

Jason

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18 22:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118161526.GD90703@nvidia.com>

On Tue, Nov 18, 2025 at 11:15 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 10:46:35AM -0500, Pasha Tatashin wrote:
> > > > This won't leak data, as /dev/liveupdate is completely disabled, so
> > > > nothing preserved in memory will be recoverable.
> > >
> > > This seems reasonable, but it is still dangerous.
> > >
> > > At the minimum the KHO startup either needs to succeed, panic, or fail
> > > to online most of the memory (ie run from the safe region only)
> >
> > Allowing degrade booting using only scratch memory sounds like a very
> > good compromise. This allows the live-update boot to stay alive as a
> > sort of "crash kernel," particularly since kdump functionality is not
> > available here. However, it would require some work in KHO to enable
> > such a feature.
> >
> > > The above approach works better for things like VFIO or memfd where
> > > you can boot significantly safely. Not sure about iommu though, if
> > > iommu doesn't deserialize properly then it probably corrupts all
> > > memory too.
> >
> > Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
> > broken LUO recovering, the KHO preserved memory should still stay as
> > preserved but unretriable, so DMA activity should only happen to those
> > regions...
>
> If the iommu is not preserved then normal iommu boot will possibly set
> the translation the identiy and it will scribble over random memory.
>
> You can't rely on the translation being present and only reaching kho
> preserved memroy if the iommu can't restore itself.

In this case, we cannot even rely on having "safe" memory, i.e. this
scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
there a way to avoid defaulting to identify mode when we are booting
into the "maintenance" mode?

Thanks,
Pasha

>
> Jason

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pasha Tatashin @ 2025-11-18 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, parav, leonro, witu, hughd, skhawaja,
	chrisl
In-Reply-To: <20251118190901.GS10864@nvidia.com>

On Tue, Nov 18, 2025 at 2:09 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 12:58:20PM -0500, Pasha Tatashin wrote:
> > I actually had full unregister functionality in v4 and earlier, but I
> > dropped it from this series to minimize the footprint and get the core
> > infrastructure landed first.
>
> I don't think this will make sense, there are enough error paths we
> can't have registers without unregisters to unwind them.

I will add them back in LUOv7.

>
> Jason

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Jason Gunthorpe @ 2025-11-18 19:09 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, parav, leonro, witu, hughd, skhawaja,
	chrisl
In-Reply-To: <CA+CK2bAqisSdZ7gSBd7=hGd1VbLHX5WXfBazR=rO8BOVCRx3pg@mail.gmail.com>

On Tue, Nov 18, 2025 at 12:58:20PM -0500, Pasha Tatashin wrote:
> I actually had full unregister functionality in v4 and earlier, but I
> dropped it from this series to minimize the footprint and get the core
> infrastructure landed first.

I don't think this will make sense, there are enough error paths we
can't have registers without unregisters to unwind them.

Jason

^ permalink raw reply

* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
From: Pasha Tatashin @ 2025-11-18 18:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRxY53gBbeH-6L0Y@kernel.org>

On Tue, Nov 18, 2025 at 6:31 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 02:00:15PM -0500, Pasha Tatashin wrote:
> > > >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > > index df337c9c4f21..9a531096bdb5 100644
> > > > --- a/kernel/liveupdate/luo_file.c
> > > > +++ b/kernel/liveupdate/luo_file.c
> > > > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > > >       INIT_LIST_HEAD(&fh->flb_list);
> > > >       list_add_tail(&fh->list, &luo_file_handler_list);
> > > >
> > > > +     liveupdate_test_register(fh);
> > > > +
> > >
> > > Why this cannot be called from the test?
> >
> > Because test does not have access to all file_handlers that are being
> > registered with LUO.
>
> Unless I'm missing something, an FLB users registers a file handlers and
> let's LUO know that it will need FLB. Why the test can't do the same?

The test needs to attach to every registered file handler because we
want to ensure that FLB scales and works correctly with any file
handler. For this in-kernel test, there is no need to create our own
file type or to drive it from userspace (where a user would create a
file of that type, preserve it with LUO, so FLB can be allocated and
checked. This in-kernel test is self-sufficient.

> > Pasha
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pratyush Yadav @ 2025-11-18 18:17 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl
In-Reply-To: <CA+CK2bAqisSdZ7gSBd7=hGd1VbLHX5WXfBazR=rO8BOVCRx3pg@mail.gmail.com>

On Tue, Nov 18 2025, Pasha Tatashin wrote:

> On Tue, Nov 18, 2025 at 12:43 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>>
>> On Tue, Nov 18 2025, David Matlack wrote:
>>
>> > On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
>> >> This patch implements the core mechanism for managing preserved
>> >> files throughout the live update lifecycle. It provides the logic to
>> >> invoke the file handler callbacks (preserve, unpreserve, freeze,
>> >> unfreeze, retrieve, and finish) at the appropriate stages.
>> >>
>> >> During the reboot phase, luo_file_freeze() serializes the final
>> >> metadata for each file (handler compatible string, token, and data
>> >> handle) into a memory region preserved by KHO. In the new kernel,
>> >> luo_file_deserialize() reconstructs the in-memory file list from this
>> >> data, preparing the session for retrieval.
>> >>
>> >> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> >
>> >> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
>> >
>> > Should there be a way to unregister a file handler?
>> >
>> > If VFIO is built as module then I think it  would need to be able to
>> > unregister its file handler when the module is unloaded to avoid leaking
>> > pointers to its text in LUO.
>
> I actually had full unregister functionality in v4 and earlier, but I
> dropped it from this series to minimize the footprint and get the core
> infrastructure landed first.
>
> For now, safety is guaranteed because
> liveupdate_register_file_handler() and liveupdate_register_flb() take
> a module reference. This effectively pins any module that registers
> with LUO, meaning those driver modules cannot be unloaded or upgraded
> dynamically, they can only be updated via Live Update or full reboot.

What if liveupdate_register_flb() fails? It would need to unregister its
file handler too, since the file handler can't really work without its
FLB. Shouldn't happen in practice, but still LUO clients need a way to
handle this failure.

[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-18 18:05 UTC (permalink / raw)
  To: Ned Ulbricht, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <06279d25-73d6-01f5-dcf8-8667415048d2@netscape.net>

On 2025-11-18 08:33, Ned Ulbricht wrote:
>>
>> O_NOCLOBBER looks like an odd in-between between O_EXCL and
>> (O_EXCL|O_NOFOLLOW); stated to be specifically to implement the shell
>> "noclobber" semantic.
> 
> "(O_EXCL|O_NOFOLLOW)" provokes a thought...
> 
> As essential context, fs/open.c build_open_flags() has:
> 
> if (flags & O_CREAT) {
>     op->intent |= LOOKUP_CREATE;
>     if (flags & O_EXCL) {
>         op->intent |= LOOKUP_EXCL;
>         flags |= O_NOFOLLOW;
>     }
> }
> 
> if (!(flags & O_NOFOLLOW))
>     lookup_flags |= LOOKUP_FOLLOW;
> 

Interesting. As far as O_NOCLOBBER is concerned, that is an "O_EXCL unless the
output is a special file (device node, FIFO, etc)"; presumably to allow the
shell to not flip out when doing, say "foo > /dev/ttyS0" when in noclobber mode.

I had missed the bit in the spec that says that O_CREAT|O_EXCL is required to
imply O_NOFOLLOW (as Linux indeed does as seen above.)

O_NOCLOBBER emulation in user space would seem to be possible with a loop;
first try to open O_CREAT|O_EXCL and if that fails with EEXIST then open
without either; if that succeeds test with fstat() to see if it is a regular
file, and if it is, close it and error. However, it is hardly ideal, and I
might have overlooked some mechanism by which this may fail.

	-hpa

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pasha Tatashin @ 2025-11-18 17:58 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: David Matlack, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <mafs05xb744pb.fsf@kernel.org>

On Tue, Nov 18, 2025 at 12:43 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Tue, Nov 18 2025, David Matlack wrote:
>
> > On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
> >> This patch implements the core mechanism for managing preserved
> >> files throughout the live update lifecycle. It provides the logic to
> >> invoke the file handler callbacks (preserve, unpreserve, freeze,
> >> unfreeze, retrieve, and finish) at the appropriate stages.
> >>
> >> During the reboot phase, luo_file_freeze() serializes the final
> >> metadata for each file (handler compatible string, token, and data
> >> handle) into a memory region preserved by KHO. In the new kernel,
> >> luo_file_deserialize() reconstructs the in-memory file list from this
> >> data, preparing the session for retrieval.
> >>
> >> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> >
> >> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
> >
> > Should there be a way to unregister a file handler?
> >
> > If VFIO is built as module then I think it  would need to be able to
> > unregister its file handler when the module is unloaded to avoid leaking
> > pointers to its text in LUO.

I actually had full unregister functionality in v4 and earlier, but I
dropped it from this series to minimize the footprint and get the core
infrastructure landed first.

For now, safety is guaranteed because
liveupdate_register_file_handler() and liveupdate_register_flb() take
a module reference. This effectively pins any module that registers
with LUO, meaning those driver modules cannot be unloaded or upgraded
dynamically, they can only be updated via Live Update or full reboot.

I plan to introduce unregister support in a future improvements to
relax this constraint. The design I have in mind is:
1. Unregistration will acquire the singleton lock on /dev/liveupdate
to ensure no new sessions can be created during teardown.
2. Verify that there are no incoming/outgoing sessions.
2.  File-Handler can only be unregistered if there are no FLBs
currently registered against it.

Pasha

> Good point. We also need when using FLB. You would first do
> liveupdate_register_file_handler(), and then do
> liveupdate_register_flb(). If the latter fails, you would want to
> unregister the file handler too.
>
> --
> Regards,
> Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pratyush Yadav @ 2025-11-18 17:43 UTC (permalink / raw)
  To: David Matlack
  Cc: Pasha Tatashin, pratyush, jasonmiu, graf, rppt, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRyvG308oNRVzuN7@google.com>

On Tue, Nov 18 2025, David Matlack wrote:

> On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
>> This patch implements the core mechanism for managing preserved
>> files throughout the live update lifecycle. It provides the logic to
>> invoke the file handler callbacks (preserve, unpreserve, freeze,
>> unfreeze, retrieve, and finish) at the appropriate stages.
>> 
>> During the reboot phase, luo_file_freeze() serializes the final
>> metadata for each file (handler compatible string, token, and data
>> handle) into a memory region preserved by KHO. In the new kernel,
>> luo_file_deserialize() reconstructs the in-memory file list from this
>> data, preparing the session for retrieval.
>> 
>> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
>> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
>
> Should there be a way to unregister a file handler?
>
> If VFIO is built as module then I think it  would need to be able to
> unregister its file handler when the module is unloaded to avoid leaking
> pointers to its text in LUO.

Good point. We also need when using FLB. You would first do
liveupdate_register_file_handler(), and then do
liveupdate_register_flb(). If the latter fails, you would want to
unregister the file handler too.

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: David Matlack @ 2025-11-18 17:38 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-7-pasha.tatashin@soleen.com>

On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
> This patch implements the core mechanism for managing preserved
> files throughout the live update lifecycle. It provides the logic to
> invoke the file handler callbacks (preserve, unpreserve, freeze,
> unfreeze, retrieve, and finish) at the appropriate stages.
> 
> During the reboot phase, luo_file_freeze() serializes the final
> metadata for each file (handler compatible string, token, and data
> handle) into a memory region preserved by KHO. In the new kernel,
> luo_file_deserialize() reconstructs the in-memory file list from this
> data, preparing the session for retrieval.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);

Should there be a way to unregister a file handler?

If VFIO is built as module then I think it  would need to be able to
unregister its file handler when the module is unloaded to avoid leaking
pointers to its text in LUO.

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-18 17:31 UTC (permalink / raw)
  To: Ned Ulbricht, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <06279d25-73d6-01f5-dcf8-8667415048d2@netscape.net>

On November 18, 2025 8:33:07 AM PST, Ned Ulbricht <nedu@netscape.net> wrote:
>On 11/15/25 16:47, H. Peter Anvin wrote:
>> On 2025-11-15 13:29, Ned Ulbricht wrote:
>>> |
>>> | O_TTY_INIT
>>> 
>>> https://pubs.opengroup.org/onlinepubs/9799919799/
>>> 
>>> That's what motivates my first-glance preference to name this new flag,
>>> which will have approximately opposite behavior, as O_TTY_NOINIT.
>>> 
>>> But as a generic abstraction, I more prefer O_KEEP.
>>> 
>> 
>> O_KEEP seems a little vague, but O_KEEPCONFIG seems like a decent name.
>> 
>> It seems like we don't have several new flags:
>> 
>> 	O_EXEC
>> 	O_SEARCH
>> 	O_CLOFORK
>> 	O_TTY_INIT
>> 	O_RSYNC
>> 	O_NOCLOBBER
>> 
>> Some of them *may* be possible to construct with existing Linux options, I'm
>> not 100% sure; in particular O_SEARCH might be the same as (O_DIRECTORY|O_PATH).
>> 
>> O_NOCLOBBER looks like an odd in-between between O_EXCL and
>> (O_EXCL|O_NOFOLLOW); stated to be specifically to implement the shell
>> "noclobber" semantic.
>
>"(O_EXCL|O_NOFOLLOW)" provokes a thought...
>
>As essential context, fs/open.c build_open_flags() has:
>
>if (flags & O_CREAT) {
>	op->intent |= LOOKUP_CREATE;
>	if (flags & O_EXCL) {
>		op->intent |= LOOKUP_EXCL;
>		flags |= O_NOFOLLOW;
>	}
>}
>
>if (!(flags & O_NOFOLLOW))
>	lookup_flags |= LOOKUP_FOLLOW;
>
>So with that context, just imagine hypothetically implementing both a
>non-zero O_TTY_INIT flag and an O_KEEPCONFIG flag. What would
>build_open_flags() look like to handle the case where userspace
>simultaneously asserts both flags?  Even if it's documented, specified
>as unspecified behavior, what would the code actually do?
>
>Or, alternatively, should an hypothetical standardization insist that in
>any implementation, one of O_TTY_INIT, O_KEEPCONFIG must be #define'd 0?
>
>
>Ned

It's not actually a contradiction: it means preserve all configuration *except* the minimal termios tweaks required to bring it inside the POSIX compliant envelope, notably setting winsize to "appropriate default parameters."

Linux doesn't have a lot of such settings, but I can see at least one *very useful* one: bringing B19200 and B38400 (EXTA and EXTB), which can be tweaked by setserial, back to their proper baud rates.

There are even two ways to do that: either keep the B19200/B38400 setting and change the baud rate, or keep the baud rate and change termios to match (using BOTHER if necessary; after my changes to glibc 2.42+ BOTHER is a private interface between glibc and the kernel and thus doesn't break POSIX compliance.)

A fairly reasonable implementation would be the first with O_TTY_INIT and the second with O_TTY_INIT | O_KEEPCONFIG.

Flags in termios that probably should be cleared by O_TTY_INIT are CMSPAR, OLCUC, IUCLC, IMAXBEL, ECHOPRT, ECHOKE, FLUSHO, PENDIN, IEXTEN and EXTPROC; I'm not sure about ADDRB, CRTSCTS, IUTF8 or the line discipline; at least with O_KEEPCONFIG at least CRTSCTS ought to be kept I would think, as changing it would have immediate effect visible to 

Obviously an application that wants to absolutely minimize changes must not pass O_TTY_INIT.

The other (and simpler!) option is to simply declare O_KEEPCONFIG | O_TTY_INIT as a reserved bit combination and return -EINVAL until we find a good reason to do anything different.

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: Ned Ulbricht @ 2025-11-18 16:33 UTC (permalink / raw)
  To: H. Peter Anvin, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <2846db90-fb05-41d2-b8de-c678af75a04b@zytor.com>

On 11/15/25 16:47, H. Peter Anvin wrote:
> On 2025-11-15 13:29, Ned Ulbricht wrote:
>> |
>> | O_TTY_INIT
>>
>> https://pubs.opengroup.org/onlinepubs/9799919799/
>>
>> That's what motivates my first-glance preference to name this new flag,
>> which will have approximately opposite behavior, as O_TTY_NOINIT.
>>
>> But as a generic abstraction, I more prefer O_KEEP.
>>
> 
> O_KEEP seems a little vague, but O_KEEPCONFIG seems like a decent name.
> 
> It seems like we don't have several new flags:
> 
> 	O_EXEC
> 	O_SEARCH
> 	O_CLOFORK
> 	O_TTY_INIT
> 	O_RSYNC
> 	O_NOCLOBBER
> 
> Some of them *may* be possible to construct with existing Linux options, I'm
> not 100% sure; in particular O_SEARCH might be the same as (O_DIRECTORY|O_PATH).
> 
> O_NOCLOBBER looks like an odd in-between between O_EXCL and
> (O_EXCL|O_NOFOLLOW); stated to be specifically to implement the shell
> "noclobber" semantic.

"(O_EXCL|O_NOFOLLOW)" provokes a thought...

As essential context, fs/open.c build_open_flags() has:

if (flags & O_CREAT) {
	op->intent |= LOOKUP_CREATE;
	if (flags & O_EXCL) {
		op->intent |= LOOKUP_EXCL;
		flags |= O_NOFOLLOW;
	}
}

if (!(flags & O_NOFOLLOW))
	lookup_flags |= LOOKUP_FOLLOW;

So with that context, just imagine hypothetically implementing both a
non-zero O_TTY_INIT flag and an O_KEEPCONFIG flag. What would
build_open_flags() look like to handle the case where userspace
simultaneously asserts both flags?  Even if it's documented, specified
as unspecified behavior, what would the code actually do?

Or, alternatively, should an hypothetical standardization insist that in
any implementation, one of O_TTY_INIT, O_KEEPCONFIG must be #define'd 0?


Ned

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 16:15 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <CA+CK2bC6sZe1qYd4=KjqDY-eUb95RBPK-Us+-PZbvkrVsvS5Cw@mail.gmail.com>

On Tue, Nov 18, 2025 at 10:46:35AM -0500, Pasha Tatashin wrote:
> > > This won't leak data, as /dev/liveupdate is completely disabled, so
> > > nothing preserved in memory will be recoverable.
> >
> > This seems reasonable, but it is still dangerous.
> >
> > At the minimum the KHO startup either needs to succeed, panic, or fail
> > to online most of the memory (ie run from the safe region only)
> 
> Allowing degrade booting using only scratch memory sounds like a very
> good compromise. This allows the live-update boot to stay alive as a
> sort of "crash kernel," particularly since kdump functionality is not
> available here. However, it would require some work in KHO to enable
> such a feature.
> 
> > The above approach works better for things like VFIO or memfd where
> > you can boot significantly safely. Not sure about iommu though, if
> > iommu doesn't deserialize properly then it probably corrupts all
> > memory too.
> 
> Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
> broken LUO recovering, the KHO preserved memory should still stay as
> preserved but unretriable, so DMA activity should only happen to those
> regions...

If the iommu is not preserved then normal iommu boot will possibly set
the translation the identiy and it will scribble over random memory.

You can't rely on the translation being present and only reaching kho
preserved memroy if the iommu can't restore itself.

Jason

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-18 16:11 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: jasonmiu, graf, rppt, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <mafs0ecpv4a4q.fsf@kernel.org>

On Tue, Nov 18, 2025 at 10:46 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Sat, Nov 15 2025, Pasha Tatashin wrote:
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec). The primary use case is updating hypervisors with minimal
> > disruption to running virtual machines. For userspace side of hypervisor
> > update we have copyless migration. LUO is for updating the kernel.
> >
> > This initial patch lays the groundwork for the LUO subsystem.
> >
> > Further functionality, including the implementation of state transition
> > logic, integration with KHO, and hooks for subsystems and file
> > descriptors, will be added in subsequent patches.
> >
> > Create a character device at /dev/liveupdate.
> >
> > A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> > structures. The magic number for IOCTL is registered in
> > Documentation/userspace-api/ioctl/ioctl-number.rst.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> [...]
> > diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> > new file mode 100644
> > index 000000000000..0e1ab19fa1cd
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_core.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +/**
> > + * DOC: Live Update Orchestrator (LUO)
> > + *
> > + * Live Update is a specialized, kexec-based reboot process that allows a
> > + * running kernel to be updated from one version to another while preserving
> > + * the state of selected resources and keeping designated hardware devices
> > + * operational. For these devices, DMA activity may continue throughout the
> > + * kernel transition.
> > + *
> > + * While the primary use case driving this work is supporting live updates of
> > + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> > + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> > + * Live Patching, which applies security fixes regardless of the workload,
> > + * Live Update facilitates a full kernel version upgrade for any type of system.
>
> Nit: I think live update is very different from live patching. It has
> very different limitations and advantages. In fact, I view live patching
> and live update on two opposite ends of the "applying security patches"
> spectrum. I think this line is going to mislead or confuse people.
>
> I think it would better to either spend more lines explaining the
> difference between the two, or just drop it from here.

I removed mentioning live-patching.

>
> > + *
> > + * For example, a non-hypervisor system running an in-memory cache like
> > + * memcached with many gigabytes of data can use LUO. The userspace service
> > + * can place its cache into a memfd, have its state preserved by LUO, and
> > + * restore it immediately after the kernel kexec.
> > + *
> > + * Whether the system is running virtual machines, containers, a
> > + * high-performance database, or networking services, LUO's primary goal is to
> > + * enable a full kernel update by preserving critical userspace state and
> > + * keeping essential devices operational.
> > + *
> > + * The core of LUO is a mechanism that tracks the progress of a live update,
> > + * along with a callback API that allows other kernel subsystems to participate
> > + * in the process. Example subsystems that can hook into LUO include: kvm,
> > + * iommu, interrupts, vfio, participating filesystems, and memory management.
> > + *
> > + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> > + * the next kernel. For more details see
> > + * Documentation/core-api/kho/concepts.rst.
> > + */
> > +
> [...]
> > diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> > new file mode 100644
> > index 000000000000..44d365185f7c
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_ioctl.c
> [...]
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Pasha Tatashin");
> > +MODULE_DESCRIPTION("Live Update Orchestrator");
> > +MODULE_VERSION("0.1");
>
> Nit: do we really need the module version? I don't think LUO can even be
> used as a module. What does this number mean then?

Removed the above and also removed liveupdate_exit(). Also changed:
module_init(liveupdate_ioctl_init); to late_initcall(liveupdate_ioctl_init);

> Other than these two nitpicks,
>
> Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

Thank you!

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18 15:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118153631.GB90703@nvidia.com>

> > This won't leak data, as /dev/liveupdate is completely disabled, so
> > nothing preserved in memory will be recoverable.
>
> This seems reasonable, but it is still dangerous.
>
> At the minimum the KHO startup either needs to succeed, panic, or fail
> to online most of the memory (ie run from the safe region only)

Allowing degrade booting using only scratch memory sounds like a very
good compromise. This allows the live-update boot to stay alive as a
sort of "crash kernel," particularly since kdump functionality is not
available here. However, it would require some work in KHO to enable
such a feature.

> The above approach works better for things like VFIO or memfd where
> you can boot significantly safely. Not sure about iommu though, if
> iommu doesn't deserialize properly then it probably corrupts all
> memory too.

Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
broken LUO recovering, the KHO preserved memory should still stay as
preserved but unretriable, so DMA activity should only happen to those
regions...

Pasha

>
> Jason

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pratyush Yadav @ 2025-11-18 15:45 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-2-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). The primary use case is updating hypervisors with minimal
> disruption to running virtual machines. For userspace side of hypervisor
> update we have copyless migration. LUO is for updating the kernel.
>
> This initial patch lays the groundwork for the LUO subsystem.
>
> Further functionality, including the implementation of state transition
> logic, integration with KHO, and hooks for subsystems and file
> descriptors, will be added in subsequent patches.
>
> Create a character device at /dev/liveupdate.
>
> A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> structures. The magic number for IOCTL is registered in
> Documentation/userspace-api/ioctl/ioctl-number.rst.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> new file mode 100644
> index 000000000000..0e1ab19fa1cd
> --- /dev/null
> +++ b/kernel/liveupdate/luo_core.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: Live Update Orchestrator (LUO)
> + *
> + * Live Update is a specialized, kexec-based reboot process that allows a
> + * running kernel to be updated from one version to another while preserving
> + * the state of selected resources and keeping designated hardware devices
> + * operational. For these devices, DMA activity may continue throughout the
> + * kernel transition.
> + *
> + * While the primary use case driving this work is supporting live updates of
> + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> + * Live Patching, which applies security fixes regardless of the workload,
> + * Live Update facilitates a full kernel version upgrade for any type of system.

Nit: I think live update is very different from live patching. It has
very different limitations and advantages. In fact, I view live patching
and live update on two opposite ends of the "applying security patches"
spectrum. I think this line is going to mislead or confuse people.

I think it would better to either spend more lines explaining the
difference between the two, or just drop it from here.

> + *
> + * For example, a non-hypervisor system running an in-memory cache like
> + * memcached with many gigabytes of data can use LUO. The userspace service
> + * can place its cache into a memfd, have its state preserved by LUO, and
> + * restore it immediately after the kernel kexec.
> + *
> + * Whether the system is running virtual machines, containers, a
> + * high-performance database, or networking services, LUO's primary goal is to
> + * enable a full kernel update by preserving critical userspace state and
> + * keeping essential devices operational.
> + *
> + * The core of LUO is a mechanism that tracks the progress of a live update,
> + * along with a callback API that allows other kernel subsystems to participate
> + * in the process. Example subsystems that can hook into LUO include: kvm,
> + * iommu, interrupts, vfio, participating filesystems, and memory management.
> + *
> + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> + * the next kernel. For more details see
> + * Documentation/core-api/kho/concepts.rst.
> + */
> +
[...]
> diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> new file mode 100644
> index 000000000000..44d365185f7c
> --- /dev/null
> +++ b/kernel/liveupdate/luo_ioctl.c
[...]
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Pasha Tatashin");
> +MODULE_DESCRIPTION("Live Update Orchestrator");
> +MODULE_VERSION("0.1");

Nit: do we really need the module version? I don't think LUO can even be
used as a module. What does this number mean then?

Other than these two nitpicks,

Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Pasha Tatashin @ 2025-11-18 15:37 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRxYQKrQeP8BzR_2@kernel.org>

On Tue, Nov 18, 2025 at 6:28 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> > >
> > > The concept makes sense to me, but it's hard to review the implementation
> > > without an actual user.
> >
> > There are three users: we will have HugeTLB support that is going to
> > be posted as RFC in a few weeks. Also, in two weeks we are going to
> > have an updated VFIO and IOMMU series posted both using FLBs. In the
> > mean time, this series provides an FLB in-kernel test that verifies
> > that multiple FLBs can be attached to File-Handlers, and the basic
> > interfaces are working.
>
> Which means that essentially there won't be a real kernel user for FLB for
> a while.
> We usually don't merge dead code because some future patchset depends on
> it.

I understand the concern. I would prefer to merge FLB with the rest of
the LUO series; I don't view it as completely dead code since I have
added the in-kernel test that specifically exercises and validates
this API.

> I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
> it there until the first user is going to land and then FLB will move
> upstream along with that user.

My reasoning for pushing for inclusion now is that there are many
developers who currently depend on the FLB functionality. Having it in
a public tree, preferably upstream, or at least linux-next, would be
highly beneficial for their development and testing.

However, to avoid blocking the entire series, I am going to move the
FLB patch and the in-kernel test patch to be the last two patches in
LUOv7.

This way, the rest of the LUO series can be merged without them if
they are blocked, however, in this case it would be best if the two
FLB patches stayed in mm tree to allow VFIO/IOMMU/PCI/HugeTLB
preservation developers to use them, as they all depend on functional
FLB.

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 15:36 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <CA+CK2bBFtG3LWmCtLs-5vfS8FYm_r24v=jJra9gOGPKKcs=55g@mail.gmail.com>

On Tue, Nov 18, 2025 at 10:18:28AM -0500, Pasha Tatashin wrote:
> On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > > You can avoid that complexity if you register the device with a different
> > > > > > fops, but that's technicality.
> > > > > >
> > > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > > on.
> > > > >
> > > > > I see two solutions:
> > > > >
> > > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > > happened (in reality in fleets version mismatches should not be
> > > > > happening, those should be detected in quals).
> > > > > 2. Create a zombie device to return some errno on open, and still
> > > > > study dmesg to understand what really happened.
> > > >
> > > > User should not study dmesg. We need another solution.
> > > > What's wrong with e.g. ioctl()?
> > >
> > > It seems very dangerous to even boot at all if the next kernel doesn't
> > > understand the serialization information..
> > >
> > > IMHO I think we should not even be thinking about this, it is up to
> > > the predecessor environment to prevent it from happening. The ideas to
> > > use ELF metadata/etc to allow a pre-flight validation are the right
> > > solution.
> 
> 100% agreed, this is the goal.
> 
> > > If we get into the next kernel and it receives information it cannot
> > > process it should just BUG_ON and die, or some broad equivalent.
> 
> I initially had a panic() that would kill the kernel, but after
> further consideration, I realized that we can still boot into
> "maintenance" mode and allow the user to decide when and how to reboot
> the machine back to a normal state.
 
> This won't leak data, as /dev/liveupdate is completely disabled, so
> nothing preserved in memory will be recoverable.

This seems reasonable, but it is still dangerous.

At the minimum the KHO startup either needs to succeed, panic, or fail
to online most of the memory (ie run from the safe region only)

The above approach works better for things like VFIO or memfd where
you can boot significantly safely. Not sure about iommu though, if
iommu doesn't deserialize properly then it probably corrupts all
memory too.

Jason

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-18 15:18 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Jason Gunthorpe, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <aRyLbB8yoQwUJ3dh@kernel.org>

On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > You can avoid that complexity if you register the device with a different
> > > > > fops, but that's technicality.
> > > > >
> > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > on.
> > > >
> > > > I see two solutions:
> > > >
> > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > happened (in reality in fleets version mismatches should not be
> > > > happening, those should be detected in quals).
> > > > 2. Create a zombie device to return some errno on open, and still
> > > > study dmesg to understand what really happened.
> > >
> > > User should not study dmesg. We need another solution.
> > > What's wrong with e.g. ioctl()?
> >
> > It seems very dangerous to even boot at all if the next kernel doesn't
> > understand the serialization information..
> >
> > IMHO I think we should not even be thinking about this, it is up to
> > the predecessor environment to prevent it from happening. The ideas to
> > use ELF metadata/etc to allow a pre-flight validation are the right
> > solution.

100% agreed, this is the goal.

> > If we get into the next kernel and it receives information it cannot
> > process it should just BUG_ON and die, or some broad equivalent.

I initially had a panic() that would kill the kernel, but after
further consideration, I realized that we can still boot into
"maintenance" mode and allow the user to decide when and how to reboot
the machine back to a normal state.

Crashing during early boot has its own disadvantages: the crash kernel
is not available. Also, because live-update has to be very fast, the
console is likely to be disabled. Therefore, getting to userspace and
allowing the user to investigate what happened (e.g., automatically
retrieving dmesg or a core dump and filing a bug) before rebooting
seems like the most sensible approach.

This won't leak data, as /dev/liveupdate is completely disabled, so
nothing preserved in memory will be recoverable.

Pasha

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-18 15:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <20251118140300.GK10864@nvidia.com>

On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > You can avoid that complexity if you register the device with a different
> > > > fops, but that's technicality.
> > > >
> > > > Your point about treating the incoming FDT as an underlying resource that
> > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > on.
> > > 
> > > I see two solutions:
> > > 
> > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > happened (in reality in fleets version mismatches should not be
> > > happening, those should be detected in quals).
> > > 2. Create a zombie device to return some errno on open, and still
> > > study dmesg to understand what really happened.
> > 
> > User should not study dmesg. We need another solution.
> > What's wrong with e.g. ioctl()?
> 
> It seems very dangerous to even boot at all if the next kernel doesn't
> understand the serialization information..
> 
> IMHO I think we should not even be thinking about this, it is up to
> the predecessor environment to prevent it from happening. The ideas to
> use ELF metadata/etc to allow a pre-flight validation are the right
> solution.
> 
> If we get into the next kernel and it receives information it cannot
> process it should just BUG_ON and die, or some broad equivalent. 
> It is a catastrophic orchestration error, and we don't need some fine
> grain recovery or userspace visibility. Crash dump the system and
> reboot it.

I was under impression Pasha wanted to get up to the userspace no matter
what.

panic() in liveupdate_early_init() makes perfect sense to me. Parsing dmesg
does not.
 
> IOW, I would not invest time in this.
> 
> Jason

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 1/2] man/man7/ip.7: Clarify PKTINFO's semantics depending on packet direction
From: Alejandro Colomar @ 2025-11-18 14:31 UTC (permalink / raw)
  To: Jakub Głogowski; +Cc: linux-man, LKML, Linux API, ej
In-Reply-To: <fb3980b64d1c827ad59726bb30761d735396e109.1763130571.git.not@dzwdz.net>

[-- Attachment #1: Type: text/plain, Size: 4944 bytes --]

Hi Jakub,

On Fri, Nov 14, 2025 at 03:29:30PM +0100, Jakub Głogowski wrote:
> For recvmsg(2), ipi_spec_dst is set by ipv4_pktinfo_prepare() to the
> result of fib_compute_sec_dst().  The latter was introduced in
> 	linux.git 35ebf65e851c6d97 ("ipv4: Create and use fib_compute_spec_dst() helper.").
> 
> Quoting its commit message:
> 
> > The specific destination is the host we direct unicast replies to.
> > Usually this is the original packet source address, but if we are
> > responding to a multicast or broadcast packet we have to use something
> > different.
> >
> > Specifically we must use the source address we would use if we were to
> > send a packet to the unicast source of the original packet.
> 
> Experimentation seems to confirm that behavior.
> 
> As for the note about ipi_spec_dst being on a different interface:
> - For unicast packets (for which ipi_spec_dst is the original
>   destination address), I believe this is trivially true because Linux
>   uses the weak host model (unless there's some interaction with
>   RTCF_LOCAL that I'm missing).
> - For multicast/broadcast packets, fib_compute_sec_dst() only passes the
>   original interface to the lookup in the context of L3M.  In
>   particular, the original implementation (cited above) set iif and oof
>   to 0. Also, citing
> 	linux.git e7372197e15856ec ("net/ipv4: Set oif in fib_compute_spec_dst"),
>   > If the device is not enslaved, oif is still 0 so no affect.
> 
> It doesn't seem like using an address specifically from the interface
> the packet was received on was ever the intention.  I've also confirmed
> this behavior (sending a multicast packet from another machine, whose IP
> I've routed to a dummy interface).
> 
> I'm focusing on this because that's a misconception I've had before
> digging into the code - the sendmsg behavior explained in the same
> paragraph made me think ipi_spec_dst was the (primary?) address of
> ipi_ifindex.  I think this is worth clarifying.
> 
> I've made it explicit that ipi_addr isn't used by sendmsg because that's
> another possible misconception.
> 
> The (first) extra comma in sendmsg's ipi_spec_dst's description is meant
> to emphasize that it's used as the local source address _and_ for the
> routing table lookup, as opposed to just affecting the routing table
> lookup.
> Stylistically it might be a bit weird but idk how to convey this better.
> 
> Apart from the cited commits I was referencing the linux-6.17.7 tarball.
> 
> __fib_validate_source (and the comment near it) might also be of
> interest to people trying to figure out what "specific destinations"
> are, exactly.
> 
> Signed-off-by: Jakub Głogowski <not@dzwdz.net>

Thanks!  I've applied the patch.  I've added CC tags (please, copy those
yourself in future patches).

I've also s/PKTINFO/IP_PKTINFO/ in the subject.

And I've applied minor wording and source improvements.

I've pushed here:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=b8f472450f6607e2d5bd68a1b60615a91ed3d111>
(use port 80).

> ---
>  man/man7/ip.7 | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/man/man7/ip.7 b/man/man7/ip.7
> index a92939cd0..a7f118b42 100644
> --- a/man/man7/ip.7
> +++ b/man/man7/ip.7
> @@ -809,12 +809,20 @@ .SS Socket options
>  .EE
>  .in
>  .IP
> +When returned by
> +.BR recvmsg (2) ,
>  .I ipi_ifindex
>  is the unique index of the interface the packet was received on.
>  .I ipi_spec_dst
> -is the local address of the packet and
> +is the preferred source address for replies to the given packet, and
>  .I ipi_addr
>  is the destination address in the packet header.
> +These addresses are usually the same,
> +but can differ for broadcast or multicast packets.
> +Note that, depending on the configured routes,

I've removed 'Note that,'.  It's redundant.  Everything in a manual page
should be noteworthy.


Have a lovely day!
Alex

> +.I ipi_spec_dst
> +might belong to a different interface from the one that received the packet.
> +.IP
>  If
>  .B IP_PKTINFO
>  is passed to
> @@ -822,14 +830,16 @@ .SS Socket options
>  and
>  .\" This field is grossly misnamed
>  .I ipi_spec_dst
> -is not zero, then it is used as the local source address for the routing
> -table lookup and for setting up IP source route options.
> +is not zero, then it is used as the local source address, for the routing
> +table lookup, and for setting up IP source route options.
>  When
>  .I ipi_ifindex
>  is not zero, the primary local address of the interface specified by the
>  index overwrites
>  .I ipi_spec_dst
>  for the routing table lookup.
> +.I ipi_addr
> +is ignored.
>  .IP
>  Not supported for
>  .B SOCK_STREAM
> -- 
> 2.47.3
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe @ 2025-11-18 14:03 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
	skhawaja, chrisl
In-Reply-To: <aRxWvsdv1dQz8oZ4@kernel.org>

On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > You can avoid that complexity if you register the device with a different
> > > fops, but that's technicality.
> > >
> > > Your point about treating the incoming FDT as an underlying resource that
> > > failed to initialize makes sense, but nevertheless userspace needs a
> > > reliable way to detect it and parsing dmesg is not something we should rely
> > > on.
> > 
> > I see two solutions:
> > 
> > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > not finding /dev/liveupdate, and studying the dmesg for what has
> > happened (in reality in fleets version mismatches should not be
> > happening, those should be detected in quals).
> > 2. Create a zombie device to return some errno on open, and still
> > study dmesg to understand what really happened.
> 
> User should not study dmesg. We need another solution.
> What's wrong with e.g. ioctl()?

It seems very dangerous to even boot at all if the next kernel doesn't
understand the serialization information..

IMHO I think we should not even be thinking about this, it is up to
the predecessor environment to prevent it from happening. The ideas to
use ELF metadata/etc to allow a pre-flight validation are the right
solution.

If we get into the next kernel and it receives information it cannot
process it should just BUG_ON and die, or some broad equivalent. 
It is a catastrophic orchestration error, and we don't need some fine
grain recovery or userspace visibility. Crash dump the system and
reboot it.

IOW, I would not invest time in this.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox