Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-12 15:14 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRSMsz4zy8QBbsIH@kernel.org>

> > FLB global objects act similarly to subsystem-wide data, except their
> > data has a clear creation and destruction time tied to preserved
> > files. When the first file of a particular type is added to LUO, this
> > global data is created; when the last file of that type is removed
> > (unpreserved or finished), this global data is destroyed, this is why
> > its life is bound to file lifecycle. Crucially, this global data is
> > accessible at any time while LUO owns the associated files spanning
> > the early boot update boundary.
>
> But there are no files at mm_core_init(). I'm really confused here.

This isn't about the files themselves, but about the subsystem global
data. The files are only used to describe the lifetime of this global
data.

I think mm_core_init() is too late, and the call would need to be
moved earlier to work correctly with subsystems. At the very least, we
will have to add some early FDT parsing to retrieve data during early
boot, but that would be part of the HugeTLB preservation work.

I can move liveupdate_init() inside kho_memory_init(), so we don't
need to modify mm_core_init(). Or, rename kho_memory_init to
kho_and_liveupdate_memory_init() and combine the two calls into a
single function in kexec_handover.c.

> > > So I think for now we can move liveupdate_init() later in boot and we will
> > > solve the problem of hugetlb reservations when we add support for hugetlb.
> >
> > HugeTLB reserves memory early in boot. If we already have preserved
> > HugeTLB pages via LUO/KHO, we must ensure they are counted against the
> > boot-time reservation. For example, if hugetlb_cma_reserve() needs to
> > reserve ten 1G pages, but LUO has already preserved seven, we only
> > need to reserve three new pages and the rest are going to be restored
> > with the files.
> >
> > Since this count is contained in the FLB global object, that data
> > needs to be available during the early reservation phase. (Pratyush is
> > working on HugeTLB preservation and can explain further).
>
> Not sure I really follow the design here, but in my understanding the gist
> here is that hugetlb reservations need to be aware of the preserved state.
> If that's the case, we definitely can move liveupdate_init() to an initcall
> and revisit this when hugetlb support for luo comes along.

This will break the in-kernel tests that ensure FLB data is accessible
and works correctly during early boot, as they use
early_initcall(liveupdate_test_early_init);.

We cannot rely on early_initcall() for liveupdate_init() because it
would compete with the test. We also can't move the test to a later
initcall, as that would break the verification of what FLB is
promising: early access to global data by subsystems that need it
(PCI, IOMMU Core, HugeTLB, etc.).

Thanks,
Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-12 14:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRSKrxfAb_GG_2Mw@kernel.org>

On Wed, Nov 12, 2025 at 8:25 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi Pasha,
>
> On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> > Hi Mike,
> >
> > Thank you for review, my comments below:
> >
> > > > This is why this call is placed first in reboot(), before any
> > > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > > an allocation problem occurs in KHO, the error is simply reported back
> > > > to userspace, and the live update update is safely aborted.
>
> The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
> move it there?

Yes, I can move that call into kernel_kexec().

> And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
> the error value before returning it to userspace. Why kernel_kexec() can't
> do the same?

We could do that. It would look something like this:

if (liveupdate_enabled())
   kho_finalize();

Because we want to do kho_finalize() from kernel_kexec only when we do
live update.

> > > This is fine. But what I don't like is that we can't use kho without
> > > liveupdate. We are making debugfs optional, we have a way to call

This is exactly the fix I proposed:

1. When live-update is enabled, always disable "finalize" debugfs API.
2. When live-update is disabled, always enable "finalize" debugfs API.

Once KHO is stateless the "finalize" debugfs API is going to be
removed, and KHO debugfs itself can be optional.

> > Yes you can: you can disable liveupdate (i.e. not supply liveupdate=1
> > via kernel parameter) and use KHO the old way: drive it from the
> > userspace. However, if liveupdate is enabled, liveupdate becomes the
> > driver of KHO as unfortunately KHO has these weird states at the
> > moment.
>
> The "weird state" is the point where KHO builds its FDT. Replacing the
> current memory tracker with one that does not require serialization won't
> change it. We still need a way to tell KHO that "there won't be new nodes
> in FDT, pack it".
>

see my answer below

> > > kho_finalize() on the reboot path and it does not seem an issue to do it
> > > even without liveupdate. But then we force kho_finalize() into
> > > liveupdate_reboot() allowing weird configurations where kho is there but
> > > it's unusable.
> >
> > What do you mean KHO is there but unusable, we should not have such a state...
>
> If you compile a kernel with KEXEC_HANDOVER=y, KEXEC_HANDOVER_DEBUGFS=n and
> LIVEUPDATE=n and boot with kho=1 there is nothing to trigger
> kho_finalize().
>
> > > What I'd like to see is that we can finalize KHO on kexec reboot path even
> > > when liveupdate is not compiled and until then the patch that makes KHO
> > > debugfs optional should not go further IMO.
> > >
> > > Another thing I didn't check in this series yet is how finalization driven
> > > from debugfs interacts with liveupdate internal handling?
> >
> > I think what we can do is the following:
> > - Remove "Kconfig: make debugfs optional" from this series, and
> > instead make that change as part of stateless KHO work.
> > - This will ensure that when liveupdate=0 always KHO finalize is fully
> > support the old way.
> > - When liveupdate=1 always disable KHO debugfs "finalize" API, and
> > allow liveupdate to drive it automatically. It would add another
> > liveupdate_enable() check to KHO, and is going to be removed as part
> > of stateless KHO work.
>
> KHO should not call into liveupdate. That's layering violation.
> And "stateless KHO" does not really make it stateless, it only removes the
> memory serialization from kho_finalize(), but it's still required to pack
> the FDT.

This touches on a point I've raised in the KHO sync meetings: to be
effective, the "stateless KHO" work must also make subtree add/remove
stateless. There should not be a separate "finalize" state just to
finish the FDT. The KHO FDT is tiny (only one page), and there are
only a handful of subtrees. Adding and removing subtrees is cheap; we
should be able to open FDT, modify it, and finish FDT on every
operation. There's no need for a special finalization state at kexec
time. KHO should be totally stateless.

> I think we should allow kho finalization in some form from kernel_kexec().

If we achieve that, we wouldn't need a kho_finalize() call from
kernel_kexec() at all. All KHO operations should be allowed at any
time once KHO is initialized, and they shouldn't depend on the machine
state. So, even late in shutdown or early in boot, it should be
possible to preserve KHO memory or a subtree. I'm not saying it's a
good idea to do that late in shutdown (as preservation may fail), but
that should be the caller's problem.

Thanks,
Pasha

^ permalink raw reply

* Re: [PATCH RFC 3/4] io-128-nonatomic: introduce io{read|write}128_{lo_hi|hi_lo}
From: Ben Dooks @ 2025-11-12 14:48 UTC (permalink / raw)
  To: Chenghai Huang, arnd, catalin.marinas, will, akpm,
	anshuman.khandual, ryan.roberts, andriy.shevchenko, herbert,
	linux-kernel, linux-arch, linux-arm-kernel, linux-crypto,
	linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-4-huangchenghai2@huawei.com>

On 12/11/2025 01:58, Chenghai Huang wrote:
> From: Weili Qian <qianweili@huawei.com>
> 
> In order to provide non-atomic functions for io{read|write}128.
> We define a number of variants of these functions in the generic
> iomap that will do non-atomic operations.
> 
> These functions are only defined if io{read|write}128 are defined.
> If they are not, then the wrappers that always use non-atomic operations
> from include/linux/io-128-nonatomic*.h will be used.
> 
> Signed-off-by: Weili Qian <qianweili@huawei.com>
> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
> ---
>   include/linux/io-128-nonatomic-hi-lo.h | 35 ++++++++++++++++++++++++++
>   include/linux/io-128-nonatomic-lo-hi.h | 34 +++++++++++++++++++++++++
>   2 files changed, 69 insertions(+)
>   create mode 100644 include/linux/io-128-nonatomic-hi-lo.h
>   create mode 100644 include/linux/io-128-nonatomic-lo-hi.h
> 
> diff --git a/include/linux/io-128-nonatomic-hi-lo.h b/include/linux/io-128-nonatomic-hi-lo.h
> new file mode 100644
> index 000000000000..b5b083a9e81b
> --- /dev/null
> +++ b/include/linux/io-128-nonatomic-hi-lo.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_IO_128_NONATOMIC_HI_LO_H_
> +#define _LINUX_IO_128_NONATOMIC_HI_LO_H_
> +
> +#include <linux/io.h>
> +#include <asm-generic/int-ll64.h>
> +
> +static inline u128 ioread128_hi_lo(const void __iomem *addr)
> +{
> +	u32 low, high;

did you mean u64 here?

> +	high = ioread64(addr + sizeof(u64));
> +	low = ioread64(addr);
> +
> +	return low + ((u128)high << 64);
> +}
> +
> +static inline void iowrite128_hi_lo(u128 val, void __iomem *addr)
> +{
> +	iowrite64(val >> 64, addr + sizeof(u64));
> +	iowrite64(val, addr);
> +}
> +

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply

* Re: [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: Mark Rutland @ 2025-11-12 14:17 UTC (permalink / raw)
  To: David Laight
  Cc: Chenghai Huang, arnd, catalin.marinas, will, akpm,
	anshuman.khandual, ryan.roberts, andriy.shevchenko, herbert,
	linux-kernel, linux-arch, linux-arm-kernel, linux-crypto,
	linux-api, fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112140157.24ff4f2e@pumpkin>

On Wed, Nov 12, 2025 at 02:01:57PM +0000, David Laight wrote:
> On Wed, 12 Nov 2025 12:28:01 +0000
> Mark Rutland <mark.rutland@arm.com> wrote:
> 
> > On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:
> > > From: Weili Qian <qianweili@huawei.com>
> > > 
> > > Starting from ARMv8.4, stp and ldp instructions become atomic.  
> > 
> > That's not true for accesses to Device memory types.
> > 
> > Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in
> > Armv8.4"):
> > 
> >   If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load
> >   or store two 64-bit registers are single-copy atomic when all of the
> >   following conditions are true:
> >   • The overall memory access is aligned to 16 bytes.
> >   • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
> > 
> > IIUC when used for Device memory types, those can be split, and a part
> > of the access could be replayed multiple times (e.g. due to an
> > intetrupt).
> 
> That can't be right.

For better or worse, the architecture permits this, and I understand
that there are implementations on which this can happen.

> IO accesses can reference hardware FIFO so must only happen once.

This has nothing to do with the endpoint, and so any FIFO in the
endpoint is immaterial.

I agree that we want to ensure that the accesses only happen once, which
is why I have raised that it is unsound to use LDP/LDNP/STP in this way.

> (Or is 'Device memory' something different from 'Device register'?

I specifically said "Device memory type", which is an attribute that the
MMU associates with a VA, and determines how the MMU (and memory system
as a whole) treats accesses to that VA.

You can find the architecture documentation I referenced at:

  https://developer.arm.com/documentation/ddi0487/lb/

> I'm also not sure that the bus cycles could get split by an interrupt,
> that would require a mid-instruction interrupt - very unlikely.

There are various reasons why an implementation might split the accesses
made by a single instruction, and why an interrupt (or other event)
might occur between accesses and cause a replay of some of the
constituent accesses. This has nothing to do with splitting bus cycles.

Mark.

^ permalink raw reply

* Re: [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: David Laight @ 2025-11-12 14:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Chenghai Huang, arnd, catalin.marinas, will, akpm,
	anshuman.khandual, ryan.roberts, andriy.shevchenko, herbert,
	linux-kernel, linux-arch, linux-arm-kernel, linux-crypto,
	linux-api, fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <aRR9UesvUCFLdVoW@J2N7QTR9R3>

On Wed, 12 Nov 2025 12:28:01 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:
> > From: Weili Qian <qianweili@huawei.com>
> > 
> > Starting from ARMv8.4, stp and ldp instructions become atomic.  
> 
> That's not true for accesses to Device memory types.
> 
> Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in
> Armv8.4"):
> 
>   If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load
>   or store two 64-bit registers are single-copy atomic when all of the
>   following conditions are true:
>   • The overall memory access is aligned to 16 bytes.
>   • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
> 
> IIUC when used for Device memory types, those can be split, and a part
> of the access could be replayed multiple times (e.g. due to an
> intetrupt).

That can't be right.
IO accesses can reference hardware FIFO so must only happen once.
(Or is 'Device memory' something different from 'Device register'?
I'm also not sure that the bus cycles could get split by an interrupt,
that would require a mid-instruction interrupt - very unlikely.
Interleaving is most likely to come from another cpu.

More interesting would be whether the instructions generate a single
PCIe TLP? (perhaps even only most of the time.)
PCIe reads are high latency, anything that can be done to increase the
size of the TLP improves PIO throughput massively.

	David

> 
> I don't think we can add this generally. It is not atomic, and not
> generally safe.
> 
> Mark.
...

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-12 13:33 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bB8731z-EKv2K8-x5SH8rjOTTuWkfkrc4Qj6skW+Kr7-g@mail.gmail.com>

On Wed, Nov 12, 2025 at 07:46:23AM -0500, Pasha Tatashin wrote:
> On Wed, Nov 12, 2025 at 5:21 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Nov 11, 2025 at 03:42:24PM -0500, Pasha Tatashin wrote:
> > > On Tue, Nov 11, 2025 at 3:39 PM Pasha Tatashin
> > > <pasha.tatashin@soleen.com> wrote:
> > > >
> > > > > >       kho_memory_init();
> > > > > >
> > > > > > +     /* Live Update should follow right after KHO is initialized */
> > > > > > +     liveupdate_init();
> > > > > > +
> > > > >
> > > > > Why do you think it should be immediately after kho_memory_init()?
> > > > > Any reason this can't be called from start_kernel() or even later as an
> > > > > early_initcall() or core_initall()?
> > > >
> > > > Unfortunately, no, even here it is too late, and we might need to find
> > > > a way to move the kho_init/liveupdate_init earlier. We must be able to
> > > > preserve HugeTLB pages, and those are reserved earlier in boot.
> > >
> > > Just to clarify: liveupdate_init() is needed to start using:
> > > liveupdate_flb_incoming_* API, and FLB data is needed during HugeTLB
> > > reservation.
> >
> > Since flb is "file-lifecycle-bound", it implies *file*. Early memory
> > reservations in hugetlb are not bound to files, they end up in file objects
> > way later.
> 
> FLB global objects act similarly to subsystem-wide data, except their
> data has a clear creation and destruction time tied to preserved
> files. When the first file of a particular type is added to LUO, this
> global data is created; when the last file of that type is removed
> (unpreserved or finished), this global data is destroyed, this is why
> its life is bound to file lifecycle. Crucially, this global data is
> accessible at any time while LUO owns the associated files spanning
> the early boot update boundary.

But there are no files at mm_core_init(). I'm really confused here.
 
> > So I think for now we can move liveupdate_init() later in boot and we will
> > solve the problem of hugetlb reservations when we add support for hugetlb.
> 
> HugeTLB reserves memory early in boot. If we already have preserved
> HugeTLB pages via LUO/KHO, we must ensure they are counted against the
> boot-time reservation. For example, if hugetlb_cma_reserve() needs to
> reserve ten 1G pages, but LUO has already preserved seven, we only
> need to reserve three new pages and the rest are going to be restored
> with the files.
> 
> Since this count is contained in the FLB global object, that data
> needs to be available during the early reservation phase. (Pratyush is
> working on HugeTLB preservation and can explain further).

Not sure I really follow the design here, but in my understanding the gist
here is that hugetlb reservations need to be aware of the preserved state.
If that's the case, we definitely can move liveupdate_init() to an initcall
and revisit this when hugetlb support for luo comes along.
 
> Pasha
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-12 13:25 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bAsrEqpt9d3s0KXpjcO9WPTJjymdwtiiyWVS6uq5KKNgA@mail.gmail.com>

Hi Pasha,

On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> Hi Mike,
> 
> Thank you for review, my comments below:
> 
> > > This is why this call is placed first in reboot(), before any
> > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > an allocation problem occurs in KHO, the error is simply reported back
> > > to userspace, and the live update update is safely aborted.

The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
move it there?

And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
the error value before returning it to userspace. Why kernel_kexec() can't
do the same?

> > This is fine. But what I don't like is that we can't use kho without
> > liveupdate. We are making debugfs optional, we have a way to call
> 
> Yes you can: you can disable liveupdate (i.e. not supply liveupdate=1
> via kernel parameter) and use KHO the old way: drive it from the
> userspace. However, if liveupdate is enabled, liveupdate becomes the
> driver of KHO as unfortunately KHO has these weird states at the
> moment.

The "weird state" is the point where KHO builds its FDT. Replacing the
current memory tracker with one that does not require serialization won't
change it. We still need a way to tell KHO that "there won't be new nodes
in FDT, pack it".

> > kho_finalize() on the reboot path and it does not seem an issue to do it
> > even without liveupdate. But then we force kho_finalize() into
> > liveupdate_reboot() allowing weird configurations where kho is there but
> > it's unusable.
> 
> What do you mean KHO is there but unusable, we should not have such a state...

If you compile a kernel with KEXEC_HANDOVER=y, KEXEC_HANDOVER_DEBUGFS=n and
LIVEUPDATE=n and boot with kho=1 there is nothing to trigger
kho_finalize().

> > What I'd like to see is that we can finalize KHO on kexec reboot path even
> > when liveupdate is not compiled and until then the patch that makes KHO
> > debugfs optional should not go further IMO.
> >
> > Another thing I didn't check in this series yet is how finalization driven
> > from debugfs interacts with liveupdate internal handling?
> 
> I think what we can do is the following:
> - Remove "Kconfig: make debugfs optional" from this series, and
> instead make that change as part of stateless KHO work.
> - This will ensure that when liveupdate=0 always KHO finalize is fully
> support the old way.
> - When liveupdate=1 always disable KHO debugfs "finalize" API, and
> allow liveupdate to drive it automatically. It would add another
> liveupdate_enable() check to KHO, and is going to be removed as part
> of stateless KHO work.

KHO should not call into liveupdate. That's layering violation.
And "stateless KHO" does not really make it stateless, it only removes the
memory serialization from kho_finalize(), but it's still required to pack
the FDT.

I think we should allow kho finalization in some form from kernel_kexec().

When kho=1 and liveupdate=0, it will actually create the FDT if there was
no previous trigger from debugfs or it will continue with FDT created by
explicit request via debugfs.

When liveupdate=1, liveupdate_reboot() may call a function that actually
finalizes the state to allow safe rollback (although in the current patches
it does not seem necessary). And then kho_finalize() called from
kernel_kexec() will just continue with the state created by
liveupdate_reboot().  If we already finalized the kho state via debugfs,
liveupdate_reboot() can either error out or reset that state.

> Pasha
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-12 12:46 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRRflLTejNQXWa1Z@kernel.org>

On Wed, Nov 12, 2025 at 5:21 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 11, 2025 at 03:42:24PM -0500, Pasha Tatashin wrote:
> > On Tue, Nov 11, 2025 at 3:39 PM Pasha Tatashin
> > <pasha.tatashin@soleen.com> wrote:
> > >
> > > > >       kho_memory_init();
> > > > >
> > > > > +     /* Live Update should follow right after KHO is initialized */
> > > > > +     liveupdate_init();
> > > > > +
> > > >
> > > > Why do you think it should be immediately after kho_memory_init()?
> > > > Any reason this can't be called from start_kernel() or even later as an
> > > > early_initcall() or core_initall()?
> > >
> > > Unfortunately, no, even here it is too late, and we might need to find
> > > a way to move the kho_init/liveupdate_init earlier. We must be able to
> > > preserve HugeTLB pages, and those are reserved earlier in boot.
> >
> > Just to clarify: liveupdate_init() is needed to start using:
> > liveupdate_flb_incoming_* API, and FLB data is needed during HugeTLB
> > reservation.
>
> Since flb is "file-lifecycle-bound", it implies *file*. Early memory
> reservations in hugetlb are not bound to files, they end up in file objects
> way later.

FLB global objects act similarly to subsystem-wide data, except their
data has a clear creation and destruction time tied to preserved
files. When the first file of a particular type is added to LUO, this
global data is created; when the last file of that type is removed
(unpreserved or finished), this global data is destroyed, this is why
its life is bound to file lifecycle. Crucially, this global data is
accessible at any time while LUO owns the associated files spanning
the early boot update boundary.

> So I think for now we can move liveupdate_init() later in boot and we will
> solve the problem of hugetlb reservations when we add support for hugetlb.

HugeTLB reserves memory early in boot. If we already have preserved
HugeTLB pages via LUO/KHO, we must ensure they are counted against the
boot-time reservation. For example, if hugetlb_cma_reserve() needs to
reserve ten 1G pages, but LUO has already preserved seven, we only
need to reserve three new pages and the rest are going to be restored
with the files.

Since this count is contained in the FLB global object, that data
needs to be available during the early reservation phase. (Pratyush is
working on HugeTLB preservation and can explain further).

Pasha

^ permalink raw reply

* Re: [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: Mark Rutland @ 2025-11-12 12:28 UTC (permalink / raw)
  To: Chenghai Huang
  Cc: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api, fanghao11,
	shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-5-huangchenghai2@huawei.com>

On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:
> From: Weili Qian <qianweili@huawei.com>
> 
> Starting from ARMv8.4, stp and ldp instructions become atomic.

That's not true for accesses to Device memory types.

Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in
Armv8.4"):

  If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load
  or store two 64-bit registers are single-copy atomic when all of the
  following conditions are true:
  • The overall memory access is aligned to 16 bytes.
  • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.

IIUC when used for Device memory types, those can be split, and a part
of the access could be replayed multiple times (e.g. due to an
intetrupt).

I don't think we can add this generally. It is not atomic, and not
generally safe.

Mark.

> Currently, device drivers depend on 128-bit atomic memory IO access,
> but these are implemented within the drivers. Therefore, this introduces
> generic {__raw_read|__raw_write}128 function for 128-bit memory access.
> 
> Signed-off-by: Weili Qian <qianweili@huawei.com>
> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
> ---
>  arch/arm64/include/asm/io.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 83e03abbb2ca..80430750a28c 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -50,6 +50,17 @@ static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>  	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
>  }
>  
> +#define __raw_write128 __raw_write128
> +static __always_inline void __raw_write128(u128 val, volatile void __iomem *addr)
> +{
> +	u64 low, high;
> +
> +	low = val;
> +	high = (u64)(val >> 64);
> +
> +	asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(low), "rZ"(high), "r"(addr));
> +}
> +
>  #define __raw_readb __raw_readb
>  static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
>  {
> @@ -95,6 +106,16 @@ static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
>  	return val;
>  }
>  
> +#define __raw_read128 __raw_read128
> +static __always_inline u128 __raw_read128(const volatile void __iomem *addr)
> +{
> +	u64 high, low;
> +
> +	asm volatile("ldp %0, %1, [%2]" : "=r" (low), "=r" (high) : "r" (addr));
> +
> +	return (((u128)high << 64) | (u128)low);
> +}
> +
>  /* IO barriers */
>  #define __io_ar(v)							\
>  ({									\
> -- 
> 2.33.0
> 
> 

^ permalink raw reply

* Re: [PATCH] man/man2/clone.2: Document CLONE_NEWPID and CLONE_NEWUSER flag
From: Alejandro Colomar @ 2025-11-12 11:23 UTC (permalink / raw)
  To: hoodit dev; +Cc: Carlos O'Donell, linux-man, linux-api, Andrew Morton
In-Reply-To: <CAFvyz33t9gYOi2HtNFNC_YAPS-_0QHiqJQwatc7YsGppstiZ7A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2286 bytes --]

Hi,

On Wed, Oct 29, 2025 at 06:00:50PM +0900, hoodit dev wrote:
> Hi, Alejandro Colomar and Carlos
> 
> Just a friendly ping to check if you had a chance to review this patch.

I don't know enough of clone(2) to review this.  I'll wait for Carlos's
review.


Have a lovely day!
Alex

> 
> Thanks
> 
> 2025년 5월 2일 (금) 오전 6:30, Alejandro Colomar <alx@kernel.org>님이 작성:
> >
> > Hi Carlos,
> >
> > On Mon, Apr 21, 2025 at 04:16:03AM +0900, devhoodit wrote:
> > > CLONE_NEWPID and CLONE_PARENT can be used together, but not CLONE_THREAD.  Similarly, CLONE_NEWUSER and CLONE_PARENT can be used together, but not CLONE_THREAD.
> > > This was discussed here: <https://lore.kernel.org/linux-man/06febfb3-e2e2-4363-bc34-83a07692144f@redhat.com/T/>
> > > Relevant code: <https://github.com/torvalds/linux/blob/219d54332a09e8d8741c1e1982f5eae56099de85/kernel/fork.c#L1815>
> > >
> > > Cc: Carlos O'Donell <carlos@redhat.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: devhoodit <devhoodit@gmail.com>
> >
> > Could you please review this patch?
> >
> >
> > Have a lovely night!
> > Alex
> >
> > > ---
> > >  man/man2/clone.2 | 9 +++------
> > >  1 file changed, 3 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/man/man2/clone.2 b/man/man2/clone.2
> > > index 1b74e4c92..b9561125a 100644
> > > --- a/man/man2/clone.2
> > > +++ b/man/man2/clone.2
> > > @@ -776,9 +776,7 @@ .SS The flags mask
> > >  no privileges are needed to create a user namespace.
> > >  .IP
> > >  This flag can't be specified in conjunction with
> > > -.B CLONE_THREAD
> > > -or
> > > -.BR CLONE_PARENT .
> > > +.BR CLONE_THREAD .
> > >  For security reasons,
> > >  .\" commit e66eded8309ebf679d3d3c1f5820d1f2ca332c71
> > >  .\" https://lwn.net/Articles/543273/
> > > @@ -1319,11 +1317,10 @@ .SH ERRORS
> > >  mask.
> > >  .TP
> > >  .B EINVAL
> > > +Both
> > >  .B CLONE_NEWPID
> > > -and one (or both) of
> > > +and
> > >  .B CLONE_THREAD
> > > -or
> > > -.B CLONE_PARENT
> > >  were specified in the
> > >  .I flags
> > >  mask.
> > > --
> > > 2.49.0
> > >
> >
> > --
> > <https://www.alejandro-colomar.es/>
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: Greg KH @ 2025-11-12 11:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <D4AF3E24-8698-4EEC-9D52-655D69897111@zytor.com>

On Mon, Nov 10, 2025 at 07:57:22PM -0800, H. Peter Anvin wrote:
> Honestly, though, I'm far less interested in what 8250-based hardware does than e.g. USB.

hahahahahahaha {snort}

Hah.  that's a good one.

Oh, you aren't kidding.

Wow, good luck with this.  USB-serial adaptors are all over the place,
some have real uarts in them (and so do buffering in the device, and
line handling in odd ways when powered up), and some are almost just a
straight pipe through to the USB host with control line handling ideas
tacked on to the side as an afterthought, if at all.

There is no standard here, they all work differently, and even work
differently across the same device type with just barely enough hints
for us to determine what is going on.

So don't worry about USB, if you throw that into the mix, all bets are
off and you should NEVER rely on that.

Remeber USB->serial was explicitly rejected by the USB standard group,
only to have it come back in the "side door" through the spec process
when it turned out that Microsoft hated having to write a zillion
different vendor-specific drivers because the vendor provided ones kept
crashing user's machines.  So what we ended up with was "just enough" to
make it through the spec process, and even then line signals are
probably never tested so you can't rely on them.

good luck!

greg "this brought up too many bad memories" k-h

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-12 10:21 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bD3hps+atqUZ2LKyuoOSRRUWpTPE+frd5g13js4EAFK8g@mail.gmail.com>

On Tue, Nov 11, 2025 at 03:42:24PM -0500, Pasha Tatashin wrote:
> On Tue, Nov 11, 2025 at 3:39 PM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
> >
> > > >       kho_memory_init();
> > > >
> > > > +     /* Live Update should follow right after KHO is initialized */
> > > > +     liveupdate_init();
> > > > +
> > >
> > > Why do you think it should be immediately after kho_memory_init()?
> > > Any reason this can't be called from start_kernel() or even later as an
> > > early_initcall() or core_initall()?
> >
> > Unfortunately, no, even here it is too late, and we might need to find
> > a way to move the kho_init/liveupdate_init earlier. We must be able to
> > preserve HugeTLB pages, and those are reserved earlier in boot.
> 
> Just to clarify: liveupdate_init() is needed to start using:
> liveupdate_flb_incoming_* API, and FLB data is needed during HugeTLB
> reservation.

Since flb is "file-lifecycle-bound", it implies *file*. Early memory
reservations in hugetlb are not bound to files, they end up in file objects
way later.

So I think for now we can move liveupdate_init() later in boot and we will
solve the problem of hugetlb reservations when we add support for hugetlb.
 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 00/17] vfs: recall-only directory delegations for knfsd
From: Christian Brauner @ 2025-11-12  9:00 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christian Brauner, linux-fsdevel, linux-kernel, linux-nfs,
	linux-cifs, samba-technical, netfs, ecryptfs, linux-unionfs,
	linux-xfs, netdev, linux-api, Miklos Szeredi, Alexander Viro,
	Jan Kara, Chuck Lever, Alexander Aring, Trond Myklebust,
	Anna Schumaker, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Tom Talpey, Bharath SM, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells, Tyler Hicks,
	NeilBrown, Olga Kornievskaia, Dai Ngo, Amir Goldstein,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Carlos Maiolino,
	Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
In-Reply-To: <20251111-dir-deleg-ro-v6-0-52f3feebb2f2@kernel.org>

On Tue, 11 Nov 2025 09:12:41 -0500, Jeff Layton wrote:
> Behold, another version of the directory delegation patchset. This
> version contains support for recall-only delegations. Support for
> CB_NOTIFY will be forthcoming (once the client-side patches have caught
> up).
> 
> The main changes here are in response to Jan's comments. I also changed
> struct delegation use to fixed-with integer types.
> 
> [...]

Applied to the vfs-6.19.directory.delegations branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.directory.delegations branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.directory.delegations

[01/17] filelock: make lease_alloc() take a flags argument
        https://git.kernel.org/vfs/vfs/c/6fc5f2b19e75
[02/17] filelock: rework the __break_lease API to use flags
        https://git.kernel.org/vfs/vfs/c/4be9f3cc582a
[03/17] filelock: add struct delegated_inode
        https://git.kernel.org/vfs/vfs/c/6976ed2dd0d5
[04/17] filelock: push the S_ISREG check down to ->setlease handlers
        https://git.kernel.org/vfs/vfs/c/e6d28ebc17eb
[05/17] vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
        https://git.kernel.org/vfs/vfs/c/b46ebf9a768d
[06/17] vfs: allow mkdir to wait for delegation break on parent
        https://git.kernel.org/vfs/vfs/c/e12d203b8c88
[07/17] vfs: allow rmdir to wait for delegation break on parent
        https://git.kernel.org/vfs/vfs/c/4fa76319cd0c
[08/17] vfs: break parent dir delegations in open(..., O_CREAT) codepath
        https://git.kernel.org/vfs/vfs/c/134796f43a5e
[09/17] vfs: clean up argument list for vfs_create()
        https://git.kernel.org/vfs/vfs/c/85bbffcad730
[10/17] vfs: make vfs_create break delegations on parent directory
        https://git.kernel.org/vfs/vfs/c/c826229c6a82
[11/17] vfs: make vfs_mknod break delegations on parent directory
        https://git.kernel.org/vfs/vfs/c/e8960c1b2ee9
[12/17] vfs: make vfs_symlink break delegations on parent dir
        https://git.kernel.org/vfs/vfs/c/92bf53577f01
[13/17] filelock: lift the ban on directory leases in generic_setlease
        https://git.kernel.org/vfs/vfs/c/d0eab9fc1047
[14/17] nfsd: allow filecache to hold S_IFDIR files
        https://git.kernel.org/vfs/vfs/c/544a0ee152f0
[15/17] nfsd: allow DELEGRETURN on directories
        https://git.kernel.org/vfs/vfs/c/80c8afddc8b1
[16/17] nfsd: wire up GET_DIR_DELEGATION handling
        https://git.kernel.org/vfs/vfs/c/8b99f6a8c116
[17/17] vfs: expose delegation support to userland
        https://git.kernel.org/vfs/vfs/c/1602bad16d7d

^ permalink raw reply

* [PATCH RFC 0/4] Introduce 128-bit IO access
From: Chenghai Huang @ 2025-11-12  1:58 UTC (permalink / raw)
  To: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili

These patches introduce 128-bit IO access functionality. The reason
is that the current HiSilicon cryptographic devices need to
maintain atomic operations when accessing 128-bit MMIO across
physical and virtual functions.

Currently, 128-bit atomic writes have already been implemented in
the device driver, and the driver also depends on a 128-bit atomic
read access interface. Therefore, we have introduced a generic
128-bit IO access interface to replace the implementation of
128-bit read and write IO interfaces using instructions in the
device driver. When the architecture does not support 128-bit
atomic operations, non-atomic 128-bit read and write interfaces can
be used to make the driver functional.

Weili Qian (4):
  UAPI: Introduce 128-bit types and byteswap operations
  asm-generic/io.h: add io{read,write}128 accessors
  io-128-nonatomic: introduce io{read|write}128_{lo_hi|hi_lo}
  arm64/io: Add {__raw_read|__raw_write}128 support

 arch/arm64/include/asm/io.h                  | 21 +++++++++
 include/asm-generic/io.h                     | 48 ++++++++++++++++++++
 include/linux/io-128-nonatomic-hi-lo.h       | 35 ++++++++++++++
 include/linux/io-128-nonatomic-lo-hi.h       | 34 ++++++++++++++
 include/uapi/linux/byteorder/big_endian.h    |  6 +++
 include/uapi/linux/byteorder/little_endian.h |  6 +++
 include/uapi/linux/swab.h                    | 10 ++++
 include/uapi/linux/types.h                   |  3 ++
 8 files changed, 163 insertions(+)
 create mode 100644 include/linux/io-128-nonatomic-hi-lo.h
 create mode 100644 include/linux/io-128-nonatomic-lo-hi.h

-- 
2.33.0

^ permalink raw reply

* [PATCH RFC 3/4] io-128-nonatomic: introduce io{read|write}128_{lo_hi|hi_lo}
From: Chenghai Huang @ 2025-11-12  1:58 UTC (permalink / raw)
  To: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-1-huangchenghai2@huawei.com>

From: Weili Qian <qianweili@huawei.com>

In order to provide non-atomic functions for io{read|write}128.
We define a number of variants of these functions in the generic
iomap that will do non-atomic operations.

These functions are only defined if io{read|write}128 are defined.
If they are not, then the wrappers that always use non-atomic operations
from include/linux/io-128-nonatomic*.h will be used.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 include/linux/io-128-nonatomic-hi-lo.h | 35 ++++++++++++++++++++++++++
 include/linux/io-128-nonatomic-lo-hi.h | 34 +++++++++++++++++++++++++
 2 files changed, 69 insertions(+)
 create mode 100644 include/linux/io-128-nonatomic-hi-lo.h
 create mode 100644 include/linux/io-128-nonatomic-lo-hi.h

diff --git a/include/linux/io-128-nonatomic-hi-lo.h b/include/linux/io-128-nonatomic-hi-lo.h
new file mode 100644
index 000000000000..b5b083a9e81b
--- /dev/null
+++ b/include/linux/io-128-nonatomic-hi-lo.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IO_128_NONATOMIC_HI_LO_H_
+#define _LINUX_IO_128_NONATOMIC_HI_LO_H_
+
+#include <linux/io.h>
+#include <asm-generic/int-ll64.h>
+
+static inline u128 ioread128_hi_lo(const void __iomem *addr)
+{
+	u32 low, high;
+
+	high = ioread64(addr + sizeof(u64));
+	low = ioread64(addr);
+
+	return low + ((u128)high << 64);
+}
+
+static inline void iowrite128_hi_lo(u128 val, void __iomem *addr)
+{
+	iowrite64(val >> 64, addr + sizeof(u64));
+	iowrite64(val, addr);
+}
+
+#ifndef ioread128
+#define ioread128_is_nonatomic
+#define ioread128 ioread128_hi_lo
+#endif
+
+#ifndef iowrite128
+#define iowrite128_is_nonatomic
+#define iowrite128 iowrite128_hi_lo
+#endif
+
+#endif	/* _LINUX_IO_128_NONATOMIC_HI_LO_H_ */
+
diff --git a/include/linux/io-128-nonatomic-lo-hi.h b/include/linux/io-128-nonatomic-lo-hi.h
new file mode 100644
index 000000000000..0448ee5a13de
--- /dev/null
+++ b/include/linux/io-128-nonatomic-lo-hi.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IO_128_NONATOMIC_LO_HI_H_
+#define _LINUX_IO_128_NONATOMIC_LO_HI_H_
+
+#include <linux/io.h>
+#include <asm-generic/int-ll64.h>
+
+static inline u128 ioread128_lo_hi(const void __iomem *addr)
+{
+	u64 low, high;
+
+	low = ioread64(addr);
+	high = ioread64(addr + sizeof(u64));
+
+	return low + ((u128)high << 64);
+}
+
+static inline void iowrite128_lo_hi(u128 val, void __iomem *addr)
+{
+	iowrite64(val, addr);
+	iowrite64(val >> 64, addr + sizeof(u64));
+}
+
+#ifndef ioread128
+#define ioread128_is_nonatomic
+#define ioread128 ioread128_lo_hi
+#endif
+
+#ifndef iowrite128
+#define iowrite128_is_nonatomic
+#define iowrite128 iowrite128_lo_hi
+#endif
+
+#endif	/* _LINUX_IO_128_NONATOMIC_LO_HI_H_ */
-- 
2.33.0


^ permalink raw reply related

* [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: Chenghai Huang @ 2025-11-12  1:58 UTC (permalink / raw)
  To: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-1-huangchenghai2@huawei.com>

From: Weili Qian <qianweili@huawei.com>

Starting from ARMv8.4, stp and ldp instructions become atomic.
Currently, device drivers depend on 128-bit atomic memory IO access,
but these are implemented within the drivers. Therefore, this introduces
generic {__raw_read|__raw_write}128 function for 128-bit memory access.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 arch/arm64/include/asm/io.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 83e03abbb2ca..80430750a28c 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -50,6 +50,17 @@ static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
 }
 
+#define __raw_write128 __raw_write128
+static __always_inline void __raw_write128(u128 val, volatile void __iomem *addr)
+{
+	u64 low, high;
+
+	low = val;
+	high = (u64)(val >> 64);
+
+	asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(low), "rZ"(high), "r"(addr));
+}
+
 #define __raw_readb __raw_readb
 static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
 {
@@ -95,6 +106,16 @@ static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
 	return val;
 }
 
+#define __raw_read128 __raw_read128
+static __always_inline u128 __raw_read128(const volatile void __iomem *addr)
+{
+	u64 high, low;
+
+	asm volatile("ldp %0, %1, [%2]" : "=r" (low), "=r" (high) : "r" (addr));
+
+	return (((u128)high << 64) | (u128)low);
+}
+
 /* IO barriers */
 #define __io_ar(v)							\
 ({									\
-- 
2.33.0


^ permalink raw reply related

* [PATCH RFC 1/4] UAPI: Introduce 128-bit types and byteswap operations
From: Chenghai Huang @ 2025-11-12  1:58 UTC (permalink / raw)
  To: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-1-huangchenghai2@huawei.com>

From: Weili Qian <qianweili@huawei.com>

Architectures like ARM64 support 128-bit integer types and
operations. This patch adds a generic byte order conversion
interface for 128-bit.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 include/uapi/linux/byteorder/big_endian.h    |  6 ++++++
 include/uapi/linux/byteorder/little_endian.h |  6 ++++++
 include/uapi/linux/swab.h                    | 10 ++++++++++
 include/uapi/linux/types.h                   |  3 +++
 4 files changed, 25 insertions(+)

diff --git a/include/uapi/linux/byteorder/big_endian.h b/include/uapi/linux/byteorder/big_endian.h
index 80aa5c41a763..318d51a18f43 100644
--- a/include/uapi/linux/byteorder/big_endian.h
+++ b/include/uapi/linux/byteorder/big_endian.h
@@ -29,6 +29,12 @@
 #define __constant_be32_to_cpu(x) ((__force __u32)(__be32)(x))
 #define __constant_cpu_to_be16(x) ((__force __be16)(__u16)(x))
 #define __constant_be16_to_cpu(x) ((__force __u16)(__be16)(x))
+
+#ifdef __SIZEOF_INT128__
+#define __cpu_to_le128(x) ((__force __le128)__swab128((x)))
+#define __le128_to_cpu(x) __swab128((__force __u128)(__le128)(x))
+#endif
+
 #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
 #define __le64_to_cpu(x) __swab64((__force __u64)(__le64)(x))
 #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
diff --git a/include/uapi/linux/byteorder/little_endian.h b/include/uapi/linux/byteorder/little_endian.h
index cd98982e7523..b2732452b825 100644
--- a/include/uapi/linux/byteorder/little_endian.h
+++ b/include/uapi/linux/byteorder/little_endian.h
@@ -29,6 +29,12 @@
 #define __constant_be32_to_cpu(x) ___constant_swab32((__force __u32)(__be32)(x))
 #define __constant_cpu_to_be16(x) ((__force __be16)___constant_swab16((x)))
 #define __constant_be16_to_cpu(x) ___constant_swab16((__force __u16)(__be16)(x))
+
+#ifdef __SIZEOF_INT128__
+#define __cpu_to_le128(x) ((__force __le128)(__u128)(x))
+#define __le128_to_cpu(x) ((__force __u128)(__le128)(x))
+#endif
+
 #define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
 #define __le64_to_cpu(x) ((__force __u64)(__le64)(x))
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
index 01717181339e..7381b9a785ce 100644
--- a/include/uapi/linux/swab.h
+++ b/include/uapi/linux/swab.h
@@ -133,6 +133,16 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
 	__fswab64(x))
 #endif
 
+#ifdef __SIZEOF_INT128__
+static inline __attribute_const__ __u128 __swab128(__u128 val)
+{
+	__u64 h = val >> 64;
+	__u64 l = val;
+
+	return (((__u128)__swab64(l)) << 64) | ((__u128)(__swab64(h)));
+}
+#endif
+
 static __always_inline unsigned long __swab(const unsigned long y)
 {
 #if __BITS_PER_LONG == 64
diff --git a/include/uapi/linux/types.h b/include/uapi/linux/types.h
index 48b933938877..9624ea43cd8a 100644
--- a/include/uapi/linux/types.h
+++ b/include/uapi/linux/types.h
@@ -40,6 +40,9 @@ typedef __u32 __bitwise __be32;
 typedef __u64 __bitwise __le64;
 typedef __u64 __bitwise __be64;
 
+#ifdef __SIZEOF_INT128__
+typedef __u128 __bitwise __le128;
+#endif
 typedef __u16 __bitwise __sum16;
 typedef __u32 __bitwise __wsum;
 
-- 
2.33.0


^ permalink raw reply related

* [PATCH RFC 2/4] asm-generic/io.h: add io{read,write}128 accessors
From: Chenghai Huang @ 2025-11-12  1:58 UTC (permalink / raw)
  To: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <20251112015846.1842207-1-huangchenghai2@huawei.com>

From: Weili Qian <qianweili@huawei.com>

Architectures like ARM64 already support 128-bit memory access. Currently,
device drivers implement atomic read and write operations for 128-bit
memory using assembly. This patch adds generic io{read,write}128 access
functions, which will enable device drivers to consistently use
io{read,write}128 for 128-bit access.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 include/asm-generic/io.h | 48 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index ca5a1ce6f0f8..c419021318e6 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -146,6 +146,16 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #endif
 #endif /* CONFIG_64BIT */
 
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+#ifndef __raw_read128
+#define __raw_read128 __raw_read128
+static inline u128 __raw_read128(volatile void __iomem *addr)
+{
+	return *(const volatile u128 __force *)addr;
+}
+#endif
+#endif /* CONFIG_ARCH_SUPPORTS_INT128 */
+
 #ifndef __raw_writeb
 #define __raw_writeb __raw_writeb
 static inline void __raw_writeb(u8 value, volatile void __iomem *addr)
@@ -180,6 +190,16 @@ static inline void __raw_writeq(u64 value, volatile void __iomem *addr)
 #endif
 #endif /* CONFIG_64BIT */
 
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+#ifndef __raw_write128
+#define __raw_write128 __raw_write128
+static inline void __raw_write128(u128 value, volatile void __iomem *addr)
+{
+	*(volatile u128 __force *)addr = value;
+}
+#endif
+#endif /* CONFIG_ARCH_SUPPORTS_INT128 */
+
 /*
  * {read,write}{b,w,l,q}() access little endian memory and return result in
  * native endianness.
@@ -917,6 +937,22 @@ static inline u64 ioread64(const volatile void __iomem *addr)
 #endif
 #endif /* CONFIG_64BIT */
 
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+#ifndef ioread128
+#define ioread128 ioread128
+static inline u128 ioread128(const volatile void __iomem *addr)
+{
+	u128 val;
+
+	__io_br();
+	val = __le128_to_cpu((__le128 __force)__raw_read128(addr));
+	__io_ar(val);
+
+	return val;
+}
+#endif
+#endif /* CONFIG_ARCH_SUPPORTS_INT128 */
+
 #ifndef iowrite8
 #define iowrite8 iowrite8
 static inline void iowrite8(u8 value, volatile void __iomem *addr)
@@ -951,6 +987,18 @@ static inline void iowrite64(u64 value, volatile void __iomem *addr)
 #endif
 #endif /* CONFIG_64BIT */
 
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+#ifndef iowrite128
+#define iowrite128 iowrite128
+static inline void iowrite128(u128 value, volatile void __iomem *addr)
+{
+	__io_bw();
+	__raw_write128((u128 __force)__cpu_to_le128(value), addr);
+	__io_aw();
+}
+#endif
+#endif /* CONFIG_ARCH_SUPPORTS_INT128 */
+
 #ifndef ioread16be
 #define ioread16be ioread16be
 static inline u16 ioread16be(const volatile void __iomem *addr)
-- 
2.33.0


^ permalink raw reply related

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: H. Peter Anvin @ 2025-11-11 21:28 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <20251111043803.GK2988753@mit.edu>

On 2025-11-10 20:38, Theodore Ts'o wrote:
> On Mon, Nov 10, 2025 at 07:57:22PM -0800, H. Peter Anvin wrote:
>> I really think you are looking at this from a very odd point of
>> view, and you seem to be very inconsistent. Boot time setup? Isn't
>> that what setserial is for? We have the ability to feed this
>> configuration already, but you need a file descriptor.
> 
> I'm not really fond of adding some new open flag that to me seems
> **very** serial / RS-485 specific, and so I'm trying to find some
> way to avoid it.
> 

I don't think it is.  "Opening this device for configuration."

> I also think that that the GPIO style timing requirements of RTS
> **really** should be done as a line discpline, and not in userspace.
> 

No disagreement there -- and so it is. What I want to do is a way to *attach*
that line discipline without poking with the serial port itself.  That's what
I keep trying to get at.

>> Honestly, though, I'm far less interested in what 8250-based hardware does than e.g. USB.
> 
> I'm quite confident that USB won't have "state" that will be preserved
> across a reboot, because the device won't even get powered up until
> the USB device is attached.  And part of the problem was that the
> requirements weren't particularly clear, and given the insistence that
> the "state" be preserved even across reboot, despite the serial port
> autoconfiguration, I had assumed you were posting uing the COM 1/2/3/4
> ports where autoconfiguration isn't stricty speaking necessary.
> 
> In some ways, USB ports might be easier, since it should be possible
> to specify udev rules which get passed to the driver when the USB
> serial device is inserted, and so *that* can easily be done without
> needing a file descriptor.
> 
> And for this sort of thing, it seems perfectly fair to hard code some
> specific behavior using either a boot command line or a udev rule,
> since you seem to be positing that the serial port will be dedicated
> to some kind of weird-shit RS-485 bus device, where any time RTS/DTR
> gets raised, the bus will malfunction in weird and wondrous ways....

But again, it is very much a configuration property.  You don't know where
your dynamically assigned serial port will end up -- and you *can't*, because
it is a property of the DCE -- what is plugged *into* the device.

Now you have someone writing a terminal program or something like Arduino and
decide to enumerate serial ports (which, as I stated, you can't actually do
right now without opening the devices).  This is why it makes sense for the
open() caller to declare intent; this is similar to how O_NDELAY replaced
callout devices.

It would be lovely if we could do something like
open("/dev/ttyS0/option-string") and so on, but that is well and truly a far
bigger change to the whole driver API.

	-hpa

^ permalink raw reply

* Re: [PATCH v5 05/22] liveupdate: kho: when live update add KHO image during kexec load
From: Pasha Tatashin @ 2025-11-11 20:59 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aROaJUjyyZqJ19Wo@kernel.org>

> I believe that when my concerns about "[PATCH v5 02/22] liveupdate:
> luo_core: integrate with KHO" [1] are resolved this patch won't be needed.
>
> [1] https://lore.kernel.org/all/aROZi043lxtegqWE@kernel.org/

Thank you, I replied to your comments in that patch. However, until
KHO becomes statless this change is needed. We *must* have KHO image
as part of kexec load if liveupdate=1.

>
> > Pasha
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-11 20:57 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aROZi043lxtegqWE@kernel.org>

Hi Mike,

Thank you for review, my comments below:

> > This is why this call is placed first in reboot(), before any
> > irreversible reboot notifiers or shutdown callbacks are performed. If
> > an allocation problem occurs in KHO, the error is simply reported back
> > to userspace, and the live update update is safely aborted.
>
> This is fine. But what I don't like is that we can't use kho without
> liveupdate. We are making debugfs optional, we have a way to call

Yes you can: you can disable liveupdate (i.e. not supply liveupdate=1
via kernel parameter) and use KHO the old way: drive it from the
userspace. However, if liveupdate is enabled, liveupdate becomes the
driver of KHO as unfortunately KHO has these weird states at the
moment.

> kho_finalize() on the reboot path and it does not seem an issue to do it
> even without liveupdate. But then we force kho_finalize() into
> liveupdate_reboot() allowing weird configurations where kho is there but
> it's unusable.

What do you mean KHO is there but unusable, we should not have such a state...

> What I'd like to see is that we can finalize KHO on kexec reboot path even
> when liveupdate is not compiled and until then the patch that makes KHO
> debugfs optional should not go further IMO.
>
> Another thing I didn't check in this series yet is how finalization driven
> from debugfs interacts with liveupdate internal handling?

I think what we can do is the following:
- Remove "Kconfig: make debugfs optional" from this series, and
instead make that change as part of stateless KHO work.
- This will ensure that when liveupdate=0 always KHO finalize is fully
support the old way.
- When liveupdate=1 always disable KHO debugfs "finalize" API, and
allow liveupdate to drive it automatically. It would add another
liveupdate_enable() check to KHO, and is going to be removed as part
of stateless KHO work.

Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-11 20:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bDnaLJS9GdO_7Anhwah2uQrYYk_RhQMSiRL-YB=8ZZZWQ@mail.gmail.com>

On Tue, Nov 11, 2025 at 3:39 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> > >       kho_memory_init();
> > >
> > > +     /* Live Update should follow right after KHO is initialized */
> > > +     liveupdate_init();
> > > +
> >
> > Why do you think it should be immediately after kho_memory_init()?
> > Any reason this can't be called from start_kernel() or even later as an
> > early_initcall() or core_initall()?
>
> Unfortunately, no, even here it is too late, and we might need to find
> a way to move the kho_init/liveupdate_init earlier. We must be able to
> preserve HugeTLB pages, and those are reserved earlier in boot.

Just to clarify: liveupdate_init() is needed to start using:
liveupdate_flb_incoming_* API, and FLB data is needed during HugeTLB
reservation.

Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-11 20:39 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRObz4bQzRHH5hJb@kernel.org>

> >       kho_memory_init();
> >
> > +     /* Live Update should follow right after KHO is initialized */
> > +     liveupdate_init();
> > +
>
> Why do you think it should be immediately after kho_memory_init()?
> Any reason this can't be called from start_kernel() or even later as an
> early_initcall() or core_initall()?

Unfortunately, no, even here it is too late, and we might need to find
a way to move the kho_init/liveupdate_init earlier. We must be able to
preserve HugeTLB pages, and those are reserved earlier in boot.

Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-11 20:25 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-3-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:00PM -0500, Pasha Tatashin wrote:
> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
> 
> When LUO is transitioned to a "prepared" state, it tells KHO to
> finalize, so all memory segments that were added to KHO preservation
> list are getting preserved. After "Prepared" state no new segments
> can be preserved. If LUO is canceled, it also tells KHO to cancel the
> serialization, and therefore, later LUO can go back into the prepared
> state.
> 
> This patch introduces the following changes:
> - During the KHO finalization phase allocate FDT blob.
> - Populate this FDT with a LUO compatibility string ("luo-v1").
> 
> LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> logic (`luo_do_*_calls`) remains unimplemented in this patch.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

...

> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index c6812b4dbb2e..20c850a52167 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -21,6 +21,7 @@
>  #include <linux/buffer_head.h>
>  #include <linux/kmemleak.h>
>  #include <linux/kfence.h>
> +#include <linux/liveupdate.h>
>  #include <linux/page_ext.h>
>  #include <linux/pti.h>
>  #include <linux/pgtable.h>
> @@ -2703,6 +2704,9 @@ void __init mm_core_init(void)
>  	 */
>  	kho_memory_init();
>  
> +	/* Live Update should follow right after KHO is initialized */
> +	liveupdate_init();
> +

Why do you think it should be immediately after kho_memory_init()?
Any reason this can't be called from start_kernel() or even later as an
early_initcall() or core_initall()?

>  	memblock_free_all();
>  	mem_init();
>  	kmem_cache_init();
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 05/22] liveupdate: kho: when live update add KHO image during kexec load
From: Mike Rapoport @ 2025-11-11 20:18 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bA=cQkibx4dSxJQTVxVxqkAsZPfFoPJip6rx8DqX62aEA@mail.gmail.com>

On Mon, Nov 10, 2025 at 10:31:23AM -0500, Pasha Tatashin wrote:
> On Mon, Nov 10, 2025 at 7:47 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Fri, Nov 07, 2025 at 04:03:03PM -0500, Pasha Tatashin wrote:
> > > In case KHO is driven from within kernel via live update, finalize will
> > > always happen during reboot, so add the KHO image unconditionally.
> > >
> > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > > ---
> > >  kernel/liveupdate/kexec_handover.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> > > index 9f0913e101be..b54ca665e005 100644
> > > --- a/kernel/liveupdate/kexec_handover.c
> > > +++ b/kernel/liveupdate/kexec_handover.c
> > > @@ -15,6 +15,7 @@
> > >  #include <linux/kexec_handover.h>
> > >  #include <linux/libfdt.h>
> > >  #include <linux/list.h>
> > > +#include <linux/liveupdate.h>
> > >  #include <linux/memblock.h>
> > >  #include <linux/page-isolation.h>
> > >  #include <linux/vmalloc.h>
> > > @@ -1489,7 +1490,7 @@ int kho_fill_kimage(struct kimage *image)
> > >       int err = 0;
> > >       struct kexec_buf scratch;
> > >
> > > -     if (!kho_out.finalized)
> > > +     if (!kho_out.finalized && !liveupdate_enabled())
> > >               return 0;
> >
> > This feels backwards, I don't think KHO should call liveupdate methods.
> 
> It is backward, but it is a requirement until KHO becomes stateless.
> LUO does not have dependencies on userspace state of when kexec is
> loaded. In fact the next kernel must be loaded before the brownout as
> it is an expensive operation. The sequence of events should:
> 
> 1. Load the next kernel in memory
> 2. Preserve resources via LUO
> 3. Do Kexec reboot

I believe that when my concerns about "[PATCH v5 02/22] liveupdate:
luo_core: integrate with KHO" [1] are resolved this patch won't be needed.

[1] https://lore.kernel.org/all/aROZi043lxtegqWE@kernel.org/
 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox