* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Dave Chinner @ 2025-10-05 22:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
linux-xfs
In-Reply-To: <aOCiCkFUOBWV_1yY@infradead.org>
On Fri, Oct 03, 2025 at 09:26:50PM -0700, Christoph Hellwig wrote:
> On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> > The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> > updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> > add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> > was suggested that this flag is propagated to a O_NOCMTIME one.
>
> skipping c/mtime is dangerous. The XFS handle code allows it to
> support HSM where data is migrated out to tape, and requires
> CAP_SYS_ADMIN. Allowing it for any file owner would expand the scope
> for too much as now everyone could skip timestamp updates.
>
> > It can be used by workloads that want to write a file but don't care
> > much about the preciese timestamp on it and can update it later with
> > utimens() call.
If you don't care about accurate c/mtime, then mount the filesystem
with '-o lazytime' to degrade c/mtime updates to "eventual
consistency" behaviour for IO operations. If inode metadata is
otherwise modified (e.g. block allocation during IO) or the
application then calls utimens(), it will update the recorded
in-memory timestamps in a persistent manner immediately.
> The workload might not care, the rest of the system does. ctime can't
> bet set to arbitrary values, so it is important for backups and as
> an audit trail.
But we can (and do) delay the persistence of IO-based timestamp
updates with the lazytime option.
> > There's another reason for having this patch. When performing AIO write,
> > the file_modified_flags() function checks whether or not to update inode
> > times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> > the check return EINTR error that quickly propagates into cb completion
> > without doing any IO. This restriction effectively prevents doing AIO
> > writes with nowait flag, as file modifications really imply time update.
>
> Well, we'll need to look into that, including maybe non-blockin
> timestamp updates.
Lazytime updates can generally be done in a non-blocking manner
right now (someone raised that in the context of io-uring on #xfs
about a month ago), but the NOWAIT behaviour for timestamp updates
is done at a higher level in the VFS and does not take into account
filesystem specific non-blocking lazytime updates at all. If we
push the NOWAIT checking behaviour down to the filesystem, we can do
this.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Andy Lutomirski @ 2025-10-04 16:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
linux-xfs
In-Reply-To: <aOCiCkFUOBWV_1yY@infradead.org>
On Fri, Oct 3, 2025 at 9:26 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> > The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> > updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> > add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> > was suggested that this flag is propagated to a O_NOCMTIME one.
>
> skipping c/mtime is dangerous. The XFS handle code allows it to
> support HSM where data is migrated out to tape, and requires
> CAP_SYS_ADMIN. Allowing it for any file owner would expand the scope
> for too much as now everyone could skip timestamp updates.
>
> > It can be used by workloads that want to write a file but don't care
> > much about the preciese timestamp on it and can update it later with
> > utimens() call.
>
> The workload might not care, the rest of the system does. ctime can't
> bet set to arbitrary values, so it is important for backups and as
> an audit trail.
>
> > There's another reason for having this patch. When performing AIO write,
> > the file_modified_flags() function checks whether or not to update inode
> > times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> > the check return EINTR error that quickly propagates into cb completion
> > without doing any IO. This restriction effectively prevents doing AIO
> > writes with nowait flag, as file modifications really imply time update.
>
> Well, we'll need to look into that, including maybe non-blockin
> timestamp updates.
>
It's been 12 years (!), but maybe it's time to reconsider this:
https://lore.kernel.org/all/cover.1377193658.git.luto@amacapital.net/
Nothing has fundamentally changed since then, but I bet enough little
things (folios!) have changed around this series that it won't apply
without considerably massaging. I stopped working on it personally
because I moved the workload in question onto fast, fancy SSDs
resulting in my having bigger fish to fry. I don't think I'll have
the bandwidth to pick it up any time soon, but maybe one of you folks
is interested :) I never looked into the AIO path (I was interested
in the page_mkwrite path), but my series made it at least conceptually
possible to unconditionally mark the file as needing a cmtime update
when presently dirty data is written back, and I imagine that AIO
could use that too to avoid ever needing to bail out because an mtime
update would block.
To the extent that ctime is "important for backups", it's been *wrong*
for backups approximately forever -- one can read ctime, then read the
contents of a file, and get a new ctime and an old copy of the data
that preceeds the modification that logically triggered the ctime
value that was read.
--Andy
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Christoph Hellwig @ 2025-10-04 4:26 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: linux-fsdevel, Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <20251003093213.52624-1-xemul@scylladb.com>
On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> was suggested that this flag is propagated to a O_NOCMTIME one.
skipping c/mtime is dangerous. The XFS handle code allows it to
support HSM where data is migrated out to tape, and requires
CAP_SYS_ADMIN. Allowing it for any file owner would expand the scope
for too much as now everyone could skip timestamp updates.
> It can be used by workloads that want to write a file but don't care
> much about the preciese timestamp on it and can update it later with
> utimens() call.
The workload might not care, the rest of the system does. ctime can't
bet set to arbitrary values, so it is important for backups and as
an audit trail.
> There's another reason for having this patch. When performing AIO write,
> the file_modified_flags() function checks whether or not to update inode
> times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> the check return EINTR error that quickly propagates into cb completion
> without doing any IO. This restriction effectively prevents doing AIO
> writes with nowait flag, as file modifications really imply time update.
Well, we'll need to look into that, including maybe non-blockin
timestamp updates.
^ permalink raw reply
* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Pasha Tatashin @ 2025-10-04 2:37 UTC (permalink / raw)
To: Vipin Sharma
Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <CA+CK2bBuO5YaL8MNqb5Xo_us600vTe2SF_yMNU-O9D2_RBoMag@mail.gmail.com>
On Fri, Oct 3, 2025 at 10:07 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Fri, Oct 3, 2025 at 6:51 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> > > diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> > > index af6e773cf98f..de7ca45d3892 100644
> > > --- a/tools/testing/selftests/liveupdate/.gitignore
> > > +++ b/tools/testing/selftests/liveupdate/.gitignore
> > > @@ -1 +1,2 @@
> > > /liveupdate
> > > +/luo_multi_kexec
> >
> > In next patches new tests are not added to gitignore.
>
> Will fix it, thanks.
>
> >
> > > diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> > > index 2a573c36016e..1cbc816ed5c5 100644
> > > --- a/tools/testing/selftests/liveupdate/Makefile
> > > +++ b/tools/testing/selftests/liveupdate/Makefile
> > > @@ -1,7 +1,38 @@
> > > # SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +KHDR_INCLUDES ?= -I../../../usr/include
> >
> > If make is run from the tools/testing/selftests/liveupdate directory, this
> > will not work because it needs one more "..".
> >
> > If this is built using selftest Makefile from root directory
> >
> > make -C tools/testing/selftests TARGETS=liveupdate
> >
> > there will not be build errors because tools/testing/selftests/Makefile
> > defines KHDR_INCLUDES, so above definition will never happen.
> >
> > > CFLAGS += -Wall -O2 -Wno-unused-function
> > > CFLAGS += $(KHDR_INCLUDES)
> > > +LDFLAGS += -static
> >
> > Why static? Can't we let user pass extra flags if they prefer static
>
> Because these tests are executed in a VM and not on the host, static
> makes sense to be able to run in a different environment.
>
> > > +
> > > +# --- Test Configuration (Edit this section when adding new tests) ---
> > > +LUO_SHARED_SRCS := luo_test_utils.c
> > > +LUO_SHARED_HDRS += luo_test_utils.h
> > > +
> > > +LUO_MANUAL_TESTS += luo_multi_kexec
> > > +
> > > +TEST_FILES += do_kexec.sh
> > >
> > > TEST_GEN_PROGS += liveupdate
> > >
> > > +# --- Automatic Rule Generation (Do not edit below) ---
> > > +
> > > +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> > > +
> > > +# Define the full list of sources for each manual test.
> > > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > > + $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> > > +
> > > +# This loop automatically generates an explicit build rule for each manual test.
> > > +# It includes dependencies on the shared headers and makes the output
> > > +# executable.
> > > +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> > > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > > + $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> > > + $(call msg,LINK,,$$@) ; \
> > > + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> > > + $(Q)chmod +x $$@ \
> > > + ) \
> > > +)
> > > +
> > > include ../lib.mk
> >
> > make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
> > How to build them?
>
> I am building them out of tree:
> make O=x86_64 -s -C tools/testing/selftests TARGETS=liveupdate install
> make O=x86_64 -s -C tools/testing/selftests TARGETS=kho install
Actually, I just tested in-tree and everything works for me, could you
please verify:
make mrproper # Clean the tree
cat tools/testing/selftests/liveupdate/config > .config # Copy LUO depends.
make olddefconfig # make a def config with LUO
make kvm_guest.config # Build minimal KVM guest with LUO
make headers # Make uAPI headers
make -C tools/testing/selftests TARGETS=liveupdate install # make and
install liveupdate selftests
# Show that self tests are properly installed:
ls -1 tools/testing/selftests/kselftest_install/liveupdate/
config
do_kexec.sh
liveupdate
luo_multi_file
luo_multi_kexec
luo_multi_session
luo_unreclaimed
Pasha
^ permalink raw reply
* Re: [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests
From: Pasha Tatashin @ 2025-10-04 2:08 UTC (permalink / raw)
To: Vipin Sharma
Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20251003231712.GA2144931.vipinsh@google.com>
On Fri, Oct 3, 2025 at 7:17 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On 2025-09-29 01:03:09, Pasha Tatashin wrote:
> > diff --git a/tools/testing/selftests/liveupdate/config b/tools/testing/selftests/liveupdate/config
> > new file mode 100644
> > index 000000000000..382c85b89570
> > --- /dev/null
> > +++ b/tools/testing/selftests/liveupdate/config
> > @@ -0,0 +1,6 @@
> > +CONFIG_KEXEC_FILE=y
> > +CONFIG_KEXEC_HANDOVER=y
> > +CONFIG_KEXEC_HANDOVER_DEBUG=y
> > +CONFIG_LIVEUPDATE=y
> > +CONFIG_LIVEUPDATE_SYSFS_API=y
>
> Where is this one?
I removed the v4 SYSFS interface, and this line is a leftover, I will fix it.
Thanks,
Pasha
>
^ permalink raw reply
* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Pasha Tatashin @ 2025-10-04 2:07 UTC (permalink / raw)
To: Vipin Sharma
Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20251003225120.GA2035091.vipinsh@google.com>
On Fri, Oct 3, 2025 at 6:51 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> > diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> > index af6e773cf98f..de7ca45d3892 100644
> > --- a/tools/testing/selftests/liveupdate/.gitignore
> > +++ b/tools/testing/selftests/liveupdate/.gitignore
> > @@ -1 +1,2 @@
> > /liveupdate
> > +/luo_multi_kexec
>
> In next patches new tests are not added to gitignore.
Will fix it, thanks.
>
> > diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> > index 2a573c36016e..1cbc816ed5c5 100644
> > --- a/tools/testing/selftests/liveupdate/Makefile
> > +++ b/tools/testing/selftests/liveupdate/Makefile
> > @@ -1,7 +1,38 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > +
> > +KHDR_INCLUDES ?= -I../../../usr/include
>
> If make is run from the tools/testing/selftests/liveupdate directory, this
> will not work because it needs one more "..".
>
> If this is built using selftest Makefile from root directory
>
> make -C tools/testing/selftests TARGETS=liveupdate
>
> there will not be build errors because tools/testing/selftests/Makefile
> defines KHDR_INCLUDES, so above definition will never happen.
>
> > CFLAGS += -Wall -O2 -Wno-unused-function
> > CFLAGS += $(KHDR_INCLUDES)
> > +LDFLAGS += -static
>
> Why static? Can't we let user pass extra flags if they prefer static
Because these tests are executed in a VM and not on the host, static
makes sense to be able to run in a different environment.
> > +
> > +# --- Test Configuration (Edit this section when adding new tests) ---
> > +LUO_SHARED_SRCS := luo_test_utils.c
> > +LUO_SHARED_HDRS += luo_test_utils.h
> > +
> > +LUO_MANUAL_TESTS += luo_multi_kexec
> > +
> > +TEST_FILES += do_kexec.sh
> >
> > TEST_GEN_PROGS += liveupdate
> >
> > +# --- Automatic Rule Generation (Do not edit below) ---
> > +
> > +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> > +
> > +# Define the full list of sources for each manual test.
> > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > + $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> > +
> > +# This loop automatically generates an explicit build rule for each manual test.
> > +# It includes dependencies on the shared headers and makes the output
> > +# executable.
> > +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > + $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> > + $(call msg,LINK,,$$@) ; \
> > + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> > + $(Q)chmod +x $$@ \
> > + ) \
> > +)
> > +
> > include ../lib.mk
>
> make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
> How to build them?
I am building them out of tree:
make O=x86_64 -s -C tools/testing/selftests TARGETS=liveupdate install
make O=x86_64 -s -C tools/testing/selftests TARGETS=kho install
And for me it worked, but I forgot to test with the normal make
options, thank you for reporting, and providing your fixes, I will
address them.
Pasha
^ permalink raw reply
* Re: [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests
From: Vipin Sharma @ 2025-10-03 23:17 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-19-pasha.tatashin@soleen.com>
On 2025-09-29 01:03:09, Pasha Tatashin wrote:
> diff --git a/tools/testing/selftests/liveupdate/config b/tools/testing/selftests/liveupdate/config
> new file mode 100644
> index 000000000000..382c85b89570
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/config
> @@ -0,0 +1,6 @@
> +CONFIG_KEXEC_FILE=y
> +CONFIG_KEXEC_HANDOVER=y
> +CONFIG_KEXEC_HANDOVER_DEBUG=y
> +CONFIG_LIVEUPDATE=y
> +CONFIG_LIVEUPDATE_SYSFS_API=y
Where is this one?
^ permalink raw reply
* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Vipin Sharma @ 2025-10-03 22:51 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-27-pasha.tatashin@soleen.com>
On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> index af6e773cf98f..de7ca45d3892 100644
> --- a/tools/testing/selftests/liveupdate/.gitignore
> +++ b/tools/testing/selftests/liveupdate/.gitignore
> @@ -1 +1,2 @@
> /liveupdate
> +/luo_multi_kexec
In next patches new tests are not added to gitignore.
> diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> index 2a573c36016e..1cbc816ed5c5 100644
> --- a/tools/testing/selftests/liveupdate/Makefile
> +++ b/tools/testing/selftests/liveupdate/Makefile
> @@ -1,7 +1,38 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +
> +KHDR_INCLUDES ?= -I../../../usr/include
If make is run from the tools/testing/selftests/liveupdate directory, this
will not work because it needs one more "..".
If this is built using selftest Makefile from root directory
make -C tools/testing/selftests TARGETS=liveupdate
there will not be build errors because tools/testing/selftests/Makefile
defines KHDR_INCLUDES, so above definition will never happen.
> CFLAGS += -Wall -O2 -Wno-unused-function
> CFLAGS += $(KHDR_INCLUDES)
> +LDFLAGS += -static
Why static? Can't we let user pass extra flags if they prefer static
> +
> +# --- Test Configuration (Edit this section when adding new tests) ---
> +LUO_SHARED_SRCS := luo_test_utils.c
> +LUO_SHARED_HDRS += luo_test_utils.h
> +
> +LUO_MANUAL_TESTS += luo_multi_kexec
> +
> +TEST_FILES += do_kexec.sh
>
> TEST_GEN_PROGS += liveupdate
>
> +# --- Automatic Rule Generation (Do not edit below) ---
> +
> +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> +
> +# Define the full list of sources for each manual test.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> +
> +# This loop automatically generates an explicit build rule for each manual test.
> +# It includes dependencies on the shared headers and makes the output
> +# executable.
> +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> + $(call msg,LINK,,$$@) ; \
> + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> + $(Q)chmod +x $$@ \
> + ) \
> +)
> +
> include ../lib.mk
make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
How to build them?
I ended up making bunch of changes in the Makefile to fix these issues.
Following is the diff (it is based on last patch of the series). It
allows in-tree build, out-of-tree build, and build other tests as well.
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 25a6dec790bb..fbcacbd1b798 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -1,10 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-KHDR_INCLUDES ?= -I../../../usr/include
-CFLAGS += -Wall -O2 -Wno-unused-function
-CFLAGS += $(KHDR_INCLUDES)
-LDFLAGS += -static
-
# --- Test Configuration (Edit this section when adding new tests) ---
LUO_SHARED_SRCS := luo_test_utils.c
LUO_SHARED_HDRS += luo_test_utils.h
@@ -25,6 +20,12 @@ TEST_GEN_PROGS := $(LUO_MAIN_TESTS)
liveupdate_SOURCES := liveupdate.c $(LUO_SHARED_SRCS)
+include ../lib.mk
+
+CFLAGS += -Wall -O2 -Wno-unused-function
+CFLAGS += $(KHDR_INCLUDES)
+LDFLAGS += -static
+
$(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS)
$(call msg,LINK,,$@)
$(Q)$(LINK.c) $^ $(LDLIBS) -o $@
@@ -33,16 +34,16 @@ $(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS)
$(foreach test,$(LUO_MANUAL_TESTS), \
$(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
+define BUILD_RULE_TEMPLATE
+$(OUTPUT)/$(1): $($(1)_SOURCES) $(LUO_SHARED_HDRS)
+ $(call msg,LINK,,$$@)
+ $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@
+ $(Q)chmod +x $$@
+endef
# This loop automatically generates an explicit build rule for each manual test.
# It includes dependencies on the shared headers and makes the output
# executable.
# Note the use of '$$' to escape automatic variables for the 'eval' command.
$(foreach test,$(LUO_MANUAL_TESTS), \
- $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
- $(call msg,LINK,,$$@) ; \
- $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
- $(Q)chmod +x $$@ \
- ) \
+ $(eval $(call BUILD_RULE_TEMPLATE,$(test))) \
)
-
-include ../lib.mk
^ permalink raw reply related
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Aleksa Sarai @ 2025-10-03 4:22 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
viro
In-Reply-To: <5ukckeqipdkz6aigdy7rmtsmy5zav5x4rw2hrgbxiwfflrcmgb@jy7yr34cwyat>
[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]
On 2025-10-01, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
>
> On Wed, Oct 01, 2025 at 05:35:45PM +1000, Aleksa Sarai wrote:
> > On 2025-10-01, Askar Safin <safinaskar@gmail.com> wrote:
> > > Aleksa Sarai <cyphar@cyphar.com>:
> > > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > > + &attr, sizeof(attr));
> > >
> > > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > > calls. :)
> > >
> > > I think you meant open_tree_attr here.
> >
> > Oops.
> >
> > >
> > > > +\&
> > > > +/* Create a new copy with the id-mapping cleared */
> > > > +memset(&attr, 0, sizeof(attr));
> > > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > > + &attr, sizeof(attr));
> > >
> > > And here.
> >
> > Oops x2.
> >
> > > Otherwise your whole patchset looks good. Add to whole patchset:
> > > Reviewed-by: Askar Safin <safinaskar@gmail.com>
>
> I've applied the patch, with the following amendment:
>
> diff --git i/man/man2/open_tree.2 w/man/man2/open_tree.2
> index 8b48f3b78..f6f2fbecd 100644
> --- i/man/man2/open_tree.2
> +++ w/man/man2/open_tree.2
> @@ -683,14 +683,14 @@ .SS open_tree_attr()
> .\" Using .attr_clr is not strictly necessary but makes the intent clearer.
> attr.attr_set = MOUNT_ATTR_IDMAP;
> attr.userns_fd = nsfd2;
> -mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> - &attr, sizeof(attr));
> +mntfd2 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
> + &attr, sizeof(attr));
> \&
> /* Create a new copy with the id-mapping cleared */
> memset(&attr, 0, sizeof(attr));
> attr.attr_clr = MOUNT_ATTR_IDMAP;
> -mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> - &attr, sizeof(attr));
> +mntfd3 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
> + &attr, sizeof(attr));
> .EE
> .in
> .P
>
>
> (Hopefully I got it right.)
That looks correct -- thanks!
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply
* Re: [PATCH v5 0/8] man2: document "new" mount API
From: Alejandro Colomar @ 2025-10-01 18:20 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com>
[-- Attachment #1: Type: text/plain, Size: 10437 bytes --]
Hi Aleksa, Askar,
On Thu, Sep 25, 2025 at 01:31:22AM +1000, Aleksa Sarai wrote:
> Back in 2019, the new mount API was merged[1]. David Howells then set
> about writing man pages for these new APIs, and sent some patches back
> in 2020[2].
>
[...]
>
> In addition, I have also included a man page for open_tree_attr(2) (as a
> subsection of the new open_tree(2) man page), which was merged in Linux
> 6.15.
>
> [1]: https://lore.kernel.org/all/20190507204921.GL23075@ZenIV.linux.org.uk/
> [2]: https://lore.kernel.org/linux-man/159680892602.29015.6551860260436544999.stgit@warthog.procyon.org.uk/
> [3]: https://github.com/brauner/man-pages-md
>
> Co-authored-by: David Howells <dhowells@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Co-authored-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The full patch set has been merged now. I've done a merge commit where
I've pasted this cover letter, and amended it so that Aleksa is the
author of the merge commit. I've also included Askar's Reviewed-by tag
in the merge commit itself.
I'll have it in a separate branch for a few days, in case I need to fix
anything. You can check it here:
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?h=fs>
I editorialized the titles, but other than that, I didn't do much.
I think I mentioned most of the changes in replies to each patch.
Thanks a lot for your contributions!
Have a lovely night!
Alex
> ---
> Changes in v5:
> - `sed -i s|file descriptor based|file-descriptor-based|`.
> [Alejandro Colomar]
> - fsconfig(2): use bullets instead of ordered list for workflow
> description. [Alejandro Colomar]
> - mount_setattr(2): fix minor wording nit in new attribute-parameter
> subsection.
> - fsopen(2): remove brackets around "message" for message retrieval
> interface description. [Alejandro Colomar]
> - {move_mount,fspick}(2): fix remaining incorrect no-automount text.
> [Askar Safin]
> - {fsmount,open_tree}(2): `sed -i s|MOUNT_DETACH|MNT_DETACH|g`.
> [Askar Safin]
> - mount_setattr(2): fix copy-paste snafu in attribute-parameter
> subsection. [Askar Safin]
> - *: clean `make -R build-catman-troff`. [Alejandro Colomar]
> - *: switch to \[em]\c where appropriate.
> - open_tree(2): clean up MNT_DETACH-on-close description and make it
> slightly more prominent. [Alejandro Colomar]
> - open_tree(2): mention the distinction from open(O_PATH) with regards
> to automounts. Askar suggested it be put in the section about
> ~OPEN_TREE_CLONE, but the change in behaviour also applies to
> OPEN_TREE_CLONE and it looked awkward to include it in the
> dentry_open() case because O_PATH only gets mentioned in the following
> paragraph (where I've put the text now). [Askar Safin]
> - {move_mount,open_tree{,_attr}}(2): fix column-width-related "make -R
> check" failures.
> - *: fix remaining "make -R lint" failures.
> - open_tree_attr(2): add example using MOUNT_ATTR_IDMAP.
> - v4: <https://lore.kernel.org/r/20250919-new-mount-api-v4-0-1261201ab562@cyphar.com>
>
> Changes in v4:
> - `sed -i s|\\% |\\%|g`.
> - Remove unneeded quotes in SYNOPSIS. [Alejandro Colomar]
> - open_tree(2): fix leftover confusing usages of "attach" when referring
> to file descriptors being associated with mount objects.
> - open_tree(2): rename "Anonymous mount namespaces" NOTES subsection to
> the far more informative "Mount propagation" and clean up the wording
> a little.
> - open_tree_attr(2): add a code comment about
> <https://lore.kernel.org/all/20250808-open_tree_attr-bugfix-idmap-v1-0-0ec7bc05646c@cyphar.com/>
> - {fsconfig,open_tree_attr}(2): use _Nullable.
> - {fsmount,open_tree}(2): mention the the unmount-on-close behaviour is
> actually lazy (a-la MNT_DETACH).
> - {fsconfig,mount_setattr}(2): improve "mount attributes and filesystem
> parameters" wording to make it clearer that superblock and mount flags
> are sibling properties, not the same thing.
> - open_tree(2): mention that any mount propagation events while the
> mount object is detached are completely lost -- i.e., they don't get
> replayed once you attach the mount somewhere.
> - fsconfig(2): fix minor grammatical / missing joining word issues.
> - fsconfig(2): fix final leftover `.IR A " and " B` cases.
> - fsconfig(2): explain that failed fsconfig(FSCONFIG_CMD_*) operations
> render the filesystem context invalid.
> - fsconfig(2): rework the description of superblock reuse, as the
> previous text was very wrong. (Though there has been discussion about
> changing this behaviour...)
> - fsconfig(2): remove misleading wording in FSCONFIG_CMD_CREATE_EXCL
> about how we are requesting a new filesystem instance -- in theory
> filesystems could take this request into account but in practice none
> do (and it seems unlikely any ever will).
> - fsconfig(2): mention that key, value, and aux must be 0 or NULL for
> FSCONFIG_CMD_RECONF.
> - fsmount(2): fix usage of "filesystem instance" in relation to
> fsmount() and open_tree() comparison. [Askar Safin]
> - move_mount(2): "as attached" -> "as a detached" [Askar Safin]
> - fspick(2): add note about filesystem parameter list being copied
> rather than reset with FSCONFIG_CMD_RECONFIGURE. [Askar Safin]
> - v3: <https://lore.kernel.org/r/20250809-new-mount-api-v3-0-f61405c80f34@cyphar.com>
>
> Changes in v3:
> - `sed -i s|Co-developed-by|Co-authored-by|g`. [Alejandro Colomar]
> - Add Signed-off-by for co-authors. [Christian Brauner]
> - `sed -i s|needs-mount|awaiting-mount|g`, to match the kernel parlance.
> - Fix VERSIONS/HISTORY mixup in mount_attr(2type) that was copied from
> open_how(2type). [Alejandro Colomar]
> - Fix incorrect .BR usage in SYNOPSIS.
> - Some more semantic newlines fixes. [Alejandro Colomar]
> - Minor fixes suggested by Alejandro. [Alejandro Colomar]
> - open_tree_attr(2): heavily reword everything to be better formatted
> and more explicit about its behaviour.
> - open_tree(2): write proper explanatory paragraphs for the EXAMPLES.
> - mount_setattr(2): fix stray doublequote in SYNOPSIS. [Askar Safin]
> - fsopen(2): rework structure of the DESCRIPTION introduction.
> - fsopen(2): explicitly say that read(2) errors in the message retrieval
> interface are actual errors, not return 0. [Askar Safin]
> - fsopen(2): add BUGS section to describe the unfortunate -ENODATA
> message dropping behaviour that should be fixed by
> <https://lore.kernel.org/r/20250807-fscontext-log-cleanups-v3-0-8d91d6242dc3@cyphar.com/>.
> - fsconfig(2): add a NOTES subsection about generic filesystem
> parameters.
> - fsconfig(2): add comment about the weirdness surrounding
> FSCONFIG_SET_PATH.
> - {fspick,open_tree}(2): Correct AT_NO_AUTOMOUNT description (copied
> from David, who probably copied it from statx(2)) -- AT_NO_AUTOMOUNT
> applies to all path components, not just the final one. [Christian
> Brauner]
> - statx(2): fix AT_NO_AUTOMOUNT documentation.
> - open_tree(2): swap open(2) reference for openat(2) when saying that
> the result is identical. [Askar Safin]
> - fsmount(2): fix DESCRIPTION introduction, and rework attr_flags
> description to better reference mount_setattr(2).
> - {fsopen,fspick,fsmount,open_tree}(2): don't use "attach" when talking
> about the file descriptors we return that reference in-kernel objects,
> to avoid confusing readers with mount object attachment status.
> - fsconfig(2): remove pidns argument example, as it was kind of unclear
> and referenced kernel features not yet merged.
> - fsconfig(2): remove rambling FSCONFIG_SET_PATH_EMPTY text (which
> mostly describes an academic issue that doesn't apply to any existing
> filesystem), and instead add a CAVEATS section which touches on the
> weird type behaviour of fsconfig(2).
> - v2: <https://lore.kernel.org/r/20250807-new-mount-api-v2-0-558a27b8068c@cyphar.com>
>
> Changes in v2:
> - `make -R lint-man`. [Alejandro Colomar]
> - `sed -i s|Glibc|glibc|g`. [Alejandro Colomar]
> - `sed -i s|pathname|path|g` [Alejandro Colomar]
> - Clean up macro usage, example code, and synopsis. [Alejandro Colomar]
> - Try to use semantic newlines. [Alejandro Colomar]
> - Make sure the usage of "filesystem context", "filesystem instance",
> and "mount object" are consistent. [Askar Safin]
> - Avoid referring to these syscalls without an "at" suffix as "*at()
> syscalls". [Askar Safin]
> - Use \% to avoid hyphenation of constants. [Askar Safin, G. Branden Robinson]
> - Add a new subsection to mount_setattr(2) to describe the distinction
> between mount attributes and filesystem parameters.
> - (Under protest) double-space-after-period formatted commit messages.
> - v1: <https://lore.kernel.org/r/20250806-new-mount-api-v1-0-8678f56c6ee0@cyphar.com>
>
> ---
> Aleksa Sarai (8):
> man/man2/fsopen.2: document "new" mount API
> man/man2/fspick.2: document "new" mount API
> man/man2/fsconfig.2: document "new" mount API
> man/man2/fsmount.2: document "new" mount API
> man/man2/move_mount.2: document "new" mount API
> man/man2/open_tree.2: document "new" mount API
> man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
> man/man2/{fsconfig,mount_setattr}.2: add note about attribute-parameter distinction
>
> man/man2/fsconfig.2 | 741 ++++++++++++++++++++++++++++++++++++++++++++++
> man/man2/fsmount.2 | 231 +++++++++++++++
> man/man2/fsopen.2 | 385 ++++++++++++++++++++++++
> man/man2/fspick.2 | 343 +++++++++++++++++++++
> man/man2/mount_setattr.2 | 39 +++
> man/man2/move_mount.2 | 646 ++++++++++++++++++++++++++++++++++++++++
> man/man2/open_tree.2 | 709 ++++++++++++++++++++++++++++++++++++++++++++
> man/man2/open_tree_attr.2 | 1 +
> 8 files changed, 3095 insertions(+)
> ---
> base-commit: f17990c243eafc1891ff692f90b6ce42e6449be8
> change-id: 20250802-new-mount-api-436db984f432
>
>
> Kind regards,
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
>
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Alejandro Colomar @ 2025-10-01 18:02 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
viro
In-Reply-To: <2025-10-01-brawny-bronze-taste-mounds-zp8G2b@cyphar.com>
[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]
Hi Aleksa,
On Wed, Oct 01, 2025 at 05:35:45PM +1000, Aleksa Sarai wrote:
> On 2025-10-01, Askar Safin <safinaskar@gmail.com> wrote:
> > Aleksa Sarai <cyphar@cyphar.com>:
> > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > + &attr, sizeof(attr));
> >
> > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > calls. :)
> >
> > I think you meant open_tree_attr here.
>
> Oops.
>
> >
> > > +\&
> > > +/* Create a new copy with the id-mapping cleared */
> > > +memset(&attr, 0, sizeof(attr));
> > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > + &attr, sizeof(attr));
> >
> > And here.
>
> Oops x2.
>
> > Otherwise your whole patchset looks good. Add to whole patchset:
> > Reviewed-by: Askar Safin <safinaskar@gmail.com>
I've applied the patch, with the following amendment:
diff --git i/man/man2/open_tree.2 w/man/man2/open_tree.2
index 8b48f3b78..f6f2fbecd 100644
--- i/man/man2/open_tree.2
+++ w/man/man2/open_tree.2
@@ -683,14 +683,14 @@ .SS open_tree_attr()
.\" Using .attr_clr is not strictly necessary but makes the intent clearer.
attr.attr_set = MOUNT_ATTR_IDMAP;
attr.userns_fd = nsfd2;
-mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
- &attr, sizeof(attr));
+mntfd2 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
+ &attr, sizeof(attr));
\&
/* Create a new copy with the id-mapping cleared */
memset(&attr, 0, sizeof(attr));
attr.attr_clr = MOUNT_ATTR_IDMAP;
-mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
- &attr, sizeof(attr));
+mntfd3 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
+ &attr, sizeof(attr));
.EE
.in
.P
(Hopefully I got it right.)
Cheers,
Alex
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v5 6/8] man/man2/open_tree.2: document "new" mount API
From: Alejandro Colomar @ 2025-10-01 17:59 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
In-Reply-To: <20250925-new-mount-api-v5-6-028fb88023f2@cyphar.com>
[-- Attachment #1: Type: text/plain, Size: 14015 bytes --]
Hi Aleksa,
On Thu, Sep 25, 2025 at 01:31:28AM +1000, Aleksa Sarai wrote:
> This is loosely based on the original documentation written by David
> Howells and later maintained by Christian Brauner, but has been
> rewritten to be more from a user perspective (as well as fixing a few
> critical mistakes).
>
> Co-authored-by: David Howells <dhowells@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Co-authored-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
Patch applied. Thanks!
Have a lovely night!
Alex
> man/man2/open_tree.2 | 518 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 518 insertions(+)
>
> diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2
> new file mode 100644
> index 0000000000000000000000000000000000000000..6b04a80927a8b6a394cf7ab341b8d6b29d42d304
> --- /dev/null
> +++ b/man/man2/open_tree.2
> @@ -0,0 +1,518 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH open_tree 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +open_tree \- open path or create detached mount object and attach to fd
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#define _GNU_SOURCE " "/* See feature_test_macros(7) */"
> +.BR "#include <fcntl.h>" " /* Definition of " AT_* " constants */"
> +.B #include <sys/mount.h>
> +.P
> +.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR open_tree ()
> +system call is part of
> +the suite of file-descriptor-based mount facilities in Linux.
> +.IP \[bu] 3
> +If
> +.I flags
> +contains
> +.BR \%OPEN_TREE_CLONE ,
> +.BR open_tree ()
> +creates a detached mount object
> +which consists of a bind-mount of
> +the path specified by the
> +.IR path .
> +A new file descriptor
> +associated with the detached mount object
> +is then returned.
> +The mount object is equivalent to a bind-mount
> +that would be created by
> +.BR mount (2)
> +called with
> +.BR \%MS_BIND ,
> +except that it is tied to a file descriptor
> +and is not mounted onto the filesystem.
> +.IP
> +As with file descriptors returned from
> +.BR fsmount (2),
> +the resultant file descriptor can then be used with
> +.BR move_mount (2),
> +.BR mount_setattr (2),
> +or other such system calls to do further mount operations.
> +.IP
> +This mount object will be unmounted and destroyed
> +when the file descriptor is closed
> +if it was not otherwise attached to a mount point
> +by calling
> +.BR move_mount (2).
> +This implicit unmount operation is lazy\[em]\c
> +akin to calling
> +.BR umount2 (2)
> +with
> +.BR \%MNT_DETACH ;
> +thus,
> +any existing open references to files
> +from the mount object
> +will continue to work,
> +and the mount object will only be completely destroyed
> +once it ceases to be busy.
> +.IP \[bu]
> +If
> +.I flags
> +does not contain
> +.BR \%OPEN_TREE_CLONE ,
> +.BR open_tree ()
> +returns a file descriptor
> +that is exactly equivalent to
> +one produced by
> +.BR openat (2)
> +when called with the same
> +.I dirfd
> +and
> +.IR path .
> +.P
> +In either case, the resultant file descriptor
> +acts the same as one produced by
> +.BR open (2)
> +with
> +.BR O_PATH ,
> +meaning it can also be used as a
> +.I dirfd
> +argument to
> +"*at()" system calls.
> +However,
> +unlike
> +.BR open (2)
> +called with
> +.BR O_PATH ,
> +automounts will
> +by default
> +be triggered by
> +.BR open_tree ()
> +unless
> +.B \%AT_NO_AUTOMOUNT
> +is included in
> +.IR flags .
> +.P
> +As with "*at()" system calls,
> +.BR open_tree ()
> +uses the
> +.I dirfd
> +argument in conjunction with the
> +.I path
> +argument to determine the path to operate on, as follows:
> +.IP \[bu] 3
> +If the pathname given in
> +.I path
> +is absolute, then
> +.I dirfd
> +is ignored.
> +.IP \[bu]
> +If the pathname given in
> +.I path
> +is relative and
> +.I dirfd
> +is the special value
> +.BR \%AT_FDCWD ,
> +then
> +.I path
> +is interpreted relative to
> +the current working directory
> +of the calling process (like
> +.BR open (2)).
> +.IP \[bu]
> +If the pathname given in
> +.I path
> +is relative,
> +then it is interpreted relative to
> +the directory referred to by the file descriptor
> +.I dirfd
> +(rather than relative to
> +the current working directory
> +of the calling process,
> +as is done by
> +.BR open (2)
> +for a relative pathname).
> +In this case,
> +.I dirfd
> +must be a directory
> +that was opened for reading
> +.RB ( \%O_RDONLY )
> +or using the
> +.B O_PATH
> +flag.
> +.IP \[bu]
> +If
> +.I path
> +is an empty string,
> +and
> +.I flags
> +contains
> +.BR \%AT_EMPTY_PATH ,
> +then the file descriptor
> +.I dirfd
> +is operated on directly.
> +In this case,
> +.I dirfd
> +may refer to any type of file,
> +not just a directory.
> +.P
> +See
> +.BR openat (2)
> +for an explanation of why the
> +.I dirfd
> +argument is useful.
> +.P
> +.I flags
> +can be used to control aspects of the path lookup
> +and properties of the returned file descriptor.
> +A value for
> +.I flags
> +is constructed by bitwise ORing
> +zero or more of the following constants:
> +.RS
> +.TP
> +.B \%AT_EMPTY_PATH
> +If
> +.I path
> +is an empty string, operate on the file referred to by
> +.I dirfd
> +(which may have been obtained from
> +.BR open (2),
> +.BR fsmount (2),
> +or from another
> +.BR open_tree ()
> +call).
> +In this case,
> +.I dirfd
> +may refer to any type of file, not just a directory.
> +If
> +.I dirfd
> +is
> +.BR \%AT_FDCWD ,
> +.BR open_tree ()
> +will operate on the current working directory
> +of the calling process.
> +This flag is Linux-specific;
> +define
> +.B \%_GNU_SOURCE
> +to obtain its definition.
> +.TP
> +.B \%AT_NO_AUTOMOUNT
> +Do not automount the terminal ("basename") component of
> +.I path
> +if it is a directory that is an automount point.
> +This allows you to create a handle to the automount point itself,
> +rather than the location it would mount.
> +This flag has no effect if the mount point has already been mounted over.
> +This flag is Linux-specific;
> +define
> +.B \%_GNU_SOURCE
> +to obtain its definition.
> +.TP
> +.B \%AT_SYMLINK_NOFOLLOW
> +If
> +.I path
> +is a symbolic link, do not dereference it;
> +instead,
> +create either a handle to the link itself
> +or a bind-mount of it.
> +The resultant file descriptor is indistinguishable from one produced by
> +.BR openat (2)
> +with
> +.BR \%O_PATH | O_NOFOLLLOW .
> +.TP
> +.B \%OPEN_TREE_CLOEXEC
> +Set the close-on-exec
> +.RB ( FD_CLOEXEC )
> +flag on the new file descriptor.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +for reasons why this may be useful.
> +.TP
> +.B \%OPEN_TREE_CLONE
> +Rather than creating an
> +.BR openat (2)-style
> +.B O_PATH
> +file descriptor,
> +create a bind-mount of
> +.I path
> +(akin to
> +.IR \%mount\~\-\-bind )
> +as a detached mount object.
> +In order to do this operation,
> +the calling process must have the
> +.B \%CAP_SYS_ADMIN
> +capability.
> +.TP
> +.B \%AT_RECURSIVE
> +Create a recursive bind-mount of the path
> +(akin to
> +.IR \%mount\~\-\-rbind )
> +as a detached mount object.
> +This flag is only permitted in conjunction with
> +.BR \%OPEN_TREE_CLONE .
> +.SH RETURN VALUE
> +On success, a new file descriptor is returned.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EACCES
> +Search permission is denied for one of the directories
> +in the path prefix of
> +.IR path .
> +(See also
> +.BR path_resolution (7).)
> +.TP
> +.B EBADF
> +.I path
> +is relative but
> +.I dirfd
> +is neither
> +.B \%AT_FDCWD
> +nor a valid file descriptor.
> +.TP
> +.B EFAULT
> +.I path
> +is NULL
> +or a pointer to a location
> +outside the calling process's accessible address space.
> +.TP
> +.B EINVAL
> +Invalid flag specified in
> +.IR flags .
> +.TP
> +.B ELOOP
> +Too many symbolic links encountered when resolving
> +.IR path .
> +.TP
> +.B EMFILE
> +The calling process has too many open files to create more.
> +.TP
> +.B ENAMETOOLONG
> +.I path
> +is longer than
> +.BR PATH_MAX .
> +.TP
> +.B ENFILE
> +The system has too many open files to create more.
> +.TP
> +.B ENOENT
> +A component of
> +.I path
> +does not exist, or is a dangling symbolic link.
> +.TP
> +.B ENOENT
> +.I path
> +is an empty string, but
> +.B AT_EMPTY_PATH
> +is not specified in
> +.IR flags .
> +.TP
> +.B ENOTDIR
> +A component of the path prefix of
> +.I path
> +is not a directory, or
> +.I path
> +is relative and
> +.I dirfd
> +is a file descriptor referring to a file other than a directory.
> +.TP
> +.B ENOSPC
> +The "anonymous" mount namespace
> +necessary to contain the
> +.B \%OPEN_TREE_CLONE
> +detached bind-mount mount object
> +could not be allocated,
> +as doing so would exceed
> +the configured per-user limit on
> +the number of mount namespaces in the current user namespace.
> +(See also
> +.BR namespaces (7).)
> +.TP
> +.B ENOMEM
> +The kernel could not allocate sufficient memory to complete the operation.
> +.TP
> +.B EPERM
> +.I flags
> +contains
> +.B \%OPEN_TREE_CLONE
> +but the calling process does not have the required
> +.B CAP_SYS_ADMIN
> +capability.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 5.2.
> +.\" commit a07b20004793d8926f78d63eb5980559f7813404
> +.\" commit 400913252d09f9cfb8cce33daee43167921fc343
> +glibc 2.36.
> +.SH NOTES
> +.SS Mount propagation
> +The bind-mount mount objects created by
> +.BR open_tree ()
> +with
> +.B \%OPEN_TREE_CLONE
> +are not associated with
> +the mount namespace of the calling process.
> +Instead, each mount object is placed
> +in a newly allocated "anonymous" mount namespace
> +associated with the calling process.
> +.P
> +One of the side-effects of this is that
> +(unlike bind-mounts created with
> +.BR mount (2)),
> +mount propagation
> +(as described in
> +.BR mount_namespaces (7))
> +will not be applied to bind-mounts created by
> +.BR open_tree ()
> +until the bind-mount is attached with
> +.BR move_mount (2),
> +at which point the mount object
> +will be associated with the mount namespace
> +where it was attached
> +and mount propagation will resume.
> +Note that any mount propagation events that occurred
> +before the mount object was attached
> +will
> +.I not
> +be propagated to the mount object,
> +even after it is attached.
> +.SH EXAMPLES
> +The following examples show how
> +.BR open_tree ()
> +can be used in place of more traditional
> +.BR mount (2)
> +calls with
> +.BR MS_BIND .
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE);
> +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +First,
> +a detached bind-mount mount object of
> +.I /var
> +is created
> +and associated with the file descriptor
> +.IR srcfd .
> +Then, the mount object is attached to
> +.I /mnt
> +using
> +.BR move_mount (2)
> +with
> +.B \%MOVE_MOUNT_F_EMPTY_PATH
> +to request that the detached mount object
> +associated with the file descriptor
> +.I srcfd
> +be moved (and thus attached) to
> +.IR /mnt .
> +.P
> +The above procedure is functionally equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/var", "/mnt", NULL, MS_BIND, NULL);
> +.EE
> +.in
> +.P
> +.B \%OPEN_TREE_CLONE
> +can be combined with
> +.B \%AT_RECURSIVE
> +to create recursive detached bind-mount mount objects,
> +which in turn can be attached to mount points
> +to create recursive bind-mounts.
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(AT_FDCWD, "/var",
> + OPEN_TREE_CLONE | AT_RECURSIVE);
> +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +The above procedure is functionally equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/var", "/mnt", NULL, MS_BIND | MS_REC, NULL);
> +.EE
> +.in
> +.P
> +One of the primary benefits of using
> +.BR open_tree ()
> +and
> +.BR move_mount (2)
> +over the traditional
> +.BR mount (2)
> +is that operating with
> +.IR dirfd -style
> +file descriptors is far easier and more intuitive.
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE);
> +move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +The above procedure is roughly equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/proc/self/fd/100",
> + "/proc/self/fd/200/foo",
> + NULL, MS_BIND, NULL);
> +.EE
> +.in
> +.P
> +In addition, you can use the file descriptor returned by
> +.BR open_tree ()
> +as the
> +.I dirfd
> +argument to any "*at()" system calls:
> +.P
> +.in +4n
> +.EX
> +int dirfd, fd;
> +\&
> +dirfd = open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE);
> +fd = openat(dirfd, "passwd", O_RDONLY);
> +fchmodat(dirfd, "shadow", 0000, 0);
> +close(dirfd);
> +close(fd);
> +/* The bind-mount is now destroyed */
> +.EE
> +.in
> +.SH SEE ALSO
> +.BR fsconfig (2),
> +.BR fsmount (2),
> +.BR fsopen (2),
> +.BR fspick (2),
> +.BR mount (2),
> +.BR mount_setattr (2),
> +.BR move_mount (2),
> +.BR mount_namespaces (7)
>
> --
> 2.51.0
>
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Aleksa Sarai @ 2025-10-01 7:37 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
viro
In-Reply-To: <ugko3x7tuqrmbyb326aw3dvtvmdozvtps6hc6ff3lmtsijoube@aem2acyk6t2q>
[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]
On 2025-10-01, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Askar,
>
> On Wed, Oct 01, 2025 at 03:38:41AM +0300, Askar Safin wrote:
> > Aleksa Sarai <cyphar@cyphar.com>:
> > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > + &attr, sizeof(attr));
> >
> > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > calls. :)
> >
> > I think you meant open_tree_attr here.
>
> I'll wait for Aleksa to confirm before applying and amending.
Yeah, Askar is right, they were a copy-paste snafu.
> > > +\&
> > > +/* Create a new copy with the id-mapping cleared */
> > > +memset(&attr, 0, sizeof(attr));
> > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > + &attr, sizeof(attr));
> >
> > And here.
> >
> > Otherwise your whole patchset looks good. Add to whole patchset:
> > Reviewed-by: Askar Safin <safinaskar@gmail.com>
>
> Thanks! I'll retro-fit that to the commits I've aplied already too, as
> I haven't pushed them to master yet.
>
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es>
> Use port 80 (that is, <...:80/>).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Aleksa Sarai @ 2025-10-01 7:35 UTC (permalink / raw)
To: Askar Safin
Cc: alx, brauner, dhowells, g.branden.robinson, jack, linux-api,
linux-fsdevel, linux-kernel, linux-man, mtk.manpages, viro
In-Reply-To: <20251001003841.510494-1-safinaskar@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 856 bytes --]
On 2025-10-01, Askar Safin <safinaskar@gmail.com> wrote:
> Aleksa Sarai <cyphar@cyphar.com>:
> > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > + &attr, sizeof(attr));
>
> Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> calls. :)
>
> I think you meant open_tree_attr here.
Oops.
>
> > +\&
> > +/* Create a new copy with the id-mapping cleared */
> > +memset(&attr, 0, sizeof(attr));
> > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > + &attr, sizeof(attr));
>
> And here.
Oops x2.
> Otherwise your whole patchset looks good. Add to whole patchset:
> Reviewed-by: Askar Safin <safinaskar@gmail.com>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Alejandro Colomar @ 2025-10-01 6:45 UTC (permalink / raw)
To: Askar Safin
Cc: cyphar, brauner, dhowells, g.branden.robinson, jack, linux-api,
linux-fsdevel, linux-kernel, linux-man, mtk.manpages, viro
In-Reply-To: <20251001003841.510494-1-safinaskar@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]
Hi Askar,
On Wed, Oct 01, 2025 at 03:38:41AM +0300, Askar Safin wrote:
> Aleksa Sarai <cyphar@cyphar.com>:
> > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > + &attr, sizeof(attr));
>
> Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> calls. :)
>
> I think you meant open_tree_attr here.
I'll wait for Aleksa to confirm before applying and amending.
> > +\&
> > +/* Create a new copy with the id-mapping cleared */
> > +memset(&attr, 0, sizeof(attr));
> > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > + &attr, sizeof(attr));
>
> And here.
>
> Otherwise your whole patchset looks good. Add to whole patchset:
> Reviewed-by: Askar Safin <safinaskar@gmail.com>
Thanks! I'll retro-fit that to the commits I've aplied already too, as
I haven't pushed them to master yet.
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Askar Safin @ 2025-10-01 0:38 UTC (permalink / raw)
To: cyphar
Cc: alx, brauner, dhowells, g.branden.robinson, jack, linux-api,
linux-fsdevel, linux-kernel, linux-man, mtk.manpages, viro
In-Reply-To: <20250925-new-mount-api-v5-7-028fb88023f2@cyphar.com>
Aleksa Sarai <cyphar@cyphar.com>:
> +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> + &attr, sizeof(attr));
Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
calls. :)
I think you meant open_tree_attr here.
> +\&
> +/* Create a new copy with the id-mapping cleared */
> +memset(&attr, 0, sizeof(attr));
> +attr.attr_clr = MOUNT_ATTR_IDMAP;
> +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> + &attr, sizeof(attr));
And here.
Otherwise your whole patchset looks good. Add to whole patchset:
Reviewed-by: Askar Safin <safinaskar@gmail.com>
--
Askar Safin
^ permalink raw reply
* Re: [PATCH-RFC] init: simplify initrd code (was Re: [PATCH RESEND 00/62] initrd: remove classic initrd support).
From: David Disseldorp @ 2025-09-29 9:13 UTC (permalink / raw)
To: nschichan
Cc: akpm, andy.shevchenko, axboe, brauner, cyphar, devicetree,
ecurtin, email2tema, graf, gregkh, hca, hch, hsiangkao, initramfs,
jack, julian.stecklina, kees, linux-acpi, linux-alpha, linux-api,
linux-arch, linux-arm-kernel, linux-block, linux-csky, linux-doc,
linux-efi, linux-ext4, linux-fsdevel, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-openrisc, linux-parisc, linux-riscv,
linux-s390, linux-sh, linux-snps-arc, linux-um, linuxppc-dev,
loongarch, mcgrof, mingo, monstr, mzxreary, patches, rob,
safinaskar, sparclinux, thomas.weissschuh, thorsten.blum,
torvalds, tytso, viro, x86
In-Reply-To: <20250925131055.3933381-1-nschichan@freebox.fr>
Hi Nicolas,
On Thu, 25 Sep 2025 15:10:56 +0200, nschichan@freebox.fr wrote:
> From: Nicolas Schichan <nschichan@freebox.fr>
>
> - drop prompt_ramdisk and ramdisk_start kernel parameters
> - drop compression support
> - drop image autodetection, the whole /initrd.image content is now
> copied into /dev/ram0
> - remove rd_load_disk() which doesn't seem to be used anywhere.
>
> There is now no more limitation on the type of initrd filesystem that
> can be loaded since the code trying to guess the initrd filesystem
> size is gone (the whole /initrd.image file is used).
>
> A few global variables in do_mounts_rd.c are now put as local
> variables in rd_load_image() since they do not need to be visible
> outside this function.
> ---
>
> Hello,
>
> Hopefully my email config is now better and reaches gmail users
> correctly.
>
> The patch below could probably split in a few patches, but I think
> this simplify the code greatly without removing the functionality we
> depend on (and this allows now to use EROFS initrd images).
>
> Coupled with keeping the function populate_initrd_image() in
> init/initramfs.c, this will keep what we need from the initrd code.
>
> This removes support of loading bzip/gz/xz/... compressed images as
> well, not sure if many user depend on this feature anymore.
>
> No signoff because I'm only seeking comments about those changes right
> now.
>
> init/do_mounts.h | 2 -
> init/do_mounts_rd.c | 243 +-------------------------------------------
> 2 files changed, 4 insertions(+), 241 deletions(-)
This seems like a reasonable improvement to me. FWIW, one alternative
approach to clean up the FS specific code here was proposed by Al:
https://lore.kernel.org/all/20250321020826.GB2023217@ZenIV/
...
> diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
> index ac021ae6e6fa..5a69ff43f5ee 100644
> --- a/init/do_mounts_rd.c
> +++ b/init/do_mounts_rd.c
> @@ -14,173 +14,9 @@
>
> #include <linux/decompress/generic.h>
>
> -static struct file *in_file, *out_file;
> -static loff_t in_pos, out_pos;
> -
> -static int __init prompt_ramdisk(char *str)
> -{
> - pr_warn("ignoring the deprecated prompt_ramdisk= option\n");
> - return 1;
> -}
> -__setup("prompt_ramdisk=", prompt_ramdisk);
> -
> -int __initdata rd_image_start; /* starting block # of image */
> -
> -static int __init ramdisk_start_setup(char *str)
> -{
> - rd_image_start = simple_strtol(str,NULL,0);
> - return 1;
> -}
> -__setup("ramdisk_start=", ramdisk_start_setup);
There are a couple of other places that mention these parameters, which
should also be cleaned up.
...
> static unsigned long nr_blocks(struct file *file)
> {
> - struct inode *inode = file->f_mapping->host;
> -
> - if (!S_ISBLK(inode->i_mode))
> - return 0;
> - return i_size_read(inode) >> 10;
> + return i_size_read(file->f_mapping->host) >> 10;
This should be >> BLOCK_SIZE_BITS, and dropped as a wrapper function
IMO.
^ permalink raw reply
* [PATCH v4 30/30] selftests/liveupdate: Add tests for per-session state and cancel cycles
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
Introduce two new, non-kexec selftests to validate the state transition
logic for individual LUO sessions, with a focus on the PREPARE, FREEZE,
and CANCEL events. While other tests cover the full kexec lifecycle, it
is critical to also test the internal per-session state machine's logic
and rollback capabilities in isolation. These tests provide this focused
coverage, ensuring the core session management ioctls behave as
expected.
The new test cases are:
1. session_prepare_cancel_cycle:
- Verifies the fundamental NORMAL -> PREPARED -> NORMAL state
transition path.
- It creates a session, preserves a file, sends a per-session PREPARE
event, asserts the state is PREPARED, then sends a CANCEL event and
asserts the state has correctly returned to NORMAL.
2. session_freeze_cancel_cycle:
- Extends the first test by validating the more critical ... ->
FROZEN -> NORMAL rollback path.
- It follows the same steps but adds a FREEZE event after PREPARE,
asserting the session enters the FROZEN state.
- It then sends a CANCEL event, verifying that a session can be rolled
back even from this final pre-kexec state. This is essential for
robustly handling aborts.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/Makefile | 9 ++-
.../testing/selftests/liveupdate/liveupdate.c | 56 +++++++++++++++++++
2 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index ffce73233149..25a6dec790bb 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -16,11 +16,18 @@ LUO_MANUAL_TESTS += luo_unreclaimed
TEST_FILES += do_kexec.sh
-TEST_GEN_PROGS += liveupdate
+LUO_MAIN_TESTS += liveupdate
# --- Automatic Rule Generation (Do not edit below) ---
TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
+TEST_GEN_PROGS := $(LUO_MAIN_TESTS)
+
+liveupdate_SOURCES := liveupdate.c $(LUO_SHARED_SRCS)
+
+$(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS)
+ $(call msg,LINK,,$@)
+ $(Q)$(LINK.c) $^ $(LDLIBS) -o $@
# Define the full list of sources for each manual test.
$(foreach test,$(LUO_MANUAL_TESTS), \
diff --git a/tools/testing/selftests/liveupdate/liveupdate.c b/tools/testing/selftests/liveupdate/liveupdate.c
index 7c0ceaac0283..804aa25ce5ae 100644
--- a/tools/testing/selftests/liveupdate/liveupdate.c
+++ b/tools/testing/selftests/liveupdate/liveupdate.c
@@ -17,6 +17,7 @@
#include <sys/mman.h>
#include <linux/liveupdate.h>
+#include "luo_test_utils.h"
#include "../kselftest.h"
#include "../kselftest_harness.h"
@@ -52,6 +53,16 @@ const char *const luo_state_str[] = {
[LIVEUPDATE_STATE_UPDATED] = "updated",
};
+static int get_session_state(int session_fd)
+{
+ struct liveupdate_session_get_state arg = { .size = sizeof(arg) };
+
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_GET_STATE, &arg) < 0)
+ return -errno;
+
+ return arg.state;
+}
+
static int run_luo_selftest_cmd(int fd_dbg, __u64 cmd_code,
struct luo_arg_subsystem *subsys_arg)
{
@@ -345,4 +356,49 @@ TEST_F(subsystem, prepare_fail)
ASSERT_EQ(0, unregister_subsystem(self->fd_dbg, &self->si[i]));
}
+TEST_F(state, session_freeze_cancel_cycle)
+{
+ int session_fd;
+ const char *session_name = "freeze_cancel_session";
+ const int memfd_token = 5678;
+
+ session_fd = luo_create_session(self->fd, session_name);
+ ASSERT_GE(session_fd, 0);
+
+ ASSERT_EQ(0, create_and_preserve_memfd(session_fd, memfd_token,
+ "freeze test data"));
+
+ ASSERT_EQ(0, luo_set_session_event(session_fd, LIVEUPDATE_PREPARE));
+ ASSERT_EQ(get_session_state(session_fd), LIVEUPDATE_STATE_PREPARED);
+
+ ASSERT_EQ(0, luo_set_session_event(session_fd, LIVEUPDATE_FREEZE));
+ ASSERT_EQ(get_session_state(session_fd), LIVEUPDATE_STATE_FROZEN);
+
+ ASSERT_EQ(0, luo_set_session_event(session_fd, LIVEUPDATE_CANCEL));
+ ASSERT_EQ(get_session_state(session_fd), LIVEUPDATE_STATE_NORMAL);
+
+ close(session_fd);
+}
+
+TEST_F(state, session_prepare_cancel_cycle)
+{
+ const char *session_name = "prepare_cancel_session";
+ const int memfd_token = 1234;
+ int session_fd;
+
+ session_fd = luo_create_session(self->fd, session_name);
+ ASSERT_GE(session_fd, 0);
+
+ ASSERT_EQ(0, create_and_preserve_memfd(session_fd, memfd_token,
+ "prepare test data"));
+
+ ASSERT_EQ(0, luo_set_session_event(session_fd, LIVEUPDATE_PREPARE));
+ ASSERT_EQ(get_session_state(session_fd), LIVEUPDATE_STATE_PREPARED);
+
+ ASSERT_EQ(0, luo_set_session_event(session_fd, LIVEUPDATE_CANCEL));
+ ASSERT_EQ(get_session_state(session_fd), LIVEUPDATE_STATE_NORMAL);
+
+ close(session_fd);
+}
+
TEST_HARNESS_MAIN
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 29/30] selftests/liveupdate: Add test for unreclaimed resource cleanup
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
Introduce a new selftest, luo_unreclaimed, to specifically validate that
the LUO framework correctly identifies and cleans up preserved
resources that are not restored by userspace after a kexec reboot.
Ensuring proper cleanup of unreclaimed (or "abandoned") resources is
critical for preventing resource leaks in the kernel. This test provides
a focused scenario to verify this cleanup path, which is a key aspect of
the LUO's robustness.
The test performs a full kexec cycle with the following simple flow:
1. Pre-kexec:
- A single session is created.
- Two memfd files are preserved: File A (which will be restored) and
File B (which will be abandoned).
- The global LIVEUPDATE_PREPARE event is triggered, and the system
reboots.
2. Post-kexec:
- The preserved session is retrieved.
- Only File A is restored and its contents are verified to confirm the
basic preservation mechanism is working.
- File B is intentionally not restored.
- The global LIVEUPDATE_FINISH event is triggered.
3. Verification:
- The test passes if File A is verified successfully.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/Makefile | 1 +
.../selftests/liveupdate/luo_unreclaimed.c | 107 ++++++++++++++++++
2 files changed, 108 insertions(+)
create mode 100644 tools/testing/selftests/liveupdate/luo_unreclaimed.c
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 72892942dd61..ffce73233149 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -12,6 +12,7 @@ LUO_SHARED_HDRS += luo_test_utils.h
LUO_MANUAL_TESTS += luo_multi_file
LUO_MANUAL_TESTS += luo_multi_kexec
LUO_MANUAL_TESTS += luo_multi_session
+LUO_MANUAL_TESTS += luo_unreclaimed
TEST_FILES += do_kexec.sh
diff --git a/tools/testing/selftests/liveupdate/luo_unreclaimed.c b/tools/testing/selftests/liveupdate/luo_unreclaimed.c
new file mode 100644
index 000000000000..c3921b21b97b
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_unreclaimed.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include "luo_test_utils.h"
+#include "../kselftest.h"
+
+#define KEXEC_SCRIPT "./do_kexec.sh"
+
+#define SESSION_NAME "unreclaimed_session"
+#define TOKEN_A 100
+#define TOKEN_B 200
+#define DATA_A "This is file A, the one we retrieve."
+#define DATA_B "This is file B, the one we abandon."
+
+static void run_pre_kexec(int luo_fd)
+{
+ int session_fd;
+
+ ksft_print_msg("[PRE-KEXEC] Starting workload...\n");
+
+ session_fd = luo_create_session(luo_fd, SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("Failed to create session '%s'", SESSION_NAME);
+
+ ksft_print_msg("[PRE-KEXEC] Preserving memfd A (to be restored).\n");
+ if (create_and_preserve_memfd(session_fd, TOKEN_A, DATA_A) < 0)
+ fail_exit("Failed to preserve memfd A");
+
+ ksft_print_msg("[PRE-KEXEC] Preserving memfd B (to be abandoned).\n");
+ if (create_and_preserve_memfd(session_fd, TOKEN_B, DATA_B) < 0)
+ fail_exit("Failed to preserve memfd B");
+
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to set global PREPARE event");
+
+ ksft_print_msg("[PRE-KEXEC] System is ready. Executing kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ sleep(10);
+ exit(EXIT_FAILURE);
+}
+
+static void run_post_kexec(int luo_fd)
+{
+ int session_fd, mfd_a;
+
+ ksft_print_msg("[POST-KEXEC] Starting workload...\n");
+
+ session_fd = luo_retrieve_session(luo_fd, SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("Failed to retrieve session '%s'", SESSION_NAME);
+
+ ksft_print_msg("[POST-KEXEC] Restoring and verifying memfd A (token %d)...\n",
+ TOKEN_A);
+ mfd_a = restore_and_verify_memfd(session_fd, TOKEN_A, DATA_A);
+ if (mfd_a < 0)
+ fail_exit("Failed to restore or verify memfd A");
+ close(mfd_a);
+ ksft_print_msg(" Data verification PASSED for memfd A.\n");
+
+ ksft_print_msg("[POST-KEXEC] NOT restoring memfd B (token %d) to test cleanup.\n",
+ TOKEN_B);
+
+ ksft_print_msg("[POST-KEXEC] Driving global state to FINISH...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("Failed to set global FINISH event");
+
+ close(session_fd);
+
+ ksft_print_msg("\n--- TEST PASSED ---\n");
+ ksft_print_msg("Check dmesg for cleanup log of token %d in session '%s'.\n",
+ TOKEN_B, SESSION_NAME);
+}
+
+int main(int argc, char *argv[])
+{
+ enum liveupdate_state state;
+ int luo_fd;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0) {
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+ }
+
+ if (luo_get_global_state(luo_fd, &state) < 0)
+ fail_exit("Failed to get LUO state");
+
+ switch (state) {
+ case LIVEUPDATE_STATE_NORMAL:
+ run_pre_kexec(luo_fd);
+ break;
+ case LIVEUPDATE_STATE_UPDATED:
+ run_post_kexec(luo_fd);
+ break;
+ default:
+ fail_exit("Test started in an unexpected state: %d", state);
+ }
+
+ close(luo_fd);
+ ksft_exit_pass();
+}
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 28/30] selftests/liveupdate: Add multi-session workflow and state interaction test
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
Introduce a new, luo_multi_session, test to validate the orchestration
of multiple LUO sessions with differing lifecycles through a full kexec
reboot.
The test validates interactions between per-session and global state
transitions:
1. Mixed State Preparation: Before the first kexec, sessions are put
into different states to test the global PREPARE event's behavior:
- Session A & C: Are individually transitioned to PREPARED via a
per-session ioctl. The test verifies that the subsequent global
PREPARE correctly handles these already-prepared sessions.
- Session B: Is transitioned to PREPARED and then immediately back to
NORMAL via a per-session CANCEL. This validates the rollback
mechanism and ensures the session is correctly picked up and
prepared by the subsequent global PREPARE.
- Session D: Is left in the NORMAL state, verifying that the global
PREPARE correctly transitions sessions that have not been
individually managed.
2. Unreclaimed Session Cleanup:
- After the kexec reboot, sessions A, B, C, and D are all retrieved
and verified to ensure they were preserved correctly, regardless of
their pre-kexec transition path.
- Session E: Is intentionally not retrieved. This validates that the
global FINISH event correctly identifies and cleans up an entire
unreclaimed session and all of its preserved file resources,
preventing leaks.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/Makefile | 1 +
.../selftests/liveupdate/luo_multi_session.c | 155 ++++++++++++++++++
2 files changed, 156 insertions(+)
create mode 100644 tools/testing/selftests/liveupdate/luo_multi_session.c
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index f43b7d03e017..72892942dd61 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -11,6 +11,7 @@ LUO_SHARED_HDRS += luo_test_utils.h
LUO_MANUAL_TESTS += luo_multi_file
LUO_MANUAL_TESTS += luo_multi_kexec
+LUO_MANUAL_TESTS += luo_multi_session
TEST_FILES += do_kexec.sh
diff --git a/tools/testing/selftests/liveupdate/luo_multi_session.c b/tools/testing/selftests/liveupdate/luo_multi_session.c
new file mode 100644
index 000000000000..9ea96d7b997f
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_multi_session.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include "luo_test_utils.h"
+#include "../kselftest.h"
+
+#define KEXEC_SCRIPT "./do_kexec.sh"
+
+#define NUM_SESSIONS 5
+#define FILES_PER_SESSION 5
+
+/* Helper to manage one session and its files */
+static void setup_session(int luo_fd, struct session_info *s, int session_idx)
+{
+ int i;
+
+ snprintf(s->name, sizeof(s->name), "session-%c", 'A' + session_idx);
+
+ s->fd = luo_create_session(luo_fd, s->name);
+ if (s->fd < 0)
+ fail_exit("Failed to create session '%s'", s->name);
+
+ /* Create and preserve all files for this session */
+ for (i = 0; i < FILES_PER_SESSION; i++) {
+ s->file_tokens[i] = (session_idx * 100) + i;
+ snprintf(s->file_data[i], sizeof(s->file_data[i]),
+ "Data for %.*s-File%d",
+ LIVEUPDATE_SESSION_NAME_LENGTH,
+ s->name, i);
+
+ if (create_and_preserve_memfd(s->fd, s->file_tokens[i],
+ s->file_data[i]) < 0) {
+ fail_exit("Failed to preserve token %d in session '%s'",
+ s->file_tokens[i], s->name);
+ }
+ }
+}
+
+/* Helper to re-initialize the expected session data post-reboot */
+static void reinit_sessions(struct session_info *sessions)
+{
+ int i, j;
+
+ for (i = 0; i < NUM_SESSIONS; i++) {
+ snprintf(sessions[i].name, sizeof(sessions[i].name),
+ "session-%c", 'A' + i);
+ for (j = 0; j < FILES_PER_SESSION; j++) {
+ sessions[i].file_tokens[j] = (i * 100) + j;
+ snprintf(sessions[i].file_data[j],
+ sizeof(sessions[i].file_data[j]),
+ "Data for %.*s-File%d",
+ LIVEUPDATE_SESSION_NAME_LENGTH,
+ sessions[i].name, j);
+ }
+ }
+}
+
+static void run_pre_kexec(int luo_fd)
+{
+ struct session_info sessions[NUM_SESSIONS] = {0};
+ int i;
+
+ ksft_print_msg("[PRE-KEXEC] Starting workload...\n");
+
+ ksft_print_msg("[PRE-KEXEC] Setting up %d sessions with %d files each...\n",
+ NUM_SESSIONS, FILES_PER_SESSION);
+ for (i = 0; i < NUM_SESSIONS; i++)
+ setup_session(luo_fd, &sessions[i], i);
+ ksft_print_msg("[PRE-KEXEC] Setup complete.\n");
+
+ ksft_print_msg("[PRE-KEXEC] Performing individual session state transitions...\n");
+ ksft_print_msg(" - Preparing Session A...\n");
+ if (luo_set_session_event(sessions[0].fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to prepare Session A");
+
+ ksft_print_msg(" - Preparing and then Canceling Session B...\n");
+ if (luo_set_session_event(sessions[1].fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to prepare Session B");
+ if (luo_set_session_event(sessions[1].fd, LIVEUPDATE_CANCEL) < 0)
+ fail_exit("Failed to cancel Session B");
+
+ ksft_print_msg(" - Preparing Session C...\n");
+ if (luo_set_session_event(sessions[2].fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to prepare Session C");
+
+ ksft_print_msg(" - Sessions D & E remain in NORMAL state.\n");
+
+ ksft_print_msg("[PRE-KEXEC] Triggering global PREPARE event...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to set global PREPARE event");
+
+ ksft_print_msg("[PRE-KEXEC] System is ready. Executing kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ sleep(10);
+ exit(EXIT_FAILURE);
+}
+
+static void run_post_kexec(int luo_fd)
+{
+ struct session_info sessions[NUM_SESSIONS] = {0};
+
+ ksft_print_msg("[POST-KEXEC] Starting workload...\n");
+
+ reinit_sessions(sessions);
+
+ ksft_print_msg("[POST-KEXEC] Verifying preserved sessions (A, B, C, D)...\n");
+ verify_session_and_get_fd(luo_fd, &sessions[0]);
+ verify_session_and_get_fd(luo_fd, &sessions[1]);
+ verify_session_and_get_fd(luo_fd, &sessions[2]);
+ verify_session_and_get_fd(luo_fd, &sessions[3]);
+
+ ksft_print_msg("[POST-KEXEC] NOT retrieving session E to test cleanup.\n");
+
+ ksft_print_msg("[POST-KEXEC] Driving global state to FINISH...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("Failed to set global FINISH event");
+
+ ksft_print_msg("\n--- TEST PASSED ---\n");
+ ksft_print_msg("Check dmesg for cleanup log of session E.\n");
+}
+
+int main(int argc, char *argv[])
+{
+ enum liveupdate_state state;
+ int luo_fd;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0) {
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+ }
+
+ if (luo_get_global_state(luo_fd, &state) < 0)
+ fail_exit("Failed to get LUO state");
+
+ switch (state) {
+ case LIVEUPDATE_STATE_NORMAL:
+ run_pre_kexec(luo_fd);
+ break;
+ case LIVEUPDATE_STATE_UPDATED:
+ run_post_kexec(luo_fd);
+ break;
+ default:
+ fail_exit("Test started in an unexpected state: %d", state);
+ }
+
+ close(luo_fd);
+ ksft_exit_pass();
+}
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 27/30] selftests/liveupdate: Add multi-file and unreclaimed file test
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
Introduce a new selftest, luo_multi_file, to validate two key aspects of
the Live Update Orchestrator file preservation mechanism: the ability to
handle multiple files within a single session, and the correct cleanup
of unreclaimed files.
The test implements a full kexec cycle with the following flow:
1. Pre-kexec:
- A single session is created.
- Three distinct memfd files (A, B, and C) are created, populated with
unique data, and preserved within this session.
- The global LIVEUPDATE_PREPARE event is triggered, and the system
reboots via kexec.
2. Post-kexec:
- The preserved session is retrieved.
- Files A and C are restored and their contents are verified to ensure
that multiple files can be successfully restored from a single
session.
- File B is intentionally not restored.
- The global LIVEUPDATE_FINISH event is triggered.
3. Verification:
- The test is considered successful if files A and C are verified
correctly.
- The user is prompted to check the kernel log (dmesg) for a message
confirming that the unreclaimed file (B) was identified and cleaned
up by the LUO core, thus validating the cleanup path.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/Makefile | 1 +
.../selftests/liveupdate/luo_multi_file.c | 119 ++++++++++++++++++
2 files changed, 120 insertions(+)
create mode 100644 tools/testing/selftests/liveupdate/luo_multi_file.c
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 1cbc816ed5c5..f43b7d03e017 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -9,6 +9,7 @@ LDFLAGS += -static
LUO_SHARED_SRCS := luo_test_utils.c
LUO_SHARED_HDRS += luo_test_utils.h
+LUO_MANUAL_TESTS += luo_multi_file
LUO_MANUAL_TESTS += luo_multi_kexec
TEST_FILES += do_kexec.sh
diff --git a/tools/testing/selftests/liveupdate/luo_multi_file.c b/tools/testing/selftests/liveupdate/luo_multi_file.c
new file mode 100644
index 000000000000..ae38fe8aba4c
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_multi_file.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include "luo_test_utils.h"
+
+#define KEXEC_SCRIPT "./do_kexec.sh"
+
+#define SESSION_NAME "multi_file_session"
+#define TOKEN_A 101
+#define TOKEN_B 102
+#define TOKEN_C 103
+
+#define DATA_A "Alpha file data"
+#define DATA_B "Bravo file data which will be unreclaimed"
+#define DATA_C "Charlie file data"
+
+static void run_pre_kexec(int luo_fd)
+{
+ int session_fd;
+
+ ksft_print_msg("[PRE-KEXEC] Starting workload...\n");
+
+ session_fd = luo_create_session(luo_fd, SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("Failed to create session '%s'", SESSION_NAME);
+
+ ksft_print_msg("[PRE-KEXEC] Preserving 3 memfds (A, B, C)...\n");
+ if (create_and_preserve_memfd(session_fd, TOKEN_A, DATA_A) < 0)
+ fail_exit("Failed to preserve memfd A");
+ if (create_and_preserve_memfd(session_fd, TOKEN_B, DATA_B) < 0)
+ fail_exit("Failed to preserve memfd B");
+ if (create_and_preserve_memfd(session_fd, TOKEN_C, DATA_C) < 0)
+ fail_exit("Failed to preserve memfd C");
+ ksft_print_msg("[PRE-KEXEC] All memfds preserved.\n");
+
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("Failed to set global PREPARE event");
+
+ ksft_print_msg("[PRE-KEXEC] System is ready. Executing kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ sleep(10); /* Should not be reached */
+ exit(EXIT_FAILURE);
+}
+
+static void run_post_kexec(int luo_fd)
+{
+ int session_fd, mfd_a, mfd_c;
+
+ ksft_print_msg("[POST-KEXEC] Starting workload...\n");
+
+ session_fd = luo_retrieve_session(luo_fd, SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("Failed to retrieve session '%s'", SESSION_NAME);
+
+ /* 1. VERIFY SUCCESS: Restore and verify memfd A. */
+ ksft_print_msg("[POST-KEXEC] Restoring and verifying memfd A (token %d)...\n",
+ TOKEN_A);
+ mfd_a = restore_and_verify_memfd(session_fd, TOKEN_A, DATA_A);
+ if (mfd_a < 0)
+ fail_exit("Failed to restore or verify memfd A");
+ close(mfd_a);
+ ksft_print_msg(" Success.\n");
+
+ /* 2. VERIFY SUCCESS: Restore and verify memfd C. */
+ ksft_print_msg("[POST-KEXEC] Restoring and verifying memfd C (token %d)...\n",
+ TOKEN_C);
+ mfd_c = restore_and_verify_memfd(session_fd, TOKEN_C, DATA_C);
+ if (mfd_c < 0)
+ fail_exit("Failed to restore or verify memfd C");
+ close(mfd_c);
+ ksft_print_msg(" Success.\n");
+
+ ksft_print_msg("[POST-KEXEC] NOT restoring memfd B (token %d) to test cleanup.\n",
+ TOKEN_B);
+
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("Failed to set global FINISH event");
+
+ close(session_fd);
+
+ ksft_print_msg("\n--- TEST PASSED ---\n");
+ ksft_print_msg("Check dmesg for cleanup log of token %d in session '%s'.\n",
+ TOKEN_B, SESSION_NAME);
+}
+
+int main(int argc, char *argv[])
+{
+ enum liveupdate_state state;
+ int luo_fd;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0) {
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+ }
+
+ if (luo_get_global_state(luo_fd, &state) < 0)
+ fail_exit("Failed to get LUO state");
+
+ switch (state) {
+ case LIVEUPDATE_STATE_NORMAL:
+ run_pre_kexec(luo_fd);
+ break;
+ case LIVEUPDATE_STATE_UPDATED:
+ run_post_kexec(luo_fd);
+ break;
+ default:
+ fail_exit("Test started in an unexpected state: %d", state);
+ }
+
+ close(luo_fd);
+ ksft_exit_pass();
+}
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
Introduce multi-stage selftest, luo_multi_kexec, to validate the
end-to-end lifecycle of Live Update Orchestrator sessions
across multiple kexec reboots.
The test operates in three stages, using a preserved memfd within a
dedicated "state_session" to track its progress across reboots. This
avoids reliance on filesystem flags and tests the core preservation
mechanism itself.
The test validates the following critical LUO functionalities:
1. Initial Preservation (Stage 1 -> 2):
- Creates multiple sessions (session-A, session-B, session-C) and
populates them with memfd files containing unique data.
- Triggers a global LIVEUPDATE_PREPARE event and executes the first
kexec.
2. Intermediate State Management (Stage 2 -> 3):
- After the first reboot, it verifies that all sessions were correctly
preserved.
- It then tests divergent session lifecycles:
- Session A: Is retrieved and explicitly finalized with a
per-session LIVEUPDATE_FINISH event. This validates that a finished
session is not carried over to the next kexec.
- Session B: Is retrieved but left open. This validates that an
active, retrieved session is correctly re-preserved during the next
global PREPARE.
- Session C: Is deliberately not retrieved. This validates that the
global LIVEUPDATE_FINISH event correctly identifies and cleans up
stale, unreclaimed sessions.
- The state-tracking memfd is updated by un-preserving and re-preserving
it, testing in-place modification of a session's contents.
- A global FINISH followed by a global PREPARE is triggered before the
second kexec.
3. Final Verification (Stage 3):
- After the second reboot, it confirms the final state:
- Asserts that session-B (the re-preserved session) and the updated
state session have survived.
- Asserts that session-A (explicitly finished) and session-C
(unreclaimed) were correctly cleaned up and no longer exist.
Example output:
root@debian-vm:~/liveupdate$ ./luo_multi_kexec
LUO state is NORMAL. Starting Stage 1.
[STAGE 1] Creating state file for next stage (2)...
[STAGE 1] Setting up Sessions A, B, C for first kexec...
- Session 'session-A' created.
- Session 'session-B' created.
- Session 'session-C' created.
[STAGE 1] Triggering global PREPARE...
[STAGE 1] Executing kexec...
<---- cut reboot messages ---->
Debian GNU/Linux 12 debian-vm ttyS0
debian-vm login: root (automatic login)
root@debian-vm:~$ cd liveupdate/
root@debian-vm:~/liveupdate$ ./luo_multi_kexec
LUO state is UPDATED. Restoring state to determine stage...
State file indicates we are entering Stage 2.
[STAGE 2] Partially reclaiming and preparing for second kexec...
- Verifying session 'session-A'...
Success. All files verified.
- Verifying session 'session-B'...
Success. All files verified.
- Finishing state session to allow modification...
- Updating state file for next stage (3)...
- Session A verified. Sending per-session FINISH.
- Session B verified. Keeping FD open for next kexec.
- NOT retrieving Session C to test global finish cleanup.
[STAGE 2] Triggering global FINISH...
[STAGE 2] Triggering global PREPARE for next kexec...
[STAGE 2] Executing second kexec...
<---- cut reboot messages ---->
Debian GNU/Linux 12 debian-vm ttyS0
debian-vm login: root (automatic login)
root@debian-vm:~$ cd liveupdate/
root@debian-vm:~/liveupdate$ ./luo_multi_kexec
LUO state is UPDATED. Restoring state to determine stage...
State file indicates we are entering Stage 3.
[STAGE 3] Final verification...
[STAGE 3] Verifying surviving sessions...
- Verifying session 'session-B'...
Success. All files verified.
[STAGE 3] Verifying Session A was cleaned up...
Success. Session A not found as expected.
[STAGE 3] Verifying Session C was cleaned up...
Success. Session C not found as expected.
[STAGE 3] Triggering final global FINISH...
--- TEST PASSED ---
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/.gitignore | 1 +
tools/testing/selftests/liveupdate/Makefile | 31 +++
.../testing/selftests/liveupdate/do_kexec.sh | 6 +
.../selftests/liveupdate/luo_multi_kexec.c | 182 +++++++++++++
.../selftests/liveupdate/luo_test_utils.c | 241 ++++++++++++++++++
.../selftests/liveupdate/luo_test_utils.h | 51 ++++
6 files changed, 512 insertions(+)
create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
create mode 100644 tools/testing/selftests/liveupdate/luo_multi_kexec.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
index af6e773cf98f..de7ca45d3892 100644
--- a/tools/testing/selftests/liveupdate/.gitignore
+++ b/tools/testing/selftests/liveupdate/.gitignore
@@ -1 +1,2 @@
/liveupdate
+/luo_multi_kexec
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 2a573c36016e..1cbc816ed5c5 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -1,7 +1,38 @@
# SPDX-License-Identifier: GPL-2.0-only
+
+KHDR_INCLUDES ?= -I../../../usr/include
CFLAGS += -Wall -O2 -Wno-unused-function
CFLAGS += $(KHDR_INCLUDES)
+LDFLAGS += -static
+
+# --- Test Configuration (Edit this section when adding new tests) ---
+LUO_SHARED_SRCS := luo_test_utils.c
+LUO_SHARED_HDRS += luo_test_utils.h
+
+LUO_MANUAL_TESTS += luo_multi_kexec
+
+TEST_FILES += do_kexec.sh
TEST_GEN_PROGS += liveupdate
+# --- Automatic Rule Generation (Do not edit below) ---
+
+TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
+
+# Define the full list of sources for each manual test.
+$(foreach test,$(LUO_MANUAL_TESTS), \
+ $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
+
+# This loop automatically generates an explicit build rule for each manual test.
+# It includes dependencies on the shared headers and makes the output
+# executable.
+# Note the use of '$$' to escape automatic variables for the 'eval' command.
+$(foreach test,$(LUO_MANUAL_TESTS), \
+ $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
+ $(call msg,LINK,,$$@) ; \
+ $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
+ $(Q)chmod +x $$@ \
+ ) \
+)
+
include ../lib.mk
diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
new file mode 100755
index 000000000000..bb396a92c3b8
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/do_kexec.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+set -e
+
+kexec -l -s --reuse-cmdline /boot/bzImage
+kexec -e
diff --git a/tools/testing/selftests/liveupdate/luo_multi_kexec.c b/tools/testing/selftests/liveupdate/luo_multi_kexec.c
new file mode 100644
index 000000000000..1f350990ee67
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_multi_kexec.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include "luo_test_utils.h"
+
+#define KEXEC_SCRIPT "./do_kexec.sh"
+
+#define NUM_SESSIONS 3
+
+/* Helper to set up one session and all its files */
+static void setup_session(int luo_fd, struct session_info *s, int session_idx)
+{
+ int i;
+
+ snprintf(s->name, sizeof(s->name), "session-%c", 'A' + session_idx);
+
+ s->fd = luo_create_session(luo_fd, s->name);
+ if (s->fd < 0)
+ fail_exit("luo_create_session for %s", s->name);
+
+ for (i = 0; i < 2; i++) {
+ s->file_tokens[i] = (session_idx * 100) + i;
+ snprintf(s->file_data[i], sizeof(s->file_data[i]),
+ "Data for %.*s-File%d",
+ (int)sizeof(s->name), s->name, i);
+
+ if (create_and_preserve_memfd(s->fd, s->file_tokens[i],
+ s->file_data[i]) < 0)
+ fail_exit("create_and_preserve_memfd for token %d",
+ s->file_tokens[i]);
+ }
+}
+
+/* Run before the first kexec */
+static void run_stage_1(int luo_fd)
+{
+ struct session_info sessions[NUM_SESSIONS] = {0};
+ int i;
+
+ ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
+ create_state_file(luo_fd, 2);
+
+ ksft_print_msg("[STAGE 1] Setting up Sessions A, B, C for first kexec...\n");
+ for (i = 0; i < NUM_SESSIONS; i++) {
+ setup_session(luo_fd, &sessions[i], i);
+ ksft_print_msg(" - Session '%s' created.\n", sessions[i].name);
+ }
+
+ ksft_print_msg("[STAGE 1] Triggering global PREPARE...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("luo_set_global_event(PREPARE)");
+
+ ksft_print_msg("[STAGE 1] Executing kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ /* Should not be reached */
+ sleep(10);
+ exit(EXIT_FAILURE);
+}
+
+/* Run after first kexec, before second kexec */
+static void run_stage_2(int luo_fd, int state_session_fd)
+{
+ struct session_info sessions[NUM_SESSIONS] = {0};
+ int session_fd_A;
+
+ ksft_print_msg("[STAGE 2] Partially reclaiming and preparing for second kexec...\n");
+
+ reinit_all_sessions(sessions, NUM_SESSIONS);
+
+ session_fd_A = verify_session_and_get_fd(luo_fd, &sessions[0]);
+ verify_session_and_get_fd(luo_fd, &sessions[1]);
+
+ ksft_print_msg(" - Finishing state session to allow modification...\n");
+ if (luo_set_session_event(state_session_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("luo_set_session_event(FINISH) for state_session");
+
+ ksft_print_msg(" - Updating state file for next stage (3)...\n");
+ update_state_file(state_session_fd, 3);
+
+ ksft_print_msg(" - Session A verified. Sending per-session FINISH.\n");
+ if (luo_set_session_event(session_fd_A, LIVEUPDATE_FINISH) < 0)
+ fail_exit("luo_set_session_event(FINISH) for Session A");
+ close(session_fd_A);
+
+ ksft_print_msg(" - Session B verified. Its FD will be auto-closed for next kexec.\n");
+ ksft_print_msg(" - NOT retrieving Session C to test global finish cleanup.\n");
+
+ ksft_print_msg("[STAGE 2] Triggering global FINISH...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("luo_set_global_event(FINISH)");
+
+ ksft_print_msg("[STAGE 2] Triggering global PREPARE for next kexec...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE) < 0)
+ fail_exit("luo_set_global_event(PREPARE)");
+
+ ksft_print_msg("[STAGE 2] Executing second kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ sleep(10);
+ exit(EXIT_FAILURE);
+}
+
+/* Run after second kexec */
+static void run_stage_3(int luo_fd)
+{
+ struct session_info sessions[NUM_SESSIONS] = {0};
+ int ret;
+
+ ksft_print_msg("[STAGE 3] Final verification...\n");
+
+ reinit_all_sessions(sessions, NUM_SESSIONS);
+
+ ksft_print_msg("[STAGE 3] Verifying surviving sessions...\n");
+ /* Session B */
+ verify_session_and_get_fd(luo_fd, &sessions[1]);
+
+ ksft_print_msg("[STAGE 3] Verifying Session A was cleaned up...\n");
+ ret = luo_retrieve_session(luo_fd, sessions[0].name);
+ if (ret != -ENOENT)
+ fail_exit("Expected ENOENT for Session A, but got %d", ret);
+ ksft_print_msg(" Success. Session A not found as expected.\n");
+
+ ksft_print_msg("[STAGE 3] Verifying Session C was cleaned up...\n");
+ ret = luo_retrieve_session(luo_fd, sessions[2].name);
+ if (ret != -ENOENT)
+ fail_exit("Expected ENOENT for Session C, but got %d", ret);
+ ksft_print_msg(" Success. Session C not found as expected.\n");
+
+ ksft_print_msg("[STAGE 3] Triggering final global FINISH...\n");
+ if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0)
+ fail_exit("luo_set_global_event(FINISH)");
+
+ ksft_print_msg("\n--- MULTI-KEXEC TEST PASSED ---\n");
+}
+
+int main(int argc, char *argv[])
+{
+ enum liveupdate_state state;
+ int luo_fd, stage = 0;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0) {
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+ }
+
+ if (luo_get_global_state(luo_fd, &state) < 0)
+ fail_exit("luo_get_global_state");
+
+ if (state == LIVEUPDATE_STATE_NORMAL) {
+ ksft_print_msg("LUO state is NORMAL. Starting Stage 1.\n");
+ run_stage_1(luo_fd);
+ } else if (state == LIVEUPDATE_STATE_UPDATED) {
+ int state_session_fd;
+
+ ksft_print_msg("LUO state is UPDATED. Restoring state to determine stage...\n");
+ state_session_fd = restore_and_read_state(luo_fd, &stage);
+ if (state_session_fd < 0)
+ fail_exit("Could not restore test state");
+
+ if (stage == 2) {
+ ksft_print_msg("State file indicates we are entering Stage 2.\n");
+ run_stage_2(luo_fd, state_session_fd);
+ } else if (stage == 3) {
+ ksft_print_msg("State file indicates we are entering Stage 3.\n");
+ run_stage_3(luo_fd);
+ } else {
+ fail_exit("Invalid stage found in state file: %d",
+ stage);
+ }
+ }
+
+ close(luo_fd);
+ ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c
new file mode 100644
index 000000000000..c0840e6e66fd
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_test_utils.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <errno.h>
+#include <stdarg.h>
+
+#include "luo_test_utils.h"
+#include "../kselftest.h"
+
+/* The fail_exit function is now a macro in the header. */
+
+int luo_open_device(void)
+{
+ return open(LUO_DEVICE, O_RDWR);
+}
+
+int luo_create_session(int luo_fd, const char *name)
+{
+ struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) };
+
+ snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
+ LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
+ if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0)
+ return -errno;
+ return arg.fd;
+}
+
+int luo_retrieve_session(int luo_fd, const char *name)
+{
+ struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) };
+
+ snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
+ LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
+ if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0)
+ return -errno;
+ return arg.fd;
+}
+
+int create_and_preserve_memfd(int session_fd, int token, const char *data)
+{
+ struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) };
+ long page_size = sysconf(_SC_PAGE_SIZE);
+ void *map = MAP_FAILED;
+ int mfd = -1, ret = -1;
+
+ mfd = memfd_create("test_mfd", 0);
+ if (mfd < 0)
+ return -errno;
+
+ if (ftruncate(mfd, page_size) != 0)
+ goto out;
+
+ map = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, mfd, 0);
+ if (map == MAP_FAILED)
+ goto out;
+
+ snprintf(map, page_size, "%s", data);
+ munmap(map, page_size);
+
+ arg.fd = mfd;
+ arg.token = token;
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0)
+ goto out;
+
+ ret = 0; /* Success */
+out:
+ if (ret != 0 && errno != 0)
+ ret = -errno;
+ if (mfd >= 0)
+ close(mfd);
+ return ret;
+}
+
+int restore_and_verify_memfd(int session_fd, int token,
+ const char *expected_data)
+{
+ struct liveupdate_session_restore_fd arg = { .size = sizeof(arg) };
+ long page_size = sysconf(_SC_PAGE_SIZE);
+ void *map = MAP_FAILED;
+ int mfd = -1, ret = -1;
+
+ arg.token = token;
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_RESTORE_FD, &arg) < 0)
+ return -errno;
+ mfd = arg.fd;
+
+ map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0);
+ if (map == MAP_FAILED)
+ goto out;
+
+ if (expected_data && strcmp(expected_data, map) != 0) {
+ ksft_print_msg("Data mismatch for token %d!\n", token);
+ ret = -EINVAL;
+ goto out_munmap;
+ }
+
+ ret = mfd; /* Success, return the new fd */
+out_munmap:
+ munmap(map, page_size);
+out:
+ if (ret < 0 && errno != 0)
+ ret = -errno;
+ if (ret < 0 && mfd >= 0)
+ close(mfd);
+ return ret;
+}
+
+int luo_set_session_event(int session_fd, enum liveupdate_event event)
+{
+ struct liveupdate_session_set_event arg = { .size = sizeof(arg) };
+
+ arg.event = event;
+ return ioctl(session_fd, LIVEUPDATE_SESSION_SET_EVENT, &arg);
+}
+
+int luo_set_global_event(int luo_fd, enum liveupdate_event event)
+{
+ struct liveupdate_ioctl_set_event arg = { .size = sizeof(arg) };
+
+ arg.event = event;
+ return ioctl(luo_fd, LIVEUPDATE_IOCTL_SET_EVENT, &arg);
+}
+
+int luo_get_global_state(int luo_fd, enum liveupdate_state *state)
+{
+ struct liveupdate_ioctl_get_state arg = { .size = sizeof(arg) };
+
+ if (ioctl(luo_fd, LIVEUPDATE_IOCTL_GET_STATE, &arg) < 0)
+ return -errno;
+ *state = arg.state;
+ return 0;
+}
+
+void create_state_file(int luo_fd, int next_stage)
+{
+ char buf[32];
+ int state_session_fd;
+
+ state_session_fd = luo_create_session(luo_fd, STATE_SESSION_NAME);
+ if (state_session_fd < 0)
+ fail_exit("luo_create_session failed");
+
+ snprintf(buf, sizeof(buf), "%d", next_stage);
+ if (create_and_preserve_memfd(state_session_fd,
+ STATE_MEMFD_TOKEN, buf) < 0) {
+ fail_exit("create_and_preserve_memfd failed");
+ }
+}
+
+int restore_and_read_state(int luo_fd, int *stage)
+{
+ char buf[32] = {0};
+ int state_session_fd, mfd;
+
+ state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
+ if (state_session_fd < 0)
+ return state_session_fd;
+
+ mfd = restore_and_verify_memfd(state_session_fd, STATE_MEMFD_TOKEN,
+ NULL);
+ if (mfd < 0)
+ fail_exit("failed to restore state memfd");
+
+ if (read(mfd, buf, sizeof(buf) - 1) < 0)
+ fail_exit("failed to read state mfd");
+
+ *stage = atoi(buf);
+
+ close(mfd);
+ return state_session_fd;
+}
+
+void update_state_file(int session_fd, int next_stage)
+{
+ char buf[32];
+ struct liveupdate_session_unpreserve_fd arg = { .size = sizeof(arg) };
+
+ arg.token = STATE_MEMFD_TOKEN;
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_UNPRESERVE_FD, &arg) < 0)
+ fail_exit("unpreserve failed");
+
+ snprintf(buf, sizeof(buf), "%d", next_stage);
+ if (create_and_preserve_memfd(session_fd, STATE_MEMFD_TOKEN, buf) < 0)
+ fail_exit("create_and_preserve failed");
+}
+
+void reinit_all_sessions(struct session_info *sessions, int num)
+{
+ int i, j;
+
+ for (i = 0; i < num; i++) {
+ snprintf(sessions[i].name, sizeof(sessions[i].name),
+ "session-%c", 'A' + i);
+ for (j = 0; j < 2; j++) {
+ sessions[i].file_tokens[j] = (i * 100) + j;
+ snprintf(sessions[i].file_data[j],
+ sizeof(sessions[i].file_data[j]),
+ "Data for %.*s-File%d",
+ LIVEUPDATE_SESSION_NAME_LENGTH,
+ sessions[i].name, j);
+ }
+ }
+}
+
+int verify_session_and_get_fd(int luo_fd, struct session_info *s)
+{
+ int i, session_fd;
+
+ ksft_print_msg(" - Verifying session '%s'...\n", s->name);
+
+ session_fd = luo_retrieve_session(luo_fd, s->name);
+ if (session_fd < 0)
+ fail_exit("luo_retrieve_session for %s", s->name);
+
+ for (i = 0; i < 2; i++) {
+ int mfd = restore_and_verify_memfd(session_fd,
+ s->file_tokens[i],
+ s->file_data[i]);
+ if (mfd < 0) {
+ fail_exit("restore_and_verify_memfd for token %d",
+ s->file_tokens[i]);
+ }
+ close(mfd);
+ }
+ ksft_print_msg(" Success. All files verified.\n");
+ return session_fd;
+}
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/luo_test_utils.h
new file mode 100644
index 000000000000..e30cfcb0a596
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_test_utils.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef LUO_TEST_UTILS_H
+#define LUO_TEST_UTILS_H
+
+#include <errno.h>
+#include <string.h>
+#include <linux/liveupdate.h>
+#include "../kselftest.h"
+
+#define LUO_DEVICE "/dev/liveupdate"
+#define STATE_SESSION_NAME "state_session"
+#define STATE_MEMFD_TOKEN 999
+
+#define MAX_FILES_PER_SESSION 5
+
+struct session_info {
+ char name[LIVEUPDATE_SESSION_NAME_LENGTH];
+ int fd;
+ int file_tokens[MAX_FILES_PER_SESSION];
+ char file_data[MAX_FILES_PER_SESSION][128];
+};
+
+#define fail_exit(fmt, ...) \
+ ksft_exit_fail_msg("[%s] " fmt " (errno: %s)\n", \
+ __func__, ##__VA_ARGS__, strerror(errno))
+
+int luo_open_device(void);
+
+int luo_create_session(int luo_fd, const char *name);
+int luo_retrieve_session(int luo_fd, const char *name);
+
+int create_and_preserve_memfd(int session_fd, int token, const char *data);
+int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
+int verify_session_and_get_fd(int luo_fd, struct session_info *s);
+
+int luo_set_session_event(int session_fd, enum liveupdate_event event);
+int luo_set_global_event(int luo_fd, enum liveupdate_event event);
+int luo_get_global_state(int luo_fd, enum liveupdate_state *state);
+
+void create_state_file(int luo_fd, int next_stage);
+int restore_and_read_state(int luo_fd, int *stage);
+void update_state_file(int session_fd, int next_stage);
+void reinit_all_sessions(struct session_info *sessions, int num);
+
+#endif /* LUO_TEST_UTILS_H */
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 25/30] docs: add documentation for memfd preservation via LUO
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
From: Pratyush Yadav <ptyadav@amazon.de>
Add the documentation under the "Preserving file descriptors" section of
LUO's documentation. The doc describes the properties preserved,
behaviour of the file under different LUO states, serialization format,
and current limitations.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
Documentation/core-api/liveupdate.rst | 7 ++
Documentation/mm/index.rst | 1 +
Documentation/mm/memfd_preservation.rst | 138 ++++++++++++++++++++++++
MAINTAINERS | 1 +
4 files changed, 147 insertions(+)
create mode 100644 Documentation/mm/memfd_preservation.rst
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 7c1c3af6f960..b44710d75088 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -23,6 +23,13 @@ LUO Preserving File Descriptors
.. kernel-doc:: kernel/liveupdate/luo_file.c
:doc: LUO file descriptors
+The following types of file descriptors can be preserved
+
+.. toctree::
+ :maxdepth: 1
+
+ ../mm/memfd_preservation
+
Public API
==========
.. kernel-doc:: include/linux/liveupdate.h
diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
index ba6a8872849b..7aa2a8886908 100644
--- a/Documentation/mm/index.rst
+++ b/Documentation/mm/index.rst
@@ -48,6 +48,7 @@ documentation, or deleted if it has served its purpose.
hugetlbfs_reserv
ksm
memory-model
+ memfd_preservation
mmu_notifier
multigen_lru
numa
diff --git a/Documentation/mm/memfd_preservation.rst b/Documentation/mm/memfd_preservation.rst
new file mode 100644
index 000000000000..3fc612e1288c
--- /dev/null
+++ b/Documentation/mm/memfd_preservation.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+==========================
+Memfd Preservation via LUO
+==========================
+
+Overview
+========
+
+Memory file descriptors (memfd) can be preserved over a kexec using the Live
+Update Orchestrator (LUO) file preservation. This allows userspace to transfer
+its memory contents to the next kernel after a kexec.
+
+The preservation is not intended to be transparent. Only select properties of
+the file are preserved. All others are reset to default. The preserved
+properties are described below.
+
+.. note::
+ The LUO API is not stabilized yet, so the preserved properties of a memfd are
+ also not stable and are subject to backwards incompatible changes.
+
+.. note::
+ Currently a memfd backed by Hugetlb is not supported. Memfds created
+ with ``MFD_HUGETLB`` will be rejected.
+
+Preserved Properties
+====================
+
+The following properties of the memfd are preserved across kexec:
+
+File Contents
+ All data stored in the file is preserved.
+
+File Size
+ The size of the file is preserved. Holes in the file are filled by allocating
+ pages for them during preservation.
+
+File Position
+ The current file position is preserved, allowing applications to continue
+ reading/writing from their last position.
+
+File Status Flags
+ memfds are always opened with ``O_RDWR`` and ``O_LARGEFILE``. This property is
+ maintained.
+
+Non-Preserved Properties
+========================
+
+All properties which are not preserved must be assumed to be reset to default.
+This section describes some of those properties which may be more of note.
+
+``FD_CLOEXEC`` flag
+ A memfd can be created with the ``MFD_CLOEXEC`` flag that sets the
+ ``FD_CLOEXEC`` on the file. This flag is not preserved and must be set again
+ after restore via ``fcntl()``.
+
+Seals
+ File seals are not preserved. The file is unsealed on restore and if needed,
+ must be sealed again via ``fcntl()``.
+
+Behavior with LUO states
+========================
+
+This section described the behavior of the memfd in the different LUO states.
+
+Normal Phase
+ During the normal phase, the memfd can be marked for preservation using the
+ ``LIVEUPDATE_SESSION_PRESERVE_FD`` ioctl. The memfd acts as a regular memfd
+ during this phase with no additional restrictions.
+
+Prepared Phase
+ After LUO enters ``LIVEUPDATE_STATE_PREPARED``, the memfd is serialized and
+ prepared for the next kernel. During this phase, the below things happen:
+
+ - All the folios are pinned. If some folios reside in ``ZONE_MIGRATE``, they
+ are migrated out. This ensures none of the preserved folios land in KHO
+ scratch area.
+ - Pages in swap are swapped in. Currently, there is no way to pass pages in
+ swap over KHO, so all swapped out pages are swapped back in and pinned.
+ - The memfd goes into "frozen mapping" mode. The file can no longer grow or
+ shrink, or punch holes. This ensures the serialized mappings stay in sync.
+ The file can still be read from or written to or mmap-ed.
+
+Freeze Phase
+ Updates the current file position in the serialized data to capture any
+ changes that occurred between prepare and freeze phases. After this, the FD is
+ not allowed to be accessed.
+
+Restoration Phase
+ After being restored, the memfd is functional as normal with the properties
+ listed above restored.
+
+Cancellation
+ If the liveupdate is cancelled after going into prepared phase, the memfd
+ functions like in normal phase.
+
+Serialization format
+====================
+
+The state is serialized in an FDT with the following structure::
+
+ /dts-v1/;
+
+ / {
+ compatible = "memfd-v1";
+ pos = <current_file_position>;
+ size = <file_size_in_bytes>;
+ folios = <array_of_preserved_folio_descriptors>;
+ };
+
+Each folio descriptor contains:
+
+- PFN + flags (8 bytes)
+
+ - Physical frame number (PFN) of the preserved folio (bits 63:12).
+ - Folio flags (bits 11:0):
+
+ - ``PRESERVED_FLAG_DIRTY`` (bit 0)
+ - ``PRESERVED_FLAG_UPTODATE`` (bit 1)
+
+- Folio index within the file (8 bytes).
+
+Limitations
+===========
+
+The current implementation has the following limitations:
+
+Size
+ Currently the size of the file is limited by the size of the FDT. The FDT can
+ be at of most ``MAX_PAGE_ORDER`` order. By default this is 4 MiB with 4K
+ pages. Each page in the file is tracked using 16 bytes. This limits the
+ maximum size of the file to 1 GiB.
+
+See Also
+========
+
+- :doc:`Live Update Orchestrator </core-api/liveupdate>`
+- :doc:`/core-api/kho/concepts`
diff --git a/MAINTAINERS b/MAINTAINERS
index a17e4e077174..a9941e920ef6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14438,6 +14438,7 @@ L: linux-kernel@vger.kernel.org
S: Maintained
F: Documentation/ABI/testing/sysfs-kernel-liveupdate
F: Documentation/core-api/liveupdate.rst
+F: Documentation/mm/memfd_preservation.rst
F: Documentation/userspace-api/liveupdate.rst
F: include/linux/liveupdate.h
F: include/uapi/linux/liveupdate.h
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 24/30] luo: allow preserving memfd
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
From: Pratyush Yadav <ptyadav@amazon.de>
The ability to preserve a memfd allows userspace to use KHO and LUO to
transfer its memory contents to the next kernel. This is useful in many
ways. For one, it can be used with IOMMUFD as the backing store for
IOMMU page tables. Preserving IOMMUFD is essential for performing a
hypervisor live update with passthrough devices. memfd support provides
the first building block for making that possible.
For another, applications with a large amount of memory that takes time
to reconstruct, reboots to consume kernel upgrades can be very
expensive. memfd with LUO gives those applications reboot-persistent
memory that they can use to quickly save and reconstruct that state.
While memfd is backed by either hugetlbfs or shmem, currently only
support on shmem is added. To be more precise, support for anonymous
shmem files is added.
The handover to the next kernel is not transparent. All the properties
of the file are not preserved; only its memory contents, position, and
size. The recreated file gets the UID and GID of the task doing the
restore, and the task's cgroup gets charged with the memory.
After LUO is in prepared state, the file cannot grow or shrink, and all
its pages are pinned to avoid migrations and swapping. The file can
still be read from or written to.
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
---
MAINTAINERS | 2 +
mm/Makefile | 1 +
mm/memfd_luo.c | 523 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 526 insertions(+)
create mode 100644 mm/memfd_luo.c
diff --git a/MAINTAINERS b/MAINTAINERS
index e99af6101d3c..a17e4e077174 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14433,6 +14433,7 @@ F: tools/testing/selftests/livepatch/
LIVE UPDATE
M: Pasha Tatashin <pasha.tatashin@soleen.com>
+R: Pratyush Yadav <pratyush@kernel.org>
L: linux-kernel@vger.kernel.org
S: Maintained
F: Documentation/ABI/testing/sysfs-kernel-liveupdate
@@ -14441,6 +14442,7 @@ F: Documentation/userspace-api/liveupdate.rst
F: include/linux/liveupdate.h
F: include/uapi/linux/liveupdate.h
F: kernel/liveupdate/
+F: mm/memfd_luo.c
F: tools/testing/selftests/liveupdate/
LLC (802.2)
diff --git a/mm/Makefile b/mm/Makefile
index 21abb3353550..7738ec416f00 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_NUMA) += memory-tiers.o
obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
+obj-$(CONFIG_LIVEUPDATE) += memfd_luo.o
obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o
obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
ifdef CONFIG_SWAP
diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c
new file mode 100644
index 000000000000..221e31c1197e
--- /dev/null
+++ b/mm/memfd_luo.c
@@ -0,0 +1,523 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ * Changyuan Lyu <changyuanl@google.com>
+ *
+ * Copyright (C) 2025 Amazon.com Inc. or its affiliates.
+ * Pratyush Yadav <ptyadav@amazon.de>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/file.h>
+#include <linux/io.h>
+#include <linux/libfdt.h>
+#include <linux/liveupdate.h>
+#include <linux/kexec_handover.h>
+#include <linux/shmem_fs.h>
+#include <linux/bits.h>
+#include "internal.h"
+
+#define PRESERVED_PFN_MASK GENMASK(63, 12)
+#define PRESERVED_PFN_SHIFT 12
+#define PRESERVED_FLAG_DIRTY BIT(0)
+#define PRESERVED_FLAG_UPTODATE BIT(1)
+
+#define PRESERVED_FOLIO_PFN(desc) (((desc) & PRESERVED_PFN_MASK) >> PRESERVED_PFN_SHIFT)
+#define PRESERVED_FOLIO_FLAGS(desc) ((desc) & ~PRESERVED_PFN_MASK)
+#define PRESERVED_FOLIO_MKDESC(pfn, flags) (((pfn) << PRESERVED_PFN_SHIFT) | (flags))
+
+struct memfd_luo_preserved_folio {
+ /*
+ * The folio descriptor is made of 2 parts. The bottom 12 bits are used
+ * for storing flags, the others for storing the PFN.
+ */
+ u64 foliodesc;
+ u64 index;
+};
+
+static int memfd_luo_preserve_folios(struct memfd_luo_preserved_folio *pfolios,
+ struct folio **folios,
+ unsigned int nr_folios)
+{
+ int err;
+ long i;
+
+ for (i = 0; i < nr_folios; i++) {
+ struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
+ struct folio *folio = folios[i];
+ unsigned int flags = 0;
+ unsigned long pfn;
+
+ err = kho_preserve_folio(folio);
+ if (err)
+ goto err_unpreserve;
+
+ pfn = folio_pfn(folio);
+ if (folio_test_dirty(folio))
+ flags |= PRESERVED_FLAG_DIRTY;
+ if (folio_test_uptodate(folio))
+ flags |= PRESERVED_FLAG_UPTODATE;
+
+ pfolio->foliodesc = PRESERVED_FOLIO_MKDESC(pfn, flags);
+ pfolio->index = folio->index;
+ }
+
+ return 0;
+
+err_unpreserve:
+ i--;
+ for (; i >= 0; i--)
+ WARN_ON_ONCE(kho_unpreserve_folio(folios[i]));
+ return err;
+}
+
+static void memfd_luo_unpreserve_folios(const struct memfd_luo_preserved_folio *pfolios,
+ unsigned int nr_folios)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr_folios; i++) {
+ const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
+ struct folio *folio;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+
+ WARN_ON_ONCE(kho_unpreserve_folio(folio));
+ unpin_folio(folio);
+ }
+}
+
+static void *memfd_luo_create_fdt(unsigned long size)
+{
+ unsigned int order = get_order(size);
+ struct folio *fdt_folio;
+ int err = 0;
+ void *fdt;
+
+ if (order > MAX_PAGE_ORDER)
+ return NULL;
+
+ fdt_folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
+ if (!fdt_folio)
+ return NULL;
+
+ fdt = folio_address(fdt_folio);
+
+ err |= fdt_create(fdt, (1 << (order + PAGE_SHIFT)));
+ err |= fdt_finish_reservemap(fdt);
+ err |= fdt_begin_node(fdt, "");
+ if (err)
+ goto free;
+
+ return fdt;
+
+free:
+ folio_put(fdt_folio);
+ return NULL;
+}
+
+static int memfd_luo_finish_fdt(void *fdt)
+{
+ int err;
+
+ err = fdt_end_node(fdt);
+ if (err)
+ return err;
+
+ return fdt_finish(fdt);
+}
+
+static int memfd_luo_prepare(struct liveupdate_file_handler *handler,
+ struct file *file, u64 *data)
+{
+ struct memfd_luo_preserved_folio *preserved_folios;
+ struct inode *inode = file_inode(file);
+ unsigned int max_folios, nr_folios = 0;
+ int err = 0, preserved_size;
+ struct folio **folios;
+ long size, nr_pinned;
+ pgoff_t offset;
+ void *fdt;
+ u64 pos;
+
+ inode_lock(inode);
+ shmem_i_mapping_freeze(inode, true);
+
+ size = i_size_read(inode);
+ if ((PAGE_ALIGN(size) / PAGE_SIZE) > UINT_MAX) {
+ err = -E2BIG;
+ goto err_unlock;
+ }
+
+ /*
+ * Guess the number of folios based on inode size. Real number might end
+ * up being smaller if there are higher order folios.
+ */
+ max_folios = PAGE_ALIGN(size) / PAGE_SIZE;
+ folios = kvmalloc_array(max_folios, sizeof(*folios), GFP_KERNEL);
+ if (!folios) {
+ err = -ENOMEM;
+ goto err_unfreeze;
+ }
+
+ /*
+ * Pin the folios so they don't move around behind our back. This also
+ * ensures none of the folios are in CMA -- which ensures they don't
+ * fall in KHO scratch memory. It also moves swapped out folios back to
+ * memory.
+ *
+ * A side effect of doing this is that it allocates a folio for all
+ * indices in the file. This might waste memory on sparse memfds. If
+ * that is really a problem in the future, we can have a
+ * memfd_pin_folios() variant that does not allocate a page on empty
+ * slots.
+ */
+ nr_pinned = memfd_pin_folios(file, 0, size - 1, folios, max_folios,
+ &offset);
+ if (nr_pinned < 0) {
+ err = nr_pinned;
+ pr_err("failed to pin folios: %d\n", err);
+ goto err_free_folios;
+ }
+ /* nr_pinned won't be more than max_folios which is also unsigned int. */
+ nr_folios = (unsigned int)nr_pinned;
+
+ preserved_size = sizeof(struct memfd_luo_preserved_folio) * nr_folios;
+ if (check_mul_overflow(sizeof(struct memfd_luo_preserved_folio),
+ nr_folios, &preserved_size)) {
+ err = -E2BIG;
+ goto err_unpin;
+ }
+
+ /*
+ * Most of the space should be taken by preserved folios. So take its
+ * size, plus a page for other properties.
+ */
+ fdt = memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE);
+ if (!fdt) {
+ err = -ENOMEM;
+ goto err_unpin;
+ }
+
+ pos = file->f_pos;
+ err = fdt_property(fdt, "pos", &pos, sizeof(pos));
+ if (err)
+ goto err_free_fdt;
+
+ err = fdt_property(fdt, "size", &size, sizeof(size));
+ if (err)
+ goto err_free_fdt;
+
+ err = fdt_property_placeholder(fdt, "folios", preserved_size,
+ (void **)&preserved_folios);
+ if (err) {
+ pr_err("Failed to reserve folios property in FDT: %s\n",
+ fdt_strerror(err));
+ err = -ENOMEM;
+ goto err_free_fdt;
+ }
+
+ err = memfd_luo_preserve_folios(preserved_folios, folios, nr_folios);
+ if (err)
+ goto err_free_fdt;
+
+ err = memfd_luo_finish_fdt(fdt);
+ if (err)
+ goto err_unpreserve;
+
+ err = kho_preserve_folio(virt_to_folio(fdt));
+ if (err)
+ goto err_unpreserve;
+
+ kvfree(folios);
+ inode_unlock(inode);
+
+ *data = virt_to_phys(fdt);
+ return 0;
+
+err_unpreserve:
+ memfd_luo_unpreserve_folios(preserved_folios, nr_folios);
+err_free_fdt:
+ folio_put(virt_to_folio(fdt));
+err_unpin:
+ unpin_folios(folios, nr_pinned);
+err_free_folios:
+ kvfree(folios);
+err_unfreeze:
+ shmem_i_mapping_freeze(inode, false);
+err_unlock:
+ inode_unlock(inode);
+ return err;
+}
+
+static int memfd_luo_freeze(struct liveupdate_file_handler *handler,
+ struct file *file, u64 *data)
+{
+ u64 pos = file->f_pos;
+ void *fdt;
+ int err;
+
+ if (WARN_ON_ONCE(!*data))
+ return -EINVAL;
+
+ fdt = phys_to_virt(*data);
+
+ /*
+ * The pos might have changed since prepare. Everything else stays the
+ * same.
+ */
+ err = fdt_setprop(fdt, 0, "pos", &pos, sizeof(pos));
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static void memfd_luo_cancel(struct liveupdate_file_handler *handler,
+ struct file *file, u64 data)
+{
+ const struct memfd_luo_preserved_folio *pfolios;
+ struct inode *inode = file_inode(file);
+ struct folio *fdt_folio;
+ void *fdt;
+ int len;
+
+ if (WARN_ON_ONCE(!data))
+ return;
+
+ inode_lock(inode);
+ shmem_i_mapping_freeze(inode, false);
+
+ fdt = phys_to_virt(data);
+ fdt_folio = virt_to_folio(fdt);
+ pfolios = fdt_getprop(fdt, 0, "folios", &len);
+ if (pfolios)
+ memfd_luo_unpreserve_folios(pfolios, len / sizeof(*pfolios));
+
+ kho_unpreserve_folio(fdt_folio);
+ folio_put(fdt_folio);
+ inode_unlock(inode);
+}
+
+static struct folio *memfd_luo_get_fdt(u64 data)
+{
+ return kho_restore_folio((phys_addr_t)data);
+}
+
+static void memfd_luo_discard_folios(const struct memfd_luo_preserved_folio *pfolios,
+ unsigned int nr_folios)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr_folios; i++) {
+ const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
+ struct folio *folio;
+ phys_addr_t phys;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ folio = kho_restore_folio(phys);
+ if (!folio) {
+ pr_warn_ratelimited("Unable to restore folio at physical address: %llx\n",
+ phys);
+ continue;
+ }
+
+ folio_put(folio);
+ }
+}
+
+static void memfd_luo_finish(struct liveupdate_file_handler *handler,
+ struct file *file, u64 data, bool reclaimed)
+{
+ const struct memfd_luo_preserved_folio *pfolios;
+ struct folio *fdt_folio;
+ int len;
+
+ if (reclaimed)
+ return;
+
+ fdt_folio = memfd_luo_get_fdt(data);
+
+ pfolios = fdt_getprop(folio_address(fdt_folio), 0, "folios", &len);
+ if (pfolios)
+ memfd_luo_discard_folios(pfolios, len / sizeof(*pfolios));
+
+ folio_put(fdt_folio);
+}
+
+static int memfd_luo_retrieve(struct liveupdate_file_handler *handler, u64 data,
+ struct file **file_p)
+{
+ const struct memfd_luo_preserved_folio *pfolios;
+ int nr_pfolios, len, ret = 0, i = 0;
+ struct address_space *mapping;
+ struct folio *folio, *fdt_folio;
+ const u64 *pos, *size;
+ struct inode *inode;
+ struct file *file;
+ const void *fdt;
+
+ fdt_folio = memfd_luo_get_fdt(data);
+ if (!fdt_folio)
+ return -ENOENT;
+
+ fdt = page_to_virt(folio_page(fdt_folio, 0));
+
+ pfolios = fdt_getprop(fdt, 0, "folios", &len);
+ if (!pfolios || len % sizeof(*pfolios)) {
+ pr_err("invalid 'folios' property\n");
+ ret = -EINVAL;
+ goto put_fdt;
+ }
+ nr_pfolios = len / sizeof(*pfolios);
+
+ size = fdt_getprop(fdt, 0, "size", &len);
+ if (!size || len != sizeof(u64)) {
+ pr_err("invalid 'size' property\n");
+ ret = -EINVAL;
+ goto put_folios;
+ }
+
+ pos = fdt_getprop(fdt, 0, "pos", &len);
+ if (!pos || len != sizeof(u64)) {
+ pr_err("invalid 'pos' property\n");
+ ret = -EINVAL;
+ goto put_folios;
+ }
+
+ file = shmem_file_setup("", 0, VM_NORESERVE);
+
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ pr_err("failed to setup file: %d\n", ret);
+ goto put_folios;
+ }
+
+ inode = file->f_inode;
+ mapping = inode->i_mapping;
+ vfs_setpos(file, *pos, MAX_LFS_FILESIZE);
+
+ for (; i < nr_pfolios; i++) {
+ const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
+ phys_addr_t phys;
+ u64 index;
+ int flags;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ folio = kho_restore_folio(phys);
+ if (!folio) {
+ pr_err("Unable to restore folio at physical address: %llx\n",
+ phys);
+ goto put_file;
+ }
+ index = pfolio->index;
+ flags = PRESERVED_FOLIO_FLAGS(pfolio->foliodesc);
+
+ /* Set up the folio for insertion. */
+ __folio_set_locked(folio);
+ __folio_set_swapbacked(folio);
+
+ ret = mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping));
+ if (ret) {
+ pr_err("shmem: failed to charge folio index %d: %d\n",
+ i, ret);
+ goto unlock_folio;
+ }
+
+ ret = shmem_add_to_page_cache(folio, mapping, index, NULL,
+ mapping_gfp_mask(mapping));
+ if (ret) {
+ pr_err("shmem: failed to add to page cache folio index %d: %d\n",
+ i, ret);
+ goto unlock_folio;
+ }
+
+ if (flags & PRESERVED_FLAG_UPTODATE)
+ folio_mark_uptodate(folio);
+ if (flags & PRESERVED_FLAG_DIRTY)
+ folio_mark_dirty(folio);
+
+ ret = shmem_inode_acct_blocks(inode, 1);
+ if (ret) {
+ pr_err("shmem: failed to account folio index %d: %d\n",
+ i, ret);
+ goto unlock_folio;
+ }
+
+ shmem_recalc_inode(inode, 1, 0);
+ folio_add_lru(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ inode->i_size = *size;
+ *file_p = file;
+ folio_put(fdt_folio);
+ return 0;
+
+unlock_folio:
+ folio_unlock(folio);
+ folio_put(folio);
+put_file:
+ fput(file);
+ i++;
+put_folios:
+ for (; i < nr_pfolios; i++) {
+ const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
+
+ folio = kho_restore_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ if (folio)
+ folio_put(folio);
+ }
+
+put_fdt:
+ folio_put(fdt_folio);
+ return ret;
+}
+
+static bool memfd_luo_can_preserve(struct liveupdate_file_handler *handler,
+ struct file *file)
+{
+ struct inode *inode = file_inode(file);
+
+ return shmem_file(file) && !inode->i_nlink;
+}
+
+static const struct liveupdate_file_ops memfd_luo_file_ops = {
+ .prepare = memfd_luo_prepare,
+ .freeze = memfd_luo_freeze,
+ .cancel = memfd_luo_cancel,
+ .finish = memfd_luo_finish,
+ .retrieve = memfd_luo_retrieve,
+ .can_preserve = memfd_luo_can_preserve,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler memfd_luo_handler = {
+ .ops = &memfd_luo_file_ops,
+ .compatible = "memfd-v1",
+};
+
+static int __init memfd_luo_init(void)
+{
+ int err;
+
+ err = liveupdate_register_file_handler(&memfd_luo_handler);
+ if (err)
+ pr_err("Could not register luo filesystem handler: %d\n", err);
+
+ return err;
+}
+late_initcall(memfd_luo_init);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
* [PATCH v4 23/30] mm: shmem: export some functions to internal.h
From: Pasha Tatashin @ 2025-09-29 1:03 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, changyuanl, pasha.tatashin, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
From: Pratyush Yadav <ptyadav@amazon.de>
shmem_inode_acct_blocks(), shmem_recalc_inode(), and
shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
functionality will also be used in the future by Live Update
Orchestrator (LUO) to recreate memfd files after a live update.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
mm/internal.h | 6 ++++++
mm/shmem.c | 10 +++++-----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..4ba155524f80 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1562,6 +1562,12 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid);
unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
int priority);
+int shmem_add_to_page_cache(struct folio *folio,
+ struct address_space *mapping,
+ pgoff_t index, void *expected, gfp_t gfp);
+int shmem_inode_acct_blocks(struct inode *inode, long pages);
+bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped);
+
#ifdef CONFIG_SHRINKER_DEBUG
static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
struct shrinker *shrinker, const char *fmt, va_list ap)
diff --git a/mm/shmem.c b/mm/shmem.c
index bd7d9afe5a27..4647a0b2831c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -219,7 +219,7 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
}
-static int shmem_inode_acct_blocks(struct inode *inode, long pages)
+int shmem_inode_acct_blocks(struct inode *inode, long pages)
{
struct shmem_inode_info *info = SHMEM_I(inode);
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
@@ -435,7 +435,7 @@ static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
*
* Return: true if swapped was incremented from 0, for shmem_writeout().
*/
-static bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
+bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
{
struct shmem_inode_info *info = SHMEM_I(inode);
bool first_swapped = false;
@@ -861,9 +861,9 @@ static void shmem_update_stats(struct folio *folio, int nr_pages)
/*
* Somewhat like filemap_add_folio, but error if expected item has gone.
*/
-static int shmem_add_to_page_cache(struct folio *folio,
- struct address_space *mapping,
- pgoff_t index, void *expected, gfp_t gfp)
+int shmem_add_to_page_cache(struct folio *folio,
+ struct address_space *mapping,
+ pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
unsigned long nr = folio_nr_pages(folio);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox