Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v4 03/30] kho: drop notifiers
From: Pratyush Yadav @ 2025-10-06 17:01 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-4-pasha.tatashin@soleen.com>

On Mon, Sep 29 2025, Pasha Tatashin wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> The KHO framework uses a notifier chain as the mechanism for clients to
> participate in the finalization process. While this works for a single,
> central state machine, it is too restrictive for kernel-internal
> components like pstore/reserve_mem or IMA. These components need a
> simpler, direct way to register their state for preservation (e.g.,
> during their initcall) without being part of a complex,
> shutdown-time notifier sequence. The notifier model forces all
> participants into a single finalization flow and makes direct
> preservation from an arbitrary context difficult.
> This patch refactors the client participation model by removing the
> notifier chain and introducing a direct API for managing FDT subtrees.
>
> The core kho_finalize() and kho_abort() state machine remains, but
> clients now register their data with KHO beforehand.
>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> diff --git a/mm/memblock.c b/mm/memblock.c
> index e23e16618e9b..c4b2d4e4c715 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2444,53 +2444,18 @@ int reserve_mem_release_by_name(const char *name)
>  #define MEMBLOCK_KHO_FDT "memblock"
>  #define MEMBLOCK_KHO_NODE_COMPATIBLE "memblock-v1"
>  #define RESERVE_MEM_KHO_NODE_COMPATIBLE "reserve-mem-v1"
> -static struct page *kho_fdt;
> -
> -static int reserve_mem_kho_finalize(struct kho_serialization *ser)
> -{
> -	int err = 0, i;
> -
> -	for (i = 0; i < reserved_mem_count; i++) {
> -		struct reserve_mem_table *map = &reserved_mem_table[i];
> -		struct page *page = phys_to_page(map->start);
> -		unsigned int nr_pages = map->size >> PAGE_SHIFT;
> -
> -		err |= kho_preserve_pages(page, nr_pages);
> -	}
> -
> -	err |= kho_preserve_folio(page_folio(kho_fdt));
> -	err |= kho_add_subtree(ser, MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt));
> -
> -	return notifier_from_errno(err);
> -}
> -
> -static int reserve_mem_kho_notifier(struct notifier_block *self,
> -				    unsigned long cmd, void *v)
> -{
> -	switch (cmd) {
> -	case KEXEC_KHO_FINALIZE:
> -		return reserve_mem_kho_finalize((struct kho_serialization *)v);
> -	case KEXEC_KHO_ABORT:
> -		return NOTIFY_DONE;
> -	default:
> -		return NOTIFY_BAD;
> -	}
> -}
> -
> -static struct notifier_block reserve_mem_kho_nb = {
> -	.notifier_call = reserve_mem_kho_notifier,
> -};
>  
>  static int __init prepare_kho_fdt(void)
>  {
>  	int err = 0, i;
> +	struct page *fdt_page;
>  	void *fdt;
>  
> -	kho_fdt = alloc_page(GFP_KERNEL);
> -	if (!kho_fdt)
> +	fdt_page = alloc_page(GFP_KERNEL);
> +	if (!fdt_page)
>  		return -ENOMEM;
>  
> -	fdt = page_to_virt(kho_fdt);
> +	fdt = page_to_virt(fdt_page);
>  
>  	err |= fdt_create(fdt, PAGE_SIZE);
>  	err |= fdt_finish_reservemap(fdt);
> @@ -2499,7 +2464,10 @@ static int __init prepare_kho_fdt(void)
>  	err |= fdt_property_string(fdt, "compatible", MEMBLOCK_KHO_NODE_COMPATIBLE);
>  	for (i = 0; i < reserved_mem_count; i++) {
>  		struct reserve_mem_table *map = &reserved_mem_table[i];
> +		struct page *page = phys_to_page(map->start);
> +		unsigned int nr_pages = map->size >> PAGE_SHIFT;
>  
> +		err |= kho_preserve_pages(page, nr_pages);
>  		err |= fdt_begin_node(fdt, map->name);
>  		err |= fdt_property_string(fdt, "compatible", RESERVE_MEM_KHO_NODE_COMPATIBLE);
>  		err |= fdt_property(fdt, "start", &map->start, sizeof(map->start));
> @@ -2507,13 +2475,14 @@ static int __init prepare_kho_fdt(void)
>  		err |= fdt_end_node(fdt);
>  	}
>  	err |= fdt_end_node(fdt);
> -
>  	err |= fdt_finish(fdt);
>  
> +	err |= kho_preserve_folio(page_folio(fdt_page));
> +	err |= kho_add_subtree(MEMBLOCK_KHO_FDT, fdt);
> +
>  	if (err) {
>  		pr_err("failed to prepare memblock FDT for KHO: %d\n", err);
> -		put_page(kho_fdt);
> -		kho_fdt = NULL;
> +		put_page(fdt_page);

This adds subtree to KHO even if the FDT might be invalid. And then
leaves a dangling reference in KHO to the FDT in case of an error. I
think you should either do this check after
kho_preserve_folio(page_folio(fdt_page)) and do a clean error check for
kho_add_subtree(), or call kho_remove_subtree() in the error block.

I prefer the former since if kho_add_subtree() is the one that fails,
there is little sense in removing a subtree that was never added.

>  	}
>  
>  	return err;
> @@ -2529,13 +2498,6 @@ static int __init reserve_mem_init(void)
>  	err = prepare_kho_fdt();
>  	if (err)
>  		return err;
> -
> -	err = register_kho_notifier(&reserve_mem_kho_nb);
> -	if (err) {
> -		put_page(kho_fdt);
> -		kho_fdt = NULL;
> -	}
> -
>  	return err;
>  }
>  late_initcall(reserve_mem_init);

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v4 02/30] kho: make debugfs interface optional
From: Pratyush Yadav @ 2025-10-06 16:55 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-3-pasha.tatashin@soleen.com>

Hi Pasha,

On Mon, Sep 29 2025, Pasha Tatashin wrote:

> Currently, KHO is controlled via debugfs interface, but once LUO is
> introduced, it can control KHO, and the debug interface becomes
> optional.
>
> Add a separate config CONFIG_KEXEC_HANDOVER_DEBUG that enables
> the debugfs interface, and allows to inspect the tree.
>
> Move all debugfs related code to a new file to keep the .c files
> clear of ifdefs.
>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> @@ -662,36 +660,24 @@ static void __init kho_reserve_scratch(void)
>  	kho_enable = false;
>  }
>  
> -struct fdt_debugfs {
> -	struct list_head list;
> -	struct debugfs_blob_wrapper wrapper;
> -	struct dentry *file;
> +struct kho_out {
> +	struct blocking_notifier_head chain_head;
> +	struct mutex lock; /* protects KHO FDT finalization */
> +	struct kho_serialization ser;
> +	bool finalized;
> +	struct kho_debugfs dbg;
>  };
>  
> -static int kho_debugfs_fdt_add(struct list_head *list, struct dentry *dir,
> -			       const char *name, const void *fdt)
> -{
> -	struct fdt_debugfs *f;
> -	struct dentry *file;
> -
> -	f = kmalloc(sizeof(*f), GFP_KERNEL);
> -	if (!f)
> -		return -ENOMEM;
> -
> -	f->wrapper.data = (void *)fdt;
> -	f->wrapper.size = fdt_totalsize(fdt);
> -
> -	file = debugfs_create_blob(name, 0400, dir, &f->wrapper);
> -	if (IS_ERR(file)) {
> -		kfree(f);
> -		return PTR_ERR(file);
> -	}
> -
> -	f->file = file;
> -	list_add(&f->list, list);
> -
> -	return 0;
> -}
> +static struct kho_out kho_out = {
> +	.chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head),
> +	.lock = __MUTEX_INITIALIZER(kho_out.lock),
> +	.ser = {
> +		.track = {
> +			.orders = XARRAY_INIT(kho_out.ser.track.orders, 0),
> +		},
> +	},
> +	.finalized = false,
> +};

There is already one definition for struct kho_out and a static struct
kho_out early in the file. This is a second declaration and definition.
And I was super confused when I saw patch 3 since it seemed to be making
unrelated changes to this struct (and removing an instance of this,
which should be done in this patch instead). In fact, this patch doesn't
even build due to this problem. I think some patch massaging is needed
to fix this all up.

>  
>  /**
>   * kho_add_subtree - record the physical address of a sub FDT in KHO root tree.
[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v4 03/30] kho: drop notifiers
From: Pratyush Yadav @ 2025-10-06 16:38 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Pasha Tatashin, jasonmiu, graf, changyuanl, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl, steven.sistare
In-Reply-To: <mafs0bjmkp0gb.fsf@kernel.org>

Hi,

On Mon, Oct 06 2025, Pratyush Yadav wrote:

> Hi Pasha,
>
> On Mon, Sep 29 2025, Pasha Tatashin wrote:
>
>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>>
>> The KHO framework uses a notifier chain as the mechanism for clients to
>> participate in the finalization process. While this works for a single,
>> central state machine, it is too restrictive for kernel-internal
>> components like pstore/reserve_mem or IMA. These components need a
>> simpler, direct way to register their state for preservation (e.g.,
>> during their initcall) without being part of a complex,
>> shutdown-time notifier sequence. The notifier model forces all
>> participants into a single finalization flow and makes direct
>> preservation from an arbitrary context difficult.
>> This patch refactors the client participation model by removing the
>> notifier chain and introducing a direct API for managing FDT subtrees.
>>
>> The core kho_finalize() and kho_abort() state machine remains, but
>> clients now register their data with KHO beforehand.
>>
>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> This patch breaks build of test_kho.c (under CONFIG_TEST_KEXEC_HANDOVER):
>
> 	lib/test_kho.c:49:14: error: ‘KEXEC_KHO_ABORT’ undeclared (first use in this function)
> 	   49 |         case KEXEC_KHO_ABORT:
> 	      |              ^~~~~~~~~~~~~~~
> 	[...]
> 	lib/test_kho.c:51:14: error: ‘KEXEC_KHO_FINALIZE’ undeclared (first use in this function)
> 	   51 |         case KEXEC_KHO_FINALIZE:
> 	      |              ^~~~~~~~~~~~~~~~~~
> 	[...]
>
> I think you need to update it as well to drop notifier usage.

Here's the fix. Build passes now and the test succeeds under my qemu
test setup.

--- 8< ---
From a8e6b5dfef38bfbcd41f3dd08598cb79a0701d7e Mon Sep 17 00:00:00 2001
From: Pratyush Yadav <pratyush@kernel.org>
Date: Mon, 6 Oct 2025 18:35:20 +0200
Subject: [PATCH] fixup! kho: drop notifiers

Update KHO test to drop the notifiers as well.

Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---
 lib/test_kho.c | 32 +++-----------------------------
 1 file changed, 3 insertions(+), 29 deletions(-)

diff --git a/lib/test_kho.c b/lib/test_kho.c
index fe8504e3407b5..e9462a1e4b93b 100644
--- a/lib/test_kho.c
+++ b/lib/test_kho.c
@@ -38,33 +38,6 @@ struct kho_test_state {
 
 static struct kho_test_state kho_test_state;
 
-static int kho_test_notifier(struct notifier_block *self, unsigned long cmd,
-			     void *v)
-{
-	struct kho_test_state *state = &kho_test_state;
-	struct kho_serialization *ser = v;
-	int err = 0;
-
-	switch (cmd) {
-	case KEXEC_KHO_ABORT:
-		return NOTIFY_DONE;
-	case KEXEC_KHO_FINALIZE:
-		/* Handled below */
-		break;
-	default:
-		return NOTIFY_BAD;
-	}
-
-	err |= kho_preserve_folio(state->fdt);
-	err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt));
-
-	return err ? NOTIFY_BAD : NOTIFY_DONE;
-}
-
-static struct notifier_block kho_test_nb = {
-	.notifier_call = kho_test_notifier,
-};
-
 static int kho_test_save_data(struct kho_test_state *state, void *fdt)
 {
 	phys_addr_t *folios_info;
@@ -111,6 +84,7 @@ static int kho_test_prepare_fdt(struct kho_test_state *state)
 
 	fdt = folio_address(state->fdt);
 
+	err |= kho_preserve_folio(state->fdt);
 	err |= fdt_create(fdt, fdt_size);
 	err |= fdt_finish_reservemap(fdt);
 
@@ -194,7 +168,7 @@ static int kho_test_save(void)
 	if (err)
 		goto err_free_folios;
 
-	err = register_kho_notifier(&kho_test_nb);
+	err = kho_add_subtree(KHO_TEST_FDT, folio_address(state->fdt));
 	if (err)
 		goto err_free_fdt;
 
@@ -309,7 +283,7 @@ static void kho_test_cleanup(void)
 
 static void __exit kho_test_exit(void)
 {
-	unregister_kho_notifier(&kho_test_nb);
+	kho_remove_subtree(folio_address(kho_test_state.fdt));
 	kho_test_cleanup();
 }
 module_exit(kho_test_exit);
-- 
Regards,
Pratyush Yadav

^ permalink raw reply related

* Re: [PATCH v4 02/30] kho: make debugfs interface optional
From: Pratyush Yadav @ 2025-10-06 16:30 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-3-pasha.tatashin@soleen.com>

On Mon, Sep 29 2025, Pasha Tatashin wrote:

> Currently, KHO is controlled via debugfs interface, but once LUO is
> introduced, it can control KHO, and the debug interface becomes
> optional.
>
> Add a separate config CONFIG_KEXEC_HANDOVER_DEBUG that enables
> the debugfs interface, and allows to inspect the tree.
>
> Move all debugfs related code to a new file to keep the .c files
> clear of ifdefs.
>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  MAINTAINERS                      |   3 +-
>  kernel/Kconfig.kexec             |  10 ++
>  kernel/Makefile                  |   1 +
>  kernel/kexec_handover.c          | 255 +++++--------------------------
>  kernel/kexec_handover_debug.c    | 218 ++++++++++++++++++++++++++
>  kernel/kexec_handover_internal.h |  44 ++++++
>  6 files changed, 311 insertions(+), 220 deletions(-)
>  create mode 100644 kernel/kexec_handover_debug.c
>  create mode 100644 kernel/kexec_handover_internal.h
>
[...]
> --- a/kernel/Kconfig.kexec
> +++ b/kernel/Kconfig.kexec
> @@ -109,6 +109,16 @@ config KEXEC_HANDOVER
>  	  to keep data or state alive across the kexec. For this to work,
>  	  both source and target kernels need to have this option enabled.
>  
> +config KEXEC_HANDOVER_DEBUG

Nit: can we call it KEXEC_HANDOVER_DEBUGFS instead? I think we would
like to add a KEXEC_HANDOVER_DEBUG at some point to control debug
asserts for KHO, and the naming would get confusing. And renaming config
symbols is kind of a pain.

> +	bool "kexec handover debug interface"
> +	depends on KEXEC_HANDOVER
> +	depends on DEBUG_FS
> +	help
> +	  Allow to control kexec handover device tree via debugfs
> +	  interface, i.e. finalize the state or aborting the finalization.
> +	  Also, enables inspecting the KHO fdt trees with the debugfs binary
> +	  blobs.
> +
[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v4 03/30] kho: drop notifiers
From: Pasha Tatashin @ 2025-10-06 16:17 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <mafs0bjmkp0gb.fsf@kernel.org>

On Mon, Oct 6, 2025 at 10:30 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> Hi Pasha,
>
> On Mon, Sep 29 2025, Pasha Tatashin wrote:
>
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > The KHO framework uses a notifier chain as the mechanism for clients to
> > participate in the finalization process. While this works for a single,
> > central state machine, it is too restrictive for kernel-internal
> > components like pstore/reserve_mem or IMA. These components need a
> > simpler, direct way to register their state for preservation (e.g.,
> > during their initcall) without being part of a complex,
> > shutdown-time notifier sequence. The notifier model forces all
> > participants into a single finalization flow and makes direct
> > preservation from an arbitrary context difficult.
> > This patch refactors the client participation model by removing the
> > notifier chain and introducing a direct API for managing FDT subtrees.
> >
> > The core kho_finalize() and kho_abort() state machine remains, but
> > clients now register their data with KHO beforehand.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> This patch breaks build of test_kho.c (under CONFIG_TEST_KEXEC_HANDOVER):
>
>         lib/test_kho.c:49:14: error: ‘KEXEC_KHO_ABORT’ undeclared (first use in this function)
>            49 |         case KEXEC_KHO_ABORT:
>               |              ^~~~~~~~~~~~~~~
>         [...]
>         lib/test_kho.c:51:14: error: ‘KEXEC_KHO_FINALIZE’ undeclared (first use in this function)
>            51 |         case KEXEC_KHO_FINALIZE:
>               |              ^~~~~~~~~~~~~~~~~~
>         [...]
>
> I think you need to update it as well to drop notifier usage.

Yes, thank you Pratyush. I missed this change in my patch.

Pasha

^ permalink raw reply

* Re: [PATCH v6 4/6] fs: make vfs_fileattr_[get|set] return -EOPNOSUPP
From: Jan Kara @ 2025-10-06 15:39 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrey Albershteyn, Amir Goldstein, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Jan Kara, Pali Rohár,
	Paul Moore, linux-api, linux-fsdevel, linux-kernel, linux-xfs,
	selinux, Andrey Albershteyn
In-Reply-To: <a622643f-1585-40b0-9441-cf7ece176e83@kernel.org>

On Mon 06-10-25 13:09:05, Jiri Slaby wrote:
> On 30. 06. 25, 18:20, Andrey Albershteyn wrote:
> > Future patches will add new syscalls which use these functions. As
> > this interface won't be used for ioctls only, the EOPNOSUPP is more
> > appropriate return code.
> > 
> > This patch converts return code from ENOIOCTLCMD to EOPNOSUPP for
> > vfs_fileattr_get and vfs_fileattr_set. To save old behavior translate
> > EOPNOSUPP back for current users - overlayfs, encryptfs and fs/ioctl.c.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> ...
> > @@ -292,6 +294,8 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
> >   			fileattr_fill_flags(&fa, flags);
> >   			err = vfs_fileattr_set(idmap, dentry, &fa);
> >   			mnt_drop_write_file(file);
> > +			if (err == -EOPNOTSUPP)
> > +				err = -ENOIOCTLCMD;
> 
> This breaks borg code (unit tests already) as it expects EOPNOTSUPP, not
> ENOIOCTLCMD/ENOTTY:
> https://github.com/borgbackup/borg/blob/1c6ef7a200c7f72f8d1204d727fea32168616ceb/src/borg/platform/linux.pyx#L147
> 
> I.e. setflags now returns ENOIOCTLCMD/ENOTTY for cases where 6.16 used to
> return EOPNOTSUPP.
> 
> This minimal testcase program doing ioctl(fd2, FS_IOC_SETFLAGS,
> &FS_NODUMP_FL):
> https://github.com/jirislaby/collected_sources/tree/master/ioctl_setflags
> 
> dumps in 6.16:
> sf: ioctl: Operation not supported
> 
> with the above patch:
> sf: ioctl: Inappropriate ioctl for device
> 
> Is this expected?

No, that's a bug and a clear userspace regression so we need to fix it. I
think we need to revert this commit and instead convert ENOIOCTLCMD from
vfs_fileattr_get/set() to EOPNOTSUPP in appropriate places. Andrey?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v4 03/30] kho: drop notifiers
From: Pratyush Yadav @ 2025-10-06 14:30 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-4-pasha.tatashin@soleen.com>

Hi Pasha,

On Mon, Sep 29 2025, Pasha Tatashin wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> The KHO framework uses a notifier chain as the mechanism for clients to
> participate in the finalization process. While this works for a single,
> central state machine, it is too restrictive for kernel-internal
> components like pstore/reserve_mem or IMA. These components need a
> simpler, direct way to register their state for preservation (e.g.,
> during their initcall) without being part of a complex,
> shutdown-time notifier sequence. The notifier model forces all
> participants into a single finalization flow and makes direct
> preservation from an arbitrary context difficult.
> This patch refactors the client participation model by removing the
> notifier chain and introducing a direct API for managing FDT subtrees.
>
> The core kho_finalize() and kho_abort() state machine remains, but
> clients now register their data with KHO beforehand.
>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

This patch breaks build of test_kho.c (under CONFIG_TEST_KEXEC_HANDOVER):

	lib/test_kho.c:49:14: error: ‘KEXEC_KHO_ABORT’ undeclared (first use in this function)
	   49 |         case KEXEC_KHO_ABORT:
	      |              ^~~~~~~~~~~~~~~
	[...]
	lib/test_kho.c:51:14: error: ‘KEXEC_KHO_FINALIZE’ undeclared (first use in this function)
	   51 |         case KEXEC_KHO_FINALIZE:
	      |              ^~~~~~~~~~~~~~~~~~
	[...]

I think you need to update it as well to drop notifier usage.

[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 4/6] fs: make vfs_fileattr_[get|set] return -EOPNOSUPP
From: Arnd Bergmann @ 2025-10-06 11:43 UTC (permalink / raw)
  To: Jiri Slaby, Andrey Albershteyn, Amir Goldstein, Casey Schaufler,
	Christian Brauner, Jan Kara, Pali Rohár, Paul Moore
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-xfs, selinux,
	Andrey Albershteyn
In-Reply-To: <a622643f-1585-40b0-9441-cf7ece176e83@kernel.org>

On Mon, Oct 6, 2025, at 13:09, Jiri Slaby wrote:
> On 30. 06. 25, 18:20, Andrey Albershteyn wrote:
>> Future patches will add new syscalls which use these functions. As
>> this interface won't be used for ioctls only, the EOPNOSUPP is more
>> appropriate return code.
>> 
>> This patch converts return code from ENOIOCTLCMD to EOPNOSUPP for
>> vfs_fileattr_get and vfs_fileattr_set. To save old behavior translate
>> EOPNOSUPP back for current users - overlayfs, encryptfs and fs/ioctl.c.
>> 
...
> dumps in 6.16:
> sf: ioctl: Operation not supported
>
> with the above patch:
> sf: ioctl: Inappropriate ioctl for device
>
>
> Is this expected?

This does look like an unintentional bug: As far as I can see, the
-ENOIOCTLCMD was previously used to indicate that a particular filesystem
does not have a fileattr_{get,set} callback at all, while individual
filesystems used EOPNOSUPP to indicate that a particular attribute
flag is unsupported. With the double conversion, both error codes
get turned into a single one.

     Arnd

^ permalink raw reply

* Re: [PATCH v6 4/6] fs: make vfs_fileattr_[get|set] return -EOPNOSUPP
From: Jiri Slaby @ 2025-10-06 11:09 UTC (permalink / raw)
  To: Andrey Albershteyn, Amir Goldstein, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Jan Kara, Pali Rohár,
	Paul Moore
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-xfs, selinux,
	Andrey Albershteyn
In-Reply-To: <20250630-xattrat-syscall-v6-4-c4e3bc35227b@kernel.org>

On 30. 06. 25, 18:20, Andrey Albershteyn wrote:
> Future patches will add new syscalls which use these functions. As
> this interface won't be used for ioctls only, the EOPNOSUPP is more
> appropriate return code.
> 
> This patch converts return code from ENOIOCTLCMD to EOPNOSUPP for
> vfs_fileattr_get and vfs_fileattr_set. To save old behavior translate
> EOPNOSUPP back for current users - overlayfs, encryptfs and fs/ioctl.c.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
...
> @@ -292,6 +294,8 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
>   			fileattr_fill_flags(&fa, flags);
>   			err = vfs_fileattr_set(idmap, dentry, &fa);
>   			mnt_drop_write_file(file);
> +			if (err == -EOPNOTSUPP)
> +				err = -ENOIOCTLCMD;

This breaks borg code (unit tests already) as it expects EOPNOTSUPP, not 
ENOIOCTLCMD/ENOTTY:
https://github.com/borgbackup/borg/blob/1c6ef7a200c7f72f8d1204d727fea32168616ceb/src/borg/platform/linux.pyx#L147

I.e. setflags now returns ENOIOCTLCMD/ENOTTY for cases where 6.16 used 
to return EOPNOTSUPP.

This minimal testcase program doing ioctl(fd2, FS_IOC_SETFLAGS, 
&FS_NODUMP_FL):
https://github.com/jirislaby/collected_sources/tree/master/ioctl_setflags

dumps in 6.16:
sf: ioctl: Operation not supported

with the above patch:
sf: ioctl: Inappropriate ioctl for device


Is this expected?

thanks,
-- 
js
suse labs


^ permalink raw reply

* Re: [PATCH 00/62] initrd: remove classic initrd support
From: Askar Safin @ 2025-10-06  6:19 UTC (permalink / raw)
  To: rob
  Cc: akpm, andy.shevchenko, axboe, brauner, cyphar, devicetree,
	email2tema, graf, gregkh, hca, hch, hsiangkao, initramfs, jack,
	julian.stecklina, kees, linux-acpi, linux-alpha, linux-api,
	linux-arch, linux-block, linux-csky, linux-doc, linux-efi,
	linux-ext4, linux-fsdevel, linux-hexagon, linux-kernel,
	linux-m68k, linux-mips, linux-openrisc, linux-parisc, linux-riscv,
	linux-s390, linux-sh, linux-snps-arc, linux-um, linuxppc-dev,
	loongarch, mcgrof, mingo, monstr, mzxreary, patches, sparclinux,
	thomas.weissschuh, thorsten.blum, torvalds, tytso, viro, x86
In-Reply-To: <0342fbda-9901-4293-afa7-ba6085eb1688@landley.net>

Rob Landley <rob@landley.net>:
> Still useful for embedded systems that can memory map flash, but it's

They can use workaround suggested in cover letter.

> While you're at it, could you fix static/builtin initramfs so PID 1 has 
> a valid stdin/stdout/stderr?

This is in my low-priority TODO list. I want to help you. I will possibly do this
after a month or two or three...

> I posted various patches to make CONFIG_DEVTMPFS_MOUNT work for initmpfs

My solution will be different: I will create static /dev/console and /dev/null
after unpacking of builtin and external initramfs. (/dev/null because of
that bionic problem you somewhere wrote.)

> Oh hey, somebody using mkroot. Cool. :)

Yeah, thank you for mkroot.

> Now that lkml.iu.edu is back up (yay!) all the links in 
> ramfs-rootfs-initramfs.txt can theoretically be fixed just by switching 
> the domain name.

Yes, I plan to replace them with lore.kernel.org ones. This is in my low-priority
TODO list, too.

> > For example, I renamed the following global variables:
> > 
> > __initramfs_start
> > __initramfs_size
> 
> That already said initramfs, and you renamed it.

Yes, to distinguish builtin and external initramfs.

> > phys_initrd_start
> > phys_initrd_size
> > initrd_start
> > initrd_end
> 
> Which is data delivered through grub's "initrd" command. Here's how I've 

My plan is to change "official" names for these things.
"initramfs" will refer both to .cpio archive itself and to loading
mechanism. Name of GRUB's "initrd" command will become "wrong, kept for
compatibility".

But I plan to do all these renamings after I fully remove initrd support,
which will happen in September 2026, as I explained in another email.

> 3) rootfs is (for some reason) the name of the mounted filesystem in 
> /proc/mounts (because letting it say "ramfs" or "tmpfs" like normal in 
> /proc/mounts would be consistent and immediately understandable, so they 
> couldn't have that).

I totally agree. I want to change it to ramfs/tmpfs. But this change
may break something, so I think we need some strong motivation to
do this. So I will wait for removal of nommu support. Arnd Bergmann said
"NOMMU removal maybe 2027" ( https://lwn.net/Articles/1035727/ ,
https://static.sched.com/hosted_files/osseu2025/75/32-bit%20Linux%20in%202025%20%28OSS%20Europe%29.pdf ,
slide 20). (Also he said 32-bit support will be removed, too.)
After that I will remove ramfs (yeah, I love to remove things),
and, while we are here, I will rename "rootfs" to "tmpfs" in
/proc/mounts (hopefully I will get away with this).

> > __builtin_initramfs_start
> > __builtin_initramfs_size
> > phys_external_initramfs_start
> > phys_external_initramfs_size
> > virt_external_initramfs_start
> > virt_external_initramfs_end
> 
> Do you believe people will understand what the slightly longer names are 
> without looking them up?

No. But I still hope new names are better. As I said above, all these
will be named "initramfs" under my new plan. But again, all these
will happen after full initrd removal, which will happen in Sep 2026.

> I'm all for removing obsolete code, but a partial cleanup that still 
> leaves various sharp edges around isn't necessarily a net improvement. 
> Did you remove the NFS mount code from init/do_mounts.c? Part of the 

Okay, I put this to my low-priority TODO list.

> The one config symbol that really seems to bite people in this area is 
> BLK_DEV_INITRD because a common thing people running from initramfs want 
> to do is yank the block layer entirely (CONFIG_BLOCK=n) and use 
> initramfs instead, and needing to enable CONFIG_BLK_DEV_INITRD while
> 
> And the INSANE part is they generally want a static initrd to do it so 
> they're not using the external loader, but Kconfig has INITRAMFS_SOURCE 
> under CONFIG_BLK_DEV_INITRD and it's a mess. Renaming THAT symbol would 
> be good.

You mean renaming CONFIG_BLK_DEV_INITRD will be good?
I do exactly that.
And while we are here, I also rename CONFIG_RD_*,
because configs will be broken anyway.

Also, recently we got keyword "transitional" to help with such
renamings: https://www.phoronix.com/news/Linux-6.18-Transitional .
I will use it.

> To you. I'm not entirely sure what virt_external means. (Yes I could go 

It means "virtual address of external initramfs". But, yes, Borislav Petkov
said me in another email that kernel devs usually use "va" for virtual
address and "pa" for physical, so I will use these terms (in Sep 2026).

> Meanwhile 35 years of installed base expertise in other people's heads 
> has been discarded and developed version skew for anyone maintaining an 

I'm still not convinced. Ideally I want to remove word "initrd" from Linux
sources completely.

Decision to merge my patches or not is on maintainers anyway. They
will decide whether these renamings are good idea.

> > - Removed kernel command line parameter "ramdisk_start",
> > which was used for initrd only (not for initramfs)
> 
> Some bootloaders appended that to the kernel command line to specify 
> where in memory they've loaded the initrd image, which could be a 
> cpio.gz once upon a time. No idea what regressions happened since though.

I double-checked: ramdisk_start is used for initrd code path only
in modern kernels, not for initramfs code path.

"initrd=" is used in both code paths, and I keep it.

==

While we are here, let me answer other your emails, too.

Here is answer to https://lore.kernel.org/all/94023988-8498-4070-bdb7-6758dbe4b91d@landley.net/ .

> There used to be a way to feed a the kernel config a text file listing 
> what to make in the cpio file instead of just pointing it at a 
> directory, and my old Aboriginal Linux build used that mechanism 
...
> But kernel commit 469e87e89fd6 broke that mechanism because somebody 
> dunning-krugered it away ("I don't understand why we need this therefore 

I will consider fixing this, too. Put to my low-priority TODO list.

But it is possible that I will instead remove gen-init-cpio completely.
(I will do some experiments before deciding.)
If it was broken, and nobody except for you cared, then this means that
nobody except for you use it.

Of course, I will do that after sending patch for unconditional creating of
/dev/console and /dev/null, so you are safe.

> And again: you ONLY need this for static initramfs. Dynamic initramfs 
> has code create /dev/console (at boot time, not build time):
>
> https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27

Your explanation is wrong here. As you can see in Makefile, noinitramfs.c
is not built if there is BLK_DEV_INITRD.

If you don't have BLK_DEV_INITRD, then noinitramfs.c
is built, and it creates /dev/console.

If there is BLK_DEV_INITRD and there is no INITRAMFS_SOURCE, then
default built-in initramfs is used, which is specified here:
https://elixir.bootlin.com/linux/v6.17/source/usr/default_cpio_list
(and it happens to be equivalent to specified in noinitramfs.c).

If there are both BLK_DEV_INITRD and INITRAMFS_SOURCE, then
INITRAMFS_SOURCE is used instead of default built-in initramfs,
so there is no /dev/console.

I am totally sure that my explanation is correct.

> I could emit cpio contents with xxd -r from a HERE document hexdump or

There is no need for "xxd -r". cpio encoding of /dev/console is ASCII
(except for some null bytes). See:

$ echo /dev/console | cpio --create --format=newc --quiet | xxd
00000000: 3037 3037 3031 3030 3030 3030 3043 3030  0707010000000C00
00000010: 3030 3231 3830 3030 3030 3030 3030 3030  0021800000000000
00000020: 3030 3030 3030 3030 3030 3030 3031 3638  0000000000000168
00000030: 4438 4337 4241 3030 3030 3030 3030 3030  D8C7BA0000000000
00000040: 3030 3030 3030 3030 3030 3030 3036 3030  0000000000000600
00000050: 3030 3030 3035 3030 3030 3030 3031 3030  0000050000000100
00000060: 3030 3030 3044 3030 3030 3030 3030 2f64  00000D00000000/d
00000070: 6576 2f63 6f6e 736f 6c65 0000 3037 3037  ev/console..0707
00000080: 3031 3030 3030 3030 3030 3030 3030 3030  0100000000000000
00000090: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
000000a0: 3030 3030 3030 3030 3031 3030 3030 3030  0000000001000000
000000b0: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
000000c0: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
000000d0: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
000000e0: 3042 3030 3030 3030 3030 5452 4149 4c45  0B00000000TRAILE
000000f0: 5221 2121 0000 0000 0000 0000 0000 0000  R!!!............
00000100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

So, I think the following will go (not tested):

==
printf '%s' '0707010000000C0000218000000000000000000000000168D8C7BA00000000000000000000000600000005000000010000000D00000000/dev/console' > out.cpio
printf '\0\0' >> out.cpio
==

Maybe even last '\0\0' is not needed.

Also, this your email ( https://lore.kernel.org/all/94023988-8498-4070-bdb7-6758dbe4b91d@landley.net/ )
for some reasons didn't end up on https://lore.kernel.org/lkml .

As you can see here https://lore.kernel.org/lkml/94023988-8498-4070-bdb7-6758dbe4b91d@landley.net/ ,
the full list of lore mailing lists, which got it, is linux-snps-arc, linux-riscv and linux-sh .

I wrote about this to public-inbox:
http://public-inbox.org/meta/CAPnZJGB7ugY5rytS+hO-QzvPQBNjCh1jzs4WVkuakafBM9c_=w@mail.gmail.com/T/#u .
But it is possible that the problem is on your side.

Maybe this is why people ignore your emails? Maybe they simply don't get them?

Consider applying for linux.dev email ( https://linux.dev ). They are free for linux devs.

==

Now let me answer to https://lore.kernel.org/lkml/8f595eec-e85e-4c1f-acb0-5069a01c1012@landley.net/T/#u .

> I find the community an elaborate bureaucracy unresponsive to hobbyists. 
> Documentation/process/submitting-patches.rst being a 934 line document 
> with a bibliography, plus a 24 step checklist not counting the a) b) c) 
> subsections are just symptoms. The real problem is following those is 
> not sufficient to navigate said bureaucracy.

I totally agree.

Still I somehow was able to manage this.

Again: I totally agree. I just want to share some practical advice, that helped me
to get my patches merged.

As you can see, I was able to get my patches merged:
https://lore.kernel.org/all/?q=f:%22Askar%20Safin%22 .

And this is despite nobody paid me for this. I do this in my own free time.

As well as I understand, you are doing embedded Linux development as your job,
so you are in better position.

My patches are merged despite my productivity is low. I am very slow person.

You don't need to remember all of submitting-patches.rst . Just do this:

- Run checkpatch.pl . It accepts git ranges, e. g. "checkpatch.pl origin/HEAD..HEAD"
- After posting patches respond to comments, apply their edits, send new version, then again and again

When sending patches and responding to comments don't write too long letters.
Nobody will carefully read long letters and respond to them.
I respond to such letters, because I'm autistic, and I feel responsibility to carefully
read and respond to each letter. But other people don't do this.

In particular, when sending patches and responding to comments don't write long
paragraphs about good things you did in the past and about how you are disappointed
in the entire world, such as these:

> Let's see, I wrote the initramfs documentation in 2005:
>
> https://lwn.net/Articles/157676/
>
> Was already correcting kernel developers on how it actually worked 
> (rather than theoretically worked) in 2006:
>
> https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html
>
> I added tmpfs support to it in 2013 (because nobody else had bothered 
> for EIGHT YEARS):
>
> https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html
>
> I've maintained my own cpio implementation in toybox for over a decade:
>
> https://github.com/landley/toybox/commit/a2d558151a63
>
> The successor to aboriginal (above) is a 400 line bash script that 
> builds a dozen archtectures that each boot to a shell prompt in qemu:
>
> https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
> https://landley.net/bin/mkroot/latest/
>
> With automated regression test infrastructure to boot them all under 
> qemu and confirm that it runs, the clocks are set right, the network 
> works, and it can read from -hda:
>
> https://github.com/landley/toybox/blob/master/mkroot/testroot.sh
>
> So yes I _can_ create my own bespoke C program to modify the file in 
> arbitrary ways, I have my reasons not to do that, and have thought about 
> them for a while now.

Again: I'm not trying to insult you. I'm just trying to give advice how
to get your patches merged.

When my patches are ready, I send them using something like this:

==
UPSTREAM=origin/HEAD
MERGE_BASE="$(git merge-base "$UPSTREAM" HEAD)"

mkdir /tmp/patches

# For --signoff
export GIT_COMMITTER_EMAIL=me@example.com

# Prepare patches
# --base for "base-commit:" footer
git format-patch --cover-letter --find-renames --base="$MERGE_BASE" --signoff -o /tmp/patches \
  --subject-prefix='PATCH v2' "$MERGE_BASE"

editor /tmp/patches/0000-cover-letter.patch

# Send
# "--batch-size=1 --relogin-delay=20" to insert delays between patches. Hopefully
# this will help me to cope with my mailserver limits
# "--confirm=" to give myself chance to cancel
git send-email --batch-size=1 --relogin-delay=20 --confirm=always --to=a@example.com --cc=b@example.com \
  /tmp/patches
==

This script will automatically generate nice diffstat in cover letter.

This script is not tested. Actually I use my own 182-line Rust program, which does
same thing.

This is checklist I plan to do when sending v2 version of this initrd patchset:
- Read all answers to prev. version, respond and apply edits
- checkpatch.pl
- Check that my patchset doesn't conflict with linux-next
- Check that every commit compiles for x86_64 with "W=1"
- Test everything using mkroot.sh rewritten in Rust

> Why keep the section when you removed the old mechanism?

This section still contains useful info, so I kept it.
But okay, I agree, I will rewrite it to not mention initrd.
I will do this after full removal of initrd, i. e. in Sep 2026.

If you want me to send some patch to this document _now_,
then just ask me, I will try to do this.

> Those two lines you just touched contradict each other

Will fix in Sep 2026, too.

> The init/noinitramfs.c file does init/mkdir("/dev") and 
> init_mknod("/dev/console") because calling the syscall_blah() functions 
> directly was considered icky so they created gratuitous wrappers to do

You cannot directly call syscall from kernel code if your syscall
works with strings. Reasons are here: https://lwn.net/Articles/832121/ .

mkdir syscall expects string, located in user memory. So you
cannot call it from kernel and pass kernel string to it.
Thus you need separate init_mkdir.

> Anyway, that's why the 130+ byte archive was there. It wasn't actually 
> empty, even when initramfs was disabled.

I just double-checked. If BLK_DEV_INITRD is disabled, then
there is no any builtin initramfs at all. If BLK_DEV_INITRD is
disabled, then initramfs_data.S is not built, as we can see here:

https://elixir.bootlin.com/linux/v6.17/source/usr/Makefile#L15

And initramfs_data.S contains symbol __initramfs_size, so, yes,
initramfs_data.S is actual builtin initramfs.

In fact, that "obj-$(CONFIG_BLK_DEV_INITRD) :=" trick
is not needed, because whole usr/ dir is compiled out,
if there is no BLK_DEV_INITRD:
https://elixir.bootlin.com/linux/v6.17/source/init/Kconfig#L1455

Again: I acknoledge that bug with missing /dev/console. In fact,
I was able to reproduce it. I plan to fix it in a month or two.

> > +If the kernel has CONFIG_BLK_DEV_INITRD enabled, an external cpio.gz archive can also
>
> You renamed that symbol, then even you use the old name here.

I rename it in later commit.

> > -This has the memory efficiency advantages of initramfs (no ramdisk block
> > -device) but the separate packaging of initrd (which is nice if you have
> > +This is nice if you have
> >   non-GPL code you'd like to run from initramfs, without conflating it with
> > -the GPL licensed Linux kernel binary).
> > +the GPL licensed Linux kernel binary.
>
> IANAL: Whether or not this qualifies as "mere aggregation" had yet to go 
> to court last I heard.

This is possible that court will use this file as an argument.
So let's keep this paragraph here. :)

There is an example, where FAQ on FSF site was actually
used as argument in court: https://www.sonarsource.com/blog/will-the-new-judicial-ruling-in-the-vizio-lawsuit-strengthen-the-gpl/ .

I mean this quote:

> Vizio “did not dispute” the first two questions, focusing instead on the “expectations” of the contracting parties.
> Relying on the Free Software Foundation’s (FSF) GPL FAQs, it argued that the FSF never intended for third parties to enforce the contract,
> and therefore the parties to the contract could not have intended it.


> >     echo init | cpio -o -H newc | gzip > test.cpio.gz
> > -  # Testing external initramfs using the initrd loading mechanism.
> > +  # Testing external initramfs.
>
> Does grub not still call it "initrd"?

Yes, grub still calls it "initrd".
As I said, in Sep 2026 I will rename bootloader loading mechanism to "initramfs",
and name of grub command "initrd" will simply become "wrong".

> A) they added -hda so you don't have to give it a dummy /dev/zero anymore.

Ok, I will fix.

> B) there's no longer a "qemu" defaulting to the current architecture,

Ok, I will fix.

-- 
Askar Safin

^ permalink raw reply

* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Theodore Ts'o @ 2025-10-06  2:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Pavel Emelyanov, linux-fsdevel,
	Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <aOMBbKUlvv2uYLzD@dread.disaster.area>

On Mon, Oct 06, 2025 at 10:38:20AM +1100, Dave Chinner wrote:
> We have already provided a safe method for minimising the overhead
> of c/mtime updates in the IO path - it's called lazytime.  The
> lazytime mount option provides eventual consistency for c/mtime
> updates for IO operations instead of immediate consistency.
> 
> Timestamps are still updated to have the correct values, but the
> latency/performance of the timestamp updates is greatly improved by
> holding them purely in memory until some other trigger forces them
> to be persisted to disk.

Specifically, the timestamps are persisted to stable store when (a)
the file system is unmounted, (b) when the inode needs to be pushed
out to memory due to memory pressure, (c) when the inode is forcibly
persisted using fsync(), (d) when some other inode field is updated,
and the inode gets written out, or (e) after 24 hours.

As a result, the on-disk timestamps will be at most 24 hours stale.
But this is POSIX compliant, because if you read the timestamps using
stat(1), you will get the updated values, and what happens after a
crash in the absense of an fsync(2) is not defined.

The reason why we implemented this at $WORK is you are constantly
updating a database using fdatasync(2), and you care about 99.9
percentage I/O latency, the 4k writes to the inode table will
eventually triger a hard drive's Adjacent Track Interference (ATI)
mitigation, which involves rewriting set of disk tracks to avoid the
analog signal for adjacent tracks getting weakened by the hot-spot
writes, and this is measurable if you are looking at long-tail I/O
latencies.  (And yes, we had to talk to our HDD vendors to figure out
this is what was going on, since performance is out of scop[e of
SCSI/SATA specifications.  Hence, random long-tail ATI latencies to
preserve data integrity is allowed, and in fact, actually a good
thing.  :-)

					- Ted

^ permalink raw reply

* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Dave Chinner @ 2025-10-05 23:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
	linux-xfs
In-Reply-To: <aOCiCkFUOBWV_1yY@infradead.org>

On Fri, Oct 03, 2025 at 09:26:50PM -0700, Christoph Hellwig wrote:
> On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> > The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> > updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> > add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> > was suggested that this flag is propagated to a O_NOCMTIME one.
> 
> skipping c/mtime is dangerous.  The XFS handle code allows it to
> support HSM where data is migrated out to tape, and requires
> CAP_SYS_ADMIN.  Allowing it for any file owner would expand the scope
> for too much as now everyone could skip timestamp updates.

We have already provided a safe method for minimising the overhead
of c/mtime updates in the IO path - it's called lazytime.  The
lazytime mount option provides eventual consistency for c/mtime
updates for IO operations instead of immediate consistency.

Timestamps are still updated to have the correct values, but the
latency/performance of the timestamp updates is greatly improved by
holding them purely in memory until some other trigger forces them
to be persisted to disk.

> > It can be used by workloads that want to write a file but don't care
> > much about the preciese timestamp on it and can update it later with
> > utimens() call.
> 
> The workload might not care, the rest of the system does.  ctime can't
> bet set to arbitrary values, so it is important for backups and as
> an audit trail.

Lazytime works for this use case; a call to utimens() will cause a
persistent update of the timestamps. As will any other inode
modification that has persistence requirements (e.g.  block
allocation during IO or other syscalls that modify inode metadata).

> > There's another reason for having this patch. When performing AIO write,
> > the file_modified_flags() function checks whether or not to update inode
> > times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> > the check return EINTR error that quickly propagates into cb completion
> > without doing any IO. This restriction effectively prevents doing AIO
> > writes with nowait flag, as file modifications really imply time update.
> 
> Well, we'll need to look into that, including maybe non-blockin
> timestamp updates.

This came up recently on #xfs w.r.t. lazytime behaviour - we need to
pass the NOWAIT decision semnatics down to the filesystem to allow
lazytime to be truly non-blocking.  At the moment the high level VFS
NOWAIT checks (via inode_needs_update_time()) have no visibility of
this filesystem specific functionality, so even if we can do the
lazy timestamp update without blocking we still give an -EAGAIN if
IOCB_NOWAIT is set.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Dave Chinner @ 2025-10-05 22:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
	linux-xfs
In-Reply-To: <aOCiCkFUOBWV_1yY@infradead.org>

On Fri, Oct 03, 2025 at 09:26:50PM -0700, Christoph Hellwig wrote:
> On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> > The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> > updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> > add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> > was suggested that this flag is propagated to a O_NOCMTIME one.
> 
> skipping c/mtime is dangerous.  The XFS handle code allows it to
> support HSM where data is migrated out to tape, and requires
> CAP_SYS_ADMIN.  Allowing it for any file owner would expand the scope
> for too much as now everyone could skip timestamp updates.
> 
> > It can be used by workloads that want to write a file but don't care
> > much about the preciese timestamp on it and can update it later with
> > utimens() call.

If you don't care about accurate c/mtime, then mount the filesystem
with '-o lazytime' to degrade c/mtime updates to "eventual
consistency" behaviour for IO operations. If inode metadata is
otherwise modified (e.g. block allocation during IO) or the
application then calls utimens(), it will update the recorded
in-memory timestamps in a persistent manner immediately.

> The workload might not care, the rest of the system does.  ctime can't
> bet set to arbitrary values, so it is important for backups and as
> an audit trail.

But we can (and do) delay the persistence of IO-based timestamp
updates with the lazytime option.

> > There's another reason for having this patch. When performing AIO write,
> > the file_modified_flags() function checks whether or not to update inode
> > times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> > the check return EINTR error that quickly propagates into cb completion
> > without doing any IO. This restriction effectively prevents doing AIO
> > writes with nowait flag, as file modifications really imply time update.
> 
> Well, we'll need to look into that, including maybe non-blockin
> timestamp updates.

Lazytime updates can generally be done in a non-blocking manner
right now (someone raised that in the context of io-uring on #xfs
about a month ago), but the NOWAIT behaviour for timestamp updates
is done at a higher level in the VFS and does not take into account
filesystem specific non-blocking lazytime updates at all.  If we
push the NOWAIT checking behaviour down to the filesystem, we can do
this.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Andy Lutomirski @ 2025-10-04 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
	linux-xfs
In-Reply-To: <aOCiCkFUOBWV_1yY@infradead.org>

On Fri, Oct 3, 2025 at 9:26 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> > The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> > updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> > add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> > was suggested that this flag is propagated to a O_NOCMTIME one.
>
> skipping c/mtime is dangerous.  The XFS handle code allows it to
> support HSM where data is migrated out to tape, and requires
> CAP_SYS_ADMIN.  Allowing it for any file owner would expand the scope
> for too much as now everyone could skip timestamp updates.
>
> > It can be used by workloads that want to write a file but don't care
> > much about the preciese timestamp on it and can update it later with
> > utimens() call.
>
> The workload might not care, the rest of the system does.  ctime can't
> bet set to arbitrary values, so it is important for backups and as
> an audit trail.
>
> > There's another reason for having this patch. When performing AIO write,
> > the file_modified_flags() function checks whether or not to update inode
> > times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> > the check return EINTR error that quickly propagates into cb completion
> > without doing any IO. This restriction effectively prevents doing AIO
> > writes with nowait flag, as file modifications really imply time update.
>
> Well, we'll need to look into that, including maybe non-blockin
> timestamp updates.
>

It's been 12 years (!), but maybe it's time to reconsider this:

https://lore.kernel.org/all/cover.1377193658.git.luto@amacapital.net/

Nothing has fundamentally changed since then, but I bet enough little
things (folios!) have changed around this series that it won't apply
without considerably massaging.  I stopped working on it personally
because I moved the workload in question onto fast, fancy SSDs
resulting in my having bigger fish to fry.  I don't think I'll have
the bandwidth to pick it up any time soon, but maybe one of you folks
is interested :)  I never looked into the AIO path (I was interested
in the page_mkwrite path), but my series made it at least conceptually
possible to unconditionally mark the file as needing a cmtime update
when presently dirty data is written back, and I imagine that AIO
could use that too to avoid ever needing to bail out because an mtime
update would block.

To the extent that ctime is "important for backups", it's been *wrong*
for backups approximately forever -- one can read ctime, then read the
contents of a file, and get a new ctime and an old copy of the data
that preceeds the modification that logically triggered the ctime
value that was read.

--Andy
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Christoph Hellwig @ 2025-10-04  4:26 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: linux-fsdevel, Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <20251003093213.52624-1-xemul@scylladb.com>

On Fri, Oct 03, 2025 at 12:32:13PM +0300, Pavel Emelyanov wrote:
> The FMODE_NOCMTIME flag tells that ctime and mtime stamps are not
> updated on IO. The flag was introduced long ago by 4d4be482a4 ([XFS]
> add a FMODE flag to make XFS invisible I/O less hacky. Back then it
> was suggested that this flag is propagated to a O_NOCMTIME one.

skipping c/mtime is dangerous.  The XFS handle code allows it to
support HSM where data is migrated out to tape, and requires
CAP_SYS_ADMIN.  Allowing it for any file owner would expand the scope
for too much as now everyone could skip timestamp updates.

> It can be used by workloads that want to write a file but don't care
> much about the preciese timestamp on it and can update it later with
> utimens() call.

The workload might not care, the rest of the system does.  ctime can't
bet set to arbitrary values, so it is important for backups and as
an audit trail.

> There's another reason for having this patch. When performing AIO write,
> the file_modified_flags() function checks whether or not to update inode
> times. In case update is needed and iocb carries the RWF_NOWAIT flag,
> the check return EINTR error that quickly propagates into cb completion
> without doing any IO. This restriction effectively prevents doing AIO
> writes with nowait flag, as file modifications really imply time update.

Well, we'll need to look into that, including maybe non-blockin
timestamp updates.


^ permalink raw reply

* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Pasha Tatashin @ 2025-10-04  2:37 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <CA+CK2bBuO5YaL8MNqb5Xo_us600vTe2SF_yMNU-O9D2_RBoMag@mail.gmail.com>

On Fri, Oct 3, 2025 at 10:07 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Fri, Oct 3, 2025 at 6:51 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> > > diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> > > index af6e773cf98f..de7ca45d3892 100644
> > > --- a/tools/testing/selftests/liveupdate/.gitignore
> > > +++ b/tools/testing/selftests/liveupdate/.gitignore
> > > @@ -1 +1,2 @@
> > >  /liveupdate
> > > +/luo_multi_kexec
> >
> > In next patches new tests are not added to gitignore.
>
> Will fix it, thanks.
>
> >
> > > diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> > > index 2a573c36016e..1cbc816ed5c5 100644
> > > --- a/tools/testing/selftests/liveupdate/Makefile
> > > +++ b/tools/testing/selftests/liveupdate/Makefile
> > > @@ -1,7 +1,38 @@
> > >  # SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +KHDR_INCLUDES ?= -I../../../usr/include
> >
> > If make is run from the tools/testing/selftests/liveupdate directory, this
> > will not work because it needs one more "..".
> >
> > If this is built using selftest Makefile from root directory
> >
> >   make -C tools/testing/selftests TARGETS=liveupdate
> >
> > there will not be build errors because tools/testing/selftests/Makefile
> > defines KHDR_INCLUDES, so above definition will never happen.
> >
> > >  CFLAGS += -Wall -O2 -Wno-unused-function
> > >  CFLAGS += $(KHDR_INCLUDES)
> > > +LDFLAGS += -static
> >
> > Why static? Can't we let user pass extra flags if they prefer static
>
> Because these tests are executed in a VM and not on the host, static
> makes sense to be able to run in a different environment.
>
> > > +
> > > +# --- Test Configuration (Edit this section when adding new tests) ---
> > > +LUO_SHARED_SRCS := luo_test_utils.c
> > > +LUO_SHARED_HDRS += luo_test_utils.h
> > > +
> > > +LUO_MANUAL_TESTS += luo_multi_kexec
> > > +
> > > +TEST_FILES += do_kexec.sh
> > >
> > >  TEST_GEN_PROGS += liveupdate
> > >
> > > +# --- Automatic Rule Generation (Do not edit below) ---
> > > +
> > > +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> > > +
> > > +# Define the full list of sources for each manual test.
> > > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > > +     $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> > > +
> > > +# This loop automatically generates an explicit build rule for each manual test.
> > > +# It includes dependencies on the shared headers and makes the output
> > > +# executable.
> > > +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> > > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > > +     $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> > > +             $(call msg,LINK,,$$@) ; \
> > > +             $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> > > +             $(Q)chmod +x $$@ \
> > > +     ) \
> > > +)
> > > +
> > >  include ../lib.mk
> >
> > make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
> > How to build them?
>
> I am building them out of tree:
> make O=x86_64 -s -C tools/testing/selftests TARGETS=liveupdate install
> make O=x86_64 -s -C tools/testing/selftests TARGETS=kho install

Actually, I just tested in-tree and everything works for me, could you
please verify:

make mrproper  # Clean the tree
cat tools/testing/selftests/liveupdate/config > .config # Copy LUO depends.
make olddefconfig  # make a def config with LUO
make kvm_guest.config # Build minimal KVM guest with LUO
make headers # Make uAPI headers
make -C tools/testing/selftests TARGETS=liveupdate install # make and
install liveupdate selftests

# Show that self tests are properly installed:
ls -1 tools/testing/selftests/kselftest_install/liveupdate/
config
do_kexec.sh
liveupdate
luo_multi_file
luo_multi_kexec
luo_multi_session
luo_unreclaimed

Pasha

^ permalink raw reply

* Re: [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests
From: Pasha Tatashin @ 2025-10-04  2:08 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20251003231712.GA2144931.vipinsh@google.com>

On Fri, Oct 3, 2025 at 7:17 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On 2025-09-29 01:03:09, Pasha Tatashin wrote:
> > diff --git a/tools/testing/selftests/liveupdate/config b/tools/testing/selftests/liveupdate/config
> > new file mode 100644
> > index 000000000000..382c85b89570
> > --- /dev/null
> > +++ b/tools/testing/selftests/liveupdate/config
> > @@ -0,0 +1,6 @@
> > +CONFIG_KEXEC_FILE=y
> > +CONFIG_KEXEC_HANDOVER=y
> > +CONFIG_KEXEC_HANDOVER_DEBUG=y
> > +CONFIG_LIVEUPDATE=y
> > +CONFIG_LIVEUPDATE_SYSFS_API=y
>
> Where is this one?

I removed the v4 SYSFS interface, and this line is a leftover, I will fix it.

Thanks,
Pasha

>

^ permalink raw reply

* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Pasha Tatashin @ 2025-10-04  2:07 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20251003225120.GA2035091.vipinsh@google.com>

On Fri, Oct 3, 2025 at 6:51 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> > diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> > index af6e773cf98f..de7ca45d3892 100644
> > --- a/tools/testing/selftests/liveupdate/.gitignore
> > +++ b/tools/testing/selftests/liveupdate/.gitignore
> > @@ -1 +1,2 @@
> >  /liveupdate
> > +/luo_multi_kexec
>
> In next patches new tests are not added to gitignore.

Will fix it, thanks.

>
> > diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> > index 2a573c36016e..1cbc816ed5c5 100644
> > --- a/tools/testing/selftests/liveupdate/Makefile
> > +++ b/tools/testing/selftests/liveupdate/Makefile
> > @@ -1,7 +1,38 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> > +
> > +KHDR_INCLUDES ?= -I../../../usr/include
>
> If make is run from the tools/testing/selftests/liveupdate directory, this
> will not work because it needs one more "..".
>
> If this is built using selftest Makefile from root directory
>
>   make -C tools/testing/selftests TARGETS=liveupdate
>
> there will not be build errors because tools/testing/selftests/Makefile
> defines KHDR_INCLUDES, so above definition will never happen.
>
> >  CFLAGS += -Wall -O2 -Wno-unused-function
> >  CFLAGS += $(KHDR_INCLUDES)
> > +LDFLAGS += -static
>
> Why static? Can't we let user pass extra flags if they prefer static

Because these tests are executed in a VM and not on the host, static
makes sense to be able to run in a different environment.

> > +
> > +# --- Test Configuration (Edit this section when adding new tests) ---
> > +LUO_SHARED_SRCS := luo_test_utils.c
> > +LUO_SHARED_HDRS += luo_test_utils.h
> > +
> > +LUO_MANUAL_TESTS += luo_multi_kexec
> > +
> > +TEST_FILES += do_kexec.sh
> >
> >  TEST_GEN_PROGS += liveupdate
> >
> > +# --- Automatic Rule Generation (Do not edit below) ---
> > +
> > +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> > +
> > +# Define the full list of sources for each manual test.
> > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > +     $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> > +
> > +# This loop automatically generates an explicit build rule for each manual test.
> > +# It includes dependencies on the shared headers and makes the output
> > +# executable.
> > +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> > +$(foreach test,$(LUO_MANUAL_TESTS), \
> > +     $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> > +             $(call msg,LINK,,$$@) ; \
> > +             $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> > +             $(Q)chmod +x $$@ \
> > +     ) \
> > +)
> > +
> >  include ../lib.mk
>
> make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
> How to build them?

I am building them out of tree:
make O=x86_64 -s -C tools/testing/selftests TARGETS=liveupdate install
make O=x86_64 -s -C tools/testing/selftests TARGETS=kho install

And for me it worked, but I forgot to test with the normal make
options,  thank you for reporting, and providing your fixes, I will
address them.

Pasha

^ permalink raw reply

* Re: [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests
From: Vipin Sharma @ 2025-10-03 23:17 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-19-pasha.tatashin@soleen.com>

On 2025-09-29 01:03:09, Pasha Tatashin wrote:
> diff --git a/tools/testing/selftests/liveupdate/config b/tools/testing/selftests/liveupdate/config
> new file mode 100644
> index 000000000000..382c85b89570
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/config
> @@ -0,0 +1,6 @@
> +CONFIG_KEXEC_FILE=y
> +CONFIG_KEXEC_HANDOVER=y
> +CONFIG_KEXEC_HANDOVER_DEBUG=y
> +CONFIG_LIVEUPDATE=y
> +CONFIG_LIVEUPDATE_SYSFS_API=y

Where is this one?


^ permalink raw reply

* Re: [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test
From: Vipin Sharma @ 2025-10-03 22:51 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <20250929010321.3462457-27-pasha.tatashin@soleen.com>

On 2025-09-29 01:03:17, Pasha Tatashin wrote:
> diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> index af6e773cf98f..de7ca45d3892 100644
> --- a/tools/testing/selftests/liveupdate/.gitignore
> +++ b/tools/testing/selftests/liveupdate/.gitignore
> @@ -1 +1,2 @@
>  /liveupdate
> +/luo_multi_kexec

In next patches new tests are not added to gitignore.

> diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> index 2a573c36016e..1cbc816ed5c5 100644
> --- a/tools/testing/selftests/liveupdate/Makefile
> +++ b/tools/testing/selftests/liveupdate/Makefile
> @@ -1,7 +1,38 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> +
> +KHDR_INCLUDES ?= -I../../../usr/include

If make is run from the tools/testing/selftests/liveupdate directory, this
will not work because it needs one more "..".

If this is built using selftest Makefile from root directory

  make -C tools/testing/selftests TARGETS=liveupdate

there will not be build errors because tools/testing/selftests/Makefile
defines KHDR_INCLUDES, so above definition will never happen.

>  CFLAGS += -Wall -O2 -Wno-unused-function
>  CFLAGS += $(KHDR_INCLUDES)
> +LDFLAGS += -static

Why static? Can't we let user pass extra flags if they prefer static

> +
> +# --- Test Configuration (Edit this section when adding new tests) ---
> +LUO_SHARED_SRCS := luo_test_utils.c
> +LUO_SHARED_HDRS += luo_test_utils.h
> +
> +LUO_MANUAL_TESTS += luo_multi_kexec
> +
> +TEST_FILES += do_kexec.sh
>  
>  TEST_GEN_PROGS += liveupdate
>  
> +# --- Automatic Rule Generation (Do not edit below) ---
> +
> +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> +
> +# Define the full list of sources for each manual test.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> +	$(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> +
> +# This loop automatically generates an explicit build rule for each manual test.
> +# It includes dependencies on the shared headers and makes the output
> +# executable.
> +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> +	$(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> +		$(call msg,LINK,,$$@) ; \
> +		$(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> +		$(Q)chmod +x $$@ \
> +	) \
> +)
> +
>  include ../lib.mk

make is not building LUO_MANUAL_TESTS, it is only building liveupdate.
How to build them?

I ended up making bunch of changes in the Makefile to fix these issues.
Following is the diff (it is based on last patch of the series). It
allows in-tree build, out-of-tree build, and build other tests as well.

diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 25a6dec790bb..fbcacbd1b798 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -1,10 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-KHDR_INCLUDES ?= -I../../../usr/include
-CFLAGS += -Wall -O2 -Wno-unused-function
-CFLAGS += $(KHDR_INCLUDES)
-LDFLAGS += -static
-
 # --- Test Configuration (Edit this section when adding new tests) ---
 LUO_SHARED_SRCS := luo_test_utils.c
 LUO_SHARED_HDRS += luo_test_utils.h
@@ -25,6 +20,12 @@ TEST_GEN_PROGS := $(LUO_MAIN_TESTS)
 
 liveupdate_SOURCES := liveupdate.c $(LUO_SHARED_SRCS)
 
+include ../lib.mk
+
+CFLAGS += -Wall -O2 -Wno-unused-function
+CFLAGS += $(KHDR_INCLUDES)
+LDFLAGS += -static
+
 $(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS)
        $(call msg,LINK,,$@)
        $(Q)$(LINK.c) $^ $(LDLIBS) -o $@
@@ -33,16 +34,16 @@ $(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS)
 $(foreach test,$(LUO_MANUAL_TESTS), \
        $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
 
+define BUILD_RULE_TEMPLATE
+$(OUTPUT)/$(1): $($(1)_SOURCES) $(LUO_SHARED_HDRS)
+       $(call msg,LINK,,$$@)
+       $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@
+       $(Q)chmod +x $$@
+endef
 # This loop automatically generates an explicit build rule for each manual test.
 # It includes dependencies on the shared headers and makes the output
 # executable.
 # Note the use of '$$' to escape automatic variables for the 'eval' command.
 $(foreach test,$(LUO_MANUAL_TESTS), \
-       $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
-               $(call msg,LINK,,$$@) ; \
-               $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
-               $(Q)chmod +x $$@ \
-       ) \
+       $(eval $(call BUILD_RULE_TEMPLATE,$(test))) \
 )
-
-include ../lib.mk

^ permalink raw reply related

* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Aleksa Sarai @ 2025-10-03  4:22 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
	linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
	viro
In-Reply-To: <5ukckeqipdkz6aigdy7rmtsmy5zav5x4rw2hrgbxiwfflrcmgb@jy7yr34cwyat>

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

On 2025-10-01, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
> 
> On Wed, Oct 01, 2025 at 05:35:45PM +1000, Aleksa Sarai wrote:
> > On 2025-10-01, Askar Safin <safinaskar@gmail.com> wrote:
> > > Aleksa Sarai <cyphar@cyphar.com>:
> > > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > > +                   &attr, sizeof(attr));
> > > 
> > > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > > calls. :)
> > > 
> > > I think you meant open_tree_attr here.
> > 
> > Oops.
> > 
> > > 
> > > > +\&
> > > > +/* Create a new copy with the id-mapping cleared */
> > > > +memset(&attr, 0, sizeof(attr));
> > > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > > +                   &attr, sizeof(attr));
> > > 
> > > And here.
> > 
> > Oops x2.
> > 
> > > Otherwise your whole patchset looks good. Add to whole patchset:
> > > Reviewed-by: Askar Safin <safinaskar@gmail.com>
> 
> I've applied the patch, with the following amendment:
> 
> 	diff --git i/man/man2/open_tree.2 w/man/man2/open_tree.2
> 	index 8b48f3b78..f6f2fbecd 100644
> 	--- i/man/man2/open_tree.2
> 	+++ w/man/man2/open_tree.2
> 	@@ -683,14 +683,14 @@ .SS open_tree_attr()
> 	 .\" Using .attr_clr is not strictly necessary but makes the intent clearer.
> 	 attr.attr_set = MOUNT_ATTR_IDMAP;
> 	 attr.userns_fd = nsfd2;
> 	-mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> 	-                   &attr, sizeof(attr));
> 	+mntfd2 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
> 	+                        &attr, sizeof(attr));
> 	 \&
> 	 /* Create a new copy with the id-mapping cleared */
> 	 memset(&attr, 0, sizeof(attr));
> 	 attr.attr_clr = MOUNT_ATTR_IDMAP;
> 	-mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> 	-                   &attr, sizeof(attr));
> 	+mntfd3 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
> 	+                        &attr, sizeof(attr));
> 	 .EE
> 	 .in
> 	 .P
> 
> 
> (Hopefully I got it right.)

That looks correct -- thanks!

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: [PATCH v5 0/8] man2: document "new" mount API
From: Alejandro Colomar @ 2025-10-01 18:20 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
	G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
	linux-kernel, David Howells, Christian Brauner
In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com>

[-- Attachment #1: Type: text/plain, Size: 10437 bytes --]

Hi Aleksa, Askar,

On Thu, Sep 25, 2025 at 01:31:22AM +1000, Aleksa Sarai wrote:
> Back in 2019, the new mount API was merged[1]. David Howells then set
> about writing man pages for these new APIs, and sent some patches back
> in 2020[2].
> 

[...]

> 
> In addition, I have also included a man page for open_tree_attr(2) (as a
> subsection of the new open_tree(2) man page), which was merged in Linux
> 6.15.
> 
> [1]: https://lore.kernel.org/all/20190507204921.GL23075@ZenIV.linux.org.uk/
> [2]: https://lore.kernel.org/linux-man/159680892602.29015.6551860260436544999.stgit@warthog.procyon.org.uk/
> [3]: https://github.com/brauner/man-pages-md
> 
> Co-authored-by: David Howells <dhowells@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Co-authored-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

The full patch set has been merged now.  I've done a merge commit where
I've pasted this cover letter, and amended it so that Aleksa is the
author of the merge commit.  I've also included Askar's Reviewed-by tag
in the merge commit itself.

I'll have it in a separate branch for a few days, in case I need to fix
anything.  You can check it here:

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?h=fs>

I editorialized the titles, but other than that, I didn't do much.
I think I mentioned most of the changes in replies to each patch.

Thanks a lot for your contributions!


Have a lovely night!
Alex

> ---
> Changes in v5:
> - `sed -i s|file descriptor based|file-descriptor-based|`.
>   [Alejandro Colomar]
> - fsconfig(2): use bullets instead of ordered list for workflow
>   description. [Alejandro Colomar]
> - mount_setattr(2): fix minor wording nit in new attribute-parameter
>   subsection.
> - fsopen(2): remove brackets around "message" for message retrieval
>   interface description. [Alejandro Colomar]
> - {move_mount,fspick}(2): fix remaining incorrect no-automount text.
>   [Askar Safin]
> - {fsmount,open_tree}(2): `sed -i s|MOUNT_DETACH|MNT_DETACH|g`.
>   [Askar Safin]
> - mount_setattr(2): fix copy-paste snafu in attribute-parameter
>   subsection. [Askar Safin]
> - *: clean `make -R build-catman-troff`. [Alejandro Colomar]
> - *: switch to \[em]\c where appropriate.
> - open_tree(2): clean up MNT_DETACH-on-close description and make it
>   slightly more prominent. [Alejandro Colomar]
> - open_tree(2): mention the distinction from open(O_PATH) with regards
>   to automounts. Askar suggested it be put in the section about
>   ~OPEN_TREE_CLONE, but the change in behaviour also applies to
>   OPEN_TREE_CLONE and it looked awkward to include it in the
>   dentry_open() case because O_PATH only gets mentioned in the following
>   paragraph (where I've put the text now). [Askar Safin]
> - {move_mount,open_tree{,_attr}}(2): fix column-width-related "make -R
>   check" failures.
> - *: fix remaining "make -R lint" failures.
> - open_tree_attr(2): add example using MOUNT_ATTR_IDMAP.
> - v4: <https://lore.kernel.org/r/20250919-new-mount-api-v4-0-1261201ab562@cyphar.com>
> 
> Changes in v4:
> - `sed -i s|\\% |\\%|g`.
> - Remove unneeded quotes in SYNOPSIS. [Alejandro Colomar]
> - open_tree(2): fix leftover confusing usages of "attach" when referring
>   to file descriptors being associated with mount objects.
> - open_tree(2): rename "Anonymous mount namespaces" NOTES subsection to
>   the far more informative "Mount propagation" and clean up the wording
>   a little.
> - open_tree_attr(2): add a code comment about
>   <https://lore.kernel.org/all/20250808-open_tree_attr-bugfix-idmap-v1-0-0ec7bc05646c@cyphar.com/>
> - {fsconfig,open_tree_attr}(2): use _Nullable.
> - {fsmount,open_tree}(2): mention the the unmount-on-close behaviour is
>   actually lazy (a-la MNT_DETACH).
> - {fsconfig,mount_setattr}(2): improve "mount attributes and filesystem
>   parameters" wording to make it clearer that superblock and mount flags
>   are sibling properties, not the same thing.
> - open_tree(2): mention that any mount propagation events while the
>   mount object is detached are completely lost -- i.e., they don't get
>   replayed once you attach the mount somewhere.
> - fsconfig(2): fix minor grammatical / missing joining word issues.
> - fsconfig(2): fix final leftover `.IR A " and " B` cases.
> - fsconfig(2): explain that failed fsconfig(FSCONFIG_CMD_*) operations
>   render the filesystem context invalid.
> - fsconfig(2): rework the description of superblock reuse, as the
>   previous text was very wrong. (Though there has been discussion about
>   changing this behaviour...)
> - fsconfig(2): remove misleading wording in FSCONFIG_CMD_CREATE_EXCL
>   about how we are requesting a new filesystem instance -- in theory
>   filesystems could take this request into account but in practice none
>   do (and it seems unlikely any ever will).
> - fsconfig(2): mention that key, value, and aux must be 0 or NULL for
>   FSCONFIG_CMD_RECONF.
> - fsmount(2): fix usage of "filesystem instance" in relation to
>   fsmount() and open_tree() comparison. [Askar Safin]
> - move_mount(2): "as attached" -> "as a detached" [Askar Safin]
> - fspick(2): add note about filesystem parameter list being copied
>   rather than reset with FSCONFIG_CMD_RECONFIGURE. [Askar Safin]
> - v3: <https://lore.kernel.org/r/20250809-new-mount-api-v3-0-f61405c80f34@cyphar.com>
> 
> Changes in v3:
> - `sed -i s|Co-developed-by|Co-authored-by|g`. [Alejandro Colomar]
>   - Add Signed-off-by for co-authors. [Christian Brauner]
> - `sed -i s|needs-mount|awaiting-mount|g`, to match the kernel parlance.
> - Fix VERSIONS/HISTORY mixup in mount_attr(2type) that was copied from
>   open_how(2type). [Alejandro Colomar]
> - Fix incorrect .BR usage in SYNOPSIS.
> - Some more semantic newlines fixes. [Alejandro Colomar]
> - Minor fixes suggested by Alejandro. [Alejandro Colomar]
> - open_tree_attr(2): heavily reword everything to be better formatted
>   and more explicit about its behaviour.
> - open_tree(2): write proper explanatory paragraphs for the EXAMPLES.
> - mount_setattr(2): fix stray doublequote in SYNOPSIS. [Askar Safin]
> - fsopen(2): rework structure of the DESCRIPTION introduction.
> - fsopen(2): explicitly say that read(2) errors in the message retrieval
>   interface are actual errors, not return 0. [Askar Safin]
> - fsopen(2): add BUGS section to describe the unfortunate -ENODATA
>   message dropping behaviour that should be fixed by
>   <https://lore.kernel.org/r/20250807-fscontext-log-cleanups-v3-0-8d91d6242dc3@cyphar.com/>.
> - fsconfig(2): add a NOTES subsection about generic filesystem
>   parameters.
> - fsconfig(2): add comment about the weirdness surrounding
>   FSCONFIG_SET_PATH.
> - {fspick,open_tree}(2): Correct AT_NO_AUTOMOUNT description (copied
>   from David, who probably copied it from statx(2)) -- AT_NO_AUTOMOUNT
>   applies to all path components, not just the final one. [Christian
>   Brauner]
> - statx(2): fix AT_NO_AUTOMOUNT documentation.
> - open_tree(2): swap open(2) reference for openat(2) when saying that
>   the result is identical. [Askar Safin]
> - fsmount(2): fix DESCRIPTION introduction, and rework attr_flags
>   description to better reference mount_setattr(2).
> - {fsopen,fspick,fsmount,open_tree}(2): don't use "attach" when talking
>   about the file descriptors we return that reference in-kernel objects,
>   to avoid confusing readers with mount object attachment status.
> - fsconfig(2): remove pidns argument example, as it was kind of unclear
>   and referenced kernel features not yet merged.
> - fsconfig(2): remove rambling FSCONFIG_SET_PATH_EMPTY text (which
>   mostly describes an academic issue that doesn't apply to any existing
>   filesystem), and instead add a CAVEATS section which touches on the
>   weird type behaviour of fsconfig(2).
> - v2: <https://lore.kernel.org/r/20250807-new-mount-api-v2-0-558a27b8068c@cyphar.com>
> 
> Changes in v2:
> - `make -R lint-man`. [Alejandro Colomar]
> - `sed -i s|Glibc|glibc|g`. [Alejandro Colomar]
> - `sed -i s|pathname|path|g` [Alejandro Colomar]
> - Clean up macro usage, example code, and synopsis. [Alejandro Colomar]
> - Try to use semantic newlines. [Alejandro Colomar]
> - Make sure the usage of "filesystem context", "filesystem instance",
>   and "mount object" are consistent. [Askar Safin]
> - Avoid referring to these syscalls without an "at" suffix as "*at()
>   syscalls". [Askar Safin]
> - Use \% to avoid hyphenation of constants. [Askar Safin, G. Branden Robinson]
> - Add a new subsection to mount_setattr(2) to describe the distinction
>   between mount attributes and filesystem parameters.
> - (Under protest) double-space-after-period formatted commit messages.
> - v1: <https://lore.kernel.org/r/20250806-new-mount-api-v1-0-8678f56c6ee0@cyphar.com>
> 
> ---
> Aleksa Sarai (8):
>       man/man2/fsopen.2: document "new" mount API
>       man/man2/fspick.2: document "new" mount API
>       man/man2/fsconfig.2: document "new" mount API
>       man/man2/fsmount.2: document "new" mount API
>       man/man2/move_mount.2: document "new" mount API
>       man/man2/open_tree.2: document "new" mount API
>       man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
>       man/man2/{fsconfig,mount_setattr}.2: add note about attribute-parameter distinction
> 
>  man/man2/fsconfig.2       | 741 ++++++++++++++++++++++++++++++++++++++++++++++
>  man/man2/fsmount.2        | 231 +++++++++++++++
>  man/man2/fsopen.2         | 385 ++++++++++++++++++++++++
>  man/man2/fspick.2         | 343 +++++++++++++++++++++
>  man/man2/mount_setattr.2  |  39 +++
>  man/man2/move_mount.2     | 646 ++++++++++++++++++++++++++++++++++++++++
>  man/man2/open_tree.2      | 709 ++++++++++++++++++++++++++++++++++++++++++++
>  man/man2/open_tree_attr.2 |   1 +
>  8 files changed, 3095 insertions(+)
> ---
> base-commit: f17990c243eafc1891ff692f90b6ce42e6449be8
> change-id: 20250802-new-mount-api-436db984f432
> 
> 
> Kind regards,
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Alejandro Colomar @ 2025-10-01 18:02 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
	linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
	viro
In-Reply-To: <2025-10-01-brawny-bronze-taste-mounds-zp8G2b@cyphar.com>

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

Hi Aleksa,

On Wed, Oct 01, 2025 at 05:35:45PM +1000, Aleksa Sarai wrote:
> On 2025-10-01, Askar Safin <safinaskar@gmail.com> wrote:
> > Aleksa Sarai <cyphar@cyphar.com>:
> > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > +                   &attr, sizeof(attr));
> > 
> > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > calls. :)
> > 
> > I think you meant open_tree_attr here.
> 
> Oops.
> 
> > 
> > > +\&
> > > +/* Create a new copy with the id-mapping cleared */
> > > +memset(&attr, 0, sizeof(attr));
> > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > +                   &attr, sizeof(attr));
> > 
> > And here.
> 
> Oops x2.
> 
> > Otherwise your whole patchset looks good. Add to whole patchset:
> > Reviewed-by: Askar Safin <safinaskar@gmail.com>

I've applied the patch, with the following amendment:

	diff --git i/man/man2/open_tree.2 w/man/man2/open_tree.2
	index 8b48f3b78..f6f2fbecd 100644
	--- i/man/man2/open_tree.2
	+++ w/man/man2/open_tree.2
	@@ -683,14 +683,14 @@ .SS open_tree_attr()
	 .\" Using .attr_clr is not strictly necessary but makes the intent clearer.
	 attr.attr_set = MOUNT_ATTR_IDMAP;
	 attr.userns_fd = nsfd2;
	-mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
	-                   &attr, sizeof(attr));
	+mntfd2 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
	+                        &attr, sizeof(attr));
	 \&
	 /* Create a new copy with the id-mapping cleared */
	 memset(&attr, 0, sizeof(attr));
	 attr.attr_clr = MOUNT_ATTR_IDMAP;
	-mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
	-                   &attr, sizeof(attr));
	+mntfd3 = open_tree_attr(mntfd1, "", OPEN_TREE_CLONE,
	+                        &attr, sizeof(attr));
	 .EE
	 .in
	 .P


(Hopefully I got it right.)


Cheers,
Alex

> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/



-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v5 6/8] man/man2/open_tree.2: document "new" mount API
From: Alejandro Colomar @ 2025-10-01 17:59 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
	G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
	linux-kernel, David Howells, Christian Brauner
In-Reply-To: <20250925-new-mount-api-v5-6-028fb88023f2@cyphar.com>

[-- Attachment #1: Type: text/plain, Size: 14015 bytes --]

Hi Aleksa,

On Thu, Sep 25, 2025 at 01:31:28AM +1000, Aleksa Sarai wrote:
> This is loosely based on the original documentation written by David
> Howells and later maintained by Christian Brauner, but has been
> rewritten to be more from a user perspective (as well as fixing a few
> critical mistakes).
> 
> Co-authored-by: David Howells <dhowells@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Co-authored-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---

Patch applied.  Thanks!


Have a lovely night!
Alex

>  man/man2/open_tree.2 | 518 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 518 insertions(+)
> 
> diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2
> new file mode 100644
> index 0000000000000000000000000000000000000000..6b04a80927a8b6a394cf7ab341b8d6b29d42d304
> --- /dev/null
> +++ b/man/man2/open_tree.2
> @@ -0,0 +1,518 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH open_tree 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +open_tree \- open path or create detached mount object and attach to fd
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#define _GNU_SOURCE         " "/* See feature_test_macros(7) */"
> +.BR "#include <fcntl.h>" "          /* Definition of " AT_* " constants */"
> +.B #include <sys/mount.h>
> +.P
> +.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR open_tree ()
> +system call is part of
> +the suite of file-descriptor-based mount facilities in Linux.
> +.IP \[bu] 3
> +If
> +.I flags
> +contains
> +.BR \%OPEN_TREE_CLONE ,
> +.BR open_tree ()
> +creates a detached mount object
> +which consists of a bind-mount of
> +the path specified by the
> +.IR path .
> +A new file descriptor
> +associated with the detached mount object
> +is then returned.
> +The mount object is equivalent to a bind-mount
> +that would be created by
> +.BR mount (2)
> +called with
> +.BR \%MS_BIND ,
> +except that it is tied to a file descriptor
> +and is not mounted onto the filesystem.
> +.IP
> +As with file descriptors returned from
> +.BR fsmount (2),
> +the resultant file descriptor can then be used with
> +.BR move_mount (2),
> +.BR mount_setattr (2),
> +or other such system calls to do further mount operations.
> +.IP
> +This mount object will be unmounted and destroyed
> +when the file descriptor is closed
> +if it was not otherwise attached to a mount point
> +by calling
> +.BR move_mount (2).
> +This implicit unmount operation is lazy\[em]\c
> +akin to calling
> +.BR umount2 (2)
> +with
> +.BR \%MNT_DETACH ;
> +thus,
> +any existing open references to files
> +from the mount object
> +will continue to work,
> +and the mount object will only be completely destroyed
> +once it ceases to be busy.
> +.IP \[bu]
> +If
> +.I flags
> +does not contain
> +.BR \%OPEN_TREE_CLONE ,
> +.BR open_tree ()
> +returns a file descriptor
> +that is exactly equivalent to
> +one produced by
> +.BR openat (2)
> +when called with the same
> +.I dirfd
> +and
> +.IR path .
> +.P
> +In either case, the resultant file descriptor
> +acts the same as one produced by
> +.BR open (2)
> +with
> +.BR O_PATH ,
> +meaning it can also be used as a
> +.I dirfd
> +argument to
> +"*at()" system calls.
> +However,
> +unlike
> +.BR open (2)
> +called with
> +.BR O_PATH ,
> +automounts will
> +by default
> +be triggered by
> +.BR open_tree ()
> +unless
> +.B \%AT_NO_AUTOMOUNT
> +is included in
> +.IR flags .
> +.P
> +As with "*at()" system calls,
> +.BR open_tree ()
> +uses the
> +.I dirfd
> +argument in conjunction with the
> +.I path
> +argument to determine the path to operate on, as follows:
> +.IP \[bu] 3
> +If the pathname given in
> +.I path
> +is absolute, then
> +.I dirfd
> +is ignored.
> +.IP \[bu]
> +If the pathname given in
> +.I path
> +is relative and
> +.I dirfd
> +is the special value
> +.BR \%AT_FDCWD ,
> +then
> +.I path
> +is interpreted relative to
> +the current working directory
> +of the calling process (like
> +.BR open (2)).
> +.IP \[bu]
> +If the pathname given in
> +.I path
> +is relative,
> +then it is interpreted relative to
> +the directory referred to by the file descriptor
> +.I dirfd
> +(rather than relative to
> +the current working directory
> +of the calling process,
> +as is done by
> +.BR open (2)
> +for a relative pathname).
> +In this case,
> +.I dirfd
> +must be a directory
> +that was opened for reading
> +.RB ( \%O_RDONLY )
> +or using the
> +.B O_PATH
> +flag.
> +.IP \[bu]
> +If
> +.I path
> +is an empty string,
> +and
> +.I flags
> +contains
> +.BR \%AT_EMPTY_PATH ,
> +then the file descriptor
> +.I dirfd
> +is operated on directly.
> +In this case,
> +.I dirfd
> +may refer to any type of file,
> +not just a directory.
> +.P
> +See
> +.BR openat (2)
> +for an explanation of why the
> +.I dirfd
> +argument is useful.
> +.P
> +.I flags
> +can be used to control aspects of the path lookup
> +and properties of the returned file descriptor.
> +A value for
> +.I flags
> +is constructed by bitwise ORing
> +zero or more of the following constants:
> +.RS
> +.TP
> +.B \%AT_EMPTY_PATH
> +If
> +.I path
> +is an empty string, operate on the file referred to by
> +.I dirfd
> +(which may have been obtained from
> +.BR open (2),
> +.BR fsmount (2),
> +or from another
> +.BR open_tree ()
> +call).
> +In this case,
> +.I dirfd
> +may refer to any type of file, not just a directory.
> +If
> +.I dirfd
> +is
> +.BR \%AT_FDCWD ,
> +.BR open_tree ()
> +will operate on the current working directory
> +of the calling process.
> +This flag is Linux-specific;
> +define
> +.B \%_GNU_SOURCE
> +to obtain its definition.
> +.TP
> +.B \%AT_NO_AUTOMOUNT
> +Do not automount the terminal ("basename") component of
> +.I path
> +if it is a directory that is an automount point.
> +This allows you to create a handle to the automount point itself,
> +rather than the location it would mount.
> +This flag has no effect if the mount point has already been mounted over.
> +This flag is Linux-specific;
> +define
> +.B \%_GNU_SOURCE
> +to obtain its definition.
> +.TP
> +.B \%AT_SYMLINK_NOFOLLOW
> +If
> +.I path
> +is a symbolic link, do not dereference it;
> +instead,
> +create either a handle to the link itself
> +or a bind-mount of it.
> +The resultant file descriptor is indistinguishable from one produced by
> +.BR openat (2)
> +with
> +.BR \%O_PATH | O_NOFOLLLOW .
> +.TP
> +.B \%OPEN_TREE_CLOEXEC
> +Set the close-on-exec
> +.RB ( FD_CLOEXEC )
> +flag on the new file descriptor.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +for reasons why this may be useful.
> +.TP
> +.B \%OPEN_TREE_CLONE
> +Rather than creating an
> +.BR openat (2)-style
> +.B O_PATH
> +file descriptor,
> +create a bind-mount of
> +.I path
> +(akin to
> +.IR \%mount\~\-\-bind )
> +as a detached mount object.
> +In order to do this operation,
> +the calling process must have the
> +.B \%CAP_SYS_ADMIN
> +capability.
> +.TP
> +.B \%AT_RECURSIVE
> +Create a recursive bind-mount of the path
> +(akin to
> +.IR \%mount\~\-\-rbind )
> +as a detached mount object.
> +This flag is only permitted in conjunction with
> +.BR \%OPEN_TREE_CLONE .
> +.SH RETURN VALUE
> +On success, a new file descriptor is returned.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EACCES
> +Search permission is denied for one of the directories
> +in the path prefix of
> +.IR path .
> +(See also
> +.BR path_resolution (7).)
> +.TP
> +.B EBADF
> +.I path
> +is relative but
> +.I dirfd
> +is neither
> +.B \%AT_FDCWD
> +nor a valid file descriptor.
> +.TP
> +.B EFAULT
> +.I path
> +is NULL
> +or a pointer to a location
> +outside the calling process's accessible address space.
> +.TP
> +.B EINVAL
> +Invalid flag specified in
> +.IR flags .
> +.TP
> +.B ELOOP
> +Too many symbolic links encountered when resolving
> +.IR path .
> +.TP
> +.B EMFILE
> +The calling process has too many open files to create more.
> +.TP
> +.B ENAMETOOLONG
> +.I path
> +is longer than
> +.BR PATH_MAX .
> +.TP
> +.B ENFILE
> +The system has too many open files to create more.
> +.TP
> +.B ENOENT
> +A component of
> +.I path
> +does not exist, or is a dangling symbolic link.
> +.TP
> +.B ENOENT
> +.I path
> +is an empty string, but
> +.B AT_EMPTY_PATH
> +is not specified in
> +.IR flags .
> +.TP
> +.B ENOTDIR
> +A component of the path prefix of
> +.I path
> +is not a directory, or
> +.I path
> +is relative and
> +.I dirfd
> +is a file descriptor referring to a file other than a directory.
> +.TP
> +.B ENOSPC
> +The "anonymous" mount namespace
> +necessary to contain the
> +.B \%OPEN_TREE_CLONE
> +detached bind-mount mount object
> +could not be allocated,
> +as doing so would exceed
> +the configured per-user limit on
> +the number of mount namespaces in the current user namespace.
> +(See also
> +.BR namespaces (7).)
> +.TP
> +.B ENOMEM
> +The kernel could not allocate sufficient memory to complete the operation.
> +.TP
> +.B EPERM
> +.I flags
> +contains
> +.B \%OPEN_TREE_CLONE
> +but the calling process does not have the required
> +.B CAP_SYS_ADMIN
> +capability.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 5.2.
> +.\" commit a07b20004793d8926f78d63eb5980559f7813404
> +.\" commit 400913252d09f9cfb8cce33daee43167921fc343
> +glibc 2.36.
> +.SH NOTES
> +.SS Mount propagation
> +The bind-mount mount objects created by
> +.BR open_tree ()
> +with
> +.B \%OPEN_TREE_CLONE
> +are not associated with
> +the mount namespace of the calling process.
> +Instead, each mount object is placed
> +in a newly allocated "anonymous" mount namespace
> +associated with the calling process.
> +.P
> +One of the side-effects of this is that
> +(unlike bind-mounts created with
> +.BR mount (2)),
> +mount propagation
> +(as described in
> +.BR mount_namespaces (7))
> +will not be applied to bind-mounts created by
> +.BR open_tree ()
> +until the bind-mount is attached with
> +.BR move_mount (2),
> +at which point the mount object
> +will be associated with the mount namespace
> +where it was attached
> +and mount propagation will resume.
> +Note that any mount propagation events that occurred
> +before the mount object was attached
> +will
> +.I not
> +be propagated to the mount object,
> +even after it is attached.
> +.SH EXAMPLES
> +The following examples show how
> +.BR open_tree ()
> +can be used in place of more traditional
> +.BR mount (2)
> +calls with
> +.BR MS_BIND .
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE);
> +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +First,
> +a detached bind-mount mount object of
> +.I /var
> +is created
> +and associated with the file descriptor
> +.IR srcfd .
> +Then, the mount object is attached to
> +.I /mnt
> +using
> +.BR move_mount (2)
> +with
> +.B \%MOVE_MOUNT_F_EMPTY_PATH
> +to request that the detached mount object
> +associated with the file descriptor
> +.I srcfd
> +be moved (and thus attached) to
> +.IR /mnt .
> +.P
> +The above procedure is functionally equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/var", "/mnt", NULL, MS_BIND, NULL);
> +.EE
> +.in
> +.P
> +.B \%OPEN_TREE_CLONE
> +can be combined with
> +.B \%AT_RECURSIVE
> +to create recursive detached bind-mount mount objects,
> +which in turn can be attached to mount points
> +to create recursive bind-mounts.
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(AT_FDCWD, "/var",
> +                      OPEN_TREE_CLONE | AT_RECURSIVE);
> +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +The above procedure is functionally equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/var", "/mnt", NULL, MS_BIND | MS_REC, NULL);
> +.EE
> +.in
> +.P
> +One of the primary benefits of using
> +.BR open_tree ()
> +and
> +.BR move_mount (2)
> +over the traditional
> +.BR mount (2)
> +is that operating with
> +.IR dirfd -style
> +file descriptors is far easier and more intuitive.
> +.P
> +.in +4n
> +.EX
> +int srcfd = open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE);
> +move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +The above procedure is roughly equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/proc/self/fd/100",
> +      "/proc/self/fd/200/foo",
> +      NULL, MS_BIND, NULL);
> +.EE
> +.in
> +.P
> +In addition, you can use the file descriptor returned by
> +.BR open_tree ()
> +as the
> +.I dirfd
> +argument to any "*at()" system calls:
> +.P
> +.in +4n
> +.EX
> +int dirfd, fd;
> +\&
> +dirfd = open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE);
> +fd = openat(dirfd, "passwd", O_RDONLY);
> +fchmodat(dirfd, "shadow", 0000, 0);
> +close(dirfd);
> +close(fd);
> +/* The bind-mount is now destroyed */
> +.EE
> +.in
> +.SH SEE ALSO
> +.BR fsconfig (2),
> +.BR fsmount (2),
> +.BR fsopen (2),
> +.BR fspick (2),
> +.BR mount (2),
> +.BR mount_setattr (2),
> +.BR move_mount (2),
> +.BR mount_namespaces (7)
> 
> -- 
> 2.51.0
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
From: Aleksa Sarai @ 2025-10-01  7:37 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Askar Safin, brauner, dhowells, g.branden.robinson, jack,
	linux-api, linux-fsdevel, linux-kernel, linux-man, mtk.manpages,
	viro
In-Reply-To: <ugko3x7tuqrmbyb326aw3dvtvmdozvtps6hc6ff3lmtsijoube@aem2acyk6t2q>

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On 2025-10-01, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Askar,
> 
> On Wed, Oct 01, 2025 at 03:38:41AM +0300, Askar Safin wrote:
> > Aleksa Sarai <cyphar@cyphar.com>:
> > > +mntfd2 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > +                   &attr, sizeof(attr));
> > 
> > Your whole so-called "open_tree_attr example" doesn't contain any open_tree_attr
> > calls. :)
> > 
> > I think you meant open_tree_attr here.
> 
> I'll wait for Aleksa to confirm before applying and amending.

Yeah, Askar is right, they were a copy-paste snafu.

> > > +\&
> > > +/* Create a new copy with the id-mapping cleared */
> > > +memset(&attr, 0, sizeof(attr));
> > > +attr.attr_clr = MOUNT_ATTR_IDMAP;
> > > +mntfd3 = open_tree(mntfd1, "", OPEN_TREE_CLONE,
> > > +                   &attr, sizeof(attr));
> > 
> > And here.
> > 
> > Otherwise your whole patchset looks good. Add to whole patchset:
> > Reviewed-by: Askar Safin <safinaskar@gmail.com>
> 
> Thanks!  I'll retro-fit that to the commits I've aplied already too, as
> I haven't pushed them to master yet.
> 
> 
> Have a lovely day!
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es>
> Use port 80 (that is, <...:80/>).



-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox