Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Volunteering to do more reviews
From: Konstantin Ryabitsev @ 2026-04-15  3:08 UTC (permalink / raw)
  To: linux-doc, corbet

Jon and others:

I need more direct hands-on experience doing reviews and using my own tooling,
so I'd like to offer to do more reviewing of patches sent to linux-doc, if
that sort of thing is welcome and I won't be stepping on anyone's toes.

Best wishes,
-- 
KR

^ permalink raw reply

* Re: maintainer profiles
From: Randy Dunlap @ 2026-04-15  2:03 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Linux Documentation,
	Linux Kernel Mailing List
  Cc: Jonathan Corbet, Linux Kernel Workflows
In-Reply-To: <72a4accd-5f94-45f7-8392-bb659167f078@kernel.org>



On 4/14/26 4:18 AM, Krzysztof Kozlowski wrote:
> On 10/04/2026 02:18, Randy Dunlap wrote:
>> Hi,
>>
>> Is there supposed to be a difference (or distinction) in the contents of
>>
>> Documentation/process/maintainer-handbooks.rst
>> and
>> Documentation/maintainer/maintainer-entry-profile.rst
>> ?
>>
>> Can they be combined into one location?
> 
> Yes, please! Including also the location of actual profiles. I am mostly
> looking at them in the sources directly, not web docs, so confusing and
> annoying to find them distributed.

I agree completely but I'm not sure if anyone else does.

-- 
~Randy


^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Jiayuan Chen @ 2026-04-15  1:47 UTC (permalink / raw)
  To: mkf, bpf
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David Ahern, netdev, linux-doc, linux-kernel
In-Reply-To: <42c1fed84a84519c2432163aa46f587f2d624fef.camel@163.com>


On 4/14/26 11:37 PM, mkf wrote:
> On Tue, 2026-04-14 at 18:57 +0800, Jiayuan Chen wrote:


[...]

> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -475,12 +475,21 @@ struct tcp_sock {
>   	u8	bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>   					 * values defined in uapi/linux/tcp.h
>   					 */
> -	u8	bpf_chg_cc_inprogress:1; /* In the middle of
> +	u8	bpf_chg_cc_inprogress:1, /* In the middle of
>   					  * bpf_setsockopt(TCP_CONGESTION),
>   					  * it is to avoid the bpf_tcp_cc->init()
>   					  * to recur itself by calling
>   					  * bpf_setsockopt(TCP_CONGESTION, "itself").
>   					  */
> +		bpf_hdr_opt_len_cb_inprogress:1; /* It is set before invoking the
> +						  * callback so that a nested
> +						  * bpf_setsockopt(TCP_NODELAY) or
> +						  * bpf_setsockopt(TCP_CORK) cannot
> +						  * trigger tcp_push_pending_frames(),
> +						  * which would call tcp_current_mss()
> +						  * -> bpf_skops_hdr_opt_len(), causing
> +						  * infinite recursion.
> +						  */
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>   #else
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..518699429a7a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5483,6 +5483,10 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
>   	if (sk->sk_protocol != IPPROTO_TCP)
>   		return -EINVAL;
>   
> +	if ((optname == TCP_NODELAY || optname == TCP_CORK) &&
> +	    tcp_sk(sk)->bpf_hdr_opt_len_cb_inprogress)
> +		return -EBUSY;
> +
> TCP_CORK is not support in sol_tcp_sockopt(), return -EINVAL by default. and put the check here
> could also prevent us from calling getsockopt(TCP_NODELAY) below.
>
>>   	switch (optname) {
>>   	case TCP_NODELAY:
>>   	case TCP_MAXSEG:
>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>> index dafb63b923d0..fb06c464ac16 100644
>> --- a/net/ipv4/tcp_minisocks.c
>> +++ b/net/ipv4/tcp_minisocks.c
>> @@ -663,6 +663,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
>>   	RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
>>   
>>   	newtp->bpf_chg_cc_inprogress = 0;
>> +	newtp->bpf_hdr_opt_len_cb_inprogress = 0;
>>   	tcp_bpf_clone(sk, newsk);
>>   
>>   	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 326b58ff1118..c9654e690e1a 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -475,6 +475,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   				  unsigned int *remaining)
>>   {
>>   	struct bpf_sock_ops_kern sock_ops;
>> +	struct tcp_sock *tp = tcp_sk(sk);
>>   	int err;
>>   
>>   	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
>> @@ -519,7 +520,9 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   	if (skb)
>>   		bpf_skops_init_skb(&sock_ops, skb, 0);
>>   
>> +	tp->bpf_hdr_opt_len_cb_inprogress = 1;
> we check the BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG before calling BPF_CGROUP_RUN_PROG_SOCK_OPS_SK,
> could this flag use for the same purpose? so we don't need to add an extra field.
>
> 	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
> 					   BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) ||
> 	    !*remaining)
> 		return;


Hi Martin, I saw your patch. Your solution is better, please ignore mine :)




^ permalink raw reply

* Re: [PATCH] crash: Support high memory reservation for range syntax
From: kernel test robot @ 2026-04-15  1:43 UTC (permalink / raw)
  To: Youling Tang, Andrew Morton, Baoquan He, Jonathan Corbet
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, Vivek Goyal,
	Dave Young, kexec, linux-kernel, linux-doc, youling.tang,
	Youling Tang
In-Reply-To: <20260404074103.506793-1-youling.tang@linux.dev>

Hi Youling,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Youling-Tang/crash-Support-high-memory-reservation-for-range-syntax/20260414-205035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260404074103.506793-1-youling.tang%40linux.dev
patch subject: [PATCH] crash: Support high memory reservation for range syntax
config: s390-defconfig (https://download.01.org/0day-ci/archive/20260415/202604150941.oDguaP7A-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150941.oDguaP7A-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150941.oDguaP7A-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/crash_reserve.c:264:19: warning: expression which evaluates to zero treated as a null pointer constant of type 'char *' [-Wnon-literal-null-conversion]
     264 |         char *first_gt = false;
         |                          ^~~~~
>> kernel/crash_reserve.c:324:16: error: use of undeclared identifier 'DEFAULT_CRASH_KERNEL_LOW_SIZE'
     324 |                         *low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
         |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.


vim +/DEFAULT_CRASH_KERNEL_LOW_SIZE +324 kernel/crash_reserve.c

   282	
   283	static int __init __parse_crashkernel(char *cmdline,
   284				     unsigned long long system_ram,
   285				     unsigned long long *crash_size,
   286				     unsigned long long *crash_base,
   287				     const char *suffix,
   288				     bool *high,
   289				     unsigned long long *low_size)
   290	{
   291		char *first_colon, *first_space;
   292		char *ck_cmdline;
   293		char *name = "crashkernel=";
   294		unsigned long long boundary = 0;
   295		int ret;
   296	
   297		BUG_ON(!crash_size || !crash_base);
   298		*crash_size = 0;
   299		*crash_base = 0;
   300	
   301		ck_cmdline = get_last_crashkernel(cmdline, name, suffix);
   302		if (!ck_cmdline)
   303			return -ENOENT;
   304	
   305		ck_cmdline += strlen(name);
   306	
   307		if (suffix)
   308			return parse_crashkernel_suffix(ck_cmdline, crash_size,
   309					suffix);
   310		/*
   311		 * if the commandline contains a ':', then that's the extended
   312		 * syntax -- if not, it must be the classic syntax
   313		 */
   314		first_colon = strchr(ck_cmdline, ':');
   315		first_space = strchr(ck_cmdline, ' ');
   316		if (first_colon && (!first_space || first_colon < first_space)) {
   317			ret = parse_crashkernel_mem(ck_cmdline, system_ram,
   318					crash_size, crash_base);
   319	
   320			/* Handle optional ',>boundary' condition for range ':' syntax only. */
   321			parse_crashkernel_boundary(ck_cmdline, &boundary);
   322			if (!ret && *crash_size > boundary) {
   323				*high = true;
 > 324				*low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
   325			}
   326	
   327			return ret;
   328		}
   329	
   330		return parse_crashkernel_simple(ck_cmdline, crash_size, crash_base);
   331	}
   332	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [RFC PATCH 7/7] Docs/admin-guide/mm/damon/lru_sort: update for entire memory monitoring
From: SeongJae Park @ 2026-04-15  1:20 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm
In-Reply-To: <20260415012048.76508-1-sj@kernel.org>

Update DAMON_LRU_SORT usage document for the changed default monitoring
target region selection.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/admin-guide/mm/damon/lru_sort.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst
index 25e2f042a383f..796b0a028555d 100644
--- a/Documentation/admin-guide/mm/damon/lru_sort.rst
+++ b/Documentation/admin-guide/mm/damon/lru_sort.rst
@@ -246,7 +246,8 @@ monitor_region_start
 Start of target memory region in physical address.
 
 The start physical address of memory region that DAMON_LRU_SORT will do work
-against.  By default, biggest System RAM is used as the region.
+against.  By default, the system's entire phyiscal memory is used as the
+region.
 
 monitor_region_end
 ------------------
@@ -254,7 +255,8 @@ monitor_region_end
 End of target memory region in physical address.
 
 The end physical address of memory region that DAMON_LRU_SORT will do work
-against.  By default, biggest System RAM is used as the region.
+against.  By default, the system's entire physical memory is used as the
+region.
 
 addr_unit
 ---------
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH 6/7] Docs/admin-guide/mm/damon/reclaim: update for entire memory monitoring
From: SeongJae Park @ 2026-04-15  1:20 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm
In-Reply-To: <20260415012048.76508-1-sj@kernel.org>

Update DAMON_RECLAIM usage document for the changed default monitoring
target region selection.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/admin-guide/mm/damon/reclaim.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/reclaim.rst b/Documentation/admin-guide/mm/damon/reclaim.rst
index b14a065586271..ec7e3e32b4ac6 100644
--- a/Documentation/admin-guide/mm/damon/reclaim.rst
+++ b/Documentation/admin-guide/mm/damon/reclaim.rst
@@ -240,7 +240,8 @@ Start of target memory region in physical address.
 
 The start physical address of memory region that DAMON_RECLAIM will do work
 against.  That is, DAMON_RECLAIM will find cold memory regions in this region
-and reclaims.  By default, biggest System RAM is used as the region.
+and reclaims.  By default, the system's entire physical memory is used as the
+region.
 
 monitor_region_end
 ------------------
@@ -249,7 +250,8 @@ End of target memory region in physical address.
 
 The end physical address of memory region that DAMON_RECLAIM will do work
 against.  That is, DAMON_RECLAIM will find cold memory regions in this region
-and reclaims.  By default, biggest System RAM is used as the region.
+and reclaims.  By default, the system's entire physical memory is used as the
+region.
 
 addr_unit
 ---------
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH 0/7] mm/damon/reclaim,lru_sort: monitor all system rams by default
From: SeongJae Park @ 2026-04-15  1:20 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm

DAMON_RECLAIM and DAMON_LRU_SORT set the biggest 'System RAM' resource
of the system as the default monitoring target address range.  The main
intention behind the design is to minimize the overhead coming from
monitoring of non-System RAM areas.

This could result in an odd setup when there are multiple discrete
System RAMs of considerable sizes.  For example, there are System RAMs
each having 500 GiB size.  In this case, only the first 500 GiB will be
set as the monitoring region by default.  This is particularly common on
NUMA systems.  Hence the modules allow users to set the monitoring
target address range using the module parameters if the default setup
doesn't work for them.  In other words, the current design trades ease
of setup for lower overhead.

However, because DAMON utilizes the sampling based access check and the
adaptive regions adjustment mechanisms, the overhead from the monitoring
of non-System RAM areas should be negligible in most setups.  Meanwhile,
the setup complexity is causing real headaches for users who need to run
those modules on various types of systems.  That is, the current
tradeoff is not a good deal.

Set the physical address range that can cover all System RAM areas of
the system as the default monitoring regions for DAMON_RECLAIM and
DAMON_LRU_SORT.

Technically speaking, this is changing documented behavior.  However, it
makes no sense to believe there is a real use case that really depends
on the old weird default behavior.  If the old default behavior was
working for them in the reasonable way, this change will only add a
negligible amount of monitoring overhead.  If it didn't work, the users
may already be using manual monitoring regions setup, and they will not
be affected by this change.

Patches Sequence
================

Patch 1 introduces a new core function that will be used for the new
default monitoring target region setup.  Patch 2 and 3 update
DAMON_RECLAIM and DAMON_LRU_SORT to use the new function instead of the
old one, respectively.  Patch 4 removes the old core function that was
replaced by the new one, as there is no more user of it.  Patch 5
updates DAMON_STAT to use the new one instead of its in-house
nearly-duplicate self implementation of the functionality.  Finally
patches 6 and 7 update the DAMON_RECLAIM and DAMON_LRU_SORT user
documentation for the new behaviors, respectively.

SeongJae Park (7):
  mm/damon: introduce damon_set_region_system_rams_default()
  mm/damon/reclaim: cover all system rams
  mm/damon/lru_sort: cover all system rams
  mm/damon/core: remove damon_set_region_biggest_system_ram_default()
  mm/damon/stat: use damon_set_region_system_rams_default()
  Docs/admin-guide/mm/damon/reclaim: update for entire memory monitoring
  Docs/admin-guide/mm/damon/lru_sort: update for entire memory
    monitoring

 .../admin-guide/mm/damon/lru_sort.rst         |  6 ++-
 .../admin-guide/mm/damon/reclaim.rst          |  6 ++-
 include/linux/damon.h                         |  2 +-
 mm/damon/core.c                               | 49 +++++++++--------
 mm/damon/lru_sort.c                           |  8 +--
 mm/damon/reclaim.c                            | 14 ++---
 mm/damon/stat.c                               | 53 ++-----------------
 7 files changed, 50 insertions(+), 88 deletions(-)

base-commit: 11bcd10460e9446785fc04deb5d175806a00400b
-- 
2.47.3

^ permalink raw reply

* Re: [PATCH RFC 4/4] docs: auto-generate maintainer entry profile links
From: Dan Williams @ 2026-04-15  0:59 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, linux-kernel, linux-riscv, workflows,
	Albert Ou, Alexandre Ghiti, Dan Williams, Palmer Dabbelt,
	Paul Walmsley, Randy Dunlap, Shuah Khan
In-Reply-To: <9228f77b0339b8e5dea4a201ab6d4feb30cef5c2.1776176108.git.mchehab+huawei@kernel.org>

Mauro Carvalho Chehab wrote:
> Instead of manually creating a TOC tree for them, use the new
> tag to auto-generate its TOC.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  .../maintainer/maintainer-entry-profile.rst     | 17 ++---------------
>  Documentation/process/maintainer-handbooks.rst  | 10 +---------
>  2 files changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
> index 6020d188e13d..48ecabd4ce13 100644
> --- a/Documentation/maintainer/maintainer-entry-profile.rst
> +++ b/Documentation/maintainer/maintainer-entry-profile.rst
> @@ -98,18 +98,5 @@ Existing profiles
>  For now, existing maintainer profiles are listed here; we will likely want
>  to do something different in the near future.

Given the "near future" is now, I would say go ahead and delete this.

> -.. toctree::
> -   :maxdepth: 1
> -
> -   ../doc-guide/maintainer-profile
> -   ../nvdimm/maintainer-entry-profile
> -   ../arch/riscv/patch-acceptance
> -   ../process/maintainer-soc
> -   ../process/maintainer-soc-clean-dts
> -   ../driver-api/media/maintainer-entry-profile
> -   ../process/maintainer-netdev
> -   ../driver-api/vfio-pci-device-specific-driver-acceptance
> -   ../nvme/feature-and-quirk-policy
> -   ../filesystems/nfs/nfsd-maintainer-entry-profile
> -   ../filesystems/xfs/xfs-maintainer-entry-profile
> -   ../mm/damon/maintainer-profile
> +See Documentation/process/maintainer-handbooks.rst for subsystem-specific
> +profiles.
> diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
> index 3d72ad25fc6a..d3d74c719018 100644
> --- a/Documentation/process/maintainer-handbooks.rst
> +++ b/Documentation/process/maintainer-handbooks.rst
> @@ -9,12 +9,4 @@ which is supplementary to the general development process handbook

Given this whole thing started with a question about why there are 2
files you can go ahead and fold in:

---
For developers, see below for all the known subsystem specific guides.
If the subsystem you are contributing to does not have a guide listed
here, it is fair to seek clarification of questions raised in
Documentation/maintainer/maintainer-entry-profile.rst.

For maintainers, consider documenting additional requirements and
expectations if submissions routinely overlook specific submission
criteria. See Documentation/maintainer/maintainer-entry-profile.rst and
the P: tag in MAINTAINERS.
---

...suggestion with a:

Co-developed-by: Dan Williams <djbw@kernel.org>
Signed-off-by: Dan Williams <djbw@kernel.org>

...if you want me to submit that separately, just holler.

^ permalink raw reply

* Re: [PATCH RFC 3/4] MAINTAINERS: add maintainer-tip.rst to X86
From: Dan Williams @ 2026-04-15  0:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, linux-kernel, linux-riscv, workflows,
	Dan Williams, Randy Dunlap
In-Reply-To: <970434c647aa1e1e9a81c87b4d5fed934d4018a7.1776176108.git.mchehab+huawei@kernel.org>

Mauro Carvalho Chehab wrote:
> While the maintainer's profile for tip is there, it is not
> at X86 maintainer's entry.

nit. should this be MAINTAINERS since it is referring to the file.

> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Acked-by: Dan Williams <djbw@kernel.org>

^ permalink raw reply

* Re: [PATCH RFC 1/4] docs: maintainers_include: auto-generate maintainer profile TOC
From: Dan Williams @ 2026-04-15  0:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List,
	Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, linux-kernel, linux-riscv, workflows,
	Dan Williams, Randy Dunlap, Shuah Khan
In-Reply-To: <4e9512a3d05942c98361d06d60a118d7c78762b6.1776176108.git.mchehab+huawei@kernel.org>

Mauro Carvalho Chehab wrote:
> Add a feature to allow auto-generating media entry profiles from the
> corresponding field inside MAINTAINERS file(s).
> 
> Suggested-by: Dan Williams <djbw@kernel.org>
> Closes: https://lore.kernel.org/linux-doc/69dd6299440be_147c801005b@djbw-dev.notmuch/
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Nice!

Acked-by: Dan Williams <djbw@kernel.org>

^ permalink raw reply

* Re: maintainer profiles
From: Dan Williams @ 2026-04-15  0:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Dan Williams
  Cc: Jonathan Corbet, Randy Dunlap, Linux Documentation,
	Linux Kernel Mailing List, Linux Kernel Workflows
In-Reply-To: <20260414163204.08f94002@localhost>

Mauro Carvalho Chehab wrote:
[..] 
> If you transform this diff into a patch, it would make sense to
> add together with the next version of my RFC ;-)

I am ok if you steal whatever you want from it with a Suggested-by.

The bulk of the important work is your maintainers_include.py changes.

^ permalink raw reply

* Re: [PATCH RFC 2/4] MAINTAINERS: add an entry for media maintainers profile
From: Randy Dunlap @ 2026-04-15  0:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List
  Cc: linux-kernel, linux-riscv, workflows, Dan Williams
In-Reply-To: <5af4aa6a716228eea4d59dc26b97d642e1e7d419.1776176108.git.mchehab+huawei@kernel.org>



On 4/14/26 7:29 AM, Mauro Carvalho Chehab wrote:
> While media has a maintainers entry profile, its entry is
> missing at MAINTAINERS.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Acked-by: Randy Dunlap <rdunlap@infradead.org>

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f0b106a4dd96..620219e48f98 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16115,6 +16115,7 @@ S:	Maintained
>  W:	https://linuxtv.org
>  Q:	http://patchwork.kernel.org/project/linux-media/list/
>  T:	git git://linuxtv.org/media.git
> +P:	Documentation/driver-api/media/maintainer-entry-profile.rst
>  F:	Documentation/admin-guide/media/
>  F:	Documentation/devicetree/bindings/media/
>  F:	Documentation/driver-api/media/

-- 
~Randy

^ permalink raw reply

* Re: [PATCH RFC 3/4] MAINTAINERS: add maintainer-tip.rst to X86
From: Randy Dunlap @ 2026-04-15  0:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List
  Cc: linux-kernel, linux-riscv, workflows, Dan Williams
In-Reply-To: <970434c647aa1e1e9a81c87b4d5fed934d4018a7.1776176108.git.mchehab+huawei@kernel.org>



On 4/14/26 7:29 AM, Mauro Carvalho Chehab wrote:
> While the maintainer's profile for tip is there, it is not
> at X86 maintainer's entry.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Acked-by: Randy Dunlap <rdunlap@infradead.org>

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 620219e48f98..a85fcae5f56e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -28560,6 +28560,7 @@ M:	Ingo Molnar <mingo@redhat.com>
>  M:	Borislav Petkov <bp@alien8.de>
>  M:	Dave Hansen <dave.hansen@linux.intel.com>
>  M:	x86@kernel.org
> +P:	Documentation/process/maintainer-tip.rst
>  R:	"H. Peter Anvin" <hpa@zytor.com>
>  L:	linux-kernel@vger.kernel.org
>  S:	Maintained

-- 
~Randy

^ permalink raw reply

* Re: [PATCH RFC 4/4] docs: auto-generate maintainer entry profile links
From: Randy Dunlap @ 2026-04-15  0:34 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Corbet, Linux Doc Mailing List
  Cc: linux-kernel, linux-riscv, workflows, Albert Ou, Alexandre Ghiti,
	Dan Williams, Palmer Dabbelt, Paul Walmsley, Shuah Khan
In-Reply-To: <9228f77b0339b8e5dea4a201ab6d4feb30cef5c2.1776176108.git.mchehab+huawei@kernel.org>



On 4/14/26 7:29 AM, Mauro Carvalho Chehab wrote:
> Instead of manually creating a TOC tree for them, use the new
> tag to auto-generate its TOC.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  .../maintainer/maintainer-entry-profile.rst     | 17 ++---------------
>  Documentation/process/maintainer-handbooks.rst  | 10 +---------
>  2 files changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
> index 6020d188e13d..48ecabd4ce13 100644
> --- a/Documentation/maintainer/maintainer-entry-profile.rst
> +++ b/Documentation/maintainer/maintainer-entry-profile.rst
> @@ -98,18 +98,5 @@ Existing profiles
>  For now, existing maintainer profiles are listed here; we will likely want
>  to do something different in the near future.
>  
> -.. toctree::
> -   :maxdepth: 1
> -
> -   ../doc-guide/maintainer-profile
> -   ../nvdimm/maintainer-entry-profile
> -   ../arch/riscv/patch-acceptance
> -   ../process/maintainer-soc
> -   ../process/maintainer-soc-clean-dts
> -   ../driver-api/media/maintainer-entry-profile
> -   ../process/maintainer-netdev
> -   ../driver-api/vfio-pci-device-specific-driver-acceptance
> -   ../nvme/feature-and-quirk-policy
> -   ../filesystems/nfs/nfsd-maintainer-entry-profile
> -   ../filesystems/xfs/xfs-maintainer-entry-profile
> -   ../mm/damon/maintainer-profile
> +See Documentation/process/maintainer-handbooks.rst for subsystem-specific
> +profiles.
> diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
> index 3d72ad25fc6a..d3d74c719018 100644
> --- a/Documentation/process/maintainer-handbooks.rst
> +++ b/Documentation/process/maintainer-handbooks.rst
> @@ -9,12 +9,4 @@ which is supplementary to the general development process handbook
>  
>  Contents:
>  
> -.. toctree::
> -   :numbered:
> -   :maxdepth: 2
> -
> -   maintainer-netdev
> -   maintainer-soc
> -   maintainer-soc-clean-dts
> -   maintainer-tip
> -   maintainer-kvm-x86
> +.. maintainers-profile-toc::

It appears that the maintainer profile entries should be listed
here, following "Contents:". Is that correct?
I see nothing following "Contents:" except for the page footer.


-- 
~Randy


^ permalink raw reply

* Re: [PATCH] crash: Support high memory reservation for range syntax
From: kernel test robot @ 2026-04-15  0:28 UTC (permalink / raw)
  To: Youling Tang, Andrew Morton, Baoquan He, Jonathan Corbet
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, Vivek Goyal,
	Dave Young, kexec, linux-kernel, linux-doc, youling.tang,
	Youling Tang
In-Reply-To: <20260404074103.506793-1-youling.tang@linux.dev>

Hi Youling,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Youling-Tang/crash-Support-high-memory-reservation-for-range-syntax/20260414-205035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260404074103.506793-1-youling.tang%40linux.dev
patch subject: [PATCH] crash: Support high memory reservation for range syntax
config: loongarch-randconfig-001-20260415 (https://download.01.org/0day-ci/archive/20260415/202604150808.7HxFp5b4-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150808.7HxFp5b4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150808.7HxFp5b4-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/crash_reserve.c:264:19: warning: expression which evaluates to zero treated as a null pointer constant of type 'char *' [-Wnon-literal-null-conversion]
     264 |         char *first_gt = false;
         |                          ^~~~~
   1 warning generated.


vim +264 kernel/crash_reserve.c

   254	
   255	/*
   256	 * This function parses command lines in the format
   257	 *
   258	 *   crashkernel=ramsize-range:size[,...][@offset],>boundary
   259	 */
   260	static void __init parse_crashkernel_boundary(char *ck_cmdline,
   261						unsigned long long *boundary)
   262	{
   263		char *cur = ck_cmdline, *next;
 > 264		char *first_gt = false;
   265	
   266		first_gt = strchr(cur, '>');
   267		if (!first_gt)
   268			return;
   269	
   270		cur = first_gt + 1;
   271		if (*cur == '\0' || *cur == ' ' || *cur == ',') {
   272			pr_warn("crashkernel: '>' specified without boundary size, ignoring\n");
   273			return;
   274		}
   275	
   276		*boundary = memparse(cur, &next);
   277		if (cur == next) {
   278			pr_warn("crashkernel: invalid boundary size after '>'\n");
   279			return;
   280		}
   281	}
   282	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong @ 2026-04-15  0:15 UTC (permalink / raw)
  To: John Groves
  Cc: Miklos Szeredi, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
	Christian Brauner, Randy Dunlap, Jeff Layton, Amir Goldstein,
	Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad7MC5Em4l72nJ6u@groves.net>

On Tue, Apr 14, 2026 at 06:53:30PM -0500, John Groves wrote:
> On 26/04/14 11:57AM, Darrick J. Wong wrote:
> > On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > 
> > > > > Overall, my intention with bringing this up is just to make sure we're
> > > > > at least aware of this alternative before anything is merged and
> > > > > permanent. If Miklos and you think we should land this series, then
> > > > > I'm on board with that.
> > > > 
> > > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > > absolutely necessary.  This was the main sticking point originally,
> > > > but there seemed to be no better alternative.
> > > > 
> > > > However with the bpf approach this would be gone, which is great.
> > 
> > Well... you can't get away with having *no* mapping interface at all.
> > You still have to define a UABI that BPF programs can use to convey
> > mapping data into fsdax/iomap.  BTF is a nice piece of work that smooths
> > over minor fluctuations in struct layout between a running kernel and
> > a precompiled BPF program, but fundamentally we still need a fuse-native
> > representation.
> 
> A couple of points here, that are really top level observations.
> 
> The call path from fuse into famfs largely looks like:
> 
> if (passthrough)
> 	return passthrough_call()
> else if (virtiofs)
> 	return virtiofs_call()
> else if (famfs)
> 	return famfs_call()
> 
> So from a hooking in standpoint I was trying to be compliant.
> 
> Second point: iomap is an overloaded term. The famfs iomap usage is stolen
> from xfs' fs-dax iomap call patterns. I *think* that is distinct from the
> stuff called iomap that handles block I/O. Because maybe not everybody who
> reads this will understand that famfs is, uh, kinda like hugetlbfs except
> that the memory is from devdax (in 'famfs' mode, because the old mode
> stopped working for file-backed maps. Famfs files are never sparse, and
> they never use the page cache - which is super, super different from a
> conventional file system.
> 
> the famfs_filemap_fault() path calls dax_iomap_fault() path (which I added 
> to devdax in the new famfs mode, because it was in pmem but not devdax)
> always just updates a page table beause the page is always present. That
> means that the fault path is SUPER PERFORMANCE CRITICAL because in heavy
> use there can be millions of these faults per second - and with famfs there
> is NEVER EVER a read from storage to amortize the call overhead over. 
> 
> This is a super-important point. famfs_filemap_fault() is a in the
> vm_operations_struct. It is called to remind the CPU where an address maps
> to, because the TLB and PTE had been purged (which happens ALL THE TIME).
> 
> The ask here is to insert a BPF program as a vma fault handler. Can it work?
> Probably. Will it perform? I HAVE NO IDEA, BUT THERE ARE REASONS TO WORRY
> THAT IT MIGHT NOT.
> 
> I don't think this suggestion was made from a full understanding of the
> performance requirements of this code path.
> 
> This is why we need a discussion with fs/mm/bpf experts. We should be able 
> to assemble an understanding of what the overhead of calling the BPF program
> are and how many nanoseconds (or microseconds) that could possibly add.
> Anything longer than the current famfs_filemap_fault() path is potentially
> disastrous because the whole point of famfs is to expose memory via files,
> and avoid sabotaging the performance.
> 
> An L3 cache miss costs 100ns in round numbers on fast local DRAM, and
> 3-5x as long on switched disaggregated memory. We cannot afford an expensive
> code path resolving these mappings.
> 
> This is why, at the last two LSFMMs and in the famfs documentation, I said 
> things like "we're exposing memory, and it must run at memory speeds".
> 
> Famfs also registers with the memory provider (devdax in famfs mode) to
> receive notifications of memory failures, and uses a 'holder_operations'
> pattern copied from pmem. This stuff is not in generic iomap (correct me
> if that's wrong).
> 
> And finally since I've core dumped quite a bit here, I'll go ahead and add
> a thought experiment that *might* rule out using a BPF program as a vma
> fault handler. Could we do that with hugetlbfs without damaging performance
> for memory-intensive workloads? Hugetlbfs is a pretty solid stand-in for
> famfs: it never does data-movement faults, it's never sparse, and it needs
> to resolve TLB/PTE/PMD/PUD faults FAST.
> 
> > 
> > That last sentence was an indirect way of saying: No, we're not going
> > to export struct iomap to userspace.  The fuse-iomap patchset provides
> > all the UABI pieces we need for regular filesystems (ext4) and hardware
> > adjacent filesystems (famfs) to exchange file mapping data with the
> > kernel.  This has been out for review since last October, but the lack
> > of engagement with that patchset (or its February resubmission) doesn't
> > leave me with confidence that any of it is going anywhere.
> > 
> > Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> > upload bpf programs to generate interleaved mappings.  It's not so hard
> > to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> > helped him do that because:
> > 
> > a) I have no idea what Miklos' thoughts are about merging any of the
> > famfs stuff.
> > 
> > b) I also have no idea what his thoughts are about fuse-iomap.  The
> > sparse replies are not encouraging.
> > 
> > c) It didn't seem fair to John to make him take on a whole new patchset
> > dependency given (a) and (b).
> > 
> > d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> > some code review of fuse iomap without waiting three months for LSFMM?"
> > I've literally done nothing with fuse-iomap for two of the three months
> > requested.
> > 
> > > > So let us please at least have a try at this. I'm not into bpf yet,
> > > > but willing to learn.
> > 
> > I sent out the patches to enable exactly this sort of experimentation
> > two months ago, and have not received any responses:
> > 
> > https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
> > 
> > I would like to say this as gently as possible: I don't know what the
> > problem here is, Miklos -- are you uninterested in the work?  Do you
> > have too many other things to do inside RH that you can't talk about?
> > Is it too difficult to figure out how the iomap stuff fits into the rest
> > of the fuse codebase?  Do you need help from the rest of us to get
> > reviews done?  Is there something else with which I could help?
> > 
> > Because ... over the past few years, many of my team's filesystem
> > projects have endured monthslong review cycles and often fail to get
> > merged.  This has led to burnout and frustration among my teammates such
> > that many of them chose to move on to other things.  For the remaining
> > people, it was very difficult to justify continuing headcount when
> > progress on projects is so slow that individuals cannot achieve even one
> > milestone per quarter on any project.
> > 
> > There's now nobody left here but me.
> > 
> > I'm not blaming you (Miklos) for any of this, but that is the current
> > deplorable state of things.
> > 
> > > > Thanks,
> > > > Miklos
> > > 
> > > Thanks for responding...
> > > 
> > > My short response: Noooooooooo!!!!!!
> > > 
> > > I very strongly object to making this a prerequisite to merging. This
> > > is an untested idea that will certainly delay us by at least a couple
> > > of merge windows when products are shipping now, and the existing approach
> > > has been in circulation for a long time. It is TOO LATE!!!!!!
> > 
> > /me notes that has "we're shipping so you have to merge it over peoples'
> > concerns" rarely carries the day in LKML land, and has never ended well
> > in the few cases that it happens.  As Ted is fond of saying, this is a
> > team sport, not an individual effort.  Unfortunately, to abuse your
> > sports metaphor, we all play for the ******* A's.
> 
> That's totally fair. This process has been very long and grueling, and I'm
> not always thinking clearly.

I wish the peer review part were easier.  It's stressful enough to get
the darned thing to work the way you want it to and not do anything
weird... and computers are generally better about that than they were in
the 80s.

> > That said, you're clearly pissed at the goalposts changing yet again,
> > and that's really not fair that we collectively keep moving them.
> > 
> > It's a rotten situation that I could have even helped you to solve both
> > our problems via fuse-iomap, but I just couldn't motivate myself to
> > entwine our two projects until the technical direction questions got
> > answered.
> > 
> > > Famfs is not a science project, it's enablement for actual products and
> > > early versions are available now!!!
> > > 
> > > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
> > 
> > Heck, the fuse command field is a u32.  There are plenty of numberspace
> > left, and the kernel can just *stop issuing them*.
> > 
> > > What are the risks of converting to BPF?
> > > 
> > > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> > >   curve cost about a year because this is not that similar to anything
> > >   else that was already in fuse.
> > 
> > ...and per above, BPF isn't some magic savior that avoids the expansion
> > of the UABI.
> > 
> > > - Those of us who are involved don't fully understand either the security
> > >   or performance implications of this. It 
> > 
> > Correct.  I sure think it's swell that people can inject IR programs
> > that jit/link into the kernel.  Don't ask which secondary connotation of
> > "swell" I'm talking about.
> > 
> > > - Famfs is enabling access to memory and mapping fault handling must be
> > >   at "memory speed". We know that BPF walks some data structures when a 
> > >   program executes. That exposes us to additional serialized L3 cache 
> > >   misses each time we service a mapping fault (any TLB & page table miss).
> > >   This should be studied side-by-side with the existing approach under
> > >   multiple loads before being adopted for production.
> > 
> > Yes, it should.  AFAICT if one switched to a per-inode bpf program, then
> > you could do per-inode bpf programs.  Then you don't even need the bpf
> > map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> > math code.
> > 
> > (The downside is that dyn code can't be meaningfully signed, requires
> > clang on the system, and you have to deal with inode eviction issues.)
> > 
> > > - This has never been done in production, and we're throwing it in the way
> > >   of a project that has been soaking for years and needs to support early
> > >   shipments of products.
> > 
> > Correct.  I haven't even implemented BPF-iomap for fuse4fs.  This BPF
> > integration stuff is *highly* experimental code.
> > 
> > > If this is the only path, I'd like to revive famfs as a standalone file
> > > system. I'm still maintaining that and it's still in use.
> > 
> > Honestly, you should probably just ship that to your users.  As long as
> > the ondisk format doesn't change much, switching the implementation at a
> > later date is at least still possible.
> > 
> > --D
> 
> And apologies to the polite universe for being a bit raw earlier. Getting
> this far has been quite a grind...

Oh believe me, I had much angrier things to say elsewhere in 2023-24
about grueling slowass reviews.  That is, indirectly, why I'm now
working on /this/ project. :(

--D

^ permalink raw reply

* [PATCH v2] docs: fix typos in kernel documentation
From: fru1tworld @ 2026-04-15  0:12 UTC (permalink / raw)
  To: corbet; +Cc: skhan, linux-doc, Hyeonjin Kim
In-Reply-To: <20260414084553.22762-1-fruitworld.planet@gmail.com>

From: Hyeonjin Kim <fruitworld.planet@gmail.com>

reinitalizes => reinitializes
unpriviledged => unprivileged
sub-struture => sub-structure

Signed-off-by: Hyeonjin Kim <fruitworld.planet@gmail.com>
---
 Documentation/block/data-integrity.rst | 2 +-
 Documentation/core-api/list.rst        | 2 +-
 Documentation/gpu/drm-uapi.rst         | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/block/data-integrity.rst b/Documentation/block/data-integrity.rst
index 99905e880a0e..b7b10c8abbcc 100644
--- a/Documentation/block/data-integrity.rst
+++ b/Documentation/block/data-integrity.rst
@@ -154,7 +154,7 @@ bio_free() will automatically free the bip.
 ----------------
 
 Block devices can set up the integrity information in the integrity
-sub-struture of the queue_limits structure.
+sub-structure of the queue_limits structure.
 
 Layered block devices will need to pick a profile that's appropriate
 for all subdevices.  queue_limits_stack_integrity() can help with that.  DM
diff --git a/Documentation/core-api/list.rst b/Documentation/core-api/list.rst
index 86873ce9adbf..7ff112770c51 100644
--- a/Documentation/core-api/list.rst
+++ b/Documentation/core-api/list.rst
@@ -752,7 +752,7 @@ This is because list_splice() did not reinitialize the list_head it took
 entries from, leaving its pointer pointing into what is now a different list.
 
 If we want to avoid this situation, list_splice_init() can be used. It does the
-same thing as list_splice(), except reinitalizes the donor list_head after the
+same thing as list_splice(), except reinitializes the donor list_head after the
 transplant.
 
 Concurrency considerations
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index d98428a592f1..14ecaf98df90 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -568,7 +568,7 @@ ENOSPC:
 EPERM/EACCES:
         Returned for an operation that is valid, but needs more privileges.
         E.g. root-only or much more common, DRM master-only operations return
-        this when called by unpriviledged clients. There's no clear
+        this when called by unprivileged clients. There's no clear
         difference between EACCES and EPERM.
 
 ENODEV:
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH V10 00/10] famfs: port into fuse
From: John Groves @ 2026-04-15  0:10 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Darrick J. Wong, Miklos Szeredi, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
	Christian Brauner, Randy Dunlap, Jeff Layton, Amir Goldstein,
	Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJnrk1ZgcMuwfMpT1fXvUwBBiq9eWFHWVeOFQFFKiamGGe1RJg@mail.gmail.com>

On 26/04/14 03:13PM, Joanne Koong wrote:
> On Tue, Apr 14, 2026 at 11:57 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > > Overall, my intention with bringing this up is just to make sure we're
> > > > > at least aware of this alternative before anything is merged and
> > > > > permanent. If Miklos and you think we should land this series, then
> > > > > I'm on board with that.
> > > >
> > > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > > absolutely necessary.  This was the main sticking point originally,
> > > > but there seemed to be no better alternative.
> > > >
> > > > However with the bpf approach this would be gone, which is great.
> >
> > Well... you can't get away with having *no* mapping interface at all.
> 
> Yes but the mapping interface should be *generic*, not one that is so
> specifically tailored to one server. fuse will have to support this
> forever.

Mapping interfaces being generic is a nice idea, but I'm no sure it's
realistic in a generalized sense. But other mitigating comments below.

> 
> > You still have to define a UABI that BPF programs can use to convey
> > mapping data into fsdax/iomap.  BTF is a nice piece of work that smooths
> > over minor fluctuations in struct layout between a running kernel and
> > a precompiled BPF program, but fundamentally we still need a fuse-native
> > representation.
> >
> > That last sentence was an indirect way of saying: No, we're not going
> > to export struct iomap to userspace.  The fuse-iomap patchset provides
> > all the UABI pieces we need for regular filesystems (ext4) and hardware
> > adjacent filesystems (famfs) to exchange file mapping data with the
> > kernel.  This has been out for review since last October, but the lack
> > of engagement with that patchset (or its February resubmission) doesn't
> > leave me with confidence that any of it is going anywhere.
> >
> > Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> > upload bpf programs to generate interleaved mappings.  It's not so hard
> > to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> > helped him do that because:
> >
> > a) I have no idea what Miklos' thoughts are about merging any of the
> > famfs stuff.
> >
> > b) I also have no idea what his thoughts are about fuse-iomap.  The
> > sparse replies are not encouraging.
> >
> > c) It didn't seem fair to John to make him take on a whole new patchset
> > dependency given (a) and (b).
> >
> > d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> > some code review of fuse iomap without waiting three months for LSFMM?"
> > I've literally done nothing with fuse-iomap for two of the three months
> > requested.
> >
> > > > So let us please at least have a try at this. I'm not into bpf yet,
> > > > but willing to learn.
> >
> > I sent out the patches to enable exactly this sort of experimentation
> > two months ago, and have not received any responses:
> >
> > https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
> >
> > I would like to say this as gently as possible: I don't know what the
> > problem here is, Miklos -- are you uninterested in the work?  Do you
> > have too many other things to do inside RH that you can't talk about?
> > Is it too difficult to figure out how the iomap stuff fits into the rest
> > of the fuse codebase?  Do you need help from the rest of us to get
> > reviews done?  Is there something else with which I could help?
> >
> > Because ... over the past few years, many of my team's filesystem
> > projects have endured monthslong review cycles and often fail to get
> > merged.  This has led to burnout and frustration among my teammates such
> > that many of them chose to move on to other things.  For the remaining
> > people, it was very difficult to justify continuing headcount when
> > progress on projects is so slow that individuals cannot achieve even one
> > milestone per quarter on any project.
> >
> > There's now nobody left here but me.
> >
> > I'm not blaming you (Miklos) for any of this, but that is the current
> > deplorable state of things.
> >
> > > > Thanks,
> > > > Miklos
> > >
> > > Thanks for responding...
> > >
> > > My short response: Noooooooooo!!!!!!
> > >
> > > I very strongly object to making this a prerequisite to merging. This
> > > is an untested idea that will certainly delay us by at least a couple
> > > of merge windows when products are shipping now, and the existing approach
> > > has been in circulation for a long time. It is TOO LATE!!!!!!
> >
> > /me notes that has "we're shipping so you have to merge it over peoples'
> > concerns" rarely carries the day in LKML land, and has never ended well
> > in the few cases that it happens.  As Ted is fond of saying, this is a
> > team sport, not an individual effort.  Unfortunately, to abuse your
> > sports metaphor, we all play for the ******* A's.
> >
> > That said, you're clearly pissed at the goalposts changing yet again,
> > and that's really not fair that we collectively keep moving them.
> >
> > It's a rotten situation that I could have even helped you to solve both
> > our problems via fuse-iomap, but I just couldn't motivate myself to
> > entwine our two projects until the technical direction questions got
> > answered.
> >
> > > Famfs is not a science project, it's enablement for actual products and
> > > early versions are available now!!!
> > >
> > > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
> >
> > Heck, the fuse command field is a u32.  There are plenty of numberspace
> > left, and the kernel can just *stop issuing them*.
> 
> I don't think the problem is the command field. As I understand it, if
> this lands and is converted over later, none of the famfs code in this
> series can be removed from fuse. If fuse has native non-bpf support
> for famfs, then it will always need to have that. That's the part that
> worries me.

I believe this basic premise is completely wrong. Here is why:

There is a FUSE_DAX_FMAP capability that the kernel may advertise or not
at init time; this capability "is" the famfs GET_FMAP AND GET_DAXDEV 
commands. In the future, if we find a way to use BPF (or some other 
mechanism) to avoid needing those fuse messages, the kernel could be updated 
to NEVER advertise the FUSE_DAX_FMAP capability. All of the famfs-specific 
code could be taken out of kernels that never advertise that capability.

Simple, really. Can't re-use the message opcodes, but as Darrick pointed out
those are not a scarce resource.

> 
> >
> > > What are the risks of converting to BPF?
> 
> I think maybe there is a misinterpretation of what the alternative
> approach entails. From my point of view, the alternative approach is
> not that different from what is already in this series. The only piece
> of the famfs logic that would need to use bpf is the logic for
> finding/computing the extent mappings (which is the famfs-specific
> logic that would not be applicable to any other server). That famfs
> bpf code is minimal and already written [1], as it is just the logic
> that is in patch 6 [2] in this series copied over. No other part of
> famfs touches bpf. The rest is renaming the functions in
> fs/fuse/famfs.c to generic fuse_iomap_dax_XXX names (the logic is the
> same logic in this series, eg invoking the lower-level calls to
> dax_iomap_rw/fault/etc) and moving the daxdev setup/initialization to
> connection initialization time where the server passes that daxdev
> setup info/configs upfront. I don't think this would delay things by
> several merge windows, as the code is already mostly written. If it
> would be helpful, I can clean up what's in the prototype and send that
> out.
> 
> I think the part that is not clear yet and needs to be verified is
> whether this approach runs into any technical limitations on famfs's
> production workloads. For example, does the overhead of using bpf maps
> lead to a noticeable performance drop on real workloads? In the
> future, will there be too many extent mappings on high-scale systems
> to make this feasible? etc. If there are technical reasons why the
> famfs logic has to be in fuse, then imo we should figure that out and
> ideally that's the discussion we should be having. I am not a cxl
> expert so perhaps there is something missing in the approach that
> makes it not sufficient on production systems. If we don't end up
> going with the alternative approach, I still think this series should
> try to make the famfs uapi additions to fuse as generic as possible
> since that will be irreversible.
> 
> If we expedited the alternative approach in terms of reviewing and
> merging, would that suffice? Is the main pushback the timing of it, eg
> that it would take too long to get reviewed, merged, and shipped?
> 
> > >
> > > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> > >   curve cost about a year because this is not that similar to anything
> > >   else that was already in fuse.
> >
> > ...and per above, BPF isn't some magic savior that avoids the expansion
> > of the UABI.
> 
> It doesn't avoid the expansion of the UABI but it makes the UABI
> generic (eg plenty of future servers can/will use the generic iomap
> layer).

Um, advertised capabilities allow contraction of the UABI-handling code with 
only some small cruft. Code that is only reachable in the presence of dead 
capability can totally be removed.

> 
> >
> > > - Those of us who are involved don't fully understand either the security
> > >   or performance implications of this. It
> >
> > Correct.  I sure think it's swell that people can inject IR programs
> > that jit/link into the kernel.  Don't ask which secondary connotation of
> > "swell" I'm talking about.
> 
> bpf is used elsewhere in the kernel (eg networking, scheduling). If it
> is the case that it is unsafe (which maybe it is, I don't know), then
> wouldn't those other areas have the same issues?

See my long comment to Darrick's prior email.

I suspect that this would be the only place BPF has been tried for a vma
fault handler. That is a special, performance critical path - especially
for famfs. In discussion with the right people we can probably reason
through whether this is a non-starter or not.

> 
> >
> > > - Famfs is enabling access to memory and mapping fault handling must be
> > >   at "memory speed". We know that BPF walks some data structures when a
> > >   program executes. That exposes us to additional serialized L3 cache
> > >   misses each time we service a mapping fault (any TLB & page table miss).
> > >   This should be studied side-by-side with the existing approach under
> > >   multiple loads before being adopted for production.
> >
> > Yes, it should.  AFAICT if one switched to a per-inode bpf program, then
> > you could do per-inode bpf programs.  Then you don't even need the bpf
> > map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> > math code.
> >
> > (The downside is that dyn code can't be meaningfully signed, requires
> > clang on the system, and you have to deal with inode eviction issues.)
> >
> > > - This has never been done in production, and we're throwing it in the way
> > >   of a project that has been soaking for years and needs to support early
> > >   shipments of products.
> >
> > Correct.  I haven't even implemented BPF-iomap for fuse4fs.  This BPF
> > integration stuff is *highly* experimental code.
> 
> I think what fuse4fs needs for bpf is significantly more complicated
> and intensive than what famfs needs. For famfs, the extent mapping
> logic is straightforward computation.
> 
> >
> > > If this is the only path, I'd like to revive famfs as a standalone file
> > > system. I'm still maintaining that and it's still in use.
> >
> > Honestly, you should probably just ship that to your users.  As long as
> > the ondisk format doesn't change much, switching the implementation at a
> > later date is at least still possible.
> 
> I recognize this is an unfair situation John as you've already spent
> years working on this and did what the community asked with rewriting
> it. What I'm hoping to convey is that the approach where the extent
> computing/finding logic gets moved to bpf is not radically different
> from the famfs logic already in this patchset. In my view, moving this
> logic to bpf is more advantageous for both fuse *and* famfs
> (decoupling famfs releases from kernel releases) - it would be great
> to consider this on technical merits if expediting the timeline of the
> alternative approach would suffice.
> 
> Thanks,
> Joanne
> 
> [1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
> [2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/
> 
> >
> > --D

Regards,
John


^ permalink raw reply

* [RFC PATCH v1.1 2/2] Docs/admin-guide/mm/damon/stat: document kdamond_pid parameter
From: SeongJae Park @ 2026-04-14 23:59 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm
In-Reply-To: <20260414235912.98174-1-sj@kernel.org>

Update DAMON_STAT usage document for newly added kdamond_pid parameter.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/admin-guide/mm/damon/stat.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/admin-guide/mm/damon/stat.rst b/Documentation/admin-guide/mm/damon/stat.rst
index c4b14daeb2dd6..46c5dd96aa2ed 100644
--- a/Documentation/admin-guide/mm/damon/stat.rst
+++ b/Documentation/admin-guide/mm/damon/stat.rst
@@ -89,3 +89,10 @@ percentiles of the idle time values via this read-only parameter.  Reading the
 parameter returns 101 idle time values in milliseconds, separated by comma.
 Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle
 times.
+
+kdamond_pid
+-----------
+
+PID of the DAMON thread.
+
+If DAMON_STAT is enabled, this becomes the PID of the worker thread.  Else, -1.
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH v1.1 0/2] mm/damon/stat: add kdamond_pid parameter
From: SeongJae Park @ 2026-04-14 23:59 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm

DAMON_STAT doesn't provide the pid of its kdamond, unlike DAMON_RECLAIM
and DAMON_LRU_SORT.  This makes user-space management of DAMON_STAT
unnecessarily complicated.  Provide the information via a new parameter,
namely kdamond_pid, and document it.

Changes from RFC
- rfc: https://lore.kernel.org/20260414053742.90296-1-sj@kernel.org
- Fix damon_kdamond_pid() failure handling.

SeongJae Park (2):
  mm/damon/stat: add a parameter for reading kdamond pid
  Docs/admin-guide/mm/damon/stat: document kdamond_pid parameter

 Documentation/admin-guide/mm/damon/stat.rst |  7 +++++++
 mm/damon/stat.c                             | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)


base-commit: 02784c37a710fa3c8c3e7be4f27a5cfa3356dc00
-- 
2.47.3

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: John Groves @ 2026-04-14 23:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Miklos Szeredi, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
	Christian Brauner, Randy Dunlap, Jeff Layton, Amir Goldstein,
	Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <20260414185740.GA604658@frogsfrogsfrogs>

On 26/04/14 11:57AM, Darrick J. Wong wrote:
> On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > 
> > > > Overall, my intention with bringing this up is just to make sure we're
> > > > at least aware of this alternative before anything is merged and
> > > > permanent. If Miklos and you think we should land this series, then
> > > > I'm on board with that.
> > > 
> > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > absolutely necessary.  This was the main sticking point originally,
> > > but there seemed to be no better alternative.
> > > 
> > > However with the bpf approach this would be gone, which is great.
> 
> Well... you can't get away with having *no* mapping interface at all.
> You still have to define a UABI that BPF programs can use to convey
> mapping data into fsdax/iomap.  BTF is a nice piece of work that smooths
> over minor fluctuations in struct layout between a running kernel and
> a precompiled BPF program, but fundamentally we still need a fuse-native
> representation.

A couple of points here, that are really top level observations.

The call path from fuse into famfs largely looks like:

if (passthrough)
	return passthrough_call()
else if (virtiofs)
	return virtiofs_call()
else if (famfs)
	return famfs_call()

So from a hooking in standpoint I was trying to be compliant.

Second point: iomap is an overloaded term. The famfs iomap usage is stolen
from xfs' fs-dax iomap call patterns. I *think* that is distinct from the
stuff called iomap that handles block I/O. Because maybe not everybody who
reads this will understand that famfs is, uh, kinda like hugetlbfs except
that the memory is from devdax (in 'famfs' mode, because the old mode
stopped working for file-backed maps. Famfs files are never sparse, and
they never use the page cache - which is super, super different from a
conventional file system.

the famfs_filemap_fault() path calls dax_iomap_fault() path (which I added 
to devdax in the new famfs mode, because it was in pmem but not devdax)
always just updates a page table beause the page is always present. That
means that the fault path is SUPER PERFORMANCE CRITICAL because in heavy
use there can be millions of these faults per second - and with famfs there
is NEVER EVER a read from storage to amortize the call overhead over. 

This is a super-important point. famfs_filemap_fault() is a in the
vm_operations_struct. It is called to remind the CPU where an address maps
to, because the TLB and PTE had been purged (which happens ALL THE TIME).

The ask here is to insert a BPF program as a vma fault handler. Can it work?
Probably. Will it perform? I HAVE NO IDEA, BUT THERE ARE REASONS TO WORRY
THAT IT MIGHT NOT.

I don't think this suggestion was made from a full understanding of the
performance requirements of this code path.

This is why we need a discussion with fs/mm/bpf experts. We should be able 
to assemble an understanding of what the overhead of calling the BPF program
are and how many nanoseconds (or microseconds) that could possibly add.
Anything longer than the current famfs_filemap_fault() path is potentially
disastrous because the whole point of famfs is to expose memory via files,
and avoid sabotaging the performance.

An L3 cache miss costs 100ns in round numbers on fast local DRAM, and
3-5x as long on switched disaggregated memory. We cannot afford an expensive
code path resolving these mappings.

This is why, at the last two LSFMMs and in the famfs documentation, I said 
things like "we're exposing memory, and it must run at memory speeds".

Famfs also registers with the memory provider (devdax in famfs mode) to
receive notifications of memory failures, and uses a 'holder_operations'
pattern copied from pmem. This stuff is not in generic iomap (correct me
if that's wrong).

And finally since I've core dumped quite a bit here, I'll go ahead and add
a thought experiment that *might* rule out using a BPF program as a vma
fault handler. Could we do that with hugetlbfs without damaging performance
for memory-intensive workloads? Hugetlbfs is a pretty solid stand-in for
famfs: it never does data-movement faults, it's never sparse, and it needs
to resolve TLB/PTE/PMD/PUD faults FAST.

> 
> That last sentence was an indirect way of saying: No, we're not going
> to export struct iomap to userspace.  The fuse-iomap patchset provides
> all the UABI pieces we need for regular filesystems (ext4) and hardware
> adjacent filesystems (famfs) to exchange file mapping data with the
> kernel.  This has been out for review since last October, but the lack
> of engagement with that patchset (or its February resubmission) doesn't
> leave me with confidence that any of it is going anywhere.
> 
> Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> upload bpf programs to generate interleaved mappings.  It's not so hard
> to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> helped him do that because:
> 
> a) I have no idea what Miklos' thoughts are about merging any of the
> famfs stuff.
> 
> b) I also have no idea what his thoughts are about fuse-iomap.  The
> sparse replies are not encouraging.
> 
> c) It didn't seem fair to John to make him take on a whole new patchset
> dependency given (a) and (b).
> 
> d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> some code review of fuse iomap without waiting three months for LSFMM?"
> I've literally done nothing with fuse-iomap for two of the three months
> requested.
> 
> > > So let us please at least have a try at this. I'm not into bpf yet,
> > > but willing to learn.
> 
> I sent out the patches to enable exactly this sort of experimentation
> two months ago, and have not received any responses:
> 
> https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
> 
> I would like to say this as gently as possible: I don't know what the
> problem here is, Miklos -- are you uninterested in the work?  Do you
> have too many other things to do inside RH that you can't talk about?
> Is it too difficult to figure out how the iomap stuff fits into the rest
> of the fuse codebase?  Do you need help from the rest of us to get
> reviews done?  Is there something else with which I could help?
> 
> Because ... over the past few years, many of my team's filesystem
> projects have endured monthslong review cycles and often fail to get
> merged.  This has led to burnout and frustration among my teammates such
> that many of them chose to move on to other things.  For the remaining
> people, it was very difficult to justify continuing headcount when
> progress on projects is so slow that individuals cannot achieve even one
> milestone per quarter on any project.
> 
> There's now nobody left here but me.
> 
> I'm not blaming you (Miklos) for any of this, but that is the current
> deplorable state of things.
> 
> > > Thanks,
> > > Miklos
> > 
> > Thanks for responding...
> > 
> > My short response: Noooooooooo!!!!!!
> > 
> > I very strongly object to making this a prerequisite to merging. This
> > is an untested idea that will certainly delay us by at least a couple
> > of merge windows when products are shipping now, and the existing approach
> > has been in circulation for a long time. It is TOO LATE!!!!!!
> 
> /me notes that has "we're shipping so you have to merge it over peoples'
> concerns" rarely carries the day in LKML land, and has never ended well
> in the few cases that it happens.  As Ted is fond of saying, this is a
> team sport, not an individual effort.  Unfortunately, to abuse your
> sports metaphor, we all play for the ******* A's.

That's totally fair. This process has been very long and grueling, and I'm
not always thinking clearly.

> 
> That said, you're clearly pissed at the goalposts changing yet again,
> and that's really not fair that we collectively keep moving them.
> 
> It's a rotten situation that I could have even helped you to solve both
> our problems via fuse-iomap, but I just couldn't motivate myself to
> entwine our two projects until the technical direction questions got
> answered.
> 
> > Famfs is not a science project, it's enablement for actual products and
> > early versions are available now!!!
> > 
> > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
> 
> Heck, the fuse command field is a u32.  There are plenty of numberspace
> left, and the kernel can just *stop issuing them*.
> 
> > What are the risks of converting to BPF?
> > 
> > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> >   curve cost about a year because this is not that similar to anything
> >   else that was already in fuse.
> 
> ...and per above, BPF isn't some magic savior that avoids the expansion
> of the UABI.
> 
> > - Those of us who are involved don't fully understand either the security
> >   or performance implications of this. It 
> 
> Correct.  I sure think it's swell that people can inject IR programs
> that jit/link into the kernel.  Don't ask which secondary connotation of
> "swell" I'm talking about.
> 
> > - Famfs is enabling access to memory and mapping fault handling must be
> >   at "memory speed". We know that BPF walks some data structures when a 
> >   program executes. That exposes us to additional serialized L3 cache 
> >   misses each time we service a mapping fault (any TLB & page table miss).
> >   This should be studied side-by-side with the existing approach under
> >   multiple loads before being adopted for production.
> 
> Yes, it should.  AFAICT if one switched to a per-inode bpf program, then
> you could do per-inode bpf programs.  Then you don't even need the bpf
> map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> math code.
> 
> (The downside is that dyn code can't be meaningfully signed, requires
> clang on the system, and you have to deal with inode eviction issues.)
> 
> > - This has never been done in production, and we're throwing it in the way
> >   of a project that has been soaking for years and needs to support early
> >   shipments of products.
> 
> Correct.  I haven't even implemented BPF-iomap for fuse4fs.  This BPF
> integration stuff is *highly* experimental code.
> 
> > If this is the only path, I'd like to revive famfs as a standalone file
> > system. I'm still maintaining that and it's still in use.
> 
> Honestly, you should probably just ship that to your users.  As long as
> the ondisk format doesn't change much, switching the implementation at a
> later date is at least still possible.
> 
> --D

And apologies to the polite universe for being a bit raw earlier. Getting
this far has been quite a grind...

Thanks,
John


^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-14 23:37 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <CAEvNRgFkusZeKxGctUpTTbYjdi7nZL1ZZar-gT7XRUOCZ2xtpw@mail.gmail.com>

On Wed, Apr 01, 2026 at 03:38:12PM -0700, Ackerley Tng wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> >
> > [...snip...]
> >
> >>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> >>  {
> >> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> >>  		return -EINVAL;
> >>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> >>  		return -EINVAL;
> >> +	if (attrs->error_offset)
> >> +		return -EINVAL;
> >>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> >>  		if (attrs->reserved[i])
> >>  			return -EINVAL;
> >> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >>  		return 1;
> >>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
> >>  		return kvm_gmem_get_supported_flags(kvm);
> >> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> >> +		if (vm_memory_attributes)
> >> +			return 0;
> >> +
> >> +		return kvm_supported_mem_attributes(kvm);
> >
> > Based on the discussion from the PUCK call this morning,
> 
> Thanks for copying the discussion here, I'll start attending PUCK to
> catch those discussions too :)
> 
> > it sounds like it
> > would be a good idea to limit kvm_supported_mem_attributes() to only
> > reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> > implementation has all the necessary enablement to support in-place
> > conversion via guest_memfd. In the case of SNP, there is a
> > documentation/parameter check in snp_launch_update() that needs to be
> > relaxed in order for userspace to be able to pass in a NULL 'src'
> > parameter (since, for in-place conversion, it would be initialized in place
> > as shared memory prior to the call, since by the time kvm_gmem_poulate()
> > it will have been set to private and therefore cannot be faulted in via
> > GUP (and if it could, we'd be unecessarily copying the src back on top
> > of itself since src/dst are the same).
> 
> Could this be a separate thing? If I'm understanding you correctly, it's
> not strictly a requirement for snp_launch_update() to first support a
> NULL 'src' parameter before this series lands.

I think we are already sync'd up on this during PUCK, but for the benefit
of others: Sean pointed out that if we don't then we'll need to add yet
another capability so userspace can determine when it can actually do
in-place conversion for SNP.

Right now, this series effectively advertises in place conversion at the
point where KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports
'KVM_MEMORY_ATTRIBUTE_PRIVATE', so I slightly reworked the series to
include the snp_launch_update() change prior to that point in time in
the series. Thanks to prereqs and changes/requirements you've already
pulled in, it's just one additional patch now:

 KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE 

I also did some minor updates (prefixed with a "[squash]" tag) to advertise
the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by
userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon
during PUCK.

The branch is here, with the patches moved to where I think they
should remain (or be squashed in for the [squash] ones):

  https://github.com/AMDESE/linux/commits/guest_memfd-inplace-conversion-v4-snp2/

I've also updated the QEMU patches to use the agreed-upon API flow and
pushed them here:

  https://github.com/AMDESE/qemu/commits/snp-inplace-for-v4-wip2/

To start an SNP guest with in-place conversion:

  qemu-system-x86 \
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
  -object sev-snp-guest,id=sev0,...,convert-in-place=true \
  -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

To start an normal non-CoCo guest backed by guest_memfd with shared memory:

  qemu-system-x86 \
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
  -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

Thanks,

Mike

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong @ 2026-04-14 23:36 UTC (permalink / raw)
  To: Joanne Koong
  Cc: John Groves, Miklos Szeredi, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
	Christian Brauner, Randy Dunlap, Jeff Layton, Amir Goldstein,
	Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJnrk1ZgcMuwfMpT1fXvUwBBiq9eWFHWVeOFQFFKiamGGe1RJg@mail.gmail.com>

On Tue, Apr 14, 2026 at 03:13:57PM -0700, Joanne Koong wrote:
> On Tue, Apr 14, 2026 at 11:57 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > > Overall, my intention with bringing this up is just to make sure we're
> > > > > at least aware of this alternative before anything is merged and
> > > > > permanent. If Miklos and you think we should land this series, then
> > > > > I'm on board with that.
> > > >
> > > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > > absolutely necessary.  This was the main sticking point originally,
> > > > but there seemed to be no better alternative.
> > > >
> > > > However with the bpf approach this would be gone, which is great.
> >
> > Well... you can't get away with having *no* mapping interface at all.
> 
> Yes but the mapping interface should be *generic*, not one that is so
> specifically tailored to one server. fuse will have to support this
> forever.

<nod> On second thought, there's a way to read Miklos' sentence that I
hadn't thought of before:

"However, with the [fuse-iomap] bpf approach, this [famfs specific
mapping interface] would be gone, which is great."

vs. the way I had thought:

"However, with the bpf approach, this [famfs specific mapping interface]
would be gone [in favor of filling out a struct iomap directly], which
is great."

So maybe Miklos actually /has/ at least read all the way through the
February posting, though I have no data to make such a conclusion. :/

> > You still have to define a UABI that BPF programs can use to convey
> > mapping data into fsdax/iomap.  BTF is a nice piece of work that smooths
> > over minor fluctuations in struct layout between a running kernel and
> > a precompiled BPF program, but fundamentally we still need a fuse-native
> > representation.
> >
> > That last sentence was an indirect way of saying: No, we're not going
> > to export struct iomap to userspace.  The fuse-iomap patchset provides
> > all the UABI pieces we need for regular filesystems (ext4) and hardware
> > adjacent filesystems (famfs) to exchange file mapping data with the
> > kernel.  This has been out for review since last October, but the lack
> > of engagement with that patchset (or its February resubmission) doesn't
> > leave me with confidence that any of it is going anywhere.
> >
> > Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> > upload bpf programs to generate interleaved mappings.  It's not so hard
> > to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> > helped him do that because:
> >
> > a) I have no idea what Miklos' thoughts are about merging any of the
> > famfs stuff.
> >
> > b) I also have no idea what his thoughts are about fuse-iomap.  The
> > sparse replies are not encouraging.
> >
> > c) It didn't seem fair to John to make him take on a whole new patchset
> > dependency given (a) and (b).
> >
> > d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> > some code review of fuse iomap without waiting three months for LSFMM?"
> > I've literally done nothing with fuse-iomap for two of the three months
> > requested.
> >
> > > > So let us please at least have a try at this. I'm not into bpf yet,
> > > > but willing to learn.
> >
> > I sent out the patches to enable exactly this sort of experimentation
> > two months ago, and have not received any responses:
> >
> > https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
> >
> > I would like to say this as gently as possible: I don't know what the
> > problem here is, Miklos -- are you uninterested in the work?  Do you
> > have too many other things to do inside RH that you can't talk about?
> > Is it too difficult to figure out how the iomap stuff fits into the rest
> > of the fuse codebase?  Do you need help from the rest of us to get
> > reviews done?  Is there something else with which I could help?
> >
> > Because ... over the past few years, many of my team's filesystem
> > projects have endured monthslong review cycles and often fail to get
> > merged.  This has led to burnout and frustration among my teammates such
> > that many of them chose to move on to other things.  For the remaining
> > people, it was very difficult to justify continuing headcount when
> > progress on projects is so slow that individuals cannot achieve even one
> > milestone per quarter on any project.
> >
> > There's now nobody left here but me.
> >
> > I'm not blaming you (Miklos) for any of this, but that is the current
> > deplorable state of things.
> >
> > > > Thanks,
> > > > Miklos
> > >
> > > Thanks for responding...
> > >
> > > My short response: Noooooooooo!!!!!!
> > >
> > > I very strongly object to making this a prerequisite to merging. This
> > > is an untested idea that will certainly delay us by at least a couple
> > > of merge windows when products are shipping now, and the existing approach
> > > has been in circulation for a long time. It is TOO LATE!!!!!!
> >
> > /me notes that has "we're shipping so you have to merge it over peoples'
> > concerns" rarely carries the day in LKML land, and has never ended well
> > in the few cases that it happens.  As Ted is fond of saying, this is a
> > team sport, not an individual effort.  Unfortunately, to abuse your
> > sports metaphor, we all play for the ******* A's.
> >
> > That said, you're clearly pissed at the goalposts changing yet again,
> > and that's really not fair that we collectively keep moving them.
> >
> > It's a rotten situation that I could have even helped you to solve both
> > our problems via fuse-iomap, but I just couldn't motivate myself to
> > entwine our two projects until the technical direction questions got
> > answered.
> >
> > > Famfs is not a science project, it's enablement for actual products and
> > > early versions are available now!!!
> > >
> > > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
> >
> > Heck, the fuse command field is a u32.  There are plenty of numberspace
> > left, and the kernel can just *stop issuing them*.
> 
> I don't think the problem is the command field. As I understand it, if
> this lands and is converted over later, none of the famfs code in this
> series can be removed from fuse. If fuse has native non-bpf support
> for famfs, then it will always need to have that. That's the part that
> worries me.
> 
> >
> > > What are the risks of converting to BPF?
> 
> I think maybe there is a misinterpretation of what the alternative
> approach entails. From my point of view, the alternative approach is
> not that different from what is already in this series. The only piece
> of the famfs logic that would need to use bpf is the logic for
> finding/computing the extent mappings (which is the famfs-specific
> logic that would not be applicable to any other server). That famfs
> bpf code is minimal and already written [1], as it is just the logic

Remember where struct fuse_iomap_io came from -- the fuse-iomap
patchset.  It would be rather odd to start accepting fuse_iomap_io
objects from a user's bpf program without examining the rest of the fuse
iomap stuff.

> that is in patch 6 [2] in this series copied over. No other part of
> famfs touches bpf. The rest is renaming the functions in
> fs/fuse/famfs.c to generic fuse_iomap_dax_XXX names (the logic is the
> same logic in this series, eg invoking the lower-level calls to
> dax_iomap_rw/fault/etc) and moving the daxdev setup/initialization to
> connection initialization time where the server passes that daxdev
> setup info/configs upfront. I don't think this would delay things by
> several merge windows, as the code is already mostly written. If it
> would be helpful, I can clean up what's in the prototype and send that
> out.

I agree that you and I and John could probably get the code and review
part wrapped up in perhaps two merge windows -- one for fuse-iomap,
and the second for famfs.  The userspace parts of both are more or less
done, which would minimize the amount of rework when we get to the
libfuse part.

(Let's be honest, with LSFMM happening during the week between -rc2 and
-rc3 and everyone's travel thereto, that's going to blow a big hole in
the 7.2 schedule)

The question is, would Miklos acquiesce to merging a large ball of code
that the three of us have been collaborating on?  Even if he wasn't
deeply involved in that collaboration?

> I think the part that is not clear yet and needs to be verified is
> whether this approach runs into any technical limitations on famfs's
> production workloads. For example, does the overhead of using bpf maps
> lead to a noticeable performance drop on real workloads? In the

I see a custom hashtable map implementation in kernel/bpf/hashtab.c,
and no particular evidence that it can rehash itself to cut down on
bucket list chasing.  That's too bad, because rhashtable rehashing is
generally effective at keeping the xfs icache pointer chasing down.

If we have a per-inode famfs_file_meta object, I wonder if we could just
attach it to the fuse_inode as a void *private pointer?  That wouldn't
be any worse than current famfs.

> future, will there be too many extent mappings on high-scale systems
> to make this feasible? etc. If there are technical reasons why the

I've asked that question (are we going to have millions of mappings?)
before.  From what John has told me and what I've seen with cxl and pmem
devices before that, the memory manager is heavily incentivized to give
out large static(ish) allocations to constrain the metadata overhead,
enable the use of PMD/PGD TLB entries, and minimize pointer chasing
through mapping structures.

The only reason we let that happen in the disk filesystems is that the
IO service times are so high nobody cares about L3 misses.

> famfs logic has to be in fuse, then imo we should figure that out and
> ideally that's the discussion we should be having. I am not a cxl
> expert so perhaps there is something missing in the approach that
> makes it not sufficient on production systems. If we don't end up
> going with the alternative approach, I still think this series should
> try to make the famfs uapi additions to fuse as generic as possible
> since that will be irreversible.

<nod>

> If we expedited the alternative approach in terms of reviewing and
> merging, would that suffice? Is the main pushback the timing of it, eg
> that it would take too long to get reviewed, merged, and shipped?

I think John's been pretty clear that he doesn't want to drag this out
even a day longer.  Given current trends this month, I might run out of
time soon too.

> > > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> > >   curve cost about a year because this is not that similar to anything
> > >   else that was already in fuse.
> >
> > ...and per above, BPF isn't some magic savior that avoids the expansion
> > of the UABI.
> 
> It doesn't avoid the expansion of the UABI but it makes the UABI
> generic (eg plenty of future servers can/will use the generic iomap
> layer).

(Oh good, nobody's talking about going the evil route and just fill out
struct iomap directly!)

> >
> > > - Those of us who are involved don't fully understand either the security
> > >   or performance implications of this. It
> >
> > Correct.  I sure think it's swell that people can inject IR programs
> > that jit/link into the kernel.  Don't ask which secondary connotation of
> > "swell" I'm talking about.
> 
> bpf is used elsewhere in the kernel (eg networking, scheduling). If it
> is the case that it is unsafe (which maybe it is, I don't know), then
> wouldn't those other areas have the same issues?

Well ok, here we go -- I don't think there's any serious technical
problems with BPF.  The ability to read (and in some cases write) to
kernel memory looks like it's flexible enough to do the classification
and data collection stuff that most current bpf users want to do.

The issues I was alluding to are BPF being used as a means to get around
slow/unresponsive maintainers; and the kernel community's collective
refusal to explore any other path to building new user APIs besides
designing everything generically perfectly up front in the kernel UABI
along with all the stress that involves.

Once upon a time I tried to push on these UAPI stressfulness issues and
Linus told me I had a loose grip on reality.  He's probably right.

> > > - Famfs is enabling access to memory and mapping fault handling must be
> > >   at "memory speed". We know that BPF walks some data structures when a
> > >   program executes. That exposes us to additional serialized L3 cache
> > >   misses each time we service a mapping fault (any TLB & page table miss).
> > >   This should be studied side-by-side with the existing approach under
> > >   multiple loads before being adopted for production.
> >
> > Yes, it should.  AFAICT if one switched to a per-inode bpf program, then
> > you could do per-inode bpf programs.  Then you don't even need the bpf
> > map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> > math code.
> >
> > (The downside is that dyn code can't be meaningfully signed, requires
> > clang on the system, and you have to deal with inode eviction issues.)
> >
> > > - This has never been done in production, and we're throwing it in the way
> > >   of a project that has been soaking for years and needs to support early
> > >   shipments of products.
> >
> > Correct.  I haven't even implemented BPF-iomap for fuse4fs.  This BPF
> > integration stuff is *highly* experimental code.
> 
> I think what fuse4fs needs for bpf is significantly more complicated
> and intensive than what famfs needs. For famfs, the extent mapping
> logic is straightforward computation.

Agreed.  For fuse4fs I'm content to let it manage the iomap cache.

> > > If this is the only path, I'd like to revive famfs as a standalone file
> > > system. I'm still maintaining that and it's still in use.
> >
> > Honestly, you should probably just ship that to your users.  As long as
> > the ondisk format doesn't change much, switching the implementation at a
> > later date is at least still possible.
> 
> I recognize this is an unfair situation John as you've already spent
> years working on this and did what the community asked with rewriting
> it. What I'm hoping to convey is that the approach where the extent
> computing/finding logic gets moved to bpf is not radically different
> from the famfs logic already in this patchset. In my view, moving this
> logic to bpf is more advantageous for both fuse *and* famfs
> (decoupling famfs releases from kernel releases) - it would be great
> to consider this on technical merits if expediting the timeline of the
> alternative approach would suffice.
> 
> Thanks,
> Joanne
> 
> [1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
> [2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/
> 
> >
> > --D
> 

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-14 22:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Groves, Miklos Szeredi, Joanne Koong, Bernd Schubert,
	John Groves, Dan Williams, Bernd Schubert, Alison Schofield,
	John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
	Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
	David Hildenbrand, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <20260414185740.GA604658@frogsfrogsfrogs>

On Tue, Apr 14, 2026 at 11:57:40AM -0700, Darrick J. Wong wrote:
> > 
> > I very strongly object to making this a prerequisite to merging. This
> > is an untested idea that will certainly delay us by at least a couple
> > of merge windows when products are shipping now, and the existing approach
> > has been in circulation for a long time. It is TOO LATE!!!!!!
> 
...
> 
> That said, you're clearly pissed at the goalposts changing yet again,
> and that's really not fair that we collectively keep moving them.
> 

This seems a bit more than moving a goalpost.

We're now gating working software, for real working hardware, on a novel,
unproven BPF ops structure that controls page table mappings on page table
faults which would be used by exactly 1 user : FAMFS.

And that singular user is harmed because it turns an O(1) offset
calculation into a pointer chase - on the hottest path (every fault).

John is right to push back here.

---

That said - I'm looking at fs/fuse/famfs.c and I'm asking myself what in
here is actually famfs-specific.  If you just s/FAMFS/DAX/g - the file
just reads like a simple DAX-iomap backend with optional striping.

Would it be reasonable to refactor the dax layer (and users) to
create an ops structure that becomes the basis for the BPF solution?

We don't even know what the whole BPF scope is, and it seems wholly
unfair to John's and his users to make that solely their problem (for
negative value!).

~Gregory

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Joanne Koong @ 2026-04-14 22:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Groves, Miklos Szeredi, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
	Christian Brauner, Randy Dunlap, Jeff Layton, Amir Goldstein,
	Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <20260414185740.GA604658@frogsfrogsfrogs>

On Tue, Apr 14, 2026 at 11:57 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > > Overall, my intention with bringing this up is just to make sure we're
> > > > at least aware of this alternative before anything is merged and
> > > > permanent. If Miklos and you think we should land this series, then
> > > > I'm on board with that.
> > >
> > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > absolutely necessary.  This was the main sticking point originally,
> > > but there seemed to be no better alternative.
> > >
> > > However with the bpf approach this would be gone, which is great.
>
> Well... you can't get away with having *no* mapping interface at all.

Yes but the mapping interface should be *generic*, not one that is so
specifically tailored to one server. fuse will have to support this
forever.

> You still have to define a UABI that BPF programs can use to convey
> mapping data into fsdax/iomap.  BTF is a nice piece of work that smooths
> over minor fluctuations in struct layout between a running kernel and
> a precompiled BPF program, but fundamentally we still need a fuse-native
> representation.
>
> That last sentence was an indirect way of saying: No, we're not going
> to export struct iomap to userspace.  The fuse-iomap patchset provides
> all the UABI pieces we need for regular filesystems (ext4) and hardware
> adjacent filesystems (famfs) to exchange file mapping data with the
> kernel.  This has been out for review since last October, but the lack
> of engagement with that patchset (or its February resubmission) doesn't
> leave me with confidence that any of it is going anywhere.
>
> Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> upload bpf programs to generate interleaved mappings.  It's not so hard
> to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> helped him do that because:
>
> a) I have no idea what Miklos' thoughts are about merging any of the
> famfs stuff.
>
> b) I also have no idea what his thoughts are about fuse-iomap.  The
> sparse replies are not encouraging.
>
> c) It didn't seem fair to John to make him take on a whole new patchset
> dependency given (a) and (b).
>
> d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> some code review of fuse iomap without waiting three months for LSFMM?"
> I've literally done nothing with fuse-iomap for two of the three months
> requested.
>
> > > So let us please at least have a try at this. I'm not into bpf yet,
> > > but willing to learn.
>
> I sent out the patches to enable exactly this sort of experimentation
> two months ago, and have not received any responses:
>
> https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
>
> I would like to say this as gently as possible: I don't know what the
> problem here is, Miklos -- are you uninterested in the work?  Do you
> have too many other things to do inside RH that you can't talk about?
> Is it too difficult to figure out how the iomap stuff fits into the rest
> of the fuse codebase?  Do you need help from the rest of us to get
> reviews done?  Is there something else with which I could help?
>
> Because ... over the past few years, many of my team's filesystem
> projects have endured monthslong review cycles and often fail to get
> merged.  This has led to burnout and frustration among my teammates such
> that many of them chose to move on to other things.  For the remaining
> people, it was very difficult to justify continuing headcount when
> progress on projects is so slow that individuals cannot achieve even one
> milestone per quarter on any project.
>
> There's now nobody left here but me.
>
> I'm not blaming you (Miklos) for any of this, but that is the current
> deplorable state of things.
>
> > > Thanks,
> > > Miklos
> >
> > Thanks for responding...
> >
> > My short response: Noooooooooo!!!!!!
> >
> > I very strongly object to making this a prerequisite to merging. This
> > is an untested idea that will certainly delay us by at least a couple
> > of merge windows when products are shipping now, and the existing approach
> > has been in circulation for a long time. It is TOO LATE!!!!!!
>
> /me notes that has "we're shipping so you have to merge it over peoples'
> concerns" rarely carries the day in LKML land, and has never ended well
> in the few cases that it happens.  As Ted is fond of saying, this is a
> team sport, not an individual effort.  Unfortunately, to abuse your
> sports metaphor, we all play for the ******* A's.
>
> That said, you're clearly pissed at the goalposts changing yet again,
> and that's really not fair that we collectively keep moving them.
>
> It's a rotten situation that I could have even helped you to solve both
> our problems via fuse-iomap, but I just couldn't motivate myself to
> entwine our two projects until the technical direction questions got
> answered.
>
> > Famfs is not a science project, it's enablement for actual products and
> > early versions are available now!!!
> >
> > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
>
> Heck, the fuse command field is a u32.  There are plenty of numberspace
> left, and the kernel can just *stop issuing them*.

I don't think the problem is the command field. As I understand it, if
this lands and is converted over later, none of the famfs code in this
series can be removed from fuse. If fuse has native non-bpf support
for famfs, then it will always need to have that. That's the part that
worries me.

>
> > What are the risks of converting to BPF?

I think maybe there is a misinterpretation of what the alternative
approach entails. From my point of view, the alternative approach is
not that different from what is already in this series. The only piece
of the famfs logic that would need to use bpf is the logic for
finding/computing the extent mappings (which is the famfs-specific
logic that would not be applicable to any other server). That famfs
bpf code is minimal and already written [1], as it is just the logic
that is in patch 6 [2] in this series copied over. No other part of
famfs touches bpf. The rest is renaming the functions in
fs/fuse/famfs.c to generic fuse_iomap_dax_XXX names (the logic is the
same logic in this series, eg invoking the lower-level calls to
dax_iomap_rw/fault/etc) and moving the daxdev setup/initialization to
connection initialization time where the server passes that daxdev
setup info/configs upfront. I don't think this would delay things by
several merge windows, as the code is already mostly written. If it
would be helpful, I can clean up what's in the prototype and send that
out.

I think the part that is not clear yet and needs to be verified is
whether this approach runs into any technical limitations on famfs's
production workloads. For example, does the overhead of using bpf maps
lead to a noticeable performance drop on real workloads? In the
future, will there be too many extent mappings on high-scale systems
to make this feasible? etc. If there are technical reasons why the
famfs logic has to be in fuse, then imo we should figure that out and
ideally that's the discussion we should be having. I am not a cxl
expert so perhaps there is something missing in the approach that
makes it not sufficient on production systems. If we don't end up
going with the alternative approach, I still think this series should
try to make the famfs uapi additions to fuse as generic as possible
since that will be irreversible.

If we expedited the alternative approach in terms of reviewing and
merging, would that suffice? Is the main pushback the timing of it, eg
that it would take too long to get reviewed, merged, and shipped?

> >
> > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> >   curve cost about a year because this is not that similar to anything
> >   else that was already in fuse.
>
> ...and per above, BPF isn't some magic savior that avoids the expansion
> of the UABI.

It doesn't avoid the expansion of the UABI but it makes the UABI
generic (eg plenty of future servers can/will use the generic iomap
layer).

>
> > - Those of us who are involved don't fully understand either the security
> >   or performance implications of this. It
>
> Correct.  I sure think it's swell that people can inject IR programs
> that jit/link into the kernel.  Don't ask which secondary connotation of
> "swell" I'm talking about.

bpf is used elsewhere in the kernel (eg networking, scheduling). If it
is the case that it is unsafe (which maybe it is, I don't know), then
wouldn't those other areas have the same issues?

>
> > - Famfs is enabling access to memory and mapping fault handling must be
> >   at "memory speed". We know that BPF walks some data structures when a
> >   program executes. That exposes us to additional serialized L3 cache
> >   misses each time we service a mapping fault (any TLB & page table miss).
> >   This should be studied side-by-side with the existing approach under
> >   multiple loads before being adopted for production.
>
> Yes, it should.  AFAICT if one switched to a per-inode bpf program, then
> you could do per-inode bpf programs.  Then you don't even need the bpf
> map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> math code.
>
> (The downside is that dyn code can't be meaningfully signed, requires
> clang on the system, and you have to deal with inode eviction issues.)
>
> > - This has never been done in production, and we're throwing it in the way
> >   of a project that has been soaking for years and needs to support early
> >   shipments of products.
>
> Correct.  I haven't even implemented BPF-iomap for fuse4fs.  This BPF
> integration stuff is *highly* experimental code.

I think what fuse4fs needs for bpf is significantly more complicated
and intensive than what famfs needs. For famfs, the extent mapping
logic is straightforward computation.

>
> > If this is the only path, I'd like to revive famfs as a standalone file
> > system. I'm still maintaining that and it's still in use.
>
> Honestly, you should probably just ship that to your users.  As long as
> the ondisk format doesn't change much, switching the implementation at a
> later date is at least still possible.

I recognize this is an unfair situation John as you've already spent
years working on this and did what the community asked with rewriting
it. What I'm hoping to convey is that the approach where the extent
computing/finding logic gets moved to bpf is not radically different
from the famfs logic already in this patchset. In my view, moving this
logic to bpf is more advantageous for both fuse *and* famfs
(decoupling famfs releases from kernel releases) - it would be great
to consider this on technical merits if expediting the timeline of the
alternative approach would suffice.

Thanks,
Joanne

[1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
[2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/

>
> --D

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox