Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
From: Amit Machhiwal @ 2026-06-23 11:11 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Amit Machhiwal, linuxppc-dev, Madhavan Srinivasan,
	Anushree Mathur, Paolo Bonzini, Nicholas Piggin, Michael Ellerman,
	Christophe Leroy (CS GROUP), Jonathan Corbet, Shuah Khan, kvm,
	linux-kernel, linux-doc, lkp
In-Reply-To: <871pe3cazk.fsf@vajain21.in.ibm.com>

Hi Vaibhav,

Thanks for the detailed review. My responses are inline below.

On 2026/06/19 11:44 AM, Vaibhav Jain wrote:
> Hi Amit.
> 
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > Introduce a new capability and ioctl to expose CPU compatibility modes
> > supported by the host processor for nested guests.
> >
> > On IBM POWER systems, newer processor generations (N) can operate in
> > compatibility modes corresponding to earlier generations, like (N-1) and
> > (N-2). This is particularly relevant for nested virtualization, where
> > nested KVM guests may need to run with a specific processor compatibility
> > level.
> >
> > Introduce KVM_CAP_PPC_COMPAT_CAPS capability and the corresponding
> > KVM_PPC_GET_COMPAT_CAPS vm ioctl. The ioctl returns a bitmap describing
> > the compatibility modes supported by the host in respective bit numbers,
> > allowing userspace (e.g., QEMU) to select an appropriate compatibility
> > level when configuring nested KVM guests.
> >
> > The ioctl handling is added in kvm_arch_vm_ioctl() and retrieves host
> > CPU compatibility capabilities via a PowerPC-specific backend
> > implementation when available. The implementation validates the structure
> > size from userspace to ensure forward compatibility and returns
> > appropriate error codes (EINVAL for invalid size, EFAULT for copy
> > failures, ENOTTY if backend is not implemented). The struct
> > kvm_ppc_compat_caps includes a size field to support future ABI
> > extensions.
> >
> > Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > ---
> >  arch/powerpc/include/asm/kvm_ppc.h  |  1 +
> >  arch/powerpc/include/uapi/asm/kvm.h |  7 ++++++
> >  arch/powerpc/kvm/powerpc.c          | 35 +++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h            |  4 ++++
> >  4 files changed, 47 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> > index 0953f2daa466..169ea6a7fbad 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -319,6 +319,7 @@ struct kvmppc_ops {
> >  	bool (*hash_v3_possible)(void);
> >  	int (*create_vm_debugfs)(struct kvm *kvm);
> >  	int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
> > +	int (*get_compat_caps)(struct kvm_ppc_compat_caps *host_caps);
> >  };
> >  
> >  extern struct kvmppc_ops *kvmppc_hv_ops;
> > diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> > index 077c5437f521..8a38be6c3b03 100644
> > --- a/arch/powerpc/include/uapi/asm/kvm.h
> > +++ b/arch/powerpc/include/uapi/asm/kvm.h
> > @@ -437,6 +437,13 @@ struct kvm_ppc_cpu_char {
> >  	__u64	behaviour_mask;		/* valid bits in behaviour */
> >  };
> >  
> > +/* For KVM_PPC_GET_COMPAT_CAPS */
> > +struct kvm_ppc_compat_caps {
> > +	__u64	flags;			/* Reserved for future use */
> > +	__u64	size;			/* Size of this structure */
> Suggesting moving the 'size' as the first member of the struct. That way
> copying the struct from userspace becomes bit easier.

Yeah, I think it would make more sense and will simplify the
copy_from_user() call. I will make the change in v5. I will change to:

  struct kvm_ppc_compat_caps {
  	__u64	size;
  	__u64	flags;
  	__u64	compat_capabilities;
  };

> 
> > +	__u64	compat_capabilities;	/* Capabilities supported by the host */
> > +};
> > +
> >  /*
> >   * Values for character and character_mask.
> >   * These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 98de68379b18..9153b0034b45 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -701,6 +701,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  			}
> >  		}
> >  		break;
> > +#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
> > +	case KVM_CAP_PPC_COMPAT_CAPS:
> > +		r = 0;
> > +		if (kvmhv_on_pseries())
> > +			r = 1;
> > +		break;
> > +#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> >  	default:
> >  		r = 0;
> >  		break;
> > @@ -2467,6 +2474,34 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> >  		r = kvm->arch.kvm_ops->svm_off(kvm);
> >  		break;
> >  	}
> > +	case KVM_PPC_GET_COMPAT_CAPS: {
> > +		struct kvm_ppc_compat_caps host_caps;
> > +		u64 user_size;
> > +
> > +		r = -EFAULT;
> > +		/* First, get the size field from userspace to validate */
> > +		if (copy_from_user(&user_size, &((struct kvm_ppc_compat_caps
> > +		     __user *)argp)->size, sizeof(user_size))) {
> move the struct size member to the first field. That way
> from_from_user() call is simplified and you wont have to do some wired
> pointer arithmetic.

Will do as mentioned above.

> 
> 
> > +			goto out;
> > +		}
> > +
> > +		/* Validate size - must be at least the current structure size */
> > +		r = -EINVAL;
> > +		if (user_size < sizeof(host_caps))
> > +			goto out;
> Check should be strengthed to
>  if (user_size != sizeof(host_caps))
> So that in case used space sends a struct larger than what kernel knows
> abt it will be rejected. This will prevent surprises in future in case
> VMM sends a larger struct expecting kernel to know abt it but an older
> kernel only knows abt older smaller sized struct. Also look at the
> review comment below.

Agreed. I'll change the validation to use strict equality. This is
simpler and clearer - userspace must provide exactly the size the kernel
expects.

> 
> > +
> > +		r = -ENOTTY;
> > +		memset(&host_caps, 0, sizeof(host_caps));
> > +		if (!kvm->arch.kvm_ops->get_compat_caps)
> > +			goto out;
> > +
> > +		r = kvm->arch.kvm_ops->get_compat_caps(&host_caps);
> > +		/* Set the actual size of the structure we're returning */
> > +		host_caps.size = sizeof(host_caps);
> > +		if (!r && copy_to_user(argp, &host_caps, sizeof(host_caps)))
> > +			r = -EFAULT;
> You are allowing a future userspace VMM to potentially send a larger
> 'struct kvm_ppc_compat_caps' that what kernel knows about. This makes
> error handling in userspace bit involved since there might be some
> fields in the 'struct kvm_ppc_compat_caps' given from userspace may
> remain un-initialized when userspace sees it. So please mention this
> subtle behaviour should be mentioned in patch description and also
> update it the doc in the later patch.

With the strict equality check (user_size != sizeof(host_caps)), this
concern should be addressed - we won't accept larger structs from
userspace. However, I'll still improve the documentation to:

1. In the commit message:
   - Explain the size field validation
   - Document that exact size match is required
   - Clarify error handling behavior

2. In Documentation/virt/kvm/api.rst:
   - Add improved documentation for KVM_PPC_GET_COMPAT_CAPS
   - Document the size field requirement and validation

Thanks,
Amit

> 
> > +		break;
> > +	}
> >  	default: {
> >  		struct kvm *kvm = filp->private_data;
> >  		r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 6c8afa2047bf..1788a0068662 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -996,6 +996,7 @@ struct kvm_enable_cap {
> >  #define KVM_CAP_S390_USER_OPEREXEC 246
> >  #define KVM_CAP_S390_KEYOP 247
> >  #define KVM_CAP_S390_VSIE_ESAMODE 248
> > +#define KVM_CAP_PPC_COMPAT_CAPS 249
> >  
> >  struct kvm_irq_routing_irqchip {
> >  	__u32 irqchip;
> > @@ -1349,6 +1350,9 @@ struct kvm_s390_keyop {
> >  #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
> >  #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
> >  
> > +/* Available with KVM_CAP_PPC_COMPAT_CAPS */
> > +#define KVM_PPC_GET_COMPAT_CAPS	_IOR(KVMIO,  0xe4, struct kvm_ppc_compat_caps)
> > +
> >  /*
> >   * ioctls for vcpu fds
> >   */
> > -- 
> > 2.50.1 (Apple Git-155)
> >
> >
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* Re: [PATCH v7 06/10] tracing/probes: Type casting always involves nested calls
From: Masami Hiramatsu @ 2026-06-23 11:50 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178217910709.643090.6379198143797177881.stgit@devnote2>


Sashiko commented:

> > +	}
> > +	*close = '\0';
> >  
> > -		ctx->offset += 1;	/* for the '(' */
> > -		/* We need to parse the nested one */
> > -		ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
> > -				pcode, end, ctx);
> > -		if (ret < 0)
> > -			return ret;
> > -		ctx->nested_level--;
> > -		clear_struct_btf(ctx);
> > +	/* We need to parse the nested one */
> > +	ret = parse_probe_arg(tmp, find_fetch_type(NULL, ctx->flags),
> > +			      pcode, end, ctx);
> 
> Does routing the nested argument through parse_probe_arg() instead of 
> parse_btf_arg() break the documented eprobe syntax (STRUCT)FIELD->MEMBER?
> Since this patch removes the TPARG_FL_TEVENT handling block from
> parse_btf_arg(), and handle_typecast() now recursively invokes 
> parse_probe_arg() on the extracted FIELD name, a bare eprobe FIELD name 
> that does not start with a '$' prefix will hit the default case in 
> parse_probe_arg().
> This causes it to be unconditionally rejected with -EINVAL (NOSUP_BTFARG) 
> because eprobe flags (TPARG_FL_TEVENT) do not satisfy 
> tparg_is_function_entry() or tparg_is_function_return(). This acts as an 
> ABI breakage for existing user-space scripts relying on this eprobe syntax.

Wait... Did the type-casting support patch accidentally allow access to
event fields without the "$"?

Hmm, if so, it should be documented, and need to support correctly
with/without typecast.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: Issue cloning kernel-doc-zh from HUST mirror
From: Weijie Yuan @ 2026-06-23 12:01 UTC (permalink / raw)
  To: Dongliang Mu; +Cc: Siwei Chen, linux-doc, si.yanteng
In-Reply-To: <b03f244b-46b8-47e8-b7f5-d98d714ae15c@hust.edu.cn>

On Tue, Jun 23, 2026 at 04:51:20PM +0800, Dongliang Mu wrote:
> The curl 52 Empty reply from server error is not a Git or Ubuntu
> compatibility issue. It happens because the kernel-doc-zh repository is
> extremely large, and the HUST mirror server closes the HTTPS connection
> early due to timeout or proxy limits.
> 
> You can try the following commands:
> 
>      1. Shallow clone first (most reliable)
> 
>      git clone --depth 1
>      https://mirrors.hust.edu.cn/git/kernel-doc-zh.git linux
> 
>      Then fetch full history:
> 
>      git fetch --unshallow
> 
> If still failing, increase Git buffer like:
> 
> git config --global http.postBuffer 1073741824
> 
>      Finally, I will contact maintainers of HUST mirror site and try
>      some attempts to resolve this issue.

Thanks, and yes, shallow clone could work:

user@debian:~$ git clone --depth 1 https://mirrors.hust.edu.cn/git/kernel-doc-zh.git linux
Cloning into 'linux'...
remote: Enumerating objects: 93130, done.
remote: Counting objects: 100% (93130/93130), done.
remote: Compressing objects: 100% (90511/90511), done.
remote: Total 93130 (delta 7145), reused 20322 (delta 1615), pack-reused 0
Receiving objects: 100% (93130/93130), 259.15 MiB | 4.71 MiB/s, done.
Resolving deltas: 100% (7145/7145), done.
Updating files: 100% (87897/87897), done.

But:

user@debian:~$ cd linux
user@debian:~/linux$ git fetch --unshallow

(after ~10 minutes or so)

remote: Enumerating objects: 10638034, done.
remote: Counting objects: 100% (10638019/10638019), done.
remote: Compressing objects: 100% (1819658/1819658), done.
error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet.
error: 5476 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

and in a mac (26.5.1) Git 2.54, probably the same:

$ git clone https://mirrors.hust.edu.cn/git/kernel-doc-zh.git linux
Cloning into 'linux'...
remote: Enumerating objects: 11406904, done.
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 7537 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

after setting http.postBuffer 1073741824 on debian:

user@debian:~/linux$ git config --global http.postBuffer 1073741824
user@debian:~/linux$ git fetch --unshallow
(after ~10 minutes)
remote: Enumerating objects: 10638034, done.
remote: Counting objects: 100% (10638019/10638019), done.
remote: Compressing objects: 100% (1819658/1819658), done.
error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet.
error: 5917 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

Apparently the above is of limited value, as it is straightforward to
reproduce.

> Please use the first git repo:
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/alexs/linux.git

The average speed of cloning from git.kernel.org is less than 10 KiB/s
from Chengdu ;-)

But of course, a shallow clone is enough for translation work ;-)

Thanks!

^ permalink raw reply

* [PATCH][v2] mm/dmapool: Untangle CONFIG_SLUB_DEBUG_ON abuse and switch to static key
From: lirongqing @ 2026-06-23 12:12 UTC (permalink / raw)
  To: Jonathan Corbet, Shuah Khan, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-doc, linux-kernel,
	linux-mm
  Cc: Li RongQing

From: Li RongQing <lirongqing@baidu.com>

The dmapool subsystem historically wrapped its debugging logic inside an
This approach is fundamentally flawed because CONFIG_SLUB_DEBUG_ON
merely defines compile-time defaults for SLUB and caused two flaws:

On production kernels where CONFIG_SLUB_DEBUG=y but
CONFIG_SLUB_DEBUG_ON=n, dmapool debugging was completely compiled out
at compile time, leaving no way to enable it without rebuilding the
kernel.

On kernels with CONFIG_SLUB_DEBUG_ON=y, dmapool debugging stayed
unconditionally active even if a user explicitly disabled slub debugging
at boot time.

Clean up this mess by removing the #ifdef and switching to a runtime
static key (dmapool_debug_enabled), allowing dmapool debugging to be
toggled cleanly via its own boot parameter: dmapool_debug

Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
---
Diff with v1:  Move the static key check out of pool_init_page etc 

 Documentation/admin-guide/kernel-parameters.txt |  5 +++
 mm/dmapool.c                                    | 57 ++++++++++++++-----------
 2 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 19c9a19..66d853c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1304,6 +1304,11 @@ Kernel parameters
 
 	dis_ucode_ldr	[X86] Disable the microcode loader.
 
+	dmapool_debug	[MM]
+			Enable DMA pool debugging. This enables memory
+			poisoning and validation for DMA pool allocations.
+			Useful for debugging DMA API misuse.
+
 	dma_debug=off	If the kernel is compiled with DMA_API_DEBUG support,
 			this option disables the debugging code at boot.
 
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 5d8af6e..7bd037a 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -35,10 +35,23 @@
 #include <linux/string.h>
 #include <linux/types.h>
 #include <linux/wait.h>
+#include <linux/static_key.h>
+#include <linux/init.h>
 
-#ifdef CONFIG_SLUB_DEBUG_ON
-#define DMAPOOL_DEBUG 1
-#endif
+/*
+ * Debugging support for dmapool using static key.
+ *
+ * This allows enabling dmapool debug at boot time via:
+ *   dmapool_debug
+ */
+static DEFINE_STATIC_KEY_FALSE(dmapool_debug_enabled);
+
+static int __init dmapool_debug_setup(char *str)
+{
+	static_branch_enable(&dmapool_debug_enabled);
+	return 1;
+}
+__setup("dmapool_debug", dmapool_debug_setup);
 
 struct dma_block {
 	struct dma_block *next_block;
@@ -92,7 +105,6 @@ static ssize_t pools_show(struct device *dev, struct device_attribute *attr, cha
 
 static DEVICE_ATTR_RO(pools);
 
-#ifdef DMAPOOL_DEBUG
 static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
 			     gfp_t mem_flags)
 {
@@ -161,23 +173,6 @@ static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
 {
 	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 }
-#else
-static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
-			     gfp_t mem_flags)
-{
-}
-
-static bool pool_block_err(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
-{
-	if (want_init_on_free())
-		memset(vaddr, 0, pool->size);
-	return false;
-}
-
-static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
-{
-}
-#endif
 
 static struct dma_block *pool_block_pop(struct dma_pool *pool)
 {
@@ -305,7 +300,9 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 	unsigned int next_boundary = pool->boundary, offset = 0;
 	struct dma_block *block, *first = NULL, *last = NULL;
 
-	pool_init_page(pool, page);
+	if (static_branch_unlikely(&dmapool_debug_enabled))
+		pool_init_page(pool, page);
+
 	while (offset + pool->size <= pool->allocation) {
 		if (offset + pool->size > next_boundary) {
 			offset = next_boundary;
@@ -433,7 +430,10 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 	spin_unlock_irqrestore(&pool->lock, flags);
 
 	*handle = block->dma;
-	pool_check_block(pool, block, mem_flags);
+
+	if (static_branch_unlikely(&dmapool_debug_enabled))
+		pool_check_block(pool, block, mem_flags);
+
 	if (want_init_on_alloc(mem_flags))
 		memset(block, 0, pool->size);
 
@@ -454,9 +454,18 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
 {
 	struct dma_block *block = vaddr;
 	unsigned long flags;
+	bool err = false;
 
 	spin_lock_irqsave(&pool->lock, flags);
-	if (!pool_block_err(pool, vaddr, dma)) {
+
+	if (static_branch_unlikely(&dmapool_debug_enabled))
+		err = pool_block_err(pool, vaddr, dma);
+	else {
+		if (want_init_on_free())
+			memset(vaddr, 0, pool->size);
+	}
+
+	if (!err) {
 		pool_block_push(pool, block, dma);
 		pool->nr_active--;
 	}
-- 
2.9.4


^ permalink raw reply related

* Re: [PATCH] crypto: af_alg - Document the deprecation of AF_ALG
From: Bastien Nocera @ 2026-06-23 12:44 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto, Herbert Xu, Marcel Holtmann,
	Luiz Augusto von Dentz
  Cc: linux-doc, linux-api, linux-kernel, netdev, Linus Torvalds,
	linux-bluetooth, ell
In-Reply-To: <20260430011544.31823-1-ebiggers@kernel.org>

Hey,

Replying to this older patch.

On Wed, 2026-04-29 at 18:15 -0700, Eric Biggers wrote:
<snip>
> This isn't intended to change anything overnight.  After all, most Linux
> distros won't be able to disable the kconfig options quite yet, mainly
> because of iwd.  But this should create a bit more impetus for these
> userspace programs to be fixed, and the documentation update should also
> help prevent more users from appearing.

There are 2 other users that I know of: bluez, and the ell library
(used by iwd and bluez).

From what I could tell, bluetoothd uses AF_ALG for cryptography:
https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/src/shared/crypto.c
https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/tools/mesh-gatt/crypto.c

It uses "ecb(aes)" and "cmac(aes)" as algorithms.

Finally, it also uses them both again:
https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/mesh/crypto.c
through ell:
https://git.kernel.org/pub/scm/libs/ell/ell.git/tree/ell/cipher.c

Because that's a question that also came up, bluetoothd also uses the
CAP_NET_ADMIN capability.

I'll let Luiz and Marcel take it over from here.

Cheers

^ permalink raw reply

* Re: [RFC PATCH v2 03/10] kvm: Prepare core VM structs and helpers for LUO support
From: tarunsahu @ 2026-06-23 12:48 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgGharGxs9s_ow0Z4iiQ9PCzdghch-4Fk6UMjiPP9tX-5g@mail.gmail.com>


Hi,

Thanks for reviewing the patch.


Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>> Introduce core infrastructure to support VM preservation with LUO.
>>
>> First two changes are just refactoring, no functional change, third
>> change introduces a new member in struct kvm.
>> - Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code.
>> - Add a public kvm_create_vm_file() helper wrapping kvm_create_vm()
>>   and anon_inode_getfile() to provide a unified VM file creation API.
>> - Track a weak reference to the backing file in struct kvm under
>>   CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution
>>   without circular lifetime dependencies.
>>
>
> Given the above, I think this should be separate patches.

Okay.

>
>> Signed-off-by: Tarun Sahu <tarunsahu@google.com>
>> ---
>>  include/linux/kvm_host.h | 14 +++++++
>>  virt/kvm/kvm_main.c      | 79 +++++++++++++++++++++++++++++-----------
>>  virt/kvm/kvm_mm.h        |  3 ++
>>  3 files changed, 75 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 4c14aee1fb06..9111a28637af 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -874,6 +874,18 @@ struct kvm {
>>  #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>>  	/* Protected by slots_lock (for writes) and RCU (for reads) */
>>  	struct xarray mem_attr_array;
>> +#endif
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Weak reference to the VFS file backing this KVM instance. Stored
>> +	 * without incrementing the file refcount to prevent a circular lifetime
>> +	 * dependency (since file->private_data already pins this struct kvm).
>> +	 * Used exclusively to resolve the file pointer back from struct kvm.
>> +	 *
>> +	 * Written/cleared via rcu_assign_pointer() and read locklessly under
>> +	 * RCU (e.g. via get_file_active() to prevent ABA races).
>> +	 */
>> +	struct file *vm_file;
>>  #endif
>
> We didn't really talk about this during the calls, but it seems weird to
> preserve a vm_file with pretty much nothing other than the vm type. The
> entire VM is re-created, which means it could potentially be a
> completely different VM?
>
> In some sense it's more flexible since the guest_memfd can be restored
> with some completely different VM, but it seems like it could introduce
> other issues.
>
> I think other KVM folks would probably have more thoughts here.

IIUC,
you are asking "Why preserve vm_fd with guest_memfd when we only
preserve vm_type?"

We discussed about this. Also explained here: (also copying it down)
[RFC PATCH v2 04/10] kvm: kvm_luo: Allow kvm preservation with LUO
https://lore.kernel.org/all/8730c0e11acbd0d645a8b7187cd5cd7de373380e.1780676742.git.tarunsahu@google.com/

and
https://lore.kernel.org/all/cover.1780667929.git.tarunsahu@google.com/
(This cover letter was sent separately from the patches due to a problem
in my automated script)

vm_fd is needed for guest_memfd retrieval, because guest_memfd can
not be retrieved without struct kvm and there is no other way to pass
that. (We talked about alternative like LINK IOCTL or break the
CREATE_GUEST_MEMFD IOCTL in two IOCTL: one just create GUEST_MEMFD
and another attach it to the vm_file (struct kvm)). We discarded the
alternative approach because it changes the guest_memfd design.

This patch also set the infrastucture to preserve the vm_fd which
will be extended later in future when we will introduce private support.
where TDX related data (sPTE) might be preserved via struct kvm. Also,
vCPUs state, IRQ routing table etc if needed can also be preserved.


>> +	struct file *vm_file;

If You are asking about, the diff above (why vm_file is there)
There is no way to get vm_file from struct kvm which is needed
in guest_memfd preservation during freeze call to preserve the token of
vm_fd. This is used on retrieval time.


I have sent V3 as well here:
https://lore.kernel.org/all/20260622184851.2309827-1-tarunsahu@google.com/

V3 includes the few minor fixes suggested by sashiko.
we can continue reviewing on V2/V3. I will include all of the
suggestions in V4.

>
>>  	char stats_id[KVM_STATS_NAME_SIZE];
>>  };
>> @@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm);
>>  bool kvm_get_kvm_safe(struct kvm *kvm);
>>  void kvm_put_kvm(struct kvm *kvm);
>>  bool file_is_kvm(struct file *file);
>> +struct file *kvm_create_vm_file(unsigned long type, const char *fdname);
>>  void kvm_put_kvm_no_destroy(struct kvm *kvm);
>> +void kvm_uevent_notify_vm_create(struct kvm *kvm);
>>
>>  static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
>>  {
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 89489996fbc1..65f0c5fb353e 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -67,9 +67,6 @@
>>  #include <linux/kvm_dirty_ring.h>
>>
>>
>> -/* Worst case buffer size needed for holding an integer. */
>> -#define ITOA_MAX_LEN 12
>> -
>>  MODULE_AUTHOR("Qumranet");
>>  MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
>>  MODULE_LICENSE("GPL");
>> @@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>>  {
>>  	struct kvm *kvm = filp->private_data;
>>
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Clear the weak reference of the vm file.
>> +	 * In case vm file is closed by userspace, but kvm still has
>> +	 * other users like vCPUs, clearing this pointer ensures
>> +	 * that we don't have a dangling pointer to a closed file.
>> +	 *
>> +	 * Cleared via rcu_assign_pointer() to ensure proper memory visibility
>> +	 * for concurrent lockless readers under RCU.
>> +	 */
>> +	rcu_assign_pointer(kvm->vm_file, NULL);
>> +#endif
>> +
>>  	kvm_irqfd_release(kvm);
>>
>>  	kvm_put_kvm(kvm);
>> @@ -5476,11 +5486,47 @@ bool file_is_kvm(struct file *file)
>>  }
>>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm);
>>
>> +struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
>> +{
>> +	struct kvm *kvm = kvm_create_vm(type, fdname);
>> +	struct file *file;
>> +
>> +	if (IS_ERR(kvm))
>> +		return ERR_CAST(kvm);
>> +
>> +	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
>> +	if (IS_ERR(file)) {
>> +		kvm_put_kvm(kvm);
>> +		return file;
>> +	}
>> +
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Weak reference to the file (without get_file()) to prevent a circular
>> +	 * dependency. Safe because the file's release path clears this pointer
>> +	 * and drops its reference to the VM.
>> +	 *
>> +	 * Written via rcu_assign_pointer() because the pointer can be read
>> +	 * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
>> +	 * get_file_active() to prevent lockless ABA races).
>> +	 */
>> +	rcu_assign_pointer(kvm->vm_file, file);
>> +#endif
>> +
>> +	/*
>> +	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
>> +	 * already set, with ->release() being kvm_vm_release().  In error
>> +	 * cases it will be called by the final fput(file) and will take
>> +	 * care of doing kvm_put_kvm(kvm).
>> +	 */
>> +
>> +	return file;
>> +}
>> +
>>  static int kvm_dev_ioctl_create_vm(unsigned long type)
>>  {
>>  	char fdname[ITOA_MAX_LEN + 1];
>>  	int r, fd;
>> -	struct kvm *kvm;
>>  	struct file *file;
>>
>>  	fd = get_unused_fd_flags(O_CLOEXEC);
>> @@ -5489,31 +5535,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>>
>>  	snprintf(fdname, sizeof(fdname), "%d", fd);
>>
>> -	kvm = kvm_create_vm(type, fdname);
>> -	if (IS_ERR(kvm)) {
>> -		r = PTR_ERR(kvm);
>> -		goto put_fd;
>> -	}
>> -
>> -	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
>> +	file = kvm_create_vm_file(type, fdname);
>>  	if (IS_ERR(file)) {
>>  		r = PTR_ERR(file);
>> -		goto put_kvm;
>> +		goto put_fd;
>>  	}
>>
>> -	/*
>> -	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
>> -	 * already set, with ->release() being kvm_vm_release().  In error
>> -	 * cases it will be called by the final fput(file) and will take
>> -	 * care of doing kvm_put_kvm(kvm).
>> -	 */
>> -	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
>> +	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data);
>
> Notifying with file->private_data threw me off... I would rather inline
> the rcu_assign_pointer() in this function and have this line read
> notify(..., kvm) like before.
>
>>
>>  	fd_install(fd, file);
>>  	return fd;
>>
>> -put_kvm:
>> -	kvm_put_kvm(kvm);
>>  put_fd:
>>  	put_unused_fd(fd);
>>  	return r;
>> @@ -6341,6 +6373,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
>>  	kfree(env);
>>  }
>>
>> +void kvm_uevent_notify_vm_create(struct kvm *kvm)
>> +{
>> +	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
>> +}
>> +
>>  static void kvm_init_debug(void)
>>  {
>>  	const struct file_operations *fops;
>> diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
>> index 9fcc5d5b7f8d..7aa1d65c3d46 100644
>> --- a/virt/kvm/kvm_mm.h
>> +++ b/virt/kvm/kvm_mm.h
>> @@ -3,6 +3,9 @@
>>  #ifndef __KVM_MM_H__
>>  #define __KVM_MM_H__ 1
>>
>> +/* Worst case buffer size needed for holding an integer as a string. */
>> +#define ITOA_MAX_LEN 12
>> +
>>  /*
>>   * Architectures can choose whether to use an rwlock or spinlock
>>   * for the mmu_lock.  These macros, for use in common code
>> --
>> 2.54.0.1032.g2f8565e1d1-goog

^ permalink raw reply

* Re: [PATCH] docs/mm: clarify that we are not looking for LLM generated content
From: Jonathan Corbet @ 2026-06-23 12:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm), linux-doc
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Matthew Wilcox, Harry Yoo, linux-mm, linux-kernel
In-Reply-To: <d421c081-8686-4d46-8452-e543401b0503@kernel.org>

"David Hildenbrand (Arm)" <david@kernel.org> writes:

> On 4/20/26 23:03, David Hildenbrand (Arm) wrote:
>> Let's make it clear that we are not looking for LLM generated content
>> from contributors not familiar with the details of MM, as it shifts the
>> real work onto reviewers.
>> 
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>> ---
>>  Documentation/mm/index.rst | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
>> index 7aa2a8886908..13a79f5d092c 100644
>> --- a/Documentation/mm/index.rst
>> +++ b/Documentation/mm/index.rst
>> @@ -7,6 +7,19 @@ of Linux.  If you are looking for advice on simply allocating memory,
>>  see the :ref:`memory_allocation`.  For controlling and tuning guides,
>>  see the :doc:`admin guide <../admin-guide/mm/index>`.
>>  
>> +.. note::
>> +
>> +  Unfortunately, parts of this guide are still incomplete or missing.
>> +  While we appreciate contributions, documentation in this area is hard
>> +  to get right and requires a lot of attention to detail.  New contributors
>> +  should reach out to the relevant maintainers early.
>> +
>> +  This guide is expected to reflect reality, which requires contributors
>> +  to have a detailed understanding.  Documentation generated with LLMs
>> +  by contributors unfamiliar with these details shifts the real work onto
>> +  reviewers, which is why such contributions will be rejected without
>> +  further comment.
>> +
>>  .. toctree::
>>     :maxdepth: 1
>>  
>> 
>> ---
>> base-commit: da6b5aae84beb0917ecb0c9fbc71169d145397ff
>> change-id: 20260420-llmdoc-21bf5fadbd6f
>> 
>> Best regards,
>
> I assume this was not picked up yet? (via documentation or mm tree?)

I had figured Andrew would grab it; I can certainly do so if you'd like.

jon

^ permalink raw reply

* Re: [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
From: Amit Machhiwal @ 2026-06-23 13:01 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Amit Machhiwal, linuxppc-dev, Madhavan Srinivasan,
	Anushree Mathur, Paolo Bonzini, Nicholas Piggin, Michael Ellerman,
	Christophe Leroy (CS GROUP), Jonathan Corbet, Shuah Khan, kvm,
	linux-kernel, linux-doc, lkp
In-Reply-To: <878q8bcbh6.fsf@vajain21.in.ibm.com>

Hi Vaibhav,

Thanks for revewing this patch. My response is inline.

On 2026/06/19 11:34 AM, Vaibhav Jain wrote:
> Hi Amit.
> 
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > On POWER systems, the host CPU may run in a compatibility mode (e.g., a
> > Power11 processor operating in Power10 compatibility mode). In such
> > cases, the effective CPU level exposed to guests differs from the
> > physical processor generation.
> >
> > When running nested KVM guests, QEMU derives the host CPU type using
> > mfpvr(), which reflects the physical processor version. This can result
> > in a mismatch between the CPU model selected by QEMU and the
> > compatibility mode enforced by the host, leading to guest boot failures.
> >
> > For example, booting a nested guest on a Power11 LPAR configured in
> > Power10 compatibility mode fails with:
> >
> >   KVM-NESTEDv2: couldn't set guest wide elements
> >   [..KVM reg dump..]
> >
> > This occurs because QEMU selects a CPU model corresponding to the
> > physical processor (via mfpvr()), while the host operates in a lower
> > compatibility mode. As a result, KVM rejects the requested compatibility
> > level during guest initialization.
> >
> > Add support for retrieving host CPU compatibility capabilities for
> > nested guests on PowerVM (PAPR nested API v2). The hypervisor provides
> > the effective compatibility levels via the H_GUEST_GET_CAPABILITIES
> > hcall, which reflects the processor modes negotiated between the Power
> > hypervisor (L0) and the host partition (L1).
> >
> > On pseries systems, obtain the capability bitmap using
> > plpar_guest_get_capabilities() and return it via struct
> > kvm_ppc_compat_caps. The implementation defines KVM-specific capability
> > constants (KVM_PPC_COMPAT_CAP_POWER9/10/11) and applies masking to ensure
> > only supported processor modes are exposed to userspace. This information
> > is then exposed through the KVM_PPC_GET_COMPAT_CAPS ioctl.
> >
> > Hook the implementation into the Book3S HV kvmppc_ops so that it can be
> > invoked by the generic KVM ioctl handling code.
> >
> > Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > ---
> >  arch/powerpc/include/uapi/asm/kvm.h | 11 ++++++++++-
> >  arch/powerpc/kvm/book3s_hv.c        | 17 +++++++++++++++++
> >  2 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> > index 8a38be6c3b03..730488681443 100644
> > --- a/arch/powerpc/include/uapi/asm/kvm.h
> > +++ b/arch/powerpc/include/uapi/asm/kvm.h
> > @@ -443,7 +443,16 @@ struct kvm_ppc_compat_caps {
> >  	__u64	size;			/* Size of this structure */
> >  	__u64	compat_capabilities;	/* Capabilities supported by the host */
> >  };
> > -
> > +/*
> > + * Capability bits for compat_capabilities field in kvm_ppc_compat_caps.
> > + * These bits indicate which processor compatibility modes are supported.
> > + */
> > +#define KVM_PPC_COMPAT_CAP_POWER9	(1ULL << 62)
> > +#define KVM_PPC_COMPAT_CAP_POWER10	(1ULL << 61)
> > +#define KVM_PPC_COMPAT_CAP_POWER11	(1ULL << 60)
> > +#define KVM_PPC_COMPAT_BITMASK		(KVM_PPC_COMPAT_CAP_POWER9 | \
> > +					 KVM_PPC_COMPAT_CAP_POWER10 | \
> > +					 KVM_PPC_COMPAT_CAP_POWER11)
> >  /*
> >   * Values for character and character_mask.
> >   * These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index f9380ef65750..f674386df62c 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -6523,6 +6523,22 @@ static bool kvmppc_hash_v3_possible(void)
> >  	return true;
> >  }
> >  
> > +
> > +static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> > +{
> > +	unsigned long capabilities = 0;
> > +	long rc = -EINVAL;
> > +
> > +	if (kvmhv_on_pseries()) {
> > +		if (kvmhv_is_nestedv2())
> > +			rc = plpar_guest_get_capabilities(0,
> > &capabilities);
> I think instead of making the hcall you should use the
> 'nested_capabilities' extern symbol as it would already the same
> value. This symbol is already accessible in 'book3s_hv.c'

Agreed! Will change to use nested_capabilities directly instead of
making the hcall. This is more efficient as this will help reduce an
hcall overhead while the value is already cached during module
initialization (in kvmhv_nested_init()).

Thanks,
Amit

> 
> > +		host_caps->compat_capabilities = capabilities &
> > +							KVM_PPC_COMPAT_BITMASK;
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> >  static struct kvmppc_ops kvm_ops_hv = {
> >  	.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
> >  	.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
> > @@ -6565,6 +6581,7 @@ static struct kvmppc_ops kvm_ops_hv = {
> >  	.hash_v3_possible = kvmppc_hash_v3_possible,
> >  	.create_vcpu_debugfs = kvmppc_arch_create_vcpu_debugfs_hv,
> >  	.create_vm_debugfs = kvmppc_arch_create_vm_debugfs_hv,
> > +	.get_compat_caps = kvmppc_get_compat_caps,
> >  };
> >  
> >  static int kvm_init_subcore_bitmap(void)
> > -- 
> > 2.50.1 (Apple Git-155)
> >
> >
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* Re: [RFC PATCH] reserve_mem: add support for static memory
From: Pratyush Yadav @ 2026-06-23 13:10 UTC (permalink / raw)
  To: Shyam Saini
  Cc: linux-mm, linux-doc, linux-kernel, rppt, akpm, kees, tony.luck,
	gpiccoli, bp, rdunlap, peterz, feng.tang, dapeng1.mi, elver,
	enelsonmoore, kuba, lirongqing, ebiggers
In-Reply-To: <20260618224018.117978-1-shyamsaini@linux.microsoft.com>

On Thu, Jun 18 2026, Shyam Saini wrote:

> reserve_mem relies on dynamic memory allocation, this limits the
> usecase where memory and its address is required to be preserved
> across the boots. Eg: ramoops memory reservation on ACPI platforms
>
> So add support to pass a pre-determined static address and reserve
> memory at this specified address. This enables use case like ramoops
> on ACPI platforms to reliably access ramoops region across the boots.

Doesn't memmap= do exactly this? How is this different?

I always thought the point of reserve_mem was that you _don't_ have to
provide an explicit address, one is chosen for your machine
automatically.

>
> Also skip parsing of "align" parameter when static address is passed.
>
> Example syntax for static address
>  reserve_mem=4M@0x1E0000000:oops ramoops.mem_name=oops
>
> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v4 1/5] mm/zswap: Extend shrink_memcg() writeback capability
From: Hao Jia @ 2026-06-23 13:22 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia
In-Reply-To: <ajnB8IZrFZwbIr9P@google.com>



On 2026/6/23 07:33, Yosry Ahmed wrote:
> On Thu, Jun 18, 2026 at 12:48:53PM +0800, Hao Jia wrote:
>> From: Hao Jia <jiahao1@lixiang.com>
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 761cd699e0a3..d7d031dee4cd 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -160,6 +160,11 @@ struct zswap_pool {
>>   	char tfm_name[CRYPTO_MAX_ALG_NAME];
>>   };
>>   
>> +struct zswap_shrink_walk_arg {
>> +	unsigned long bytes_written;
>> +	bool encountered_page_in_swapcache;
>> +};
>> +
>>   /* Global LRU lists shared by all zswap pools. */
>>   static struct list_lru zswap_list_lru;
>>   
>> @@ -1089,8 +1094,9 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>>   				       void *arg)
>>   {
>>   	struct zswap_entry *entry = container_of(item, struct zswap_entry, lru);
>> -	bool *encountered_page_in_swapcache = (bool *)arg;
>> +	struct zswap_shrink_walk_arg *walk_arg = arg;
>>   	swp_entry_t swpentry;
>> +	unsigned int length;
>>   	enum lru_status ret = LRU_REMOVED_RETRY;
>>   	int writeback_result;
>>   
>> @@ -1135,8 +1141,13 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>>   	 * Once the lru lock is dropped, the entry might get freed. The
>>   	 * swpentry is copied to the stack, and entry isn't deref'd again
>>   	 * until the entry is verified to still be alive in the tree.
>> +	 *
>> +	 * entry->length is also copied while the lock is held, because
>> +	 * zswap_writeback_entry() frees the entry on success and we still
>> +	 * need its compressed size to account for writeback.
> 
> Hmm that's unnecessary, just update "The swpentry is copied to the
> stack.." above to "Copy neded fields to the stack.." or something.

I'll do this, thanks.

> 
>>   	 */
>>   	swpentry = entry->swpentry;
>> +	length = entry->length;
>>   
>>   	/*
>>   	 * It's safe to drop the lock here because we return either
>> @@ -1155,12 +1166,13 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>>   		 * into the warmer region. We should terminate shrinking (if we're in the dynamic
>>   		 * shrinker context).
>>   		 */
>> -		if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
>> +		if (writeback_result == -EEXIST) {
>>   			ret = LRU_STOP;
>> -			*encountered_page_in_swapcache = true;
>> +			walk_arg->encountered_page_in_swapcache = true;
>>   		}
>>   	} else {
>>   		zswap_written_back_pages++;
>> +		walk_arg->bytes_written += length;
>>   	}
>>   
>>   	return ret;
>> @@ -1169,8 +1181,11 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>>   static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
>>   		struct shrink_control *sc)
>>   {
>> +	struct zswap_shrink_walk_arg walk_arg = {
>> +		.bytes_written = 0,
>> +		.encountered_page_in_swapcache = false,
>> +	};
>>   	unsigned long shrink_ret;
>> -	bool encountered_page_in_swapcache = false;
>>   
>>   	if (!zswap_shrinker_enabled ||
>>   			!mem_cgroup_zswap_writeback_enabled(sc->memcg)) {
>> @@ -1179,9 +1194,9 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
>>   	}
>>   
>>   	shrink_ret = list_lru_shrink_walk(&zswap_list_lru, sc, &shrink_memcg_cb,
>> -		&encountered_page_in_swapcache);
>> +		&walk_arg);
>>   
>> -	if (encountered_page_in_swapcache)
>> +	if (walk_arg.encountered_page_in_swapcache)
>>   		return SHRINK_STOP;
>>   
>>   	return shrink_ret ? shrink_ret : SHRINK_STOP;
>> @@ -1275,10 +1290,32 @@ static struct shrinker *zswap_alloc_shrinker(void)
>>   	return shrinker;
>>   }
>>   
>> -static int shrink_memcg(struct mem_cgroup *memcg)
>> -{
>> -	int nid, shrunk = 0, scanned = 0;
>> +/*
>> + * The maximum acceptable scan cost factor for writing back
>> + * PAGE_SIZE bytes of compressed data.
>> + */
>> +#define ZSWAP_WB_SCAN_FACTOR	16UL
>> +#define NR_ZSWAP_WB_BATCH	64UL
>>   
>> +/*
>> + * Iterate over the per-node zswap LRUs of @memcg in batches, writing back
>> + * up to @nr_to_writeback * PAGE_SIZE bytes of compressed data.
>> + *
>> + * Return: The number of bytes written back, or -ENOENT if @memcg has
>> + * writeback disabled, is a zombie cgroup, or has empty zswap LRUs.
>> + */
>> +static long shrink_memcg(struct mem_cgroup *memcg,
>> +			 unsigned long nr_to_writeback)
> 
> 
> Is nr_to_writeback supposed to be the number of pages we want to
> writeback (regardless of their compressed size), or the compressed bytes
> we want to writeback divided by PAGE_SIZE?
> 
> The way it's being used below seems like it's the latter, but the batch
> size should be in terms of scanned pages (i.e. uncompressed pages). So
> this is confusing.
> 
> The zswap_store() path expects to reclaim one uncompressed page, but
> this will reclaim PAGE_SIZE worth of compressed memory when passing 1
> IIUC (actually maybe more, see below).
> 
>> +{
>> +	struct zswap_shrink_walk_arg walk_arg = {
>> +		.bytes_written = 0,
>> +		.encountered_page_in_swapcache = false,
>> +	};
>> +	u64 bytes_to_writeback = nr_to_writeback << PAGE_SHIFT;
>> +	bool memcg_list_is_empty = true;
>> +	int nid;
>> +
>> +	/* Memcg with zswap writeback disabled are not candidates. */
> 
> The comment is unnecessary here, it should be obvious.

I'll do this, thanks.
> 
>>   	if (!mem_cgroup_zswap_writeback_enabled(memcg))
>>   		return -ENOENT;
>>   
>> @@ -1290,24 +1327,65 @@ static int shrink_memcg(struct mem_cgroup *memcg)
>>   		return -ENOENT;
>>   
>>   	for_each_node_state(nid, N_NORMAL_MEMORY) {
>> -		unsigned long nr_to_walk = 1;
>> +		unsigned long nr_to_scan, nr_scanned = 0;
>> +		unsigned long remain;
>> +		walk_arg.encountered_page_in_swapcache = false;
>> +		/*
>> +		 * Cap by LRU length: bounds rewalks when referenced
>> +		 * entries keep rotating to the tail.
>> +		 */
>> +		nr_to_scan = list_lru_count_one(&zswap_list_lru, nid, memcg);
>> +		if (!nr_to_scan)
>> +			continue;
> 
> Hmm generally if we are running out of pages to scan then we should scan
> the rotated entries, and reclaim them on the second pass, right? So this
> should be working as intended. But I guess this doesn't work well when
> iterating multiple memcgs, as we don't want to drain referenced entries
> in one memcg before reclaiming already rotated entries on another.
> 
> So I think the assumption here is that the caller will retry if needed,
> handling balancing scanning between multiple memcgs if needed. Maybe we
> should document this in the function doc above? We should explain that
> referenced entries will be rotated but not reclaimed as part of the same
> call.
> 
>> +		memcg_list_is_empty = false;
>> +
>> +		/*
>> +		 * Cap by SCAN_FACTOR * remain budget: bounds scan cost
>> +		 * to the remaining writeback budget.
>> +		 */
>> +		remain = DIV_ROUND_UP(bytes_to_writeback - walk_arg.bytes_written, PAGE_SIZE);
>> +		nr_to_scan = min(nr_to_scan,
>> +				 remain * ZSWAP_WB_SCAN_FACTOR);
> 
> For the zswap_store() path bytes_to_writeback=PAGE_SIZE, so remain will
> initially be 1. But then we multiply by this factor and now to scan 16
> pages? Also, where did this factor and equation come from?
> 
> We'll also loop over nodes, so we may end up scanning 32 or more pages
> depending on the number of nodes in the system.
> 
> If this is just a heuristic, we should really just start simple and add
> heuristics later as needed. The caller should probably pass in the
> number of pages to scan (i.e. uncompressed pages), and leave it to the
> caller to decide when to retry if the actual memory savings are
> realized.
> 
>>   
>> -		shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg,
>> -					    &shrink_memcg_cb, NULL, &nr_to_walk);
>> -		scanned += 1 - nr_to_walk;
>> +		while (nr_scanned < nr_to_scan) {
>> +			unsigned long nr_to_walk = min(NR_ZSWAP_WB_BATCH,
>> +						       nr_to_scan - nr_scanned);
>> +
>> +			/*
>> +			 * Account for the committed budget rather than the walker's
>> +			 * actual delta. If the list is emptied concurrently, the
>> +			 * walker visits nothing and nr_scanned would never advance.
>> +			 */
>> +			nr_scanned += nr_to_walk;
>> +
>> +			list_lru_walk_one(&zswap_list_lru, nid, memcg,
>> +					  &shrink_memcg_cb,
>> +					  &walk_arg,
>> +					  &nr_to_walk);
>> +
>> +			if (walk_arg.bytes_written >= bytes_to_writeback)
>> +				return walk_arg.bytes_written;
>> +
>> +			if (walk_arg.encountered_page_in_swapcache)
>> +				break;
>> +
>> +			cond_resched();
>> +		}
> 
> If the caller is expected to have a retry loop anyway, should we
> simplify this and just scan each per-node LRU once?
> 
> We should also probably bail early if the number of scanned pages has
> already been reached? Currently shrink_memcg() scans one page at a time,
> so if it scans a bit more to balance between the nodes it's probably
> fine.
> 
> But with batching, we could end up scanning hundres of extra pages just
> to balance between all nodes. Is node imbalance a real issue?
> 

My initial thought was that if cold memory is evenly distributed across 
nodes and we are doing a large writeback, it would be better to balance 
the zswap entry writeback across all nodes rather than just draining 
node 0 first. However, since we currently lack a proper metric to 
represent hot/cold memory (such as age-based tracking), doing this 
probably doesn't make much sense right now.

So, perhaps we want something like this? Please correct me if I'm wrong.

static long shrink_memcg(struct mem_cgroup *memcg,
        unsigned long nr_to_scan)
{
   struct zswap_shrink_walk_arg walk_arg = {
     .bytes_written = 0,
     .encountered_page_in_swapcache = false,
   };
   unsigned long nr_remaining = nr_to_scan;
   bool memcg_list_is_empty = true;
   int nid;

   if (!mem_cgroup_zswap_writeback_enabled(memcg))
     return -ENOENT;

   if (memcg && !mem_cgroup_online(memcg))
     return -ENOENT;

   for_each_node_state(nid, N_NORMAL_MEMORY) {
     unsigned long nr_to_walk;

     /*
      * Cap the per-node scan by the current LRU length. A referenced
      * entry is only rotated to the tail (second chance) and may be
      * revisited within a single walk; without this cap those rotated
      * entries could drain the shared scan budget on one node.
      */
     nr_to_walk = min(nr_remaining,
          list_lru_count_one(&zswap_list_lru, nid, memcg));
     if (!nr_to_walk)
       continue;
     memcg_list_is_empty = false;

     nr_remaining -= nr_to_walk;
     list_lru_walk_one(&zswap_list_lru, nid, memcg,
           &shrink_memcg_cb, &walk_arg, &nr_to_walk);
     /* Return the unused share of the budget to the pool. */
     nr_remaining += nr_to_walk;

     /* Bail out once the whole scan budget has been spent. */
     if (!nr_remaining)
       break;

     cond_resched();
   }

   if (memcg_list_is_empty)
     return -ENOENT;

   return walk_arg.bytes_written;
}

Thanks,
Hao

^ permalink raw reply

* Re: [PATCH 1/4] nfs: store the full NFS fileid in inode->i_ino
From: Mark Brown @ 2026-06-23 13:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan,
	linux-nfs, linux-kernel, linux-doc
In-Reply-To: <e5ebc36c9a7e356c8d1b98ce3a9d1f3420177334.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

On Tue, Jun 23, 2026 at 07:04:47AM -0400, Jeff Layton wrote:
> On Mon, 2026-06-22 at 18:38 -0400, Jeff Layton wrote:

> > Note that it's trying to stuff the inode number field into an unsigned
> > long. Before this patch, the maps file would have printed the old
> > (hashed) inode number on 32-bit. Now, it prints the full 64-bit inode
> > number.

...

> > We could argue that this is a bug in the testcase. It assumes that the
> > maps file will never print a value larger than ULONG_MAX in that field,
> > and I don't see why it would make that assumption in this day and age.

It wouldn't be the first LTP test that had a bug in it.

> > Are there actual programs in the field that scrape the maps file that
> > might be affected by this change?

Not to my knowledge.

> This testcase patch should fix it. I'll plan to send this to the LTP
> list, but it would be nice if someone could confirm the fix on arm32:

I'll try to give it a spin, though my test setup for LTP makes that very
awkward (it's embedded into a rootfs image and built as part of that) so
I wouldn't wait for me.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
From: Amit Machhiwal @ 2026-06-23 13:31 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Amit Machhiwal, linuxppc-dev, Madhavan Srinivasan,
	Anushree Mathur, Paolo Bonzini, Nicholas Piggin, Michael Ellerman,
	Christophe Leroy (CS GROUP), Jonathan Corbet, Shuah Khan, kvm,
	linux-kernel, linux-doc, lkp
In-Reply-To: <875x3fcb3x.fsf@vajain21.in.ibm.com>

Hi Vaibhav,

Thanks for reviewing this patch. Please find my response inline.

On 2026/06/19 11:42 AM, Vaibhav Jain wrote:
> Hi Amit.
> 
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > Currently, when booting a compatibility-mode KVM guest (L1) on a PowerNV
> > hypervisor (L0), the guest runs with the expected processor
> > compatibility level. However, when booting a nested KVM guest (L2)
> > inside the L1, QEMU derives the CPU model from the raw host PVR and
> > attempts to run the nested guest at that level, instead of honoring the
> > compatibility mode of the L1.
> >
> > Extend host CPU compatibility capability reporting to support nested
> > virtualization on PowerNV systems (PAPR nested API v1).
> >
> > For nested API v2 (PowerVM), compatibility capabilities are obtained
> > from the hypervisor via the H_GUEST_GET_CAPABILITIES hcall. This
> > information is not available on PowerNV systems.
> >
> > For nested API v1, derive the compatibility capabilities from the L1
> > guest by reading the "cpu-version" property from the device tree, which
> > reflects the effective (logical) processor compatibility level. Map this
> > value to the corresponding compatibility capability bitmap using
> > KVM-specific constants.
> >
> > Introduce a helper to translate CPU version values into KVM_PPC_COMPAT_CAP
> > bits and integrate it into kvmppc_get_compat_caps(). The implementation
> > applies masking to ensure only supported processor modes are exposed.
> >
> > This allows userspace to query host CPU compatibility modes on both
> > PowerVM and PowerNV platforms via the KVM_PPC_GET_COMPAT_CAPS ioctl.
> >
> > Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > ---
> >  arch/powerpc/kvm/book3s_hv.c | 37 +++++++++++++++++++++++++++++++++++-
> >  1 file changed, 36 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index f674386df62c..375e7a7fa9f8 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -6523,15 +6523,50 @@ static bool kvmppc_hash_v3_possible(void)
> >  	return true;
> >  }
> >  
> > +static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
> > +				      unsigned long *capabilities)
> > +{
> > +	switch (cpu_version) {
> > +	case PVR_ARCH_31_P11:
> > +		*capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
> Do you need to do 'break' here instead of falling through. Since P11
> host can support P10 and P9 compat modes

I had addressed a similar comment from Harsh in v1 of the series here:

  https://lore.kernel.org/all/20260507202740.96fb259f-22-amachhiw@linux.ibm.com/

The current implementation with break statements is intentional. This
function (kvmppc_map_compat_capabilities()) is called only when booting
a nested KVM guest (L2) on **KVM on PowerNV**.

When the L1 KVM guest is booted in a compat mode, L2 is supposed to boot
with the **same PVR version** as that of the L1, which is already taken
care of with the current changes. If L2 needs to boot with a different
*lower* compat mode, it would use max-cpu-compat, which takes a
different code path for setting the compat.

Even if I included all lower compat modes in the compat caps **APIv1**,
I don't think we'll be using those lower compat bits unless we wanted to
block a specific older compat for a given pvr level - which neither we
are doing in this series nor we may want to put such a restriction for
APIv1.

Please let me know if you think otherwise.

> 
> > +		break;
> > +	case PVR_ARCH_31:
> > +		*capabilities |= KVM_PPC_COMPAT_CAP_POWER10;
> > +		break;
> > +	case PVR_ARCH_300:
> > +		*capabilities |= KVM_PPC_COMPAT_CAP_POWER9;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> >  
> >  static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> >  {
> > +	struct device_node *np;
> >  	unsigned long capabilities = 0;
> > +	const __be32 *prop = NULL;
> >  	long rc = -EINVAL;
> > +	u32 cpu_version;
> >  
> >  	if (kvmhv_on_pseries()) {
> > -		if (kvmhv_is_nestedv2())
> > +		if (kvmhv_is_nestedv2()) {
> >  			rc = plpar_guest_get_capabilities(0, &capabilities);
> > +		} else {
> > +			for_each_node_by_type(np, "cpu") {
> > +				prop = of_get_property(np, "cpu-version", NULL);
> > +				if (prop) {
> > +					cpu_version = be32_to_cpup(prop);
> > +					break;
> > +				}
> > +			}
> > +			if (!prop)
> > +				return -EINVAL;
> > +			rc = kvmppc_map_compat_capabilities(cpu_version,
> > +								&capabilities);
> > +		}
> should you check for 'rc' error here before assigning 'capabilities' to
> 'host_caps->compat_capabilities' . I understand it will be set to '0'
> due to its initialization at the top of the function. But would be
> better to make it more explicit

Sure. The return value rc is checked by the caller but more error
checking is always good I guess. :) 

I'll add a check for rc something like this (or something similar):

  if (rc) {
  	return -EINVAL;
  }

  host_caps->compat_capabilities = capabilities &
					KVM_PPC_COMPAT_BITMASK;

Thanks,
Amit

> 
> >  		host_caps->compat_capabilities = capabilities &
> >  							KVM_PPC_COMPAT_BITMASK;
> >  	}
> > -- 
> > 2.50.1 (Apple Git-155)
> >
> >
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* Re: [PATCH v7 00/10] tracing/probes: Add more typecast features
From: Masami Hiramatsu @ 2026-06-23 13:54 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178217904992.643090.15726197350652241270.stgit@devnote2>

On Tue, 23 Jun 2026 10:44:10 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> Hi,
> 
> Here is the 7th version of series to introduce more typecast features
> to probe events. The previous version is here:
> 
>  https://lore.kernel.org/all/178201238795.570818.15573963115625446598.stgit@devnote2/
> 
> In this version, I added 2 new fix and cleanup patches and update
> according to Sashiko's review. [1/10] is a long-lived issue about
> @+FOFFS, which was wrongly adding offset twice. [2/10] is a clean
> up patch for renaming fetch_op name (good to dump it). 
> This is applicable against probes/core branch on linux-trace tree.

I'll take the first 2 patches to probes/core, since those
are obvious fix and cleanup.

Thanks,

> 
> Steve introduced BTF typecast feature for eprobe[1].
> This series extends it and add more options:
> 
> 1. Expanding BTF typecast to kprobe and fprobe.
>    (currently only function entry/exit)
> 
> 2. Introduce container_of like typecast. This adds a "assigned
>    member" option to the typecast.
> 
>    (STRUCT,MEMBER)VAR->ANOTHER_MEMBER
> 
>    This casts VAR to STRUCT type but the VAR is as the address
>    of STRUCT.MEMBER. In C, it is:
> 
>    container_of(VAR, STRUCT, MEMBER)->ANOTHER_MEMBER
> 
> 3. Support nested typecast, e.g.
> 
>    (STRUCT)((STRUCT2)VAR->MEMBER2)->MEMBER
> 
>    the nest level must be smaller than 3.
> 
> 4. Add $current variable to point "current" task_struct.
>    This is useful with typecast, e.g.
> 
>    (task_struct)$current->pid
> 
> 5. per-cpu dereference support.
> 
>    Intrdouce this_cpu_read(VAR) and this_cpu_ptr(VAR) to
>    access per-cpu data on the current CPU (accessing other CPU
>    data is not stable, because it can be changed.)
> 
>    You can access the member of per-cpu data structure using
>    typecast like:
> 
>    (STRUCT)this_cpu_ptr(VAR)->MEMBER
> 
> And added fetcharg dump feature (for debug) and updated test scripts
> to test part of them.
> 
> Thanks,
> 
> ---
> base-commit: 3ec75d0067f30eb5e0730f033766d6ab2feca7ae
> 
> Masami Hiramatsu (Google) (10):
>       tracing/probes: Fix double addition of offset for @+FOFFSET
>       tracing/probes: Rename FETCH_OP_DATA to FETCH_OP_IMMSTR
>       tracing/probes: Support dumping fetcharg program for debugging dynamic events
>       tracing/probes: Support typecast for various probe events
>       tracing/probes: Support nested typecast
>       tracing/probes: Type casting always involves nested calls
>       tracing/probes: Support field specifier option for typecast
>       tracing/probes: Add $current variable support
>       tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
>       tracing/probes: Add a new testcase for BTF typecasts
> 
> 
>  Documentation/trace/eprobetrace.rst                |    9 
>  Documentation/trace/fprobetrace.rst                |   10 
>  Documentation/trace/kprobetrace.rst                |   11 
>  kernel/trace/Kconfig                               |   11 
>  kernel/trace/trace.c                               |    8 
>  kernel/trace/trace_eprobe.c                        |    2 
>  kernel/trace/trace_fprobe.c                        |    2 
>  kernel/trace/trace_kprobe.c                        |    2 
>  kernel/trace/trace_probe.c                         |  582 ++++++++++++++++----
>  kernel/trace/trace_probe.h                         |   98 ++-
>  kernel/trace/trace_probe_tmpl.h                    |   27 +
>  kernel/trace/trace_uprobe.c                        |    3 
>  samples/trace_events/trace-events-sample.c         |   40 +
>  samples/trace_events/trace-events-sample.h         |   34 +
>  .../ftrace/test.d/dynevent/btf_probe_event.tc      |   51 ++
>  .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |   11 
>  .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |   11 
>  .../ftrace/test.d/kprobe/uprobe_syntax_errors.tc   |    5 
>  18 files changed, 756 insertions(+), 161 deletions(-)
>  create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
> 
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] docs/mm: clarify that we are not looking for LLM generated content
From: David Hildenbrand (Arm) @ 2026-06-23 13:56 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Matthew Wilcox, Harry Yoo, linux-mm, linux-kernel
In-Reply-To: <87wlvpct0b.fsf@trenco.lwn.net>

On 6/23/26 14:59, Jonathan Corbet wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
> 
>> On 4/20/26 23:03, David Hildenbrand (Arm) wrote:
>>> Let's make it clear that we are not looking for LLM generated content
>>> from contributors not familiar with the details of MM, as it shifts the
>>> real work onto reviewers.
>>>
>>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>>> ---
>>>  Documentation/mm/index.rst | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
>>> index 7aa2a8886908..13a79f5d092c 100644
>>> --- a/Documentation/mm/index.rst
>>> +++ b/Documentation/mm/index.rst
>>> @@ -7,6 +7,19 @@ of Linux.  If you are looking for advice on simply allocating memory,
>>>  see the :ref:`memory_allocation`.  For controlling and tuning guides,
>>>  see the :doc:`admin guide <../admin-guide/mm/index>`.
>>>  
>>> +.. note::
>>> +
>>> +  Unfortunately, parts of this guide are still incomplete or missing.
>>> +  While we appreciate contributions, documentation in this area is hard
>>> +  to get right and requires a lot of attention to detail.  New contributors
>>> +  should reach out to the relevant maintainers early.
>>> +
>>> +  This guide is expected to reflect reality, which requires contributors
>>> +  to have a detailed understanding.  Documentation generated with LLMs
>>> +  by contributors unfamiliar with these details shifts the real work onto
>>> +  reviewers, which is why such contributions will be rejected without
>>> +  further comment.
>>> +
>>>  .. toctree::
>>>     :maxdepth: 1
>>>  
>>>
>>> ---
>>> base-commit: da6b5aae84beb0917ecb0c9fbc71169d145397ff
>>> change-id: 20260420-llmdoc-21bf5fadbd6f
>>>
>>> Best regards,
>>
>> I assume this was not picked up yet? (via documentation or mm tree?)
> 
> I had figured Andrew would grab it; I can certainly do so if you'd like.

yes please. I guess I'll soon start grabbing stuff myself. Stay tuned. :)

-- 
Cheers,

David

^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 14:02 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgFEHciT3T9y+qEYRvXhDwfrggoU7Rm=f9hT3OrV+wgpNQ@mail.gmail.com>


Thanks for reviewing!

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>> This patch introduces the freeze on gmem_inode which prevents
>
> Can't find the reference now, but commit messages should take the
> imperative mood and avoid "this patch" [*]
>
> [*] https://lore.kernel.org/all/YKRWNaqzo4GVDxHP@google.com/
>

ACK. Will take care of it.

>> the fallocate call and any new page fault allocation. This will avoid
>> gmem file modification when it is being preserved
>>
>> Used srcu lock to synchronise the freeze call, where write blocks
>> until all the reads are free. And reads are re-entrant.
>>
>> Incase fault fails, It return -EPERM and VM_EXIT to userspace. userspace
>> must handle this properly as every new fault will fail.
>>
>> Signed-off-by: Tarun Sahu <tarunsahu@google.com>
>>
>> [...snip...]
>>
>> @@ -105,12 +108,20 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>>  	if (!IS_ERR(folio))
>>  		return folio;
>>
>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>> +	if (kvm_gmem_is_frozen(inode)) {
>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>> +		return ERR_PTR(-EPERM);
>> +	}
>> +
>>  	policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
>>  	folio = __filemap_get_folio_mpol(inode->i_mapping, index,
>>  					 FGP_LOCK | FGP_CREAT,
>>  					 mapping_gfp_mask(inode->i_mapping), policy);
>>  	mpol_cond_put(policy);
>>
>> +	srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>> +
>>  	/*
>>  	 * External interfaces like kvm_gmem_get_pfn() support dealing
>>  	 * with hugepages to a degree, but internally, guest_memfd currently
>> @@ -273,16 +284,30 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
>>  static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>>  			       loff_t len)
>>  {
>> +	struct inode *inode = file_inode(file);
>>  	int ret;
>> +	int idx;
>>
>> -	if (!(mode & FALLOC_FL_KEEP_SIZE))
>> -		return -EOPNOTSUPP;
>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>> +	if (kvm_gmem_is_frozen(inode)) {
>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>> +		return -EPERM;
>> +	}
>
> fallocate may eventually go to kvm_gmem_get_folio(), so that would check
> kvm_gmem_is_frozen() twice. Is this meant to catch the punch hole case?
>

Right. To catch punch hole case. And read lock being re-entrant, so I
blocked the fallocate call completely.

>>
>> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>> -		return -EOPNOTSUPP;
>> +	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>>
>> -	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
>> -		return -EINVAL;
>> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>> +
>> +	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>
> There's some reordering here. Why not let the validation happen like
> before, then check kvm_gmem_is_frozen()?
>
>>
>>  	if (mode & FALLOC_FL_PUNCH_HOLE)
>>  		ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
>>
>> [...snip...]
>>
>> +
>> +/**
>> + * kvm_gmem_freeze - Freeze or unfreeze a guest_memfd inode mapping.
>> + * @inode: The guest_memfd inode.
>> + * @freeze: True to freeze, false to unfreeze.
>> + *
>> + * This API is used strictly during the live update / preservation transition
>> + * window to prevent host userspace and guest-side faults from making any
>> + * mapping modifications (such as fallocate or page fault allocation)
>> + * to the guest_memfd page cache.
>> + *
>> + * Synchronization Strategy (Sleepable RCU):
>> + * To avoid high-contention VFS locks (like inode_lock or
>> + * filemap_invalidate_lock) on the vCPU page fault hot paths, this subsystem
>> + * implements a lightweight, system-wide Sleepable RCU (SRCU) mechanism
>> + * (`kvm_gmem_freeze_srcu`):
>> + *
>> + * Global vs. Per-Inode SRCU
>> + * ======================
>> + * A single system-wide global static `srcu_struct` is used instead of a
>> + * per-inode SRCU structure to completely prevent unprivileged users from
>> + * exhausting the host's per-CPU memory allocator. Because
>> + * `init_srcu_struct()` allocates per-CPU memory via `alloc_percpu()`, which
>> + * is not accounted by memory cgroups (memcg),
>> + * a per-inode SRCU structure would allow a tenant to bypass cgroup limits and
>> + * trigger a system-wide Out-of-Memory (OOM) crash simply by spawning a large
>> + * number of guest_memfd file descriptors (bounded only by RLIMIT_NOFILE).
>> + *
>> + * Flag Modification Note:
>> + * Since `GUEST_MEMFD_F_MAPPING_FROZEN` is the ONLY flag in
>> + * `GMEM_I(inode)->flags` that is mutated dynamically at runtime (all other
>> + * flags are creation-time flags which remain strictly read-only), there is
>> + * no possibility of concurrent bit-modification races. Therefore, a standard
>> + * `WRITE_ONCE` is fully safe and does not require complex `cmpxchg`
>> + * synchronization loops.
>> + */
>> +void kvm_gmem_freeze(struct inode *inode, bool freeze)
>> +{
>> +	u64 flags = READ_ONCE(GMEM_I(inode)->flags);
>> +
>> +	if (freeze)
>> +		flags |= GUEST_MEMFD_F_MAPPING_FROZEN;
>> +	else
>> +		flags &= ~GUEST_MEMFD_F_MAPPING_FROZEN;
>> +
>> +	WRITE_ONCE(GMEM_I(inode)->flags, flags);
>> +
>> +	if (freeze)
>> +		synchronize_srcu(&kvm_gmem_freeze_srcu);
>
> Why only synchronize on freeze but not unfreeze?

It was not needed because

Freeze => True
When an user setting freeze to true.

"Preservation will be stalled till all the current ongoing allocation
finished, and future allocations are already stopped."

Freeze => False
When an user unfreezing, current allocation/fallocate will
return -EPERM, and future one will be succeeded as freeze is set
to false. Synchronization will only stall the user, behviour does
not change.

Unless, user expects that it should be waiting for all the ongoing
drains.

>
>> +}
>> +
>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 14:03 UTC (permalink / raw)
  To: Sean Christopherson, Ackerley Tng
  Cc: Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin, Shuah Khan,
	sagis, aneesh.kumar, skhawaja, vipinsh, Pratyush Yadav, david,
	dmatlack, mark.rutland, Paolo Bonzini, Mike Rapoport,
	Alexander Graf, axelrasmussen, linux-kselftest, kexec,
	linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <ajnOnzdknfwbuJ9g@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Jun 22, 2026, Ackerley Tng wrote:
>> Tarun Sahu <tarunsahu@google.com> writes:
>> 
>> > This patch introduces the freeze on gmem_inode which prevents
>> 
>> Can't find the reference now, but commit messages should take the
>> imperative mood and avoid "this patch" [*]
>
> From Documentation/process/submitting-patches.rst:
>
>   Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour.
>
> Documentation/process/maintainer-tip.rst and Documentation/process/maintainer-kvm-x86.rst
> elaborate more on the preferred style (I do most of the guest_memfd maintenance,
> and so for all intents and purpose it's bound by KVM x86 "rules").


Thanks!. Will take care of that.

^ permalink raw reply

* Re: [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
From: Amit Machhiwal @ 2026-06-23 14:05 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Amit Machhiwal, linuxppc-dev, Madhavan Srinivasan,
	Anushree Mathur, Paolo Bonzini, Nicholas Piggin, Michael Ellerman,
	Christophe Leroy (CS GROUP), Jonathan Corbet, Shuah Khan, kvm,
	linux-kernel, linux-doc, lkp
In-Reply-To: <8733yjcb0y.fsf@vajain21.in.ibm.com>

Hi Vaibhav,

Thanks for reviewing this patch. Please find my response below.

On 2026/06/19 11:44 AM, Vaibhav Jain wrote:
> Hi Amit,
> 
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
> > documentation.
> >
> > The ioctl exposes host processor compatibility modes supported for
> > nested KVM guests on PowerPC systems. The documentation includes
> > comprehensive error code descriptions, structure field definitions
> > including the size field for forward compatibility, and KVM-specific
> > capability bit constants.
> >
> > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 47 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 52bbbb553ce1..ba6feba74d7d 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6553,6 +6553,53 @@ KVM_S390_KEYOP_SSKE
> >    Sets the storage key for the guest address ``guest_addr`` to the key
> >    specified in ``key``, returning the previous value in ``key``.
> >  
> > +4.145 KVM_PPC_GET_COMPAT_CAPS
> > +-----------------------------
> > +:Capability: KVM_CAP_PPC_COMPAT_CAPS
> > +:Architectures: powerpc
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_ppc_compat_caps (out)
> > +:Returns: 0 on success, negative value on failure
> > +
> > +Errors include:
> > +
> > +  ======== ============================================================
> > +  EFAULT   if ``struct kvm_ppc_compat_caps`` cannot be read from or
> > +           written to userspace
> > +  EINVAL   if the ``size`` field is smaller than the current structure
> > +           size, or if the backend implementation fails to retrieve or
> > +           map CPU compatibility capabilities
> > +  ENOTTY   if the backend does not implement the ``get_compat_caps``
> > +           operation (e.g., on non-pseries platforms or when the
> > +           required KVM operations are not available)
> > +  ======== ============================================================
> > +
> > +IBM POWER system server-based processors provide a compatibility mode feature
> > +where an Nth generation processor can operate in modes consistent with earlier
> > +generations such as (N-1) and (N-2).
> > +
> > +This ioctl provides userspace with information about the CPU compatibility modes
> > +supported by the current host processor for booting the nested KVM guests on
> > +PowerNV (KVM nested APIv1) and PowerVM (KVM nested APIv2) platforms.
> > +
> 
> Please add a detail on how returned 'size' field can be less than what
> the userspace has sent and how it should be handled.

Good point! However, with the suggested change from patch-1 (using
strict equality check `user_size != sizeof(host_caps)`, the returned
size will always equal what userspace sent, because we reject any size
mismatch with -EINVAL.

I'll update the documentation to clarify the strict size matching
behavior:

- Userspace must set size to sizeof(struct kvm_ppc_compat_caps) before
  the ioctl
- The kernel validates this with strict equality (!= check)
- Returns -EINVAL if size doesn't match
- On success, the returned size will always equal what was sent

This should eliminates the forward-compatibility scenario where returned
size could differ from sent size.

Thank,
Amit

> 
> > +::
> > +
> > +  struct kvm_ppc_compat_caps {
> > +	__u64	flags;			/* Reserved for future use */
> > +	__u64	size;			/* Size of this structure */
> > +	__u64	compat_capabilities;	/* Capabilities supported by the host */
> > +  };
> > +
> > +The ``compat_capabilities`` bit field describes the processor compatibility
> > +modes supported by the host. For example, the following bits indicate support
> > +for specific processor modes.
> > +
> > +::
> > +
> > +  KVM_PPC_COMPAT_CAP_POWER9  (bit 1): KVM guests can run in Power9 processor mode
> > +  KVM_PPC_COMPAT_CAP_POWER10 (bit 2): KVM guests can run in Power10 processor mode
> > +  KVM_PPC_COMPAT_CAP_POWER11 (bit 3): KVM guests can run in Power11 processor mode
> > +
> >  .. _kvm_run:
> >  
> >  5. The kvm_run structure
> > -- 
> > 2.50.1 (Apple Git-155)
> >
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* [dlech:b4/iio-adc-ti-ads122c14 10/10] htmldocs: Documentation/iio/ads112c14.rst:57: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
From: kernel test robot @ 2026-06-23 14:05 UTC (permalink / raw)
  To: David Lechner (TI); +Cc: oe-kbuild-all, linux-doc

tree:   https://github.com/dlech/linux b4/iio-adc-ti-ads122c14
head:   9a48214781dd8525063938516a2a00ffb7b3c23c
commit: 9a48214781dd8525063938516a2a00ffb7b3c23c [10/10] WIP
compiler: clang version 22.1.8 (https://github.com/llvm/llvm-project ca7933e47d3a3451d81e72ac174dcb5aa28b59d1)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260623/202606231501.hG7m0kHG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606231501.hG7m0kHG-lkp@intel.com/

All warnings (new ones prefixed by >>):

   in0_lcrit_fault_log             Set to 1 by a VIN undervoltage fault occurring.
   in0_crit_fault_log              Set to 1 by a VIN overvoltage fault occurring.
   curr1_crit_fault_log            Set to 1 by an overcurrent fault occurring.
   =======================         ========================================== [docutils]
   Documentation/iio/ads112c14.rst:56: ERROR: Unexpected indentation. [docutils]
>> Documentation/iio/ads112c14.rst:57: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
   WARNING: ./include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx'
   Documentation/mm/memfd_preservation:7: ./mm/memfd_luo.c:13: ERROR: Unexpected section title.
--
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:58: ERROR: Unknown target name: "network flags". [docutils]
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:63: ERROR: Unknown target name: "scope flags". [docutils]
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:24: ERROR: Unknown target name: "filesystem flags". [docutils]
   Documentation/userspace-api/landlock:605: ./include/uapi/linux/landlock.h:228: ERROR: Unknown target name: "filesystem flags". [docutils]
   Documentation/userspace-api/landlock:605: ./include/uapi/linux/landlock.h:251: ERROR: Unknown target name: "network flags". [docutils]
>> Documentation/iio/ads112c14.rst:22: WARNING: unknown document: '../devicetree/bindings/iio/adc/ti,ads112c14' [ref.doc]
   Documentation/networking/checksum-offloads:157: ./include/linux/skbuff.h:181: WARNING: Failed to create a cross reference. A title or caption not found: 'crc' [ref.ref]


vim +57 Documentation/iio/ads112c14.rst

    21	
  > 22	See :doc:`../devicetree/bindings/iio/adc/ti,ads112c14` for the complete
    23	binding description.
    24	
    25	The minimal device tree node looks like this::
    26	
    27	    adc@40 {
    28	        compatible = "ti,ads122c14";
    29	        reg = <0x40>;
    30	        avdd-supply = <&avdd>;
    31	        dvdd-supply = <&dvdd>;
    32	
    33	        /* TODO: add channel@ nodes here */
    34	    };
    35	
    36	Effect of devicetree on registers
    37	---------------------------------
    38	
    39	Device properties have the following effects on registers:
    40	
    41	* ``CLK_SEL`` and ``GPIO3_CFG`` are set if the ``clocks`` property is present.
    42	* ``GPIO2_SRC`` may be set when ``interrupts`` and ``interrupt-names`` for
    43	  "fault" is present.
    44	* ``GPIO3_SRC`` may be set when ``interrupts`` and ``interrupt-names`` for
    45	  "drdy" is present.
    46	
    47	Channel-specific properties have the following effects on registers only when
    48	a measurement is active on the channel:
    49	
    50	* ``AINP`` and ``AINN`` are set according to the ``single-channel`` or
    51	  ``differential-channel`` properties of each channel.
    52	* ``REF_SEL`` and ``REF_VAL`` will be set according to the value of the
    53	  ``ti,ref-source`` property.
    54	* When ``ti,ref-source`` is set to REFP-REFN:
    55	  * If ``refp-supply`` is given and not the same phandle as ``avdd-supply``,
    56	    ``REFP_BUF_EN`` will be set to 1, otherwise it will be 0.
  > 57	  * If ``refn-supply`` is given, ``REFN_BUF_EN`` will be set to 1, otherwise it
    58	    will be 0.
    59	* ``I1MUX`` and ``I2MUX`` will be set to the values of the ``excitation-channels``
    60	   property.
    61	* ``IUNIT``, ``I1MAG`` and ``I2MAG`` will be set to best fit according to the
    62	   values of the ``excitation-current-nanoamp`` property.
    63	* ``GC_EN`` will be set to 1 if the ``input-channel-rotation`` property is present,
    64	  otherwise it will be 0.
    65	* ``CODING`` is set based on the ``bipolar`` property.
    66	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [RFC PATCH 0/6] mm/damon: hardware-sampled access reports
From: Zeng Heng @ 2026-06-23 14:08 UTC (permalink / raw)
  To: Ravi Jonnalagadda, sj, akinobu.mita, damon, linux-mm,
	linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
In-Reply-To: <20260529165640.820-1-ravis.opensrc@gmail.com>

Hi Ravi,

On 2026/5/30 0:56, Ravi Jonnalagadda wrote:
> This series introduces a vendor and PMU-agnostic substrate inside DAMON
> that consumes hardware-sampled access reports through the standard
> perf-event interface.  Userspace selects the PMU through sysfs (raw
> type/config knobs), driving either Intel PEBS L3-miss sampling or AMD
> IBS Op sampling.
> 

[...]

> 
> Ravi Jonnalagadda (6):
>    mm/damon: add struct damon_perf_event{,_attr} and per-ctx perf_events
>      list
>    mm/damon/sysfs-sample: expose perf_events configuration via sysfs
>    mm/damon/sysfs: install perf_events on apply
>    mm/damon/core: per-CPU SPSC ring drain and damon_perf_event lifecycle
>    mm/damon/vaddr: implement perf-event access check
>    mm/damon: add damos_node_eligible_mem_bp tracepoint
> 
>   include/linux/damon.h        |  80 +++++
>   include/trace/events/damon.h |  49 +++
>   mm/damon/core.c              | 403 ++++++++++++++++++++----
>   mm/damon/ops-common.h        |  39 +++
>   mm/damon/sysfs-common.h      |   6 +
>   mm/damon/sysfs-sample.c      | 579 +++++++++++++++++++++++++++++++++++
>   mm/damon/sysfs.c             |   3 +
>   mm/damon/vaddr.c             | 267 ++++++++++++++++
>   8 files changed, 1370 insertions(+), 56 deletions(-)
> 
> 
> base-commit: 4c8ad15abf15eb480d3ad85f902001e35465ef18

I wasn't able to apply this patch series to the linux (and linux-next)
mainline branch, and also had trouble identifying the source of the base
commit.

Would you mind sharing where this baseline is from?

Thanks in advance!


Best Regards,
Zeng Heng

^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 14:36 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgFEHciT3T9y+qEYRvXhDwfrggoU7Rm=f9hT3OrV+wgpNQ@mail.gmail.com>

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>>  static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>>  			       loff_t len)
>>  {
>> +	struct inode *inode = file_inode(file);
>>  	int ret;
>> +	int idx;
>>
>> -	if (!(mode & FALLOC_FL_KEEP_SIZE))
>> -		return -EOPNOTSUPP;
>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>> +	if (kvm_gmem_is_frozen(inode)) {
>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>> +		return -EPERM;
>> +	}
>
> fallocate may eventually go to kvm_gmem_get_folio(), so that would check
> kvm_gmem_is_frozen() twice. Is this meant to catch the punch hole case?
>
>>
>> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>> -		return -EOPNOTSUPP;
>> +	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>>
>> -	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
>> -		return -EINVAL;
>> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>> +
>> +	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>
> There's some reordering here. Why not let the validation happen like
> before, then check kvm_gmem_is_frozen()?

To align with design. "stop the fallocate call if inode is frozen, No
need to go further". I dont have strict opinion on this. I am fine with
taking it across punch hole as well to make it more fine grained. But it
will no longer claims stop the fallocate call (allocation one is stopped
in separate path: fault path) , though functionally it does the same
thing.

WDYT?

~Tarun

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 14:55 UTC (permalink / raw)
  To: Janani Sunil
  Cc: David Lechner, Nuno Sá, Rodrigo Alencar, Conor Dooley,
	Janani Sunil, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, linux-iio, devicetree,
	linux-kernel, linux-doc, Mark Brown
In-Reply-To: <0f778d06-726f-4aec-ab3b-f6e091cc9421@gmail.com>

On Tue, 23 Jun 2026 12:07:52 +0200
Janani Sunil <jan.sun97@gmail.com> wrote:

> On 6/22/26 17:36, David Lechner wrote:
> > On 6/22/26 7:20 AM, Nuno Sá wrote:  
> >> On Mon, Jun 22, 2026 at 12:51:20PM +0100, Rodrigo Alencar wrote:  
> >>> On 22/06/26 11:29, Nuno Sá wrote:  
> >>>> On Mon, Jun 22, 2026 at 10:24:05AM +0100, Rodrigo Alencar wrote:  
> >>>>> On 21/06/26 15:33, Jonathan Cameron wrote:  
> >>>>>> On Fri, 19 Jun 2026 16:54:11 +0100
> >>>>>> Nuno Sá <noname.nuno@gmail.com> wrote:
> >>>>>>  
> >>>>>>> On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:  
> >>>>>>>> On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:  
> >>>>>>>>> On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:  
> >>>>>>>>>> On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:  
> >>>>>>>>>>> On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:  
> >>>>>>>>>>>> On 6/14/26 21:44, Jonathan Cameron wrote:  
> >>>>>>>>>>>>> On Tue, 9 Jun 2026 16:47:23 +0200
> >>>>>>>>>>>>> Janani Sunil <jan.sun97@gmail.com> wrote:
> >>>>>>>>>>>>>      
> >>>>>>>>>>>>>> On 5/26/26 15:11, Rodrigo Alencar wrote:  
> >>>>>>>>>>>>>>> On 26/05/19 05:42PM, Janani Sunil wrote:  
> >>>>>>>>>>>>>>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> >>>>>>>>>>>>>>>> buffered voltage output digital-to-analog converter (DAC) with an
> >>>>>>>>>>>>>>>> integrated precision reference.  
> >>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>> Probably others may comment on that, but...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This parent node may support device addressing for multi-device support through
> >>>>>>>>>>>>>>> those ID pins. I suppose that each device may have its own power supplies or
> >>>>>>>>>>>>>>> other resources like the toggle pins or reset and enable.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That way I suppose that an example would look like...  
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +patternProperties:
> >>>>>>>>>>>>>>>> +  "^channel@([0-9]|1[0-5])$":
> >>>>>>>>>>>>>>>> +    type: object
> >>>>>>>>>>>>>>>> +    description: Child nodes for individual channel configuration
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    properties:
> >>>>>>>>>>>>>>>> +      reg:
> >>>>>>>>>>>>>>>> +        description: Channel number.
> >>>>>>>>>>>>>>>> +        minimum: 0
> >>>>>>>>>>>>>>>> +        maximum: 15
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +      adi,output-range-microvolt:
> >>>>>>>>>>>>>>>> +        description: |
> >>>>>>>>>>>>>>>> +          Output voltage range for this channel as [min, max] in microvolts.
> >>>>>>>>>>>>>>>> +          If not specified, defaults to 0V to 5V range.
> >>>>>>>>>>>>>>>> +        oneOf:
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: 0
> >>>>>>>>>>>>>>>> +              - enum: [5000000, 10000000, 20000000, 40000000]
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -5000000
> >>>>>>>>>>>>>>>> +              - const: 5000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -10000000
> >>>>>>>>>>>>>>>> +              - const: 10000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -15000000
> >>>>>>>>>>>>>>>> +              - const: 15000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -20000000
> >>>>>>>>>>>>>>>> +              - const: 20000000
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    required:
> >>>>>>>>>>>>>>>> +      - reg
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    additionalProperties: false
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +required:
> >>>>>>>>>>>>>>>> +  - compatible
> >>>>>>>>>>>>>>>> +  - reg
> >>>>>>>>>>>>>>>> +  - vdd-supply
> >>>>>>>>>>>>>>>> +  - avdd-supply
> >>>>>>>>>>>>>>>> +  - hvdd-supply
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +dependencies:
> >>>>>>>>>>>>>>>> +  spi-cpha: [ spi-cpol ]
> >>>>>>>>>>>>>>>> +  spi-cpol: [ spi-cpha ]
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +allOf:
> >>>>>>>>>>>>>>>> +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +unevaluatedProperties: false
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +examples:
> >>>>>>>>>>>>>>>> +  - |
> >>>>>>>>>>>>>>>> +    #include <dt-bindings/gpio/gpio.h>
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    spi {
> >>>>>>>>>>>>>>>> +        #address-cells = <1>;
> >>>>>>>>>>>>>>>> +        #size-cells = <0>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +        dac@0 {
> >>>>>>>>>>>>>>>> +            compatible = "adi,ad5529r-16";
> >>>>>>>>>>>>>>>> +            reg = <0>;
> >>>>>>>>>>>>>>>> +            spi-max-frequency = <25000000>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>>> +            avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>>> +            hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>>> +            hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            #address-cells = <1>;
> >>>>>>>>>>>>>>>> +            #size-cells = <0>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@0 {
> >>>>>>>>>>>>>>>> +                reg = <0>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@1 {
> >>>>>>>>>>>>>>>> +                reg = <1>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@2 {
> >>>>>>>>>>>>>>>> +                reg = <2>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <0 40000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +        };
> >>>>>>>>>>>>>>>> +    };  
> >>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 	spi {
> >>>>>>>>>>>>>>> 		#address-cells = <1>;
> >>>>>>>>>>>>>>> 		#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 		multi-dac@0 {
> >>>>>>>>>>>>>>> 			compatible = "adi,ad5529r-16";
> >>>>>>>>>>>>>>> 			reg = <0>;
> >>>>>>>>>>>>>>> 			spi-max-frequency = <25000000>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			#address-cells = <1>;
> >>>>>>>>>>>>>>> 			#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			dac@0 {
> >>>>>>>>>>>>>>> 				reg = <0>;
> >>>>>>>>>>>>>>> 				vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>> 				avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>> 				hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>> 				hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				#address-cells = <1>;
> >>>>>>>>>>>>>>> 				#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@0 {
> >>>>>>>>>>>>>>> 					reg = <0>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@1 {
> >>>>>>>>>>>>>>> 					reg = <1>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@2 {
> >>>>>>>>>>>>>>> 					reg = <2>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 40000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>> 			}
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			dac@1 {
> >>>>>>>>>>>>>>> 				reg = <1>;
> >>>>>>>>>>>>>>> 				vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>> 				avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>> 				hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>> 				hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				#address-cells = <1>;
> >>>>>>>>>>>>>>> 				#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@0 {
> >>>>>>>>>>>>>>> 					reg = <0>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@1 {
> >>>>>>>>>>>>>>> 					reg = <1>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>> 			}
> >>>>>>>>>>>>>>> 		};
> >>>>>>>>>>>>>>> 	};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> then you might need something like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 	patternProperties:
> >>>>>>>>>>>>>>> 		"^dac@[0-3]$":
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> and put most of the things under this node pattern.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So the main driver that you're putting together might need to handle up to four instances.
> >>>>>>>>>>>>>>> Even if your current driver cannot handle this, the dt-bindings might need cover that.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Need to double check if each dac node needs a separate compatible, so you would maybe populate
> >>>>>>>>>>>>>>> a platform data to be shared with the child nodes, which would be a separate driver.
> >>>>>>>>>>>>>>> (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).  
> >>>>>>>>>>>>>> Hi Rodrigo,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you for looking at this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> >>>>>>>>>>>>>> hardware/use case we have only needs one device node and the driver is written around that model as well.
> >>>>>>>>>>>>>> While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> >>>>>>>>>>>>>> that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> >>>>>>>>>>>>>> speculatively without a validating use case.  
> >>>>>>>>>>>>> Interesting feature - kind of similar to address control on a typical i2c bus device, or
> >>>>>>>>>>>>> looking at it another way a kind of distributed SPI mux.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Challenge of a binding is we need to anticipate the future.  So I think we do need something
> >>>>>>>>>>>>> like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> >>>>>>>>>>>>> That would leave the path open to supporting the addressing at a later date.
> >>>>>>>>>>>>> An alternative might be to look at it like a chained device setup. In those we pretend there
> >>>>>>>>>>>>> is just one device with a lot of channels etc.  The snag is that here things are more loosely
> >>>>>>>>>>>>> coupled whereas for those devices it tends to be you have to read / write the same register
> >>>>>>>>>>>>> in all devices in the chain as one big SPI message.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> >>>>>>>>>>>>> - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> >>>>>>>>>>>>> value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> >>>>>>>>>>>>> be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> >>>>>>>>>>>>> longer term how to support it cleanly in SPI.  
> >>>>>>>>>>> I'd swear I have seen this before, from some Microchip devices. Let me
> >>>>>>>>>>> see if I can find what I am thinking of...  
> >>>>>>>>>>
> >>>>>>>>>> microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> >>>>>>>>>> slightly different properties.
> >>>>>>>>>>
> >>>>>>>>>>    microchip,device-addr:
> >>>>>>>>>>      description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> >>>>>>>>>>      $ref: /schemas/types.yaml#/definitions/uint32
> >>>>>>>>>>      enum: [0, 1, 2, 3]
> >>>>>>>>>>      default: 0
> >>>>>>>>>>
> >>>>>>>>>> and
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>    microchip,hw-device-address:
> >>>>>>>>>>      $ref: /schemas/types.yaml#/definitions/uint32
> >>>>>>>>>>      minimum: 0
> >>>>>>>>>>      maximum: 3
> >>>>>>>>>>      description:
> >>>>>>>>>>        The address is set on a per-device basis by fuses in the factory,
> >>>>>>>>>>        configured on request. If not requested, the fuses are set for 0x1.
> >>>>>>>>>>        The device address is part of the device markings to avoid
> >>>>>>>>>>        potential confusion. This address is coded on two bits, so four possible
> >>>>>>>>>>        addresses are available when multiple devices are present on the same
> >>>>>>>>>>        SPI bus with only one Chip Select line for all devices.
> >>>>>>>>>>        Each device communication starts by a CS falling edge, followed by the
> >>>>>>>>>>        clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> >>>>>>>>>>        which is first one on the wire).
> >>>>>>>>>>
> >>>>>>>>>> This sounds exactly like the sort of feature that you're dealing with
> >>>>>>>>>> here?
> >>>>>>>>>>      
> >>>>>>>>> The core idea yes but for this chip, things are a bit more annoying (but
> >>>>>>>>> Janani can correct me if I'm wrong). Here, each device can, in theory,
> >>>>>>>>> have it's own supplies, pins and at the very least, channels with maybe
> >>>>>>>>> different scales. That is why Janani is proposing dac nodes. Given I
> >>>>>>>>> honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> >>>>>>>>> about solving this at the spi level.
> >>>>>>>>>
> >>>>>>>>> Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> >>>>>>>>> together in the same bus.  
> >>>>>>>> I'm definitely missing something, because that property for the
> >>>>>>>> microchip devices is not impacted what else is on the bus. AFAICT, you
> >>>>>>>> could have an mcp3911 and an mcp3564 on the same bus even though both
> >>>>>>>> are completely different devices with different drivers. They have
> >>>>>>>> individual device nodes and their own supplies etc etc. These aren't
> >>>>>>>> per-channel properties on an adc or dac, they're per child device on a
> >>>>>>>> spi bus.  
> >>>>>>> Maybe I'm the one missing something :). IIRC, spi would not allow two
> >>>>>>> devices on the same CS right? Because for this chip we would need
> >>>>>>> something like:
> >>>>>>>
> >>>>>>> spi {
> >>>>>>> 	dac@0 {
> >>>>>>> 		reg = <0>;
> >>>>>>> 		adi,pin-id = <0>;
> >>>>>>> 	};
> >>>>>>>
> >>>>>>> 	dac@1 {
> >>>>>>> 		reg = <0>; // which seems already problematic?
> >>>>>>> 		adi,pin-id <1>;
> >>>>>>> 	};
> >>>>>>>
> >>>>>>> 	...
> >>>>>>>
> >>>>>>> 	//up to 4
> >>>>>>> };  
> >>>>>> Yeah. It's not clear to me how that works for the microchip devices
> >>>>>> (I suspect it doesn't!)
> >>>>>>
> >>>>>> Just thinking as I type, but could we do something a bit nasty with
> >>>>>> a gpio mux that doesn't actually switch but represents the GPIO being
> >>>>>> shared?  Given this is all tied to the spi bus that should all happen
> >>>>>> under serializing locks.
> >>>>>>
> >>>>>> Agreed though that this would be nicer as an SPI thing that let
> >>>>>> us specify that a single CS is share by multiple devices and their
> >>>>>> is some other signal acting to select which one we are talking to.
> >>>>>>  
> >>>>> If the device-addressing on the same chip-select is to be handled
> >>>>> by the spi framework, wouldn't we lose device-specific features?
> >>>>>
> >>>>> I understand that this multi-device feature is there mostly to extend the
> >>>>> channel count from 16 to 32, 48 or 64. I suppose the command:
> >>>>>
> >>>>> 	"MULTI DEVICE SW LDAC MODE"
> >>>>>
> >>>>> exists so that software can update channel values accross multiple devices.  
> >>>> Right! You do have a point! I agree the main driver for a feature like
> >>>> this is likely to extend the channel count and effectively "aggregate"
> >>>> devices.
> >>>>
> >>>> But I would say that even with the spi solution the MULTI DEVICE stuff
> >>>> should be doable (as we still need a sort of adi,pin-id property).  
> >>> I don't think we can have something like an IIO buffer shared by multiple
> >>> devices. Synchronizing separate devices would be doable with proper hardware
> >>> support for this (probably involving an FGPA).  
> >> True!
> >>  
> >>>     
> >>>> But yes, I do feel that the whole feature is for aggregation so seeing
> >>>> one device with 32 channels is the expectation here? Rather than seeing
> >>>> two devices with 16 channels.  
> >>> Yes, I think aggregation is the whole point there... so that the IIO driver
> >>> is multi-device-aware.  
> >> Which makes me feel that different pins per device might be possible
> >> from an HW point of view but does not make much sense. For example, for
> >> the buffer example I would expect LDAC to be shared between all the
> >> devices.
> >>
> >> - Nuno Sá  
> > I think I mentioned this on a previous revision, but I still think the
> > simplest way to go about it would be to assume that all chips treated
> > as an aggregate device have everything wired in parallel and just add
> > support for per-chip wiring on an as-needed basis. This is how we have
> > handled daisy-chained devices so far.  
> 
> Hi David,
> 
> One thing about this approach is that is does not cover a combination  of 12 and 16 bit parts in the chain,
> since the compatible string would be at the top level and apply to all chips. To handle this without per chip child nodes or per-chip compatible,
> I propose an "adi, resolution" property as an integer array, indexed by the device position:
> 

To me mixed parts, smells like a case that we will probably never see.  If
a customer actually has a design doing that we can add optional dt
bindings at that time.

So I'd stick to separate compatibles. 

> 
> dac@0 {
>      compatible = "adi,ad5529r";
>      reg = <0>;
>      adi,device-addrs = <0 1>;
>      adi,resolution   = <16 12>;   /* per-chip, indexed by position */
>      reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
>      vdd-supply  = <&vdd_reg>;
>      hvdd-supply = <&hvdd_reg>;
> 
>      channel@0  { reg = <0>;  adi,output-range-microvolt = <0 5000000>; };
>      channel@16 { reg = <16>; adi,output-range-microvolt = <0 40000000>; };




> };
> 
> 
> 1) This follows the daisy-chain/aggregated model as you suggested, exposing N*16 channels as a single IIO device.
> 2) Keeps the binding flat- no phantom compatible at a parent bus node, no per-chip child nodes.
> 3) Enables a 12 bit + 16 bit device combination in the chain, without needing a per-chip compatible.
> 4) adi, device-addrs specifies the HW address, allowing the driver to encode it into the SPI frame.
> 5) Supplies and GPIOs remain simple- assuming parallel wiring across all chips.
> 
> Jonathan, you had earlier suggested using separate compatibles
>   (adi,ad5529r-16 and adi,ad5529r-12) to handle the
>   resolution difference.
> However, with the aggregated flat binding model,
>   separate per-chip compatibles would require child nodes which brings
> back the phantom compatible problem at the parent level. The
>   adi,resolution array is intended as an alternative that achieves the
> same goal-expressing per-chip resolution, without needing a per-chip
>   compatible or child node structure.
> 
> Does this look reasonable?
> 
> Best Regards,
> Janani Sunil
> 
> 


^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 14:57 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: Nuno Sá, Conor Dooley, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <5guhkvoxhkzevjm4b45hhk3772akpnz3givr3gb56x5ywamzyf@oiycreun4y62>

On Tue, 23 Jun 2026 09:09:14 +0100
Rodrigo Alencar <455.rodrigo.alencar@gmail.com> wrote:

> On 22/06/26 13:20, Nuno Sá wrote:
> > On Mon, Jun 22, 2026 at 12:51:20PM +0100, Rodrigo Alencar wrote:  
> > > On 22/06/26 11:29, Nuno Sá wrote:  
> > > > On Mon, Jun 22, 2026 at 10:24:05AM +0100, Rodrigo Alencar wrote:  
> > > > > On 21/06/26 15:33, Jonathan Cameron wrote:  
> > > > > > On Fri, 19 Jun 2026 16:54:11 +0100
> > > > > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > > > >   
> > > > > > > On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:  
> > > > > > > > On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:    
> > > > > > > > > On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:    
> > > > > > > > > > On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:    
> > > > > > > > > > > On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:    
> > > > > > > > > > > > 
> > > > > > > > > > > > On 6/14/26 21:44, Jonathan Cameron wrote:    
> > > > > > > > > > > > > On Tue, 9 Jun 2026 16:47:23 +0200
> > > > > > > > > > > > > Janani Sunil <jan.sun97@gmail.com> wrote:
> > > > > > > > > > > > >     
> > > > > > > > > > > > > > On 5/26/26 15:11, Rodrigo Alencar wrote:    
> > > > > > > > > > > > > > > On 26/05/19 05:42PM, Janani Sunil wrote:    
> > > > > > > > > > > > > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > > > > > > > > > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > > > > > > > > > > > > integrated precision reference.    
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > Probably others may comment on that, but...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > This parent node may support device addressing for multi-device support through
> > > > > > > > > > > > > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > > > > > > > > > > > > other resources like the toggle pins or reset and enable.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That way I suppose that an example would look like...    
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +patternProperties:
> > > > > > > > > > > > > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > > > > > > > > > > > > +    type: object
> > > > > > > > > > > > > > > > +    description: Child nodes for individual channel configuration
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    properties:
> > > > > > > > > > > > > > > > +      reg:
> > > > > > > > > > > > > > > > +        description: Channel number.
> > > > > > > > > > > > > > > > +        minimum: 0
> > > > > > > > > > > > > > > > +        maximum: 15
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +      adi,output-range-microvolt:
> > > > > > > > > > > > > > > > +        description: |
> > > > > > > > > > > > > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > > > > > > > > > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > > > > > > > > > > > > +        oneOf:
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: 0
> > > > > > > > > > > > > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -5000000
> > > > > > > > > > > > > > > > +              - const: 5000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -10000000
> > > > > > > > > > > > > > > > +              - const: 10000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -15000000
> > > > > > > > > > > > > > > > +              - const: 15000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -20000000
> > > > > > > > > > > > > > > > +              - const: 20000000
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    required:
> > > > > > > > > > > > > > > > +      - reg
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    additionalProperties: false
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +required:
> > > > > > > > > > > > > > > > +  - compatible
> > > > > > > > > > > > > > > > +  - reg
> > > > > > > > > > > > > > > > +  - vdd-supply
> > > > > > > > > > > > > > > > +  - avdd-supply
> > > > > > > > > > > > > > > > +  - hvdd-supply
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +dependencies:
> > > > > > > > > > > > > > > > +  spi-cpha: [ spi-cpol ]
> > > > > > > > > > > > > > > > +  spi-cpol: [ spi-cpha ]
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +allOf:
> > > > > > > > > > > > > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +unevaluatedProperties: false
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +examples:
> > > > > > > > > > > > > > > > +  - |
> > > > > > > > > > > > > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    spi {
> > > > > > > > > > > > > > > > +        #address-cells = <1>;
> > > > > > > > > > > > > > > > +        #size-cells = <0>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +        dac@0 {
> > > > > > > > > > > > > > > > +            compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > > > +            reg = <0>;
> > > > > > > > > > > > > > > > +            spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            #address-cells = <1>;
> > > > > > > > > > > > > > > > +            #size-cells = <0>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@0 {
> > > > > > > > > > > > > > > > +                reg = <0>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@1 {
> > > > > > > > > > > > > > > > +                reg = <1>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@2 {
> > > > > > > > > > > > > > > > +                reg = <2>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +        };
> > > > > > > > > > > > > > > > +    };    
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 	spi {
> > > > > > > > > > > > > > > 		#address-cells = <1>;
> > > > > > > > > > > > > > > 		#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 		multi-dac@0 {
> > > > > > > > > > > > > > > 			compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > > 			reg = <0>;
> > > > > > > > > > > > > > > 			spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			#address-cells = <1>;
> > > > > > > > > > > > > > > 			#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			dac@0 {
> > > > > > > > > > > > > > > 				reg = <0>;
> > > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@2 {
> > > > > > > > > > > > > > > 					reg = <2>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			dac@1 {
> > > > > > > > > > > > > > > 				reg = <1>;
> > > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > > 		};
> > > > > > > > > > > > > > > 	};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > then you might need something like:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 	patternProperties:
> > > > > > > > > > > > > > > 		"^dac@[0-3]$":
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > and put most of the things under this node pattern.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > > > > > > > > > > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > > > > > > > > > > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > > > > > > > > > > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).    
> > > > > > > > > > > > > > Hi Rodrigo,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thank you for looking at this.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > > > > > > > > > > > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > > > > > > > > > > > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > > > > > > > > > > > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > > > > > > > > > > > > speculatively without a validating use case.    
> > > > > > > > > > > > > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > > > > > > > > > > > > looking at it another way a kind of distributed SPI mux.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > > > > > > > > > > > > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > > > > > > > > > > > > That would leave the path open to supporting the addressing at a later date.
> > > > > > > > > > > > > An alternative might be to look at it like a chained device setup. In those we pretend there
> > > > > > > > > > > > > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > > > > > > > > > > > > coupled whereas for those devices it tends to be you have to read / write the same register
> > > > > > > > > > > > > in all devices in the chain as one big SPI message.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > > > > > > > > > > > > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > > > > > > > > > > > > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > > > > > > > > > > > > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > > > > > > > > > > > > longer term how to support it cleanly in SPI.    
> > > > > > > > > > > 
> > > > > > > > > > > I'd swear I have seen this before, from some Microchip devices. Let me
> > > > > > > > > > > see if I can find what I am thinking of...    
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> > > > > > > > > > slightly different properties.
> > > > > > > > > > 
> > > > > > > > > >   microchip,device-addr:
> > > > > > > > > >     description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> > > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > > >     enum: [0, 1, 2, 3]
> > > > > > > > > >     default: 0
> > > > > > > > > > 
> > > > > > > > > > and
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >   microchip,hw-device-address:
> > > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > > >     minimum: 0
> > > > > > > > > >     maximum: 3
> > > > > > > > > >     description:
> > > > > > > > > >       The address is set on a per-device basis by fuses in the factory,
> > > > > > > > > >       configured on request. If not requested, the fuses are set for 0x1.
> > > > > > > > > >       The device address is part of the device markings to avoid
> > > > > > > > > >       potential confusion. This address is coded on two bits, so four possible
> > > > > > > > > >       addresses are available when multiple devices are present on the same
> > > > > > > > > >       SPI bus with only one Chip Select line for all devices.
> > > > > > > > > >       Each device communication starts by a CS falling edge, followed by the
> > > > > > > > > >       clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> > > > > > > > > >       which is first one on the wire).
> > > > > > > > > > 
> > > > > > > > > > This sounds exactly like the sort of feature that you're dealing with
> > > > > > > > > > here?
> > > > > > > > > >     
> > > > > > > > > 
> > > > > > > > > The core idea yes but for this chip, things are a bit more annoying (but
> > > > > > > > > Janani can correct me if I'm wrong). Here, each device can, in theory,
> > > > > > > > > have it's own supplies, pins and at the very least, channels with maybe
> > > > > > > > > different scales. That is why Janani is proposing dac nodes. Given I
> > > > > > > > > honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> > > > > > > > > about solving this at the spi level.
> > > > > > > > > 
> > > > > > > > > Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> > > > > > > > > together in the same bus.    
> > > > > > > > 
> > > > > > > > I'm definitely missing something, because that property for the
> > > > > > > > microchip devices is not impacted what else is on the bus. AFAICT, you
> > > > > > > > could have an mcp3911 and an mcp3564 on the same bus even though both
> > > > > > > > are completely different devices with different drivers. They have
> > > > > > > > individual device nodes and their own supplies etc etc. These aren't
> > > > > > > > per-channel properties on an adc or dac, they're per child device on a
> > > > > > > > spi bus.    
> > > > > > > 
> > > > > > > Maybe I'm the one missing something :). IIRC, spi would not allow two
> > > > > > > devices on the same CS right? Because for this chip we would need
> > > > > > > something like:
> > > > > > > 
> > > > > > > spi {
> > > > > > > 	dac@0 {
> > > > > > > 		reg = <0>;
> > > > > > > 		adi,pin-id = <0>;
> > > > > > > 	};
> > > > > > > 
> > > > > > > 	dac@1 {
> > > > > > > 		reg = <0>; // which seems already problematic?
> > > > > > > 		adi,pin-id <1>;
> > > > > > > 	};
> > > > > > > 
> > > > > > > 	...
> > > > > > > 
> > > > > > > 	//up to 4
> > > > > > > };  
> > > > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > > > (I suspect it doesn't!)
> > > > > > 
> > > > > > Just thinking as I type, but could we do something a bit nasty with
> > > > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > > > under serializing locks. 
> > > > > > 
> > > > > > Agreed though that this would be nicer as an SPI thing that let
> > > > > > us specify that a single CS is share by multiple devices and their
> > > > > > is some other signal acting to select which one we are talking to.
> > > > > >   
> > > > > 
> > > > > If the device-addressing on the same chip-select is to be handled
> > > > > by the spi framework, wouldn't we lose device-specific features?
> > > > > 
> > > > > I understand that this multi-device feature is there mostly to extend the
> > > > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > > > 
> > > > > 	"MULTI DEVICE SW LDAC MODE"
> > > > > 
> > > > > exists so that software can update channel values accross multiple devices.  
> > > > 
> > > > Right! You do have a point! I agree the main driver for a feature like
> > > > this is likely to extend the channel count and effectively "aggregate"
> > > > devices.
> > > > 
> > > > But I would say that even with the spi solution the MULTI DEVICE stuff
> > > > should be doable (as we still need a sort of adi,pin-id property).   
> > > 
> > > I don't think we can have something like an IIO buffer shared by multiple
> > > devices. Synchronizing separate devices would be doable with proper hardware
> > > support for this (probably involving an FGPA).  
> > 
> > True!
> >   
> > >    
> > > > But yes, I do feel that the whole feature is for aggregation so seeing
> > > > one device with 32 channels is the expectation here? Rather than seeing
> > > > two devices with 16 channels.  
> > > 
> > > Yes, I think aggregation is the whole point there... so that the IIO driver
> > > is multi-device-aware.  
> > 
> > Which makes me feel that different pins per device might be possible
> > from an HW point of view but does not make much sense. For example, for
> > the buffer example I would expect LDAC to be shared between all the
> > devices.  
> 
> That is why I would still suggest the multi-dac node in the middle...
> the parent node can hold shared resources, while the dac children can
> have their own, overriding or inheriting stuff.
> 
Before going down that path I'd want confirmation this is something we
actually think anyone will build.

Jonathan



^ permalink raw reply

* Re: [PATCH][v2] mm/dmapool: Untangle CONFIG_SLUB_DEBUG_ON abuse and switch to static key
From: Liam R . Howlett @ 2026-06-23 14:58 UTC (permalink / raw)
  To: lirongqing
  Cc: Jonathan Corbet, Shuah Khan, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-doc, linux-kernel,
	linux-mm
In-Reply-To: <20260623121245.2221-1-lirongqing@baidu.com>


When generating new versions of a patch, using git format-patch -v[#]
will make the subject [PATCH v2] as apposed to appending v2 after the
[PATCH] like it looks like you did.

On 26/06/23 08:12PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> The dmapool subsystem historically wrapped its debugging logic inside an
> This approach is fundamentally flawed because CONFIG_SLUB_DEBUG_ON
> merely defines compile-time defaults for SLUB and caused two flaws:

This looks truncated? "inside an This approach.."

> 
> On production kernels where CONFIG_SLUB_DEBUG=y but
> CONFIG_SLUB_DEBUG_ON=n, dmapool debugging was completely compiled out
> at compile time, leaving no way to enable it without rebuilding the
> kernel.
> 
> On kernels with CONFIG_SLUB_DEBUG_ON=y, dmapool debugging stayed
> unconditionally active even if a user explicitly disabled slub debugging
> at boot time.
> 
> Clean up this mess by removing the #ifdef and switching to a runtime
> static key (dmapool_debug_enabled), allowing dmapool debugging to be
> toggled cleanly via its own boot parameter: dmapool_debug
> 
> Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Liam R. Howlett <liam@infradead.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
> Diff with v1:  Move the static key check out of pool_init_page etc 

Link to v1 here is a good idea so that people can see the diff and the
conversation.

> 
>  Documentation/admin-guide/kernel-parameters.txt |  5 +++
>  mm/dmapool.c                                    | 57 ++++++++++++++-----------
>  2 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 19c9a19..66d853c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1304,6 +1304,11 @@ Kernel parameters
>  
>  	dis_ucode_ldr	[X86] Disable the microcode loader.
>  
> +	dmapool_debug	[MM]
> +			Enable DMA pool debugging. This enables memory
> +			poisoning and validation for DMA pool allocations.
> +			Useful for debugging DMA API misuse.
> +
>  	dma_debug=off	If the kernel is compiled with DMA_API_DEBUG support,
>  			this option disables the debugging code at boot.
>  
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 5d8af6e..7bd037a 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -35,10 +35,23 @@
>  #include <linux/string.h>
>  #include <linux/types.h>
>  #include <linux/wait.h>
> +#include <linux/static_key.h>
> +#include <linux/init.h>
>  
> -#ifdef CONFIG_SLUB_DEBUG_ON
> -#define DMAPOOL_DEBUG 1
> -#endif
> +/*
> + * Debugging support for dmapool using static key.
> + *
> + * This allows enabling dmapool debug at boot time via:
> + *   dmapool_debug
> + */
> +static DEFINE_STATIC_KEY_FALSE(dmapool_debug_enabled);
> +
> +static int __init dmapool_debug_setup(char *str)
> +{
> +	static_branch_enable(&dmapool_debug_enabled);
> +	return 1;
> +}
> +__setup("dmapool_debug", dmapool_debug_setup);
>  
>  struct dma_block {
>  	struct dma_block *next_block;
> @@ -92,7 +105,6 @@ static ssize_t pools_show(struct device *dev, struct device_attribute *attr, cha
>  
>  static DEVICE_ATTR_RO(pools);
>  
> -#ifdef DMAPOOL_DEBUG
>  static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
>  			     gfp_t mem_flags)
>  {
> @@ -161,23 +173,6 @@ static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
>  {
>  	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
>  }
> -#else
> -static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
> -			     gfp_t mem_flags)
> -{
> -}
> -
> -static bool pool_block_err(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
> -{
> -	if (want_init_on_free())
> -		memset(vaddr, 0, pool->size);
> -	return false;
> -}
> -
> -static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
> -{
> -}
> -#endif
>  
>  static struct dma_block *pool_block_pop(struct dma_pool *pool)
>  {
> @@ -305,7 +300,9 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
>  	unsigned int next_boundary = pool->boundary, offset = 0;
>  	struct dma_block *block, *first = NULL, *last = NULL;
>  
> -	pool_init_page(pool, page);
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		pool_init_page(pool, page);
> +
>  	while (offset + pool->size <= pool->allocation) {
>  		if (offset + pool->size > next_boundary) {
>  			offset = next_boundary;
> @@ -433,7 +430,10 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  	spin_unlock_irqrestore(&pool->lock, flags);
>  
>  	*handle = block->dma;
> -	pool_check_block(pool, block, mem_flags);
> +
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		pool_check_block(pool, block, mem_flags);
> +
>  	if (want_init_on_alloc(mem_flags))
>  		memset(block, 0, pool->size);
>  
> @@ -454,9 +454,18 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
>  {
>  	struct dma_block *block = vaddr;
>  	unsigned long flags;
> +	bool err = false;
>  
>  	spin_lock_irqsave(&pool->lock, flags);
> -	if (!pool_block_err(pool, vaddr, dma)) {
> +
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		err = pool_block_err(pool, vaddr, dma);
> +	else {
> +		if (want_init_on_free())
> +			memset(vaddr, 0, pool->size);
> +	}
> +
> +	if (!err) {
>  		pool_block_push(pool, block, dma);
>  		pool->nr_active--;
>  	}
> -- 
> 2.9.4
> 
> 

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Luiz Augusto von Dentz @ 2026-06-23 15:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <20260622234803.6982-1-ebiggers@kernel.org>

Hi Eric,

On Mon, Jun 22, 2026 at 7:50 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> AF_ALG is a frequent source of vulnerabilities and a maintenance
> nightmare.  It exposes far more functionality to userspace than ever
> should have been exposed, especially to unprivileged processes.  Recent
> exploits have targeted kernel internal implementation details like
> "authencesn" that have zero use case for userspace access.
>
> Fortunately, AF_ALG is rarely used in practice, as userspace crypto
> libraries exist.  And when it is used, only some functionality is known
> to be used, and many users are known to hold capabilities already.
> iwd for example requires CAP_NET_ADMIN and has a known algorithm list
> (https://lore.kernel.org/linux-crypto/bcbbef00-5881-421b-8892-7be6c04b832d@gmail.com/).
>
> Thus, let's restrict the set of allowed algorithms by default, depending
> on the capabilities held.
>
> Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
>
>     0: unrestricted
>     1: limited functionality
>     2: completely disabled
>
> Set the default value to 1, which enables an algorithm allowlist for
> unprivileged processes and a slightly longer allowlist for privileged
> processes.
>
> Note that the list may be tweaked in the future.  However, the common
> use cases such as iwd and bluez are taken into account already.  I've
> tested that iwd still works with the default value of 1.
>
> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
> ---
>  Documentation/admin-guide/sysctl/crypto.rst | 36 +++++++++++
>  Documentation/crypto/userspace-if.rst       | 13 +++-
>  crypto/af_alg.c                             | 72 +++++++++++++++++++--
>  crypto/algif_aead.c                         | 11 ++++
>  crypto/algif_hash.c                         | 24 +++++++
>  crypto/algif_rng.c                          |  9 +++
>  crypto/algif_skcipher.c                     | 20 ++++++
>  include/crypto/if_alg.h                     |  8 +++
>  8 files changed, 184 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/crypto.rst b/Documentation/admin-guide/sysctl/crypto.rst
> index b707bd314a64..9a1bd53287f4 100644
> --- a/Documentation/admin-guide/sysctl/crypto.rst
> +++ b/Documentation/admin-guide/sysctl/crypto.rst
> @@ -5,10 +5,46 @@
>  These files show up in ``/proc/sys/crypto/``, depending on the
>  kernel configuration:
>
>  .. contents:: :local:
>
> +.. _af_alg_restrict:
> +
> +af_alg_restrict
> +===============
> +
> +Controls the level of restriction of AF_ALG.
> +
> +AF_ALG is a deprecated and rarely-used userspace interface that is a
> +frequent source of vulnerabilities. It also unnecessarily exposes a
> +large number of kernel implementation details. For more information
> +about AF_ALG, see :ref:`Documentation/crypto/userspace-if.rst
> +<crypto_userspace_interface>`.
> +
> +Starting in Linux v7.3, AF_ALG supports only a limited set of
> +algorithms by default. This sysctl allows the system administrator to
> +remove this restriction when needed for compatibility reasons, or to
> +go further and disable AF_ALG entirely. The default value is 1.
> +
> +===  ==================================================================
> +0    AF_ALG is unrestricted.
> +
> +1    AF_ALG is supported with a limited list of algorithms. The list
> +     is designed for compatibility with known users such as iwd and
> +     bluez that haven't yet been fixed to use userspace crypto code.

Is the expectation that we go shopping for userspace crypto here? This
would mean the Bluetooth stack would use 2 different crypto stacks:
the kernel for module parts and a userspace library for components
like bluetoothd, resulting in 2 different sources to audit. So, while
I agree we should restrict it to just the algorithms in use and with
capabilities, I don't think we should assume the current userspace
users are broken and need fixing.

> +     Specifically, there is an allowlist for unprivileged processes
> +     and a somewhat longer allowlist for processes that hold
> +     CAP_SYS_ADMIN or CAP_NET_ADMIN in the initial user namespace.
> +
> +     Attempts to bind() an AF_ALG socket with a disallowed algorithm
> +     fail with ENOENT.
> +
> +2    AF_ALG is completely disabled. Attempts to create an AF_ALG
> +     socket fail with EAFNOSUPPORT.
> +===  ==================================================================
> +
>  fips_enabled
>  ============
>
>  Read-only flag that indicates whether FIPS mode is enabled.
>
> diff --git a/Documentation/crypto/userspace-if.rst b/Documentation/crypto/userspace-if.rst
> index ab93300c8e04..d6194346e366 100644
> --- a/Documentation/crypto/userspace-if.rst
> +++ b/Documentation/crypto/userspace-if.rst
> @@ -1,5 +1,7 @@
> +.. _crypto_userspace_interface:
> +
>  User Space Interface
>  ====================
>
>  Introduction
>  ------------
> @@ -10,13 +12,18 @@ code.
>
>  AF_ALG is insecure and is deprecated. Originally added to the kernel in 2010,
>  most kernel developers now consider it to be a mistake. Support for hardware
>  accelerators, which was the original purpose of AF_ALG, has been removed.
>
> -AF_ALG continues to be supported only for backwards compatibility. On systems
> -where no programs using AF_ALG remain, the support for it should be disabled by
> -disabling ``CONFIG_CRYPTO_USER_API_*``.
> +AF_ALG continues to be supported only for backwards compatibility.
> +
> +Starting in Linux v7.3, the set of algorithms supported by AF_ALG is limited by
> +default. See :ref:`/proc/sys/crypto/af_alg_restrict <af_alg_restrict>`.
> +
> +On systems where no programs using AF_ALG remain, the support for it should be
> +disabled entirely by setting ``/proc/sys/crypto/af_alg_restrict`` to 2 or by
> +disabling ``CONFIG_CRYPTO_USER_API_*`` in the kernel configuration.
>
>  Deprecation
>  -----------
>
>  AF_ALG was originally intended to provide userspace programs access to crypto
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index cce000e8590e..34b801568fba 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -6,10 +6,11 @@
>   *
>   * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
>   */
>
>  #include <linux/atomic.h>
> +#include <linux/capability.h>
>  #include <crypto/if_alg.h>
>  #include <linux/crypto.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/key.h>
> @@ -20,14 +21,32 @@
>  #include <linux/rwsem.h>
>  #include <linux/sched.h>
>  #include <linux/sched/signal.h>
>  #include <linux/security.h>
>  #include <linux/string.h>
> +#include <linux/sysctl.h>
> +#include <linux/user_namespace.h>
>  #include <keys/user-type.h>
>  #include <keys/trusted-type.h>
>  #include <keys/encrypted-type.h>
>
> +static int af_alg_restrict = 1;
> +
> +static const struct ctl_table af_alg_table[] = {
> +       {
> +               .procname       = "af_alg_restrict",
> +               .data           = &af_alg_restrict,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec_minmax,
> +               .extra1         = SYSCTL_ZERO,
> +               .extra2         = SYSCTL_TWO,
> +       },
> +};
> +
> +static struct ctl_table_header *af_alg_header;
> +
>  struct alg_type_list {
>         const struct af_alg_type *type;
>         struct list_head list;
>  };
>
> @@ -108,10 +127,43 @@ int af_alg_unregister_type(const struct af_alg_type *type)
>
>         return err;
>  }
>  EXPORT_SYMBOL_GPL(af_alg_unregister_type);
>
> +static bool af_alg_capable(void)
> +{
> +       return ns_capable_noaudit(&init_user_ns, CAP_NET_ADMIN) ||
> +              capable(CAP_SYS_ADMIN);
> +}
> +
> +int af_alg_check_restriction(const char *name,
> +                            const struct af_alg_allowlist_entry allowlist[])
> +{
> +       int level = READ_ONCE(af_alg_restrict);
> +
> +       if (level == 0)
> +               return 0;
> +       if (level == 1) {
> +               for (const struct af_alg_allowlist_entry *ent = allowlist;
> +                    ent->name; ent++) {
> +                       if (strcmp(name, ent->name) == 0 &&
> +                           (!ent->privileged || af_alg_capable()))
> +                               return 0;
> +               }
> +       }
> +       /*
> +        * Use -ENOENT (the error code for "algorithm not found") instead of
> +        * -EACCES or -EPERM, for the highest chance of correctly triggering
> +        * fallback code paths in userspace programs.
> +        *
> +        * Don't log a warning, since it would be noisy.  iwd tries to bind a
> +        * bunch of algorithms that it never uses.
> +        */
> +       return -ENOENT;
> +}
> +EXPORT_SYMBOL_GPL(af_alg_check_restriction);
> +
>  static void alg_do_release(const struct af_alg_type *type, void *private)
>  {
>         if (!type)
>                 return;
>
> @@ -504,10 +556,13 @@ static int alg_create(struct net *net, struct socket *sock, int protocol,
>                       int kern)
>  {
>         struct sock *sk;
>         int err;
>
> +       if (READ_ONCE(af_alg_restrict) == 2)
> +               return -EAFNOSUPPORT;
> +
>         if (sock->type != SOCK_SEQPACKET)
>                 return -ESOCKTNOSUPPORT;
>         if (protocol != 0)
>                 return -EPROTONOSUPPORT;
>
> @@ -1220,31 +1275,36 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
>  }
>  EXPORT_SYMBOL_GPL(af_alg_get_rsgl);
>
>  static int __init af_alg_init(void)
>  {
> -       int err = proto_register(&alg_proto, 0);
> +       int err;
> +
> +       af_alg_header = register_sysctl("crypto", af_alg_table);
>
> +       err = proto_register(&alg_proto, 0);
>         if (err)
> -               goto out;
> +               goto out_unregister_sysctl;
>
>         err = sock_register(&alg_family);
> -       if (err != 0)
> +       if (err)
>                 goto out_unregister_proto;
>
> -out:
> -       return err;
> +       return 0;
>
>  out_unregister_proto:
>         proto_unregister(&alg_proto);
> -       goto out;
> +out_unregister_sysctl:
> +       unregister_sysctl_table(af_alg_header);
> +       return err;
>  }
>
>  static void __exit af_alg_exit(void)
>  {
>         sock_unregister(PF_ALG);
>         proto_unregister(&alg_proto);
> +       unregister_sysctl_table(af_alg_header);
>  }
>
>  module_init(af_alg_init);
>  module_exit(af_alg_exit);
>  MODULE_DESCRIPTION("Crypto userspace interface");
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index 787aac8aeb24..b9217f9086aa 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -32,10 +32,15 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry aead_allowlist[] = {
> +       { "ccm(aes)", true }, /* bluez */
> +       {},
> +};
> +
>  static inline bool aead_sufficient_data(struct sock *sk)
>  {
>         struct alg_sock *ask = alg_sk(sk);
>         struct sock *psk = ask->parent;
>         struct alg_sock *pask = alg_sk(psk);
> @@ -342,10 +347,16 @@ static struct proto_ops algif_aead_ops_nokey = {
>         .poll           =       af_alg_poll,
>  };
>
>  static void *aead_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, aead_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_aead(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void aead_release(void *private)
>  {
> diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
> index 5452ad6c1506..a8d958d51ece 100644
> --- a/crypto/algif_hash.c
> +++ b/crypto/algif_hash.c
> @@ -14,10 +14,28 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry hash_allowlist[] = {
> +       { "cmac(aes)", true }, /* iwd, bluez */
> +       { "hmac(md5)", true }, /* iwd */
> +       { "hmac(sha1)", true }, /* iwd */
> +       { "hmac(sha224)", true }, /* iwd */
> +       { "hmac(sha256)", true }, /* iwd */
> +       { "hmac(sha384)", true }, /* iwd */
> +       { "hmac(sha512)", true }, /* iwd, sha512hmac */
> +       { "md4", true }, /* iwd */
> +       { "md5", true }, /* iwd */
> +       { "sha1", false }, /* iwd, iproute2 < 7.0 */
> +       { "sha224", true }, /* iwd */
> +       { "sha256", true }, /* iwd */
> +       { "sha384", true }, /* iwd */
> +       { "sha512", true }, /* iwd */
> +       {},
> +};
> +
>  struct hash_ctx {
>         struct af_alg_sgl sgl;
>
>         u8 *result;
>
> @@ -380,10 +398,16 @@ static struct proto_ops algif_hash_ops_nokey = {
>         .accept         =       hash_accept_nokey,
>  };
>
>  static void *hash_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, hash_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_ahash(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void hash_release(void *private)
>  {
> diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
> index 4dfe7899f8fa..bd522915d56d 100644
> --- a/crypto/algif_rng.c
> +++ b/crypto/algif_rng.c
> @@ -48,10 +48,14 @@
>
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Stephan Mueller <smueller@chronox.de>");
>  MODULE_DESCRIPTION("User-space interface for random number generators");
>
> +static const struct af_alg_allowlist_entry rng_allowlist[] = {
> +       {},
> +};
> +
>  struct rng_ctx {
>  #define MAXSIZE 128
>         unsigned int len;
>         struct crypto_rng *drng;
>         u8 *addtl;
> @@ -199,10 +203,15 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = {
>
>  static void *rng_bind(const char *name)
>  {
>         struct rng_parent_ctx *pctx;
>         struct crypto_rng *rng;
> +       int err;
> +
> +       err = af_alg_check_restriction(name, rng_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
>
>         pctx = kzalloc_obj(*pctx);
>         if (!pctx)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index df20bdfe1f1f..2b8069667974 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -32,10 +32,24 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry skcipher_allowlist[] = {
> +       { "adiantum(xchacha12,aes)", false }, /* cryptsetup */
> +       { "adiantum(xchacha20,aes)", false }, /* cryptsetup */
> +       { "cbc(aes)", true }, /* iwd */
> +       { "cbc(des)", true }, /* iwd */
> +       { "cbc(des3_ede)", true }, /* iwd */
> +       { "ctr(aes)", true }, /* iwd */
> +       { "ecb(aes)", true }, /* iwd, bluez */
> +       { "ecb(des)", true }, /* iwd */
> +       { "hctr2(aes)", false }, /* cryptsetup */
> +       { "xts(aes)", false }, /* cryptsetup benchmark */
> +       {},
> +};
> +
>  static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
>                             size_t size)
>  {
>         struct sock *sk = sock->sk;
>         struct alg_sock *ask = alg_sk(sk);
> @@ -307,10 +321,16 @@ static struct proto_ops algif_skcipher_ops_nokey = {
>         .poll           =       af_alg_poll,
>  };
>
>  static void *skcipher_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, skcipher_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_skcipher(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void skcipher_release(void *private)
>  {
> diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
> index 7643ba954125..4e9ed8e73403 100644
> --- a/include/crypto/if_alg.h
> +++ b/include/crypto/if_alg.h
> @@ -159,13 +159,21 @@ struct af_alg_ctx {
>         unsigned int len;
>
>         unsigned int inflight;
>  };
>
> +struct af_alg_allowlist_entry {
> +       const char *name;
> +       bool privileged;
> +};
> +
>  int af_alg_register_type(const struct af_alg_type *type);
>  int af_alg_unregister_type(const struct af_alg_type *type);
>
> +int af_alg_check_restriction(const char *name,
> +                            const struct af_alg_allowlist_entry allowlist[]);
> +
>  int af_alg_release(struct socket *sock);
>  void af_alg_release_parent(struct sock *sk);
>  int af_alg_accept(struct sock *sk, struct socket *newsock,
>                   struct proto_accept_arg *arg);
>
>
> base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd
> --
> 2.54.0
>
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 15:08 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Nuno Sá, Rodrigo Alencar, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <20260622-captive-tux-067efd31ceac@spud>

On Mon, 22 Jun 2026 19:39:43 +0100
Conor Dooley <conor@kernel.org> wrote:

> On Mon, Jun 22, 2026 at 05:29:11PM +0100, Jonathan Cameron wrote:
> > > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > > (I suspect it doesn't!)
> > > > > 
> > > > > Just thinking as I type, but could we do something a bit nasty with
> > > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > > under serializing locks. 
> > > > > 
> > > > > Agreed though that this would be nicer as an SPI thing that let
> > > > > us specify that a single CS is share by multiple devices and their
> > > > > is some other signal acting to select which one we are talking to.
> > > > >     
> > > > 
> > > > If the device-addressing on the same chip-select is to be handled
> > > > by the spi framework, wouldn't we lose device-specific features?
> > > > 
> > > > I understand that this multi-device feature is there mostly to extend the
> > > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > > 
> > > > 	"MULTI DEVICE SW LDAC MODE"
> > > > 
> > > > exists so that software can update channel values accross multiple devices.    
> > > 
> > > Right! You do have a point! I agree the main driver for a feature like
> > > this is likely to extend the channel count and effectively "aggregate"
> > > devices.
> > > 
> > > But I would say that even with the spi solution the MULTI DEVICE stuff
> > > should be doable (as we still need a sort of adi,pin-id property). 
> > > 
> > > But yes, I do feel that the whole feature is for aggregation so seeing
> > > one device with 32 channels is the expectation here? Rather than seeing
> > > two devices with 16 channels.  
> > 
> > Agreed - if we have messages that address both devices at once that needs
> > to be a unified driver and given they are about triggering simultaneous
> > update of all channels it needs to look like one big device.
> > This ends up similar to how we handle daisy chain devices.
> > 
> > The question of what to do on devices that don't have this feature
> > is rather different. Good thing you read the datasheet :)  
> 
> I'm not sure it really is, the intent for the microchip devices I think
> is pretty similar. The mcp3911 datasheet cites three-phase power
> metering using three devices as a typical use-case, for example.
> Probably creating an amalgamated device is a good fit there too?
> 
> I assume an amalgamated device for this ADI product means per-channel ID
> properties?

Per device as they are 8 channel each, but effectively the same.
(microchip devices are 2 channels each I think)

> If so, I think they should be made generic and the Microchip
> products retrofitted to use them, with a fallback to the proprietary
> property. Not going to ask for the support for multiple devices in those
> drivers, since the current way doesn't work and there'd be no loss of
> support. Someone from Microchip can do that. The proprietary property
> to generic conversion should be straightforward and provides weight to
> an argument for this being generic, since that'd be three devices that
> can all share?

I think the practical difference is that for the ADI part we have to support
it as a single device as some commands effectively don't use the address
bits in the message.  For the microchip ones we could register multiple
separate SPI devices and just provide one address for each.

Equally we could treat the microchip ones as a unified device in which case
the binding will be the same and there will be a property with a bunch
of addresses (one for each device present). I'm fine with this option given
as you say the datasheet calls out using this for channel expansion.

Jonathan




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox