Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Refactor `____vm_create()` in the KVM selftest library to extract its
initialization steps into separate, reusable internal helpers.

Introduce `vm_init_fields()` and `vm_init_memory_properties()`. This
allows advanced test setups to perform targeted VM fields or memory
property initializations independently, which is required by upcoming
test cases that restore preserved VMs. No functional changes are
introduced for the existing tests.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 .../testing/selftests/kvm/include/kvm_util.h  |  2 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 26 +++++++++++++------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 04a9101..88de0e7 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -471,6 +471,8 @@ const char *vm_guest_mode_string(u32 i);
 
 void kvm_vm_free(struct kvm_vm *vmp);
 void kvm_vm_restart(struct kvm_vm *vmp);
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape);
+void vm_init_memory_properties(struct kvm_vm *vm);
 void kvm_vm_release(struct kvm_vm *vmp);
 void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename);
 int kvm_memfd_alloc(size_t size, bool hugepages);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 195f3fd..dc576b8 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -276,13 +276,8 @@ __weak void vm_populate_gva_bitmap(struct kvm_vm *vm)
 		(1ULL << (vm->va_bits - 1)) >> vm->page_shift);
 }
 
-struct kvm_vm *____vm_create(struct vm_shape shape)
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape)
 {
-	struct kvm_vm *vm;
-
-	vm = calloc(1, sizeof(*vm));
-	TEST_ASSERT(vm != NULL, "Insufficient Memory");
-
 	INIT_LIST_HEAD(&vm->vcpus);
 	vm->regions.gpa_tree = RB_ROOT;
 	vm->regions.hva_tree = RB_ROOT;
@@ -380,9 +375,10 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 	if (vm->pa_bits != 40)
 		vm->type = KVM_VM_TYPE_ARM_IPA_SIZE(vm->pa_bits);
 #endif
+}
 
-	vm_open(vm);
-
+void vm_init_memory_properties(struct kvm_vm *vm)
+{
 	/* Limit to VA-bit canonical virtual addresses. */
 	vm->vpages_valid = sparsebit_alloc();
 	vm_populate_gva_bitmap(vm);
@@ -392,6 +388,20 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 
 	/* Allocate and setup memory for guest. */
 	vm->vpages_mapped = sparsebit_alloc();
+}
+
+struct kvm_vm *____vm_create(struct vm_shape shape)
+{
+	struct kvm_vm *vm;
+
+	vm = calloc(1, sizeof(*vm));
+	TEST_ASSERT(vm != NULL, "Insufficient Memory");
+
+	vm_init_fields(vm, shape);
+
+	vm_open(vm);
+
+	vm_init_memory_properties(vm);
 
 	return vm;
 }
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

This patch sets up the basic infrastructure to preserve the guest_memfd.
Currently this supports only fully shared guest_memfd and backed by
PAGE_SIZE pages.

It uses INIT_SHARED flag to check its shareability and
kvm_arch_has_private_mem to check if the conversion of memory to private
is not supported.

Preservation is straight forward. It walks through the folios and
serialize them.

There is kvm_gmem_freeze call on preserve which freeze the guest_memfd
inode. It avoids any changes to inode mapping with fallocate calls and
also fails any new fault allocation on or after preservation.

This change also update the MAINTAINERS list.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 MAINTAINERS                 |   1 +
 include/linux/kho/abi/kvm.h |  79 +++++-
 virt/kvm/Makefile.kvm       |   2 +-
 virt/kvm/guest_memfd_luo.c  | 497 ++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c         |   7 +
 virt/kvm/kvm_mm.h           |   4 +
 6 files changed, 583 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/guest_memfd_luo.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c000e6..d1d699ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L:	kexec@lists.infradead.org
 L:	kvm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	virt/kvm/guest_memfd_luo.c
 F:	virt/kvm/kvm_luo.c
 
 KVM PARAVIRT (KVM/paravirt)
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
index 718db68..42074d7 100644
--- a/include/linux/kho/abi/kvm.h
+++ b/include/linux/kho/abi/kvm.h
@@ -9,20 +9,23 @@
 #define _LINUX_KHO_ABI_KVM_H
 
 #include <linux/types.h>
+#include <linux/bits.h>
 #include <linux/kho/abi/kexec_handover.h>
 
 /**
- * DOC: KVM Live Update ABI
+ * DOC: KVM and guest_memfd Live Update ABI
  *
- * KVM uses the ABI defined below for preserving its state
+ * KVM and guest_memfd use the ABI defined below for preserving their states
  * across a kexec reboot using the LUO.
  *
- * The state is serialized into a packed structure `struct kvm_luo_ser`
- * which is handed over to the next kernel via the KHO mechanism.
+ * The state is serialized into packed structures (struct kvm_luo_ser and
+ * struct guest_memfd_luo_ser) which are handed over to the next kernel via
+ * the KHO mechanism.
  *
- * This interface is a contract. Any modification to the structure layout
+ * This interface is a contract. Any modification to the structure layouts
  * constitutes a breaking change. Such changes require incrementing the
- * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ * version number in the KVM_LUO_FH_COMPATIBLE or
+ * GUEST_MEMFD_LUO_FH_COMPATIBLE compatibility strings.
  */
 
 /**
@@ -36,4 +39,68 @@ struct kvm_luo_ser {
 /* The compatibility string for KVM VM file handler */
 #define KVM_LUO_FH_COMPATIBLE	"kvm_vm_luo_v1"
 
+/**
+ * struct guest_memfd_luo_folio_ser - Serialization layout for a single folio in guest_memfd.
+ * @pfn:   Page Frame Number of the folio.
+ * @index: Page offset of the folio within the file.
+ * @flags: State flags associated with the folio.
+ */
+struct guest_memfd_luo_folio_ser {
+	u64 pfn:52;
+	u64 flags:12;
+	u64 index;
+} __packed;
+
+/**
+ * GUEST_MEMFD_LUO_FOLIO_UPTODATE - The folio is up-to-date.
+ *
+ * This flag is per folio to check if the folio is uptodate.
+ */
+#define GUEST_MEMFD_LUO_FOLIO_UPTODATE	BIT(0)
+
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_MMAP - The guest_memfd supports mmap.
+ *
+ * This flag indicates that the guest_memfd supports host-side mmap.
+ */
+#define GUEST_MEMFD_LUO_FLAG_MMAP		BIT(0)
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_INIT_SHARED - Initialize memory as shared.
+ *
+ * This flag indicates that the guest_memfd has been initialized as shared
+ * memory.
+ */
+#define GUEST_MEMFD_LUO_FLAG_INIT_SHARED	BIT(1)
+
+/**
+ * GUEST_MEMFD_LUO_SUPPORTED_FLAGS - Supported guest_memfd LUO flags mask.
+ *
+ * A mask of all guest_memfd preservation flags supported by this version
+ * of the KVM LUO ABI.
+ */
+#define GUEST_MEMFD_LUO_SUPPORTED_FLAGS	(GUEST_MEMFD_LUO_FLAG_MMAP | \
+						 GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+
+/**
+ * struct guest_memfd_luo_ser - Main serialization structure for guest_memfd.
+ * @size:      The size of the file in bytes.
+ * @flags:     File-level flags.
+ * @nr_folios: Number of folios in the folios array.
+ * @vm_token:  Token of the associated KVM VM instance.
+ * @folios:    KHO vmalloc descriptor pointing to the array of
+ *             struct guest_memfd_luo_folio_ser.
+ */
+struct guest_memfd_luo_ser {
+	u64 size;
+	u64 flags;
+	u64 nr_folios;
+	u64 vm_token;
+	struct kho_vmalloc folios;
+} __packed;
+
+/* The compatibility string for GUEST_MEMFD file handler */
+#define GUEST_MEMFD_LUO_FH_COMPATIBLE	"guest_memfd_luo_v1"
+
 #endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index c1a9621..d30fca0 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,4 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
-kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/guest_memfd_luo.o $(KVM)/kvm_luo.o
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
new file mode 100644
index 0000000..c242b1d
--- /dev/null
+++ b/virt/kvm/guest_memfd_luo.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * Guestmemfd Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: Guestmemfd Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * Guest memory file descriptors (guest_memfd) can be preserved over a kexec
+ * reboot using the Live Update Orchestrator (LUO) file preservation. This
+ * allows userspace to preserve VM memory across kexec reboots.
+ *
+ * The preservation is not intended to be transparent. Only select properties
+ * of the guest_memfd are preserved, while others are reset to default.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of guest_memfd are preserved across kexec:
+ *
+ * File Size
+ *   The size of the file is preserved.
+ *
+ * File Contents
+ *   All folios present in the page cache are preserved.
+ *
+ * File-level Flags
+ *   The file-level flags (such as MMAP support and INIT_SHARED default mapping)
+ *   are preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * NUMA Memory Policy
+ *   NUMA memory policies associated with the guest_memfd are not preserved.
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "guest_memfd.h"
+#include "kvm_mm.h"
+
+
+static int kvm_gmem_luo_walk_folios(struct address_space *mapping,
+		pgoff_t end_index, struct guest_memfd_luo_folio_ser *folios_ser,
+		u64 *out_count)
+{
+	struct folio_batch fbatch;
+	pgoff_t index = 0;
+	u64 count = 0;
+	int err = 0;
+
+	folio_batch_init(&fbatch);
+	while (index < end_index) {
+		unsigned int nr, i;
+
+		nr = filemap_get_folios(mapping, &index, end_index - 1, &fbatch);
+		if (nr == 0)
+			break;
+
+		for (i = 0; i < nr; i++) {
+			struct folio *folio = fbatch.folios[i];
+
+			if (folios_ser) {
+				if (folio_test_hwpoison(folio)) {
+					err = -EHWPOISON;
+					folio_batch_release(&fbatch);
+					goto out;
+				}
+				err = kho_preserve_folio(folio);
+				if (err) {
+					folio_batch_release(&fbatch);
+					goto out;
+				}
+
+				folios_ser[count].pfn = folio_pfn(folio);
+				folios_ser[count].index = folio->index;
+				folios_ser[count].flags = folio_test_uptodate(folio) ?
+							  GUEST_MEMFD_LUO_FOLIO_UPTODATE : 0;
+			}
+			count++;
+		}
+		folio_batch_release(&fbatch);
+		cond_resched();
+	}
+
+out:
+	*out_count = count;
+	return err;
+}
+
+static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct gmem_file *gmem_file;
+	struct kvm *kvm;
+
+	if (inode->i_sb->s_magic != GUEST_MEMFD_MAGIC)
+		return 0;
+
+	gmem_file = file->private_data;
+	if (!gmem_file)
+		return 0;
+
+	/*
+	 * Only Fully-shared guest_memfd preservation is supported
+	 */
+	if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+		return 0;
+
+	/*
+	 * It makes sure that no memory can converted to private
+	 * even if it was initially fully shared (in-place conversions are
+	 * prevented).
+	 */
+	kvm = gmem_file->kvm;
+	if (kvm_arch_has_private_mem(kvm))
+		return 0;
+
+	if (mapping_large_folio_support(inode->i_mapping))
+		return 0;
+
+	return 1;
+}
+
+static int kvm_gmem_luo_preserve(struct liveupdate_file_op_args *args)
+{
+	DECLARE_KHOSER_PTR(sd, struct guest_memfd_luo_ser *);
+	struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+	u64 count = 0, gmem_flags, abi_flags = 0;
+	struct guest_memfd_luo_ser *ser;
+	struct address_space *mapping;
+	struct gmem_file *gmem_file;
+	struct inode *inode;
+	pgoff_t end_index;
+	struct kvm *kvm;
+	int err = 0;
+	long size;
+
+	inode = file_inode(args->file);
+	kvm_gmem_freeze(inode, true);
+
+	mapping = inode->i_mapping;
+	size = i_size_read(inode);
+	if (!size) {
+		err = -EINVAL;
+		goto err_unfreeze_inode;
+	}
+
+	if (WARN_ON_ONCE(!PAGE_ALIGNED(size))) {
+		err = -EINVAL;
+		goto err_unfreeze_inode;
+	}
+
+	gmem_file = args->file->private_data;
+	kvm = gmem_file->kvm;
+
+	gmem_flags = READ_ONCE(GMEM_I(inode)->flags);
+	if (gmem_flags & ~(GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED
+				| GUEST_MEMFD_F_MAPPING_FROZEN)) {
+		err = -EOPNOTSUPP;
+		goto err_unfreeze_inode;
+	}
+
+	if (gmem_flags & GUEST_MEMFD_FLAG_MMAP)
+		abi_flags |= GUEST_MEMFD_LUO_FLAG_MMAP;
+	if (gmem_flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+		abi_flags |= GUEST_MEMFD_LUO_FLAG_INIT_SHARED;
+
+	end_index = size >> PAGE_SHIFT;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser)) {
+		err = PTR_ERR(ser);
+		goto err_unfreeze_inode;
+	}
+
+	/* First pass: Count the folios present in the page cache */
+	err = kvm_gmem_luo_walk_folios(mapping, end_index, NULL, &count);
+	if (err)
+		goto err_free_ser;
+
+	ser->size = size;
+	ser->flags = abi_flags;
+	ser->nr_folios = count;
+	ser->vm_token = 0; // It will be set during the kvm_gmem_luo_freeze()
+
+	if (count > 0) {
+		folios_ser = vcalloc(count, sizeof(*folios_ser));
+		if (!folios_ser) {
+			err = -ENOMEM;
+			goto err_free_ser;
+		}
+
+		/* Second pass: Fill the metadata array and preserve folios */
+		err = kvm_gmem_luo_walk_folios(mapping, end_index, folios_ser, &count);
+		if (err)
+			goto err_unpreserve_unlocked;
+
+		if (WARN_ON_ONCE(count != ser->nr_folios)) {
+			err = -EINVAL;
+			goto err_unpreserve_unlocked;
+		}
+	}
+
+	if (count > 0) {
+		err = kho_preserve_vmalloc(folios_ser, &ser->folios);
+		if (err)
+			goto err_unpreserve_unlocked;
+	}
+
+	KHOSER_STORE_PTR(sd, ser);
+	KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+	args->private_data = folios_ser;
+
+	return 0;
+
+err_unpreserve_unlocked:
+	for (long i = (long)count - 1; i >= 0; i--) {
+		struct folio *folio = pfn_folio(folios_ser[i].pfn);
+
+		kho_unpreserve_folio(folio);
+	}
+	vfree(folios_ser);
+err_free_ser:
+	kho_unpreserve_free(ser);
+err_unfreeze_inode:
+	kvm_gmem_freeze(inode, false);
+	return err;
+}
+
+static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_ser *ser;
+	struct gmem_file *gmem_file;
+	struct kvm *kvm;
+	struct file *kvm_file;
+	u64 vm_token;
+	int err;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return -EINVAL;
+
+	gmem_file = args->file->private_data;
+	kvm = gmem_file->kvm;
+
+	/*
+	 * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE_BY_RCU
+	 * file memory from being reallocated while it is being processed.
+	 */
+	kvm_file = get_file_active(&kvm->vm_file);
+	if (!kvm_file)
+		return -ENOENT;
+
+	err = liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token);
+	fput(kvm_file);
+	if (err)
+		return err;
+
+	ser->vm_token = vm_token;
+	return 0;
+}
+
+static void kvm_gmem_luo_discard_folios(
+	const struct guest_memfd_luo_folio_ser *folios_ser,
+	u64 nr_folios, u64 start_idx)
+{
+	long i;
+
+	for (i = start_idx; i < nr_folios; i++) {
+		struct folio *folio;
+		phys_addr_t phys;
+
+		if (!folios_ser[i].pfn)
+			continue;
+
+		phys = PFN_PHYS(folios_ser[i].pfn);
+		folio = kho_restore_folio(phys);
+		if (folio)
+			folio_put(folio);
+	}
+}
+
+static void kvm_gmem_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_folio_ser *folios_ser = args->private_data;
+	struct guest_memfd_luo_ser *ser;
+	long i;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return;
+
+	if (ser->nr_folios > 0)
+		kho_unpreserve_vmalloc(&ser->folios);
+	for (i = ser->nr_folios - 1; i >= 0; i--) {
+		struct folio *folio;
+
+		if (!folios_ser[i].pfn)
+			continue;
+
+		folio = pfn_folio(folios_ser[i].pfn);
+		kho_unpreserve_folio(folio);
+	}
+	vfree(folios_ser);
+
+	kho_unpreserve_free(ser);
+	kvm_gmem_freeze(file_inode(args->file), false);
+}
+
+static int kvm_gmem_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+	struct guest_memfd_luo_ser *ser;
+	struct kvm *kvm = NULL;
+	struct file *vm_file;
+	struct inode *inode;
+	struct file *file;
+	u64 gmem_flags = 0;
+	int err = 0;
+	long i = 0;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return -EINVAL;
+
+	if (ser->flags & ~GUEST_MEMFD_LUO_SUPPORTED_FLAGS) {
+		err = -EOPNOTSUPP;
+		goto err_free_ser;
+	}
+
+	if (ser->flags & GUEST_MEMFD_LUO_FLAG_MMAP)
+		gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
+	if (ser->flags & GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+		gmem_flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
+
+	err = liveupdate_get_file_incoming(args->session, ser->vm_token, &vm_file);
+	if (err) {
+		pr_warn("gmem: provided VM FD token (%llx) on preserve is incorrect\n",
+						ser->vm_token);
+		goto err_free_ser;
+	}
+
+	if (file_is_kvm(vm_file))
+		kvm = vm_file->private_data;
+
+	/*
+	 * Release the temporary reference taken by the liveupdate_get_file_incoming
+	 * call. LUO still holds a reference.
+	 */
+	fput(vm_file);
+
+	if (!kvm) {
+		err = -EINVAL;
+		goto err_free_ser;
+	}
+
+	file = __kvm_gmem_create_file(kvm, ser->size, gmem_flags);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_free_ser;
+	}
+
+	inode = file_inode(file);
+
+	if (ser->nr_folios) {
+		folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (!folios_ser) {
+			err = -EINVAL;
+			goto err_destroy_file;
+		}
+
+		for (i = 0; i < ser->nr_folios; i++) {
+			struct folio *folio;
+			phys_addr_t phys;
+
+			if (!folios_ser[i].pfn)
+				continue;
+
+			phys = PFN_PHYS(folios_ser[i].pfn);
+			folio = kho_restore_folio(phys);
+			if (!folio) {
+				pr_err("gmem: failed to restore folio at %llx\n", phys);
+				err = -EIO;
+				goto err_put_remaining_folios;
+			}
+
+			err = filemap_add_folio(inode->i_mapping, folio, folios_ser[i].index,
+						GFP_KERNEL);
+			if (err) {
+				pr_err("gmem: failed to add folio to page cache\n");
+				folio_put(folio);
+				goto err_put_remaining_folios;
+			}
+
+			if (folios_ser[i].flags & GUEST_MEMFD_LUO_FOLIO_UPTODATE)
+				folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			folio_put(folio);
+		}
+		vfree(folios_ser);
+	}
+
+	args->file = file;
+	kho_restore_free(ser);
+	return 0;
+
+err_put_remaining_folios:
+	i++;
+err_destroy_file:
+	fput(file);
+err_free_ser:
+	if (ser->nr_folios) {
+		if (!folios_ser)
+			folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (folios_ser) {
+			kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, i);
+			vfree(folios_ser);
+		}
+	}
+	kho_restore_free(ser);
+	return err;
+}
+
+static void kvm_gmem_luo_finish(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_ser *ser;
+	struct guest_memfd_luo_folio_ser *folios_ser;
+
+	/* Nothing to be done here, if retrieve_status was successful or errored,
+	 * Cleanup is taken care of in retrieval call.
+	 */
+	if (args->retrieve_status)
+		return;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return;
+
+	if (ser->nr_folios) {
+		folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (folios_ser) {
+			kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, 0);
+			vfree(folios_ser);
+		}
+	}
+
+	kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_gmem_luo_file_ops = {
+	.can_preserve = kvm_gmem_luo_can_preserve,
+	.preserve = kvm_gmem_luo_preserve,
+	.freeze = kvm_gmem_luo_freeze,
+	.retrieve = kvm_gmem_luo_retrieve,
+	.unpreserve = kvm_gmem_luo_unpreserve,
+	.finish = kvm_gmem_luo_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_gmem_luo_handler = {
+	.ops = &kvm_gmem_luo_file_ops,
+	.compatible = GUEST_MEMFD_LUO_FH_COMPATIBLE,
+};
+
+int kvm_gmem_luo_init(void)
+{
+	int err = liveupdate_register_file_handler(&kvm_gmem_luo_handler);
+
+	if (err && err != -EOPNOTSUPP) {
+		pr_err("Could not register luo filesystem handler: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+void kvm_gmem_luo_exit(void)
+{
+	liveupdate_unregister_file_handler(&kvm_gmem_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9c3dd1..e8e2f10 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6581,6 +6581,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	if (r)
 		goto err_luo;
 
+	r = kvm_gmem_luo_init();
+	if (r)
+		goto err_gmem_luo;
+
 	/*
 	 * Registration _must_ be the very last thing done, as this exposes
 	 * /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6594,6 +6598,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	return 0;
 
 err_register:
+	kvm_gmem_luo_exit();
+err_gmem_luo:
 	kvm_luo_exit();
 err_luo:
 	kvm_uninit_virtualization();
@@ -6625,6 +6631,7 @@ void kvm_exit(void)
 	 */
 	misc_deregister(&kvm_dev);
 
+	kvm_gmem_luo_exit();
 	kvm_luo_exit();
 
 	kvm_uninit_virtualization();
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 8719871..1295ff8 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -103,9 +103,13 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 #ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
 int kvm_luo_init(void);
 void kvm_luo_exit(void);
+int kvm_gmem_luo_init(void);
+void kvm_gmem_luo_exit(void);
 #else
 static inline int kvm_luo_init(void) { return 0; }
 static inline void kvm_luo_exit(void) {}
+static inline int kvm_gmem_luo_init(void) { return 0; }
+static inline void kvm_gmem_luo_exit(void) {}
 #endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
 
 #endif /* __KVM_MM_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Add the documentation under the "Preserving file descriptors" section
of LUO's documentation.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 Documentation/core-api/liveupdate.rst |   1 +
 Documentation/liveupdate/vmm.rst      | 107 ++++++++++++++++++++++++++
 MAINTAINERS                           |   1 +
 virt/kvm/guest_memfd_luo.c            |   4 +-
 4 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/liveupdate/vmm.rst

diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 5a292d0..bac58a3 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -34,6 +34,7 @@ The following types of file descriptors can be preserved
    :maxdepth: 1
 
    ../mm/memfd_preservation
+   ../liveupdate/vmm
 
 Public API
 ==========
diff --git a/Documentation/liveupdate/vmm.rst b/Documentation/liveupdate/vmm.rst
new file mode 100644
index 0000000..8353e23
--- /dev/null
+++ b/Documentation/liveupdate/vmm.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+=============================
+VM & Guest_Memfd Preservation
+=============================
+
+.. kernel-doc:: virt/kvm/kvm_luo.c
+   :doc: KVM VM Preservation via LUO
+
+.. kernel-doc:: virt/kvm/guest_memfd_luo.c
+   :doc: Guest_Memfd Preservation via LUO
+
+VMM Instructions
+================
+
+This section describes the requirements, scope, conditions, and
+ordering constraints that a Virtual Machine Monitor (VMM) must adhere
+to for successful preservation and retrieval of guest_memfd files
+across a Live Update Orchestrator (LUO) sequence.
+
+Scope and Limitations
+---------------------
+
+At this stage, the scope of guest_memfd preservation is restricted to:
+
+1. **Fully Shared guest_memfd**:
+   This time only fully shared guest_memfd supported. Any system that
+   supports coco vm (which uses private guest_memfd), will not support
+   the preservation.
+
+2. **Standard Page Size**:
+   Only guest_memfd backed by standard page size (``PAGE_SIZE``,
+   order-0) pages is supported. Large/huge page backing (e.g.,
+   hugetlb guest_memfd) is not supported.
+
+Any Virtual Machine (VM) whose memory is fully backed by such
+guest_memfd files can be preserved across live update.
+
+VMM Actions and Conditions during Live Update
+---------------------------------------------
+
+During the live update sequence, the kernel introduces a *freezing*
+phase for the guest_memfd inode. Freezing prevents any modifications to
+the guest_memfd page cache. Specifically, once a guest_memfd mapping is
+frozen:
+
+- Any subsequent ``fallocate`` calls on the guest_memfd file descriptor
+  will fail and return ``-EPERM``.
+- Any new page faults (guest-side or host-userspace-side) that require
+  folio allocation will fail and return ``-EPERM``.
+
+To prevent vCPUs or VMM helper threads from failing due to these
+``-EPERM`` errors, the VMM must implement one of the following
+strategies:
+
+1. **Pause the VM (Recommended)**:
+   The VMM should pause/suspend all vCPUs before invoking the
+   preservation or freezing of the VM and guest_memfd files. This
+   ensures no new page faults or memory accesses can occur while the
+   guest_memfd is frozen.
+
+2. **Handle Fault Failures**:
+   If the VM is not paused, the VMM must be prepared to handle VM
+   exits or user page fault errors resulting from the ``-EPERM``
+   failures. The VMM must take appropriate action, such as
+   immediately pausing the VM, or aborting the live update sequence
+   (by tearing down or unpreserving the live update session).
+
+Preservation and Retrieval Ordering
+-----------------------------------
+
+Preservation Order
+~~~~~~~~~~~~~~~~~~
+
+There is no strict ordering requirement for initiating the
+preservation of the KVM VM file and the guest_memfd files; they are
+preserved independently. If kexec is triggered with guest_memfd
+preservation without preserving the vm file, kexec will fail.
+
+Retrieval Order
+~~~~~~~~~~~~~~~
+
+Similarly, there is no strict ordering required for retrieving the VM
+and guest_memfd files. Any file can be retrieved at any order.
+
+If guest_memfd file is retrieved and VM file is not retrieved, and
+luo_finish is called, then vm_file will be lost and guest_memfd file
+will be hanging around.
+
+NOTE: Before Initiating the preservation/retirval, it is necessary to make
+sure that the kvm module is loaded (/dev/kvm must be available).
+
+
+VM & Guest_Memfd Preservation ABI
+=================================
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+   :doc: DOC: guest_memfd Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+   :internal:
+
+See Also
+========
+
+- :doc:`/core-api/liveupdate`
+- :doc:`/userspace-api/liveupdate`
diff --git a/MAINTAINERS b/MAINTAINERS
index d1d699ce..e27b677 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L:	kexec@lists.infradead.org
 L:	kvm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	Documentation/liveupdate/vmm.rst
 F:	virt/kvm/guest_memfd_luo.c
 F:	virt/kvm/kvm_luo.c
 
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
index c242b1d..8411fe8 100644
--- a/virt/kvm/guest_memfd_luo.c
+++ b/virt/kvm/guest_memfd_luo.c
@@ -119,11 +119,11 @@ static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, s
 	/*
 	 * Only Fully-shared guest_memfd preservation is supported
 	 */
-	if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+	if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
 		return 0;
 
 	/*
-	 * It makes sure that no memory can converted to private
+	 * It makes sure that no memory can be converted to private
 	 * even if it was initially fully shared (in-place conversions are
 	 * prevented).
 	 */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

This patch introduces the freeze on gmem_inode which prevents
the fallocate call and any new page fault allocation. This will avoid
gmem file modification when it is being preserved

Used srcu lock to synchronise the freeze call, where write blocks
until all the reads are free. And reads are re-entrant.

Incase fault fails, It return -EPERM and VM_EXIT to userspace. userspace
must handle this properly as every new fault will fail.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 virt/kvm/guest_memfd.c | 117 +++++++++++++++++++++++++++++++++++++----
 virt/kvm/guest_memfd.h |   5 ++
 2 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fe1adc9b..a4d9d34 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,11 +7,13 @@
 #include <linux/mempolicy.h>
 #include <linux/pseudo_fs.h>
 #include <linux/pagemap.h>
+#include <linux/srcu.h>
 #include "guest_memfd.h"
 
 #include "kvm_mm.h"
 
 static struct vfsmount *kvm_gmem_mnt;
+static struct srcu_struct kvm_gmem_freeze_srcu;
 
 
 #define kvm_gmem_for_each_file(f, inode) \
@@ -96,6 +98,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 	/* TODO: Support huge pages. */
 	struct mempolicy *policy;
 	struct folio *folio;
+	int idx;
 
 	/*
 	 * Fast-path: See if folio is already present in mapping to avoid
@@ -105,12 +108,20 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 	if (!IS_ERR(folio))
 		return folio;
 
+	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+	if (kvm_gmem_is_frozen(inode)) {
+		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+		return ERR_PTR(-EPERM);
+	}
+
 	policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
 	folio = __filemap_get_folio_mpol(inode->i_mapping, index,
 					 FGP_LOCK | FGP_CREAT,
 					 mapping_gfp_mask(inode->i_mapping), policy);
 	mpol_cond_put(policy);
 
+	srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+
 	/*
 	 * External interfaces like kvm_gmem_get_pfn() support dealing
 	 * with hugepages to a degree, but internally, guest_memfd currently
@@ -273,16 +284,30 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
 static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
 			       loff_t len)
 {
+	struct inode *inode = file_inode(file);
 	int ret;
+	int idx;
 
-	if (!(mode & FALLOC_FL_KEEP_SIZE))
-		return -EOPNOTSUPP;
+	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+	if (kvm_gmem_is_frozen(inode)) {
+		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+		return -EPERM;
+	}
 
-	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
-		return -EOPNOTSUPP;
+	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 
-	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
-		return -EINVAL;
+	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (mode & FALLOC_FL_PUNCH_HOLE)
 		ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
@@ -291,6 +316,9 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
 
 	if (!ret)
 		file_modified(file);
+
+out:
+	srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
 	return ret;
 }
 
@@ -948,7 +976,9 @@ static void kvm_gmem_destroy_inode(struct inode *inode)
 
 static void kvm_gmem_free_inode(struct inode *inode)
 {
-	kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
+	struct gmem_inode *gi = GMEM_I(inode);
+
+	kmem_cache_free(kvm_gmem_inode_cachep, gi);
 }
 
 static const struct super_operations kvm_gmem_super_operations = {
@@ -1005,12 +1035,21 @@ int kvm_gmem_init(struct module *module)
 	if (!kvm_gmem_inode_cachep)
 		return -ENOMEM;
 
+	ret = init_srcu_struct(&kvm_gmem_freeze_srcu);
+	if (ret)
+		goto err_cache;
+
 	ret = kvm_gmem_init_mount();
-	if (ret) {
-		kmem_cache_destroy(kvm_gmem_inode_cachep);
-		return ret;
-	}
+	if (ret)
+		goto err_srcu;
+
 	return 0;
+
+err_srcu:
+	cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
+err_cache:
+	kmem_cache_destroy(kvm_gmem_inode_cachep);
+	return ret;
 }
 
 void kvm_gmem_exit(void)
@@ -1018,5 +1057,61 @@ void kvm_gmem_exit(void)
 	kern_unmount(kvm_gmem_mnt);
 	kvm_gmem_mnt = NULL;
 	rcu_barrier();
+	cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
 	kmem_cache_destroy(kvm_gmem_inode_cachep);
 }
+
+/**
+ * kvm_gmem_freeze - Freeze or unfreeze a guest_memfd inode mapping.
+ * @inode: The guest_memfd inode.
+ * @freeze: True to freeze, false to unfreeze.
+ *
+ * This API is used strictly during the live update / preservation transition
+ * window to prevent host userspace and guest-side faults from making any
+ * mapping modifications (such as fallocate or page fault allocation)
+ * to the guest_memfd page cache.
+ *
+ * Synchronization Strategy (Sleepable RCU):
+ * To avoid high-contention VFS locks (like inode_lock or
+ * filemap_invalidate_lock) on the vCPU page fault hot paths, this subsystem
+ * implements a lightweight, system-wide Sleepable RCU (SRCU) mechanism
+ * (`kvm_gmem_freeze_srcu`):
+ *
+ * Global vs. Per-Inode SRCU
+ * ======================
+ * A single system-wide global static `srcu_struct` is used instead of a
+ * per-inode SRCU structure to completely prevent unprivileged users from
+ * exhausting the host's per-CPU memory allocator. Because
+ * `init_srcu_struct()` allocates per-CPU memory via `alloc_percpu()`, which
+ * is not accounted by memory cgroups (memcg),
+ * a per-inode SRCU structure would allow a tenant to bypass cgroup limits and
+ * trigger a system-wide Out-of-Memory (OOM) crash simply by spawning a large
+ * number of guest_memfd file descriptors (bounded only by RLIMIT_NOFILE).
+ *
+ * Flag Modification Note:
+ * Since `GUEST_MEMFD_F_MAPPING_FROZEN` is the ONLY flag in
+ * `GMEM_I(inode)->flags` that is mutated dynamically at runtime (all other
+ * flags are creation-time flags which remain strictly read-only), there is
+ * no possibility of concurrent bit-modification races. Therefore, a standard
+ * `WRITE_ONCE` is fully safe and does not require complex `cmpxchg`
+ * synchronization loops.
+ */
+void kvm_gmem_freeze(struct inode *inode, bool freeze)
+{
+	u64 flags = READ_ONCE(GMEM_I(inode)->flags);
+
+	if (freeze)
+		flags |= GUEST_MEMFD_F_MAPPING_FROZEN;
+	else
+		flags &= ~GUEST_MEMFD_F_MAPPING_FROZEN;
+
+	WRITE_ONCE(GMEM_I(inode)->flags, flags);
+
+	if (freeze)
+		synchronize_srcu(&kvm_gmem_freeze_srcu);
+}
+
+bool kvm_gmem_is_frozen(struct inode *inode)
+{
+	return READ_ONCE(GMEM_I(inode)->flags) & GUEST_MEMFD_F_MAPPING_FROZEN;
+}
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
index c528b04..028c348 100644
--- a/virt/kvm/guest_memfd.h
+++ b/virt/kvm/guest_memfd.h
@@ -29,11 +29,16 @@ struct gmem_inode {
 	u64 flags;
 };
 
+/* Internal kernel-only flags (must not overlap with UAPI flags) */
+#define GUEST_MEMFD_F_MAPPING_FROZEN	(1ULL << 63)
+
 static inline struct gmem_inode *GMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct gmem_inode, vfs_inode);
 }
 
 struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+void kvm_gmem_freeze(struct inode *inode, bool freeze);
+bool kvm_gmem_is_frozen(struct inode *inode);
 
 #endif /* __KVM_GUEST_MEMFD_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

To support guest_memfd memory preservation with LUO, guest_memfd luo
code needs to access guest_memfd internals and reconstruct guest_memfd
file instances from a preserved state.

Extract gmem_file, gmem_inode, and the GMEM_I() helper from guest_memfd.c
into a new internal header virt/kvm/guest_memfd.h.

Additionally, split __kvm_gmem_create() to expose a non-static
__kvm_gmem_create_file() helper. This helper returns a struct file
instead of a file descriptor, enabling file creation and initialization
without installing it into a file descriptor table.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 virt/kvm/guest_memfd.c | 68 +++++++++++++++++-------------------------
 virt/kvm/guest_memfd.h | 39 ++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 40 deletions(-)
 create mode 100644 virt/kvm/guest_memfd.h

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8669068..fe1adc9b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,38 +7,12 @@
 #include <linux/mempolicy.h>
 #include <linux/pseudo_fs.h>
 #include <linux/pagemap.h>
+#include "guest_memfd.h"
 
 #include "kvm_mm.h"
 
 static struct vfsmount *kvm_gmem_mnt;
 
-/*
- * A guest_memfd instance can be associated multiple VMs, each with its own
- * "view" of the underlying physical memory.
- *
- * The gmem's inode is effectively the raw underlying physical storage, and is
- * used to track properties of the physical memory, while each gmem file is
- * effectively a single VM's view of that storage, and is used to track assets
- * specific to its associated VM, e.g. memslots=>gmem bindings.
- */
-struct gmem_file {
-	struct kvm *kvm;
-	struct xarray bindings;
-	struct list_head entry;
-};
-
-struct gmem_inode {
-	struct shared_policy policy;
-	struct inode vfs_inode;
-	struct list_head gmem_file_list;
-
-	u64 flags;
-};
-
-static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
-{
-	return container_of(inode, struct gmem_inode, vfs_inode);
-}
 
 #define kvm_gmem_for_each_file(f, inode) \
 	list_for_each_entry(f, &GMEM_I(inode)->gmem_file_list, entry)
@@ -557,23 +531,17 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
 	return true;
 }
 
-static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags)
 {
 	static const char *name = "[kvm-gmem]";
 	struct gmem_file *f;
 	struct inode *inode;
 	struct file *file;
-	int fd, err;
-
-	fd = get_unused_fd_flags(0);
-	if (fd < 0)
-		return fd;
+	int err;
 
 	f = kzalloc_obj(*f);
-	if (!f) {
-		err = -ENOMEM;
-		goto err_fd;
-	}
+	if (!f)
+		return ERR_PTR(-ENOMEM);
 
 	/* __fput() will take care of fops_put(). */
 	if (!fops_get(&kvm_gmem_fops)) {
@@ -612,8 +580,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	xa_init(&f->bindings);
 	list_add(&f->entry, &GMEM_I(inode)->gmem_file_list);
 
-	fd_install(fd, file);
-	return fd;
+	return file;
 
 err_inode:
 	iput(inode);
@@ -621,7 +588,28 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	fops_put(&kvm_gmem_fops);
 err_gmem:
 	kfree(f);
-err_fd:
+	return ERR_PTR(err);
+}
+
+static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+{
+	struct file *file;
+	int fd, err;
+
+	fd = get_unused_fd_flags(0);
+	if (fd < 0)
+		return fd;
+
+	file = __kvm_gmem_create_file(kvm, size, flags);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	fd_install(fd, file);
+	return fd;
+
+err_put_fd:
 	put_unused_fd(fd);
 	return err;
 }
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
new file mode 100644
index 0000000..c528b04
--- /dev/null
+++ b/virt/kvm/guest_memfd.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_GUEST_MEMFD_H__
+#define __KVM_GUEST_MEMFD_H__ 1
+
+#include <linux/kvm_host.h>
+#include <linux/fs.h>
+#include <linux/mempolicy.h>
+
+/*
+ * A guest_memfd instance can be associated multiple VMs, each with its own
+ * "view" of the underlying physical memory.
+ *
+ * The gmem's inode is effectively the raw underlying physical storage, and is
+ * used to track properties of the physical memory, while each gmem file is
+ * effectively a single VM's view of that storage, and is used to track assets
+ * specific to its associated VM, e.g. memslots=>gmem bindings.
+ */
+struct gmem_file {
+	struct kvm *kvm;
+	struct xarray bindings;
+	struct list_head entry;
+};
+
+struct gmem_inode {
+	struct shared_policy policy;
+	struct inode vfs_inode;
+	struct list_head gmem_file_list;
+
+	u64 flags;
+};
+
+static inline struct gmem_inode *GMEM_I(struct inode *inode)
+{
+	return container_of(inode, struct gmem_inode, vfs_inode);
+}
+
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+
+#endif /* __KVM_GUEST_MEMFD_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce KVM VM preservation support for Live Update Orchestrator.

Register an LUO file handler for KVM files to serialize and
deserialize necessary VM state across live updates. Currently, this
preserves the VM type. This implementation provides the necessary
infrastructure and dependencies for the upcoming guest_memfd
preservation support. And it can be extended to preserve more vm
state in future.

Retrieve is simply creating the kvm and populate the retrieved data.
Only catch here is there is no way to know which fd is going to be
assigned to this kvm file hence I am using atomically incremented id
for the fdname.

This change also updates the MAINTAINERS list for kvm_luo.c.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 MAINTAINERS                 |  11 ++
 include/linux/kho/abi/kvm.h |  39 ++++++++
 virt/kvm/Makefile.kvm       |   1 +
 virt/kvm/kvm_luo.c          | 195 ++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c         |   8 ++
 virt/kvm/kvm_mm.h           |   8 ++
 6 files changed, 262 insertions(+)
 create mode 100644 include/linux/kho/abi/kvm.h
 create mode 100644 virt/kvm/kvm_luo.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dbc8a6..7c000e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14411,6 +14411,17 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
 F:	drivers/video/backlight/ktz8866.c
 
+KVM LIVE UPDATE
+M:	Pasha Tatashin <pasha.tatashin@soleen.com>
+M:	Mike Rapoport <rppt@kernel.org>
+M:	Pratyush Yadav <pratyush@kernel.org>
+R:	Tarun Sahu <tarunsahu@google.com>
+L:	kexec@lists.infradead.org
+L:	kvm@vger.kernel.org
+S:	Maintained
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	virt/kvm/kvm_luo.c
+
 KVM PARAVIRT (KVM/paravirt)
 M:	Paolo Bonzini <pbonzini@redhat.com>
 R:	Vitaly Kuznetsov <vkuznets@redhat.com>
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
new file mode 100644
index 0000000..718db68
--- /dev/null
+++ b/include/linux/kho/abi/kvm.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM Preservation ABI for Live Update Orchestrator (LUO)
+ */
+#ifndef _LINUX_KHO_ABI_KVM_H
+#define _LINUX_KHO_ABI_KVM_H
+
+#include <linux/types.h>
+#include <linux/kho/abi/kexec_handover.h>
+
+/**
+ * DOC: KVM Live Update ABI
+ *
+ * KVM uses the ABI defined below for preserving its state
+ * across a kexec reboot using the LUO.
+ *
+ * The state is serialized into a packed structure `struct kvm_luo_ser`
+ * which is handed over to the next kernel via the KHO mechanism.
+ *
+ * This interface is a contract. Any modification to the structure layout
+ * constitutes a breaking change. Such changes require incrementing the
+ * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ */
+
+/**
+ * struct kvm_luo_ser - Main serialization structure for a KVM VM.
+ * @type:         The type of VM.
+ */
+struct kvm_luo_ser {
+	u64 type;
+} __packed;
+
+/* The compatibility string for KVM VM file handler */
+#define KVM_LUO_FH_COMPATIBLE	"kvm_vm_luo_v1"
+
+#endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index d047d4c..c1a9621 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,3 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
diff --git a/virt/kvm/kvm_luo.c b/virt/kvm/kvm_luo.c
new file mode 100644
index 0000000..6728877
--- /dev/null
+++ b/virt/kvm/kvm_luo.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM VM Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: KVM VM Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * KVM virtual machines (VMs) can be preserved over a kexec reboot using the
+ * Live Update Orchestrator (LUO) file preservation. This allows userspace
+ * to preserve KVM VM state across kexec reboots.
+ *
+ * The preservation is not intended to be fully transparent. Only specific
+ * VM configuration and state are preserved, while other aspects of the VM
+ * must be re-established or re-configured by userspace after retrieval.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of the KVM VM are preserved across kexec:
+ *
+ * VM Type
+ *   The VM type (e.g., on x86 architecture, the vm_type parameter) is
+ *   preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * The preservation does not cover:
+ *
+ * - vCPUs and vCPU states
+ * - Memspots / Memory slot layout (memslots)
+ * - Interrupt controllers and IRQ routings
+ * - Coalesced MMIO zones
+ * - Device bindings (VFIO/Eventfds)
+ * - Active paging or guest registers state
+ * - etc
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "kvm_mm.h"
+
+static bool kvm_luo_can_preserve(struct liveupdate_file_handler *handler,
+				 struct file *file)
+{
+	return file_is_kvm(file);
+}
+
+static int kvm_luo_preserve(struct liveupdate_file_op_args *args)
+{
+	DECLARE_KHOSER_PTR(sd, struct kvm_luo_ser *);
+	struct kvm *kvm = args->file->private_data;
+	struct kvm_luo_ser *ser;
+
+	if (kvm->vm_dead || kvm->vm_bugged)
+		return -EINVAL;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser))
+		return PTR_ERR(ser);
+
+#if defined(CONFIG_X86)
+	ser->type = kvm->arch.vm_type;
+#elif defined(CONFIG_ARM64)
+	ser->type = kvm_phys_shift(&kvm->arch.mmu);
+	if (kvm_vm_is_protected(kvm))
+		ser->type |= KVM_VM_TYPE_ARM_PROTECTED;
+
+#else
+	ser->type = 0;
+#endif
+
+	KHOSER_STORE_PTR(sd, ser);
+	KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+
+	return 0;
+}
+
+static atomic_t restored_vm_id = ATOMIC_INIT(0);
+
+static int kvm_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+	char fdname[ITOA_MAX_LEN + 1];
+	struct kvm_luo_ser *ser;
+	struct file *file;
+	struct kvm *kvm;
+	int err = 0;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return -EINVAL;
+
+	snprintf(fdname, sizeof(fdname), "%d",
+		 atomic_inc_return(&restored_vm_id));
+
+	file = kvm_create_vm_file(ser->type, fdname);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_free_ser;
+	}
+
+	kvm = file->private_data;
+
+	args->file = file;
+	kho_restore_free(ser);
+
+	kvm_uevent_notify_vm_create(kvm);
+	return 0;
+
+err_free_ser:
+	kho_restore_free(ser);
+	return err;
+}
+
+static void kvm_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+	struct kvm_luo_ser *ser;
+
+	/*
+	 * in case preservation failed, args->serialized_data will
+	 * be NULL and kvm_luo_preserve takes care of cleaning up.
+	 * If preserve succeeds, this condition fails and unpreserve
+	 * function takes care of cleaning up.
+	 */
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return;
+
+	kho_unpreserve_free(ser);
+}
+
+static void kvm_luo_finish(struct liveupdate_file_op_args *args)
+{
+	struct kvm_luo_ser *ser;
+
+	/*
+	 * If retrieve_status is true or set to error, nothing to do here.
+	 * Already cleaned up in kvm_luo_retrieve().
+	 */
+	if (args->retrieve_status)
+		return;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return;
+
+	kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_luo_file_ops = {
+	.can_preserve = kvm_luo_can_preserve,
+	.preserve = kvm_luo_preserve,
+	.retrieve = kvm_luo_retrieve,
+	.unpreserve = kvm_luo_unpreserve,
+	.finish = kvm_luo_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_luo_handler = {
+	.ops = &kvm_luo_file_ops,
+	.compatible = KVM_LUO_FH_COMPATIBLE,
+};
+
+int kvm_luo_init(void)
+{
+	int err = liveupdate_register_file_handler(&kvm_luo_handler);
+
+	if (err && err != -EOPNOTSUPP) {
+		pr_err("Could not register kvm_vm_luo handler: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+void kvm_luo_exit(void)
+{
+	liveupdate_unregister_file_handler(&kvm_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14c3254..d9c3dd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6577,6 +6577,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	if (r)
 		goto err_virt;
 
+	r = kvm_luo_init();
+	if (r)
+		goto err_luo;
+
 	/*
 	 * Registration _must_ be the very last thing done, as this exposes
 	 * /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6590,6 +6594,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	return 0;
 
 err_register:
+	kvm_luo_exit();
+err_luo:
 	kvm_uninit_virtualization();
 err_virt:
 	kvm_gmem_exit();
@@ -6619,6 +6625,8 @@ void kvm_exit(void)
 	 */
 	misc_deregister(&kvm_dev);
 
+	kvm_luo_exit();
+
 	kvm_uninit_virtualization();
 
 	debugfs_remove_recursive(kvm_debugfs_dir);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 6241617..8719871 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -100,4 +100,12 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 }
 #endif /* CONFIG_KVM_GUEST_MEMFD */
 
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+int kvm_luo_init(void);
+void kvm_luo_exit(void);
+#else
+static inline int kvm_luo_init(void) { return 0; }
+static inline void kvm_luo_exit(void) {}
+#endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
+
 #endif /* __KVM_MM_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce core infrastructure to support VM preservation with LUO.

First two changes are just refactoring, no functional change, third
change introduces a new member in struct kvm.
- Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code.
- Add a public kvm_create_vm_file() helper wrapping kvm_create_vm()
  and anon_inode_getfile() to provide a unified VM file creation API.
- Track a weak reference to the backing file in struct kvm under
  CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution
  without circular lifetime dependencies.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 include/linux/kvm_host.h | 14 +++++++
 virt/kvm/kvm_main.c      | 79 +++++++++++++++++++++++++++++-----------
 virt/kvm/kvm_mm.h        |  3 ++
 3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfae..cbb5eb9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -874,6 +874,18 @@ struct kvm {
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
+#endif
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Weak reference to the VFS file backing this KVM instance. Stored
+	 * without incrementing the file refcount to prevent a circular lifetime
+	 * dependency (since file->private_data already pins this struct kvm).
+	 * Used exclusively to resolve the file pointer back from struct kvm.
+	 *
+	 * Written/cleared via rcu_assign_pointer() and read locklessly under
+	 * RCU (e.g. via get_file_active() to prevent ABA races).
+	 */
+	struct file *vm_file;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
 };
@@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm);
 bool kvm_get_kvm_safe(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
 bool file_is_kvm(struct file *file);
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname);
 void kvm_put_kvm_no_destroy(struct kvm *kvm);
+void kvm_uevent_notify_vm_create(struct kvm *kvm);
 
 static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e44c20c..14c3254 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -67,9 +67,6 @@
 #include <linux/kvm_dirty_ring.h>
 
 
-/* Worst case buffer size needed for holding an integer. */
-#define ITOA_MAX_LEN 12
-
 MODULE_AUTHOR("Qumranet");
 MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
 MODULE_LICENSE("GPL");
@@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 {
 	struct kvm *kvm = filp->private_data;
 
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Clear the weak reference of the vm file.
+	 * In case vm file is closed by userspace, but kvm still has
+	 * other users like vCPUs, clearing this pointer ensures
+	 * that we don't have a dangling pointer to a closed file.
+	 *
+	 * Cleared via rcu_assign_pointer() to ensure proper memory visibility
+	 * for concurrent lockless readers under RCU.
+	 */
+	rcu_assign_pointer(kvm->vm_file, NULL);
+#endif
+
 	kvm_irqfd_release(kvm);
 
 	kvm_put_kvm(kvm);
@@ -5477,11 +5487,47 @@ bool file_is_kvm(struct file *file)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm);
 
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
+{
+	struct kvm *kvm = kvm_create_vm(type, fdname);
+	struct file *file;
+
+	if (IS_ERR(kvm))
+		return ERR_CAST(kvm);
+
+	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+	if (IS_ERR(file)) {
+		kvm_put_kvm(kvm);
+		return file;
+	}
+
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Weak reference to the file (without get_file()) to prevent a circular
+	 * dependency. Safe because the file's release path clears this pointer
+	 * and drops its reference to the VM.
+	 *
+	 * Written via rcu_assign_pointer() because the pointer can be read
+	 * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
+	 * get_file_active() to prevent lockless ABA races).
+	 */
+	rcu_assign_pointer(kvm->vm_file, file);
+#endif
+
+	/*
+	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
+	 * already set, with ->release() being kvm_vm_release().  In error
+	 * cases it will be called by the final fput(file) and will take
+	 * care of doing kvm_put_kvm(kvm).
+	 */
+
+	return file;
+}
+
 static int kvm_dev_ioctl_create_vm(unsigned long type)
 {
 	char fdname[ITOA_MAX_LEN + 1];
 	int r, fd;
-	struct kvm *kvm;
 	struct file *file;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
@@ -5490,31 +5536,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 
 	snprintf(fdname, sizeof(fdname), "%d", fd);
 
-	kvm = kvm_create_vm(type, fdname);
-	if (IS_ERR(kvm)) {
-		r = PTR_ERR(kvm);
-		goto put_fd;
-	}
-
-	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+	file = kvm_create_vm_file(type, fdname);
 	if (IS_ERR(file)) {
 		r = PTR_ERR(file);
-		goto put_kvm;
+		goto put_fd;
 	}
 
-	/*
-	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
-	 * already set, with ->release() being kvm_vm_release().  In error
-	 * cases it will be called by the final fput(file) and will take
-	 * care of doing kvm_put_kvm(kvm).
-	 */
-	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data);
 
 	fd_install(fd, file);
 	return fd;
 
-put_kvm:
-	kvm_put_kvm(kvm);
 put_fd:
 	put_unused_fd(fd);
 	return r;
@@ -6342,6 +6374,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 	kfree(env);
 }
 
+void kvm_uevent_notify_vm_create(struct kvm *kvm)
+{
+	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+}
+
 static void kvm_init_debug(void)
 {
 	const struct file_operations *fops;
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 7510ca9..6241617 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -6,6 +6,9 @@
 #include <linux/kvm.h>
 #include <linux/kvm_types.h>
 
+/* Worst case buffer size needed for holding an integer as a string. */
+#define ITOA_MAX_LEN 12
+
 /*
  * Architectures can choose whether to use an rwlock or spinlock
  * for the mmu_lock.  These macros, for use in common code
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce the LIVEUPDATE_GUEST_MEMFD Kconfig option. This option
enables live update support for KVM guest_memfd files, enabling
guest_memfd-backed memory preservation across kernel upgrades.

Currently this support only guest_memfd files that are full-shared
and pre-faulted.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 kernel/liveupdate/Kconfig | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index c13af38..2490f9a 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -86,4 +86,19 @@ config LIVEUPDATE_MEMFD
 
 	  If unsure, say N.
 
+config LIVEUPDATE_GUEST_MEMFD
+	bool "Live update support for guest_memfd"
+	depends on LIVEUPDATE
+	depends on KVM_GUEST_MEMFD
+	default LIVEUPDATE
+	help
+	  Enable live update support for KVM guest_memfd files. This allows
+	  preserving VM Memory backed by guest_memfd file across kernel live
+	  updates.
+
+	  This can only be used for the guest_memfd that are fully-shared
+	  and pre-faulted.
+
+	  If unsure, say N.
+
 endmenu
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel

Hello,
This is Non-RFC patch series for guest_memfd preservation. After
having multiple discussion across hypervisor liveupdate meeting,
guest_memfd bi-weekly meeting, the design for the basic support of
guest_memfd preservation is final. This series is going to include
guest_memfd which are fully shared and does not support private mem
and backed by PAGE_SIZE pages.

Steps to test:
1. Compile Kernel with CONFIG_LIVEUPDATE_GUEST_MEMFD=y
2. boot kernel with command line: kho=on liveupdate=on
3. run the following kselftest
	$ .selftests/kvm/guest_memfd_preservation_test --stage 1
	$ <kexec> --reuse-cmdline
	$ .selftests/kvm/guest_memfd_preservation_test --stage 2

NOTE: Assert the following:
	$ ls /dev/liveupdate
	$ ls /dev/kvm
	$ dmesg | grep liveupdate # (should have kvm_vm_luo &&
		# guest_memfd_luo handler registered)

The changes are rebased on:
	kvm/next + liveupdate/next (merge) + [3] + [4] + [5]
	Where,
	[3]: luo: conversion of serialized_data to KHOSER_PTR
	[4]: luo: APIs to retrieve file internally from session
	[5]: selftests: liveupdate sefltests library
Here is the github repo:
	https://github.com/tar-unix/linux/tree/gmem-pre

V3 <- RFC V2 [2]
1. Finalize the design
2. resolve sashiko reported bugs
3. Use of KHOSER_PTR instead of raw serialized_data as per [3]

RFC V2 [2] <- RFC V1 [1]
1. Removed mem_attr_array as it is not needed for fully-shared
2. Removed pre-faulted condition
3. Added vm_type preservation for ARM64.
4. Removed liveupdate_get_file_incoming api patch as it is sent
   separately [4] by Samiullah.

[1] https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/
[2] https://lore.kernel.org/all/c054ba0fb2639932bbe354420d3f4f84cce84905.1780676742.git.tarunsahu@google.com/
[3] https://lore.kernel.org/all/20260622111215.4157974-1-tarunsahu@google.com/
[4] https://lore.kernel.org/all/20260613012521.835490-1-skhawaja@google.com/
[5] https://lore.kernel.org/all/20260612214512.464146-1-vipinsh@google.com/

Tarun Sahu (9):
  liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
  kvm: Prepare core VM structs and helpers for LUO support
  kvm: kvm_luo: Allow kvm preservation with LUO
  kvm: guest_memfd: Move internal definitions and helper to new header
  kvm: guest_memfd: Add support for freezing and unfreezing mappings
  kvm: guest_memfd_luo: add support for guest_memfd preservation
  docs: add documentation for guest_memfd preservation via LUO
  selftests: kvm: Split ____vm_create() to expose init helpers
  selftests: kvm: Add guest_memfd_preservation_test

 Documentation/core-api/liveupdate.rst         |   1 +
 Documentation/liveupdate/vmm.rst              | 107 ++++
 MAINTAINERS                                   |  14 +
 include/linux/kho/abi/kvm.h                   | 106 ++++
 include/linux/kvm_host.h                      |  14 +
 kernel/liveupdate/Kconfig                     |  15 +
 tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
 .../kvm/guest_memfd_preservation_test.c       | 236 +++++++++
 .../testing/selftests/kvm/include/kvm_util.h  |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  26 +-
 virt/kvm/Makefile.kvm                         |   1 +
 virt/kvm/guest_memfd.c                        | 185 +++++--
 virt/kvm/guest_memfd.h                        |  44 ++
 virt/kvm/guest_memfd_luo.c                    | 497 ++++++++++++++++++
 virt/kvm/kvm_luo.c                            | 195 +++++++
 virt/kvm/kvm_main.c                           |  94 +++-
 virt/kvm/kvm_mm.h                             |  15 +
 17 files changed, 1477 insertions(+), 81 deletions(-)
 create mode 100644 Documentation/liveupdate/vmm.rst
 create mode 100644 include/linux/kho/abi/kvm.h
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
 create mode 100644 virt/kvm/guest_memfd.h
 create mode 100644 virt/kvm/guest_memfd_luo.c
 create mode 100644 virt/kvm/kvm_luo.c

-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Conor Dooley @ 2026-06-22 18:39 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Nuno Sá, Rodrigo Alencar, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <20260622172911.48259a0c@jic23-huawei>

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On Mon, Jun 22, 2026 at 05:29:11PM +0100, Jonathan Cameron wrote:
> > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > (I suspect it doesn't!)
> > > > 
> > > > Just thinking as I type, but could we do something a bit nasty with
> > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > under serializing locks. 
> > > > 
> > > > Agreed though that this would be nicer as an SPI thing that let
> > > > us specify that a single CS is share by multiple devices and their
> > > > is some other signal acting to select which one we are talking to.
> > > >   
> > > 
> > > If the device-addressing on the same chip-select is to be handled
> > > by the spi framework, wouldn't we lose device-specific features?
> > > 
> > > I understand that this multi-device feature is there mostly to extend the
> > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > 
> > > 	"MULTI DEVICE SW LDAC MODE"
> > > 
> > > exists so that software can update channel values accross multiple devices.  
> > 
> > Right! You do have a point! I agree the main driver for a feature like
> > this is likely to extend the channel count and effectively "aggregate"
> > devices.
> > 
> > But I would say that even with the spi solution the MULTI DEVICE stuff
> > should be doable (as we still need a sort of adi,pin-id property). 
> > 
> > But yes, I do feel that the whole feature is for aggregation so seeing
> > one device with 32 channels is the expectation here? Rather than seeing
> > two devices with 16 channels.
> 
> Agreed - if we have messages that address both devices at once that needs
> to be a unified driver and given they are about triggering simultaneous
> update of all channels it needs to look like one big device.
> This ends up similar to how we handle daisy chain devices.
> 
> The question of what to do on devices that don't have this feature
> is rather different. Good thing you read the datasheet :)

I'm not sure it really is, the intent for the microchip devices I think
is pretty similar. The mcp3911 datasheet cites three-phase power
metering using three devices as a typical use-case, for example.
Probably creating an amalgamated device is a good fit there too?

I assume an amalgamated device for this ADI product means per-channel ID
properties? If so, I think they should be made generic and the Microchip
products retrofitted to use them, with a fallback to the proprietary
property. Not going to ask for the support for multiple devices in those
drivers, since the current way doesn't work and there'd be no loss of
support. Someone from Microchip can do that. The proprietary property
to generic conversion should be straightforward and provides weight to
an argument for this being generic, since that'd be three devices that
can all share?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH] Docs/driver-api/uio-howto: document mmap_prepare callback
From: Doehyun Baek @ 2026-06-22 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jonathan Corbet, Shuah Khan
  Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, linux-doc,
	linux-kernel, Doehyun Baek

The UIO howto still documents an mmap callback in struct uio_info.
That field was replaced by mmap_prepare, which takes a struct
vm_area_desc.

A UIO driver following the current howto no longer builds because
struct uio_info has no mmap member. Update the documented callback
signature and matching text to match the current API.

Fixes: 933f05f58ac6 ("uio: replace deprecated mmap hook with mmap_prepare in uio_info")
Signed-off-by: Doehyun Baek <doehyunbaek@gmail.com>
---
 Documentation/driver-api/uio-howto.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst
index 907ffa3b38f5..c08472dfbcfe 100644
--- a/Documentation/driver-api/uio-howto.rst
+++ b/Documentation/driver-api/uio-howto.rst
@@ -246,10 +246,10 @@ the members are required, others are optional.
    hardware interrupt number. The flags given here will be used in the
    call to :c:func:`request_irq()`.
 
--  ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``:
+-  ``int (*mmap_prepare)(struct uio_info *info, struct vm_area_desc *desc)``:
    Optional. If you need a special :c:func:`mmap()`
    function, you can set it here. If this pointer is not NULL, your
-   :c:func:`mmap()` will be called instead of the built-in one.
+   ``mmap_prepare`` will be called instead of the built-in one.
 
 -  ``int (*open)(struct uio_info *info, struct inode *inode)``:
    Optional. You might want to have your own :c:func:`open()`,

base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd
-- 
2.43.0


^ permalink raw reply related

* Re: [RFC PATCH 0/2] kasan: hw_tags: Add option to tag only at allocation time
From: Catalin Marinas @ 2026-06-22 17:13 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Dev Jain, ryabinin.a.a, akpm, corbet, glider, andreyknvl, dvyukov,
	vincenzo.frascino, kasan-dev, linux-mm, linux-kernel, skhan,
	workflows, linux-doc, linux-arm-kernel, ryan.roberts,
	anshuman.khandual, kaleshsingh, 21cnbao, david, will
In-Reply-To: <2208123f-8a51-483b-aa93-c35d8d053d25@kernel.org>

Hi Harry,

On Mon, Jun 22, 2026 at 09:42:10PM +0900, Harry Yoo wrote:
> On 6/19/26 10:19 PM, Catalin Marinas wrote:
> > On Thu, Jun 18, 2026 at 10:35:15PM +0900, Harry Yoo wrote:
> >> On 6/12/26 1:44 PM, Dev Jain wrote:
> >>> Now, when a memory object will be freed, it will retain the random tag it
> >>> had at allocation time. This compromises on catching UAF bugs, till the
> >>> time the object is not reallocated, at which point it will have a new
> >>> random tag.
> >>>
> >>> Hence, not catching "use-after-free-before-reallocation" and not catching
> >>> "double-free" will be the compromise for reduced KASAN overhead.
> >>
> >> I doubt users who care about security enough to enable HW_TAGS KASAN
> >> are willing to compromise on security just to save a few instructions
> >> to store tags in the free path.
> >>
> >> To me, it looks like too much of a compromise on security for little
> >> performance gain.
> > 
> > I don't think there's much compromise on security for use-after-free.
> 
> I think it depends... OH, WAIT! I see what you mean.
> 
> You mean use-after-free before reallocation does not lead to much
> compromise on security because objects are initialized after allocation?
> 
> You're probably right.
> 
> Hmm, but stores to e.g.) free pointer, fields initialized by
> constructor or accessed by SLAB_TYPESAFE_BY_RCU semantics after free
> will be undiscovered if they happen before reallocation.

Even with SLAB_TYPESAFE_BY_RCU, the object isn't tagged on free either
(or realloc, only if the actual slab page ends up freed). But we don't
get type confusion for such slab.

However, without tagging on free, one could argue that it reduces
security for cases where the page is re-allocated as untagged - e.g. all
user pages mapped without PROT_MTE. Currently we have a deterministic
tag check fault if the page is coloured as KASAN_TAG_INVALID. I think
for this patch, it might be better to only do such skip on free in
kasan_poison_slab() rather than kasan_poison(). Freed pages would then
be tagged.

An alternative would be tagging on free only with a new tag and skipping
it on re-alloc. But we'd need to track when it's a completely new
allocation or a reused object (I haven't looked I'm pretty sure it's
doable).

-- 
Catalin

^ permalink raw reply

* Re: [PATCH RFC v5 6/6] iio: osf: register IIO devices from capabilities
From: Jonathan Cameron @ 2026-06-22 17:07 UTC (permalink / raw)
  To: Jinseob Kim
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, David Lechner,
	Nuno Sá, Andy Shevchenko, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-doc, linux-kernel
In-Reply-To: <20260616072242.3942-7-kimjinseob88@gmail.com>

On Tue, 16 Jun 2026 16:22:42 +0900
Jinseob Kim <kimjinseob88@gmail.com> wrote:

> Register IIO devices for supported Open Sensor Fusion capability entries
> and push received samples into IIO buffers when enabled.
> 
> Signed-off-by: Jinseob Kim <kimjinseob88@gmail.com>
Sashiko had a few comments.  The last one on the unitilialized heap
memory needs a new version of the fix from me.

Hopefully I'll get to that in the next few days,

https://sashiko.dev/#/patchset/20260529121005.1470-1-kimjinseob88%40gmail.com

The one about intermediate build issues (if correct) suggests you didn't
ensure this series builds after each patch. Please make sure to do that
to avoid breaking bisectability of the kernel.

Thanks,

Jonathan

> ---
>  drivers/iio/opensensorfusion/Kconfig    |  11 +-
>  drivers/iio/opensensorfusion/Makefile   |   3 +-
>  drivers/iio/opensensorfusion/osf_core.c | 253 ++++++++++++++++++++--
>  drivers/iio/opensensorfusion/osf_core.h |  52 +++++
>  drivers/iio/opensensorfusion/osf_iio.c  | 275 ++++++++++++++++++++++++
>  drivers/iio/opensensorfusion/osf_iio.h  |  22 ++
>  6 files changed, 586 insertions(+), 30 deletions(-)
>  create mode 100644 drivers/iio/opensensorfusion/osf_iio.c
>  create mode 100644 drivers/iio/opensensorfusion/osf_iio.h
> 
> diff --git a/drivers/iio/opensensorfusion/Kconfig b/drivers/iio/opensensorfusion/Kconfig
> index d393eb3aa..8b9376d28 100644
> --- a/drivers/iio/opensensorfusion/Kconfig
> +++ b/drivers/iio/opensensorfusion/Kconfig
> @@ -5,11 +5,10 @@ config OPEN_SENSOR_FUSION
>  	depends on IIO
>  	depends on SERIAL_DEV_BUS
>  	select CRC32
> +	select IIO_BUFFER
> +	select IIO_KFIFO_BUF
>  	help
> -	  Build the Open Sensor Fusion UART receive path.
> +	  Build the Open Sensor Fusion UART IIO driver.
>  
> -	  The driver receives OSF protocol frames over a serdev UART.
> -	  Frames are decoded and validated before being passed to the
> -	  driver core.
> -	  This patch only adds the transport path.
> -	  IIO device registration is added separately.
> +	  The driver receives OSF protocol frames over a serdev UART and
> +	  registers IIO devices for supported capability entries.
Avoid this churn. I wouldn't worry about it being a little forwards
looking when added in the earlier patch and directly go to the final
text.

> diff --git a/drivers/iio/opensensorfusion/osf_core.c b/drivers/iio/opensensorfusion/osf_core.c
> index 137fb7166..61ef55646 100644
> --- a/drivers/iio/opensensorfusion/osf_core.c
> +++ b/drivers/iio/opensensorfusion/osf_core.c

>  
> -static int osf_core_validate_sensor_sample(const struct osf_frame *frame)
> +static int osf_core_register_capabilities(struct osf_device *osf,
> +					  const struct osf_capability_cache *cache)
>  {
> +	struct iio_dev *indio_dev;
> +	unsigned int i;
> +	int ret;
> +
> +	if (osf->capability_cache.valid)
> +		return 0;
> +
> +	for (i = 0; i < cache->capability_count; i++) {
> +		if (!osf_iio_sensor_supported(cache->entries[i].sensor_type,
> +					      cache->entries[i].channel_count))
> +			continue;
> +
> +		if (osf_core_capability_is_duplicate(cache, i))
> +			return -EEXIST;
> +	}
> +
> +	for (i = 0; i < cache->capability_count; i++) {
> +		if (!osf_iio_sensor_supported(cache->entries[i].sensor_type,
> +					      cache->entries[i].channel_count))
> +			continue;
> +
> +		ret = osf_iio_register_sensor(osf->dev, &cache->entries[i],
> +					      osf, &indio_dev);
> +		if (ret)
> +			goto err_unregister;
> +
> +		osf->iio_devs[osf->iio_dev_count].sensor_type =
> +			cache->entries[i].sensor_type;
> +		osf->iio_devs[osf->iio_dev_count].sensor_index =
> +			cache->entries[i].sensor_index;
> +		osf->iio_devs[osf->iio_dev_count].indio_dev = indio_dev;
> +		osf->iio_dev_count++;

Probably use a designated initializer for this one
		ost->iio_dev[osf->iio_dev_count++] = (struct osf_iio_binding) {
			.sensor_type = ...

		};

Not a problem if the lines are over 80 chars given this should be generally easier
to read.

> +
> +static int osf_core_handle_sensor_sample(struct osf_device *osf,
> +					 const struct osf_frame *frame)
> +{
> +	struct osf_latest_sample *latest;
>  	struct osf_sensor_sample sample;
> +	struct iio_dev *indio_dev;
> +	s32 values[OSF_MAX_SAMPLE_CHANNELS] = { };
> +	unsigned int i;
> +	int ret;
> +
> +	ret = osf_protocol_decode_sensor_sample(frame, &sample);
> +	if (ret)
> +		return ret;
> +
> +	if (sample.channel_count > OSF_MAX_SAMPLE_CHANNELS)
> +		return -E2BIG;
> +
> +	for (i = 0; i < sample.channel_count; i++) {
> +		ret = osf_protocol_sensor_sample_value(&sample, i, &values[i]);
> +		if (ret)
> +			return ret;
> +	}
>  
> -	return osf_protocol_decode_sensor_sample(frame, &sample);
> +	mutex_lock(&osf->latest_lock);

This may well be better as a scoped_guard()

> +	latest = osf_core_find_latest_sample(osf, sample.sensor_type,
> +					     sample.sensor_index);
> +	if (!latest) {
> +		mutex_unlock(&osf->latest_lock);

scoped_guard() would allow you to return here without worrying
about the manual unlock.

> +		return -E2BIG;
> +	}
> +
> +	memcpy(latest->values, values, sizeof(values));
> +	latest->sensor_type = sample.sensor_type;
> +	latest->sensor_index = sample.sensor_index;
> +	latest->channel_count = sample.channel_count;
> +	latest->sample_format = sample.sample_format;
> +	latest->scale_nano = sample.scale_nano;
> +	latest->sequence = frame->sequence;
> +	latest->timestamp_us = frame->timestamp_us;
> +	latest->valid = true;
> +	osf->last_sequence = frame->sequence;
> +	mutex_unlock(&osf->latest_lock);
> +
> +	indio_dev = osf_core_find_iio_dev(osf, sample.sensor_type,
> +					  sample.sensor_index);
> +	if (!indio_dev)
> +		return 0;
> +
> +	return osf_iio_push_sample(indio_dev, values, sample.channel_count);
>  }

>  
> @@ -73,27 +260,47 @@ int osf_core_receive_frame(struct osf_device *osf, const u8 *buf, size_t len)
>  
>  	switch (frame.message_type) {
>  	case OSF_MSG_SENSOR_SAMPLE:
> -		ret = osf_core_validate_sensor_sample(&frame);
> -		break;
> +		return osf_core_handle_sensor_sample(osf, &frame);
>  	case OSF_MSG_DEVICE_STATUS:
> -		ret = osf_core_validate_device_status(&frame);
> -		break;
> +		return osf_core_handle_device_status(osf, &frame);
>  	case OSF_MSG_CAPABILITY_REPORT:
> -		ret = osf_core_validate_capability_report(&frame);
> -		break;
> +		return osf_core_handle_capability_report(osf, &frame);
>  	default:
>  		if (frame.message_type >= OSF_RESERVED_MSG_FIRST &&
>  		    frame.message_type <= OSF_RESERVED_MSG_LAST)
> -			ret = 0;
> -		else if (frame.message_type >= OSF_VENDOR_PRIVATE_FIRST)
> -			ret = 0;
> -		else
> -			ret = -EOPNOTSUPP;
> -		break;
> +			return 0;
> +		if (frame.message_type >= OSF_VENDOR_PRIVATE_FIRST)
> +			return 0;
> +		return -EOPNOTSUPP;
>  	}

See if you can rework original code to reduce the churn here.

> +}
> +
> +int osf_core_read_latest_sample(struct osf_device *osf, u16 sensor_type,
> +				u16 sensor_index, unsigned int channel,
> +				s32 *value)
> +{
> +	const struct osf_latest_sample *latest;
> +	unsigned int i;
> +	int ret = -ENODATA;
> +
> +	if (!osf || !value)
> +		return -EINVAL;
> +
> +	mutex_lock(&osf->latest_lock);

Looks like a good place to use guard(mutex)(&osf->latest_lock);
Remember to include cleanup.h

> +	for (i = 0; i < osf->latest_sample_count; i++) {
> +		latest = &osf->latest_samples[i];
> +		if (latest->sensor_type != sensor_type ||
> +		    latest->sensor_index != sensor_index)
> +			continue;
> +
> +		if (!latest->valid || channel >= latest->channel_count)
> +			break;
>  
> -	if (!ret)
> -		osf->last_sequence = frame.sequence;
> +		*value = latest->values[channel];
> +		ret = 0;
With guard, you can return directly here.
> +		break;
> +	}
> +	mutex_unlock(&osf->latest_lock);
This gets handled automatically on leaving scope

Then if you get here you can just do
	return -ENODATA;

>  
>  	return ret;
>  }


> diff --git a/drivers/iio/opensensorfusion/osf_iio.c b/drivers/iio/opensensorfusion/osf_iio.c
> new file mode 100644
> index 000000000..862a797f4
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/osf_iio.c

> +
> +bool osf_iio_sensor_supported(u16 sensor_type, u16 channel_count)
> +{
> +	return !!osf_iio_find_sensor_spec(sensor_type, channel_count);
The !! is getting used a lot less in modern kernel code. Linus Torvalds
once pointed out how hard it is to read.  Maybe != 0 is clearer and
let the compiler do the optimization if it wants.

> +}
> +
> +const char *osf_iio_sensor_name(u16 sensor_type)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(osf_iio_sensor_specs); i++) {
> +		if (osf_iio_sensor_specs[i].sensor_type == sensor_type)
> +			return osf_iio_sensor_specs[i].name;
> +	}
> +
> +	return NULL;
> +}

> +}


> +
> +int osf_iio_push_sample(struct iio_dev *indio_dev, const s32 *values,
> +			unsigned int channel_count)

As you are comparing it with the reported number of channels from spec->channel
count I would match type with that (u16 I think)

> +{
> +	struct osf_iio_state *state = iio_priv(indio_dev);
> +	s64 timestamp;
> +
> +	if (channel_count != state->spec->channel_count)
> +		return -EPROTO;
> +
> +	/* This is only a fast path; IIO rechecks buffer state while pushing. */
> +	if (!iio_buffer_enabled(indio_dev))
> +		return 0;
> +
> +	timestamp = iio_get_time_ns(indio_dev);
> +
> +	return iio_push_to_buffers_with_ts_unaligned(indio_dev, values,
> +						     channel_count * sizeof(*values),
> +						     timestamp);
> +}


^ permalink raw reply

* Re: [PATCH v3 8/8] docs: misc: amd-sbi: Document SBTSI userspace interface
From: Randy Dunlap @ 2026-06-22 16:57 UTC (permalink / raw)
  To: Akshay Gupta, linux-doc, linux-kernel, linux-hwmon
  Cc: corbet, skhan, linux, arnd, gregkh, NaveenKrishna.Chatradhi,
	Anand.Umarji, Prathima.Lk
In-Reply-To: <20260622135821.2190260-9-Akshay.Gupta@amd.com>



On 6/22/26 6:58 AM, Akshay Gupta wrote:
> From: Prathima <Prathima.Lk@amd.com>
> 
> - Document AMD sideband IOCTL description defined
>   for SBTSI and its usage.
>   User space C-APIs are made available by esmi_oob_library [1],
>   which is provided by the E-SMS project [2].
> 
>   Link: https://github.com/amd/esmi_oob_library [1]
>   Link: https://www.amd.com/en/developer/e-sms.html [2]
> 
> Include a user-space open example for /dev/sbtsi-* and list auxiliary
> bus sysfs paths.
> 
> Reviewed-by: Akshay Gupta <Akshay.Gupta@amd.com>
> Signed-off-by: Prathima <Prathima.Lk@amd.com>
> ---
> Changes since v2:
> - Update misc node names info as per socket
> 
> Changes since v1:
> - Elaborate the document
>  Documentation/misc-devices/amd-sbi.rst | 68 ++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/Documentation/misc-devices/amd-sbi.rst b/Documentation/misc-devices/amd-sbi.rst
> index f91ddadefe48..fbbbc504119f 100644
> --- a/Documentation/misc-devices/amd-sbi.rst
> +++ b/Documentation/misc-devices/amd-sbi.rst
> @@ -48,6 +48,60 @@ Access restrictions:
>   * APML Mailbox messages and Register xfer access are read-write,
>   * CPUID and MCA_MSR access is read-only.
>  
> +SBTSI device
> +============
> +
> +sbtsi driver under the drivers/misc/amd-sbi creates miscdevice

   The sbtsi driver in the drivers/misc/amd-sbi/ directory creates a miscdevice

> +/dev/sbtsi-* to let user space programs run APML TSI register xfer

                                                                 transfer
?

> +commands.
> +
> +The driver supports both I2C and I3C transports for SB-TSI targets.
> +The transport is selected by the bus where the device is enumerated.
> +
> +Misc device:
> + * In 1P socket 0: /dev/sbtsi-4c
> + * In 2P socket 0: /dev/sbtsi-4c, socket 1: /dev/sbtsi-48
> +
> +.. code-block:: bash
> +
> +   $ ls -al /dev/sbtsi-4c
> +   crw-------    1 root     root       10, 116 Apr  2 05:22 /dev/sbtsi-4c
> +
> +
> +Access restrictions:
> + * Only root user is allowed to open the file.
> + * APML TSI Register xfer access is read-write.

                        transfer
?

> +
> +SBTSI hwmon interface
> +=====================
[snip]

-- 
~Randy


^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22 16:51 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Peter Zijlstra, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Julia Lawall, Yury Norov, linux-doc,
	linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <08b3c961-18bb-43d9-8d7f-8a87bcad0afa@infradead.org>

On Mon, 22 Jun 2026 09:40:45 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> > Did you forget your C 101 class? If you use a function, you gotta
> > include the relevant header.  
> 
> Also item #1 in Documentation/process/submit-checklist.rst.

What is that? Remove all trace_printk()s before you submit?

Because that is what you should do. But now you also need to remember
to remove the include <linux/trace_printk.h> too. Or, I guess if
someone uses it a lot, they may just keep it in their files without the
trace_printk()s.

-- Steve

^ permalink raw reply

* Re: [PATCH RFC v5 5/6] iio: osf: add UART transport
From: Jonathan Cameron @ 2026-06-22 16:49 UTC (permalink / raw)
  To: Jinseob Kim
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, David Lechner,
	Nuno Sá, Andy Shevchenko, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-doc, linux-kernel
In-Reply-To: <20260616072242.3942-6-kimjinseob88@gmail.com>

On Tue, 16 Jun 2026 16:22:41 +0900
Jinseob Kim <kimjinseob88@gmail.com> wrote:

> Add the serdev UART transport and the initial OSF core receive path.
> 
> Enable the required vcc regulator with devm_regulator_get_enable()
> before opening the UART, keeping power handling limited to the simple
> probe-time requirement for this RFC.
> 
> Signed-off-by: Jinseob Kim <kimjinseob88@gmail.com>
A few things inline.

Thanks,

Jonathan

> diff --git a/drivers/iio/opensensorfusion/Kconfig b/drivers/iio/opensensorfusion/Kconfig
> new file mode 100644
> index 000000000..d393eb3aa
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config OPEN_SENSOR_FUSION
> +	tristate "Open Sensor Fusion UART IIO driver"
> +	depends on IIO
> +	depends on SERIAL_DEV_BUS
> +	select CRC32
> +	help
> +	  Build the Open Sensor Fusion UART receive path.
> +
> +	  The driver receives OSF protocol frames over a serdev UART.
> +	  Frames are decoded and validated before being passed to the
> +	  driver core.
> +	  This patch only adds the transport path.
> +	  IIO device registration is added separately.

Don't talk about a patch in here.  Talk about what is supported then
if you really want to add the other bits in later patches.  Mostly
this help is generic enough we don't need to modify it more than
once in a series.

> diff --git a/drivers/iio/opensensorfusion/osf_core.c b/drivers/iio/opensensorfusion/osf_core.c
> new file mode 100644
> index 000000000..137fb7166
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/osf_core.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/errno.h>
> +#include <linux/string.h>
> +#include <linux/types.h>
> +
> +#include "osf_core.h"
> +#include "osf_protocol.h"
> +
> +#define OSF_RESERVED_MSG_FIRST		0x7f00
> +#define OSF_RESERVED_MSG_LAST		0x7fff
> +#define OSF_VENDOR_PRIVATE_FIRST	0x8000
> +
> +void osf_core_init(struct osf_device *osf, struct device *dev)
> +{
> +	memset(osf, 0, sizeof(*osf));
	*osf = (struct osf_device){
		.dev = dev,
	};

is guaranteed to also clear all other fields (new C spec as
well as the options the kernel has long been built with)
so is how I would always do cases of zero then set stuff like
this.

> +	osf->dev = dev;
> +}


> diff --git a/drivers/iio/opensensorfusion/osf_serdev.c b/drivers/iio/opensensorfusion/osf_serdev.c
> new file mode 100644
> index 000000000..624cb01fe
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/osf_serdev.c

> +
> +static void osf_serdev_remove(struct serdev_device *serdev)
> +{
> +	struct osf_serdev *osf_uart = serdev_device_get_drvdata(serdev);
> +
> +	serdev_device_close(serdev);
> +	osf_stream_reset(&osf_uart->stream);
> +	osf_core_unregister_iio(&osf_uart->osf);

My gut feeling is this should be first to tear down the device
interfaces as soon as possible.  They will have been initialized
after the serdev was opened so should be unregistered before it is closed.
If there is a reason for this specific order add a comment.

> +}

> +
> +static struct serdev_device_driver osf_serdev_driver = {
> +	.probe = osf_serdev_probe,
> +	.remove = osf_serdev_remove,
> +	.driver = {
> +		.name = "open-sensor-fusion-uart",
> +		.of_match_table = osf_serdev_of_match,
> +	},
> +};
> +

No blank line here as the macro is extremely tightly coupled
with the structure and it is nice to have the visual cue.

> +module_serdev_device_driver(osf_serdev_driver);
> +
> +MODULE_DESCRIPTION("Open Sensor Fusion IIO driver");
> +MODULE_LICENSE("GPL");


^ permalink raw reply

* Re: [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group
From: Reinette Chatre @ 2026-06-22 16:47 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <280f2dab-56be-49b9-982f-16f01727a732@amd.com>

Hi Babu,

On 6/18/26 6:29 PM, Babu Moger wrote:
> On 6/16/26 18:42, Reinette Chatre wrote:
>> On 4/30/26 4:24 PM, Babu Moger wrote:

...

>>> +/**
>>> + * rdtgroup_config_kmode_clear() - Tear down the kernel-mode binding on @rdtgrp
>>> + * @rdtgrp:    Resctrl group whose kernel-mode binding is being released.
>>> + *        May be %NULL when no group is currently bound, in which case
>>> + *        this is a no-op.
>>> + * @kmode:    Kernel-mode policy currently active on @rdtgrp, as a
>>> + *        BIT(&enum resctrl_kernel_modes) value.  When this is
>>> + *        BIT(INHERIT_CTRL_AND_MON) the hardware tear-down is skipped
>>> + *        because no MSR was previously programmed.
>>> + *
>>> + * Disables the kernel-mode binding on the CPUs @rdtgrp covers (its
>>> + * @kmode_cpu_mask, or all online CPUs when that mask is empty) and resets
>>> + * the per-group bookkeeping (@kmode and @kmode_cpu_mask).  This is the
>>> + * disable counterpart of rdtgroup_config_kmode() and exists so that a write
>>> + * that transitions the active mode to BIT(INHERIT_CTRL_AND_MON) -- which
>>> + * skips rdtgroup_config_kmode() entirely -- still tears down the previously
>>> + * bound group instead of leaving stale enable bits behind.
>>> + *
>>> + * On allocation failure the function returns -ENOMEM and leaves both the
>>> + * hardware state and @rdtgrp's bookkeeping unchanged so the caller can fail
>>> + * the operation atomically and last_cmd_status reflects reality.
>>> + *
>>> + * Context: Caller must hold rdtgroup_mutex.
>>> + *
>>> + * Return: 0 on success (including the @rdtgrp == %NULL and INHERIT cases),
>>> + * -ENOMEM if cpumask allocation fails.
>>> + */
>>> +static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
>>> +{
>>> +    cpumask_var_t disable_mask;
>>> +    u32 closid, rmid;
>>> +
>>> +    if (!rdtgrp)
>>> +        return 0;
>>> +
>>> +    if (kmode == BIT(INHERIT_CTRL_AND_MON))
>>> +        goto out_clear;
>>> +
>>> +    if (!zalloc_cpumask_var(&disable_mask, GFP_KERNEL))
>>> +        return -ENOMEM;
>>> +
>>> +    if (rdtgrp->type == RDTMON_GROUP) {
>>> +        closid = rdtgrp->mon.parent->closid;
>>> +        rmid = rdtgrp->mon.rmid;
>>> +    } else {
>>> +        closid = rdtgrp->closid;
>>> +        rmid = rdtgrp->mon.rmid;
>>> +    }
>>
> 
> I can directly use it like below. I dont need to check for RDTMON_GROUP.
> 
>     closid = rdtgrp->closid;
>      rmid = rdtgrp->mon.rmid;
> 
> 
>> Same comment as above ... but actually, why is closid/rmid needed at all? This
>> function is intended to *reset* the kernel mode so needing a valid/active closid and
>> rmid does not look right.
> 
> This is a bit tricky. I may need CLOSID/RMID in
> resctrl_arch_configure_kmode(). According to the specification, only
> the PLZA_EN field is allowed to differ across CPUs where PLZA is
> enabled; all other fields must remain consistent across CPUs within
> the same domain. If CLOSID/RMID are not passed, it could result in
> inconsistent values across CPUs.


I see. Let's revisit this in next version. It is not quite clear to me how
the rework of cpu_mask wrangling will impact the resctrl_arch_configure_kmode()
calls. To simplify this for now resctrl could continue to provide closid and rmid
to architecture (with the API documentation in include/linux/resctrl.h documenting
why it is provided and that it may be unused by architecture). 



>>> +
>>> +    /*
>>> +     * Split "<mode>:group=<spec>"; the ":group=<spec>" suffix is optional
>>> +     * and when omitted the default control group (&rdtgroup_default) is used.
>>> +     */
>>> +    group_str = strstr(buf, ":group=");
>>> +    if (group_str) {
>>> +        *group_str = '\0';
>>> +        group_str += strlen(":group=");
>>> +    }
>>> +    mode_str = buf;
>>> +
>>> +    mutex_lock(&rdtgroup_mutex);
>>> +    rdt_last_cmd_clear();
>>> +
>>> +    for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++)
>>> +        if (!strcmp(mode_str, resctrl_mode_str[i]))
>>> +            break;
>>> +    if (i == RESCTRL_NUM_KERNEL_MODES) {
>>> +        rdt_last_cmd_puts("Unknown kernel mode\n");
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    if (!(resctrl_kcfg.kmode & BIT(i))) {
>>> +        rdt_last_cmd_puts("Kernel mode not available\n");
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    kmode = BIT(i);
>>
>> Can kmode be of enum type to be assigned the actual enum value to avoid all these BIT(enum value) usages?
> 
> You mean?
> 
> enum resctrl_kernel_modes {
>     INHERIT_CTRL_AND_MON        = 1U << 0,  /* 1 */
>     GLOBAL_ASSIGN_CTRL_INHERIT_MON    = 1U << 1,  /* 2 */
>     GLOBAL_ASSIGN_CTRL_ASSIGN_MON    = 1U << 2,  /* 4 */
> };
> 
> #define RESCTRL_NUM_KERNEL_MODES  3

No. I mean:
	enum resctrl_kernel_mode kmode;
... with a change like this code like below can be simplified:

>>> +    if (kmode == BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU) &&

	kmode == GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU

>>> +        rdtgrp->type != RDTMON_GROUP) {
>>> +        rdt_last_cmd_puts("global_assign_ctrl_assign_mon_per_cpu requires a monitor group\n");
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +    if (kmode == BIT(GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU) &&

	kmode == GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU

>>> +        rdtgrp->type != RDTCTRL_GROUP) {
>>> +        rdt_last_cmd_puts("global_assign_ctrl_inherit_mon_per_cpu requires a control group\n");
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +

Reinette



^ permalink raw reply

* Re: [PATCH RFC v5 3/6] iio: osf: add protocol decoding
From: Jonathan Cameron @ 2026-06-22 16:43 UTC (permalink / raw)
  To: Jinseob Kim
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, David Lechner,
	Nuno Sá, Andy Shevchenko, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-doc, linux-kernel
In-Reply-To: <20260616072242.3942-4-kimjinseob88@gmail.com>

On Tue, 16 Jun 2026 16:22:39 +0900
Jinseob Kim <kimjinseob88@gmail.com> wrote:

> Add helpers for validating and decoding Open Sensor Fusion frames and the
> message payloads used by the initial receive path.
> 
> Signed-off-by: Jinseob Kim <kimjinseob88@gmail.com>
A few things inline.

> ---
>  drivers/iio/opensensorfusion/osf_protocol.c | 249 ++++++++++++++++++++
>  drivers/iio/opensensorfusion/osf_protocol.h |  97 ++++++++
>  2 files changed, 346 insertions(+)
>  create mode 100644 drivers/iio/opensensorfusion/osf_protocol.c
>  create mode 100644 drivers/iio/opensensorfusion/osf_protocol.h
> 
> diff --git a/drivers/iio/opensensorfusion/osf_protocol.c b/drivers/iio/opensensorfusion/osf_protocol.c
> new file mode 100644
> index 000000000..5bee545f3
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/osf_protocol.c

> +int osf_protocol_decode_frame(const u8 *buf, size_t len,
> +			      struct osf_frame *frame, size_t *frame_len)
> +{
> +	u32 expected_crc;
> +	u32 actual_crc;
> +	u32 payload_len;
> +	size_t total_len;
> +	u8 major;
> +
> +	if (!buf || !frame || !frame_len)
> +		return -EINVAL;
> +
> +	if (len < OSF_FRAME_MIN_LEN)
> +		return -EMSGSIZE;
> +
> +	if (get_unaligned_le32(buf) != OSF_FRAME_MAGIC)
> +		return -EPROTO;
> +
> +	major = buf[4];
> +	if (major != OSF_PROTOCOL_MAJOR)
> +		return -EPROTO;
> +
> +	if (get_unaligned_le16(buf + 6) != OSF_FRAME_HEADER_LEN)
> +		return -EPROTO;
> +
> +	payload_len = get_unaligned_le32(buf + 10);
> +	if (payload_len > len - OSF_FRAME_MIN_LEN)
> +		return -EMSGSIZE;
> +
> +	if (get_unaligned_le32(buf + 34))
> +		return -EPROTO;
> +
> +	total_len = OSF_FRAME_HEADER_LEN + payload_len + OSF_FRAME_CRC_LEN;
> +	expected_crc = osf_crc32_ieee(buf, OSF_FRAME_HEADER_LEN + payload_len);
> +	actual_crc = get_unaligned_le32(buf + OSF_FRAME_HEADER_LEN + payload_len);
> +
> +	if (actual_crc != expected_crc)
> +		return -EBADMSG;
> +
> +	frame->protocol_minor = buf[5];
> +	frame->message_type = get_unaligned_le16(buf + 8);
> +	frame->payload_len = payload_len;
> +	frame->sequence = get_unaligned_le64(buf + 14);
> +	frame->timestamp_us = get_unaligned_le64(buf + 22);
> +	frame->flags = get_unaligned_le32(buf + 30);
> +	frame->payload = buf + OSF_FRAME_HEADER_LEN;
> +	frame->crc = actual_crc;

Same as below wrt to to using designated initializers for these
structure fills

> +	*frame_len = total_len;
> +
> +	return 0;
> +}
> +
> +int osf_protocol_decode_sensor_sample(const struct osf_frame *frame,
> +				      struct osf_sensor_sample *sample)
> +{
> +	u16 channel_count;
> +	u16 sample_format;
> +	u16 sensor_type;
> +	size_t expected_len;
> +	const u8 *payload;
> +
> +	if (!frame || !sample || !frame->payload)
> +		return -EINVAL;
> +
> +	if (frame->message_type != OSF_MSG_SENSOR_SAMPLE)
> +		return -EPROTO;
> +
> +	if (frame->payload_len < OSF_SENSOR_SAMPLE_BASE_LEN)
> +		return -EMSGSIZE;
> +
> +	payload = frame->payload;
> +	sensor_type = get_unaligned_le16(payload);
> +	channel_count = get_unaligned_le16(payload + 4);
> +	sample_format = get_unaligned_le16(payload + 6);
> +
> +	if (!osf_sensor_type_valid(sensor_type))
> +		return -EPROTO;
> +
> +	if (!channel_count)
> +		return -EPROTO;
> +
> +	if (sample_format != OSF_SAMPLE_FORMAT_S32)
> +		return -EPROTO;
> +
> +	if (get_unaligned_le32(payload + 12))
> +		return -EPROTO;
> +
> +	if (channel_count > (SIZE_MAX - OSF_SENSOR_SAMPLE_BASE_LEN) / sizeof(s32))
> +		return -EOVERFLOW;
> +
> +	expected_len = OSF_SENSOR_SAMPLE_BASE_LEN + channel_count * sizeof(s32);
> +	if (frame->payload_len != expected_len)
> +		return -EMSGSIZE;
> +
> +	sample->sensor_type = sensor_type;
> +	sample->sensor_index = get_unaligned_le16(payload + 2);
> +	sample->channel_count = channel_count;
> +	sample->sample_format = sample_format;
> +	sample->scale_nano = get_unaligned_le32(payload + 8);
> +	sample->samples = payload + OSF_SENSOR_SAMPLE_BASE_LEN;
See below. Designated initializer would help readability a little here.

> +
> +	return 0;
> +}
> +
> +int osf_protocol_sensor_sample_value(const struct osf_sensor_sample *sample,
> +				     unsigned int index, s32 *value)

Given channel count is a u16 and we can't be equal or bigger than it, perhaps
use a u16 for index as well.

> +{
> +	if (!sample || !sample->samples || !value)
> +		return -EINVAL;
> +
> +	if (index >= sample->channel_count)
> +		return -ERANGE;
> +
> +	/* Samples are little-endian two's-complement signed values. */
> +	*value = (s32)get_unaligned_le32(sample->samples + index * sizeof(s32));

sizeof(__le32) slightly more appropriate given that is what you are treating it
as.

> +
> +	return 0;
> +}
> +
> +int osf_protocol_decode_device_status(const struct osf_frame *frame,
> +				      struct osf_device_status *status)
> +{
> +	const u8 *payload;
> +
> +	if (!frame || !status || !frame->payload)
> +		return -EINVAL;
> +
> +	if (frame->message_type != OSF_MSG_DEVICE_STATUS)
> +		return -EPROTO;
> +
> +	if (frame->payload_len != OSF_DEVICE_STATUS_LEN)
> +		return -EMSGSIZE;
> +
> +	payload = frame->payload;
> +	if (get_unaligned_le32(payload + 16))
> +		return -EPROTO;
> +
> +	status->uptime_s = get_unaligned_le32(payload);
> +	status->status_flags = get_unaligned_le32(payload + 4);
> +	status->error_flags = get_unaligned_le32(payload + 8);
> +	status->dropped_frames = get_unaligned_le32(payload + 12);
Similar to below. I'd use a designated initializer for status as it
is all written in one place.

> +
> +	return 0;
> +}

> +
> +int osf_protocol_decode_capability_entry(const struct osf_capability_report *report,
> +					 unsigned int index,
> +					 struct osf_capability_entry *entry)
> +{
> +	u16 sample_format;
> +	u16 sensor_type;
> +	u32 flags;
> +	const u8 *payload;
> +
> +	if (!report || !report->entries || !entry)
> +		return -EINVAL;
> +
> +	if (index >= report->capability_count)

Neater to size index to match capability_count.  Not that
important though.

> +		return -ERANGE;
> +
> +	payload = report->entries + index * OSF_CAP_SENSOR_ENTRY_LEN;
> +	sensor_type = get_unaligned_le16(payload);
> +	sample_format = get_unaligned_le16(payload + 6);
> +	flags = get_unaligned_le32(payload + 12);
> +
> +	if (!osf_sensor_type_valid(sensor_type))
> +		return -EPROTO;
> +
> +	if (sample_format != OSF_SAMPLE_FORMAT_S32)
> +		return -EPROTO;
> +
> +	if (flags & ~OSF_CAPABILITY_FLAGS_MASK)
> +		return -EPROTO;
> +
> +	if (get_unaligned_le32(payload + 16))
> +		return -EPROTO;
> +
> +	entry->sensor_type = sensor_type;
> +	entry->sensor_index = get_unaligned_le16(payload + 2);
> +	entry->channel_count = get_unaligned_le16(payload + 4);
> +	entry->sample_format = sample_format;
> +	entry->scale_nano = get_unaligned_le32(payload + 8);
> +	entry->flags = flags;
neater as designated initializer I think.

	*entry = (struct osf_capability_entry) {
		.sensor_type = sensor_type,
		.sensor_index = get_unaligned_le16(payload + 2),
etc
	};
> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH] Docs/admin-guide/cgroup-v2: fix memory.stat doc details
From: Nhat Pham @ 2026-06-22 16:41 UTC (permalink / raw)
  To: Doehyun Baek
  Cc: Tejun Heo, Jonathan Corbet, Johannes Weiner, Michal Koutný,
	Andrew Morton, Shakeel Butt, Roman Gushchin, Yosry Ahmed, cgroups,
	linux-doc, linux-kernel
In-Reply-To: <20260620122751.388770-1-doehyunbaek@gmail.com>

On Sat, Jun 20, 2026 at 5:28 AM Doehyun Baek <doehyunbaek@gmail.com> wrote:
>
> Fix minor cgroup v2 memory.stat documentation issues.  Correct the
> vmalloc per-node marker now that vmalloc uses the native NR_VMALLOC node
> stat, and document zswap_incomp as a byte-valued memory amount instead
> of as a page counter.
>
> Fixes: c466412c73c3 ("mm: memcontrol: switch to native NR_VMALLOC vmstat counter")
> Fixes: 5ad41a38c364 ("mm: zswap: add per-memcg stat for incompressible pages")
> Signed-off-by: Doehyun Baek <doehyunbaek@gmail.com>
> -               Number of incompressible pages currently stored in zswap
> +               Amount of memory used by incompressible pages currently stored in zswap
>                 without compression. These pages could not be compressed to
>                 a size smaller than PAGE_SIZE, so they are stored as-is.
>

Good catch :)

Reviewed-by: Nhat Pham <nphamcs@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Randy Dunlap @ 2026-06-22 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260622083440.GX49951@noisy.programming.kicks-ass.net>



On 6/22/26 1:34 AM, Peter Zijlstra wrote:
> On Sun, Jun 21, 2026 at 05:34:30AM -0400, Steven Rostedt wrote:
>> There's been complaints about trace_printk() being defined in kernel.h as it
>> can increase the compilation time. As it is only used by some developers for
>> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
>> cycles for those that do not ever care about it.
>>
>> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
>> use it can set and not have to always remember to add #include <linux/trace_printk.h>
>> to the files they add trace_printk() while debugging. It also means that
>> those that do not have that config set will not have to worry about wasted
>> CPU cycles as it is only include in the CFLAGS when the option is set, and
>> its completely ignored otherwise.
> 
> Did you forget your C 101 class? If you use a function, you gotta
> include the relevant header.

Also item #1 in Documentation/process/submit-checklist.rst.

> You don't see userspace saying: 'Hey, you know what, perhaps we should
> add stdio.h to every other header, just in case someone wants to
> printf()' either.
> 
> I really don't understand your argument. Yes, maybe someone will forget
> and then either their editor (if they have a halfway modern setup with
> LSP enabled) or their build will complain, but so what? This is all
> trivial stuff, surely we have more pressing matters to concern outselves
> with?



-- 
~Randy


^ permalink raw reply

* Re: [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init
From: Babu Moger @ 2026-06-22 16:38 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <737a4228-52fb-4583-ac64-8efe79c107e6@intel.com>

Hi Reinette,

On 6/22/26 11:21, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/18/26 10:14 AM, Babu Moger wrote:
>> On 6/16/26 18:36, Reinette Chatre wrote:
>>> On 4/30/26 4:24 PM, Babu Moger wrote:
>   
>>>
>>>>     - calls resctrl_arch_get_kmode_support() so each architecture ORs
>>>>       BIT(<mode>) into kmode for the policies its hardware supports
>>>>       (on x86, AMD PLZA contributes the two global-assign modes).
>>>>
>>>> resctrl_kmode_init() runs from resctrl_init() once the default group
>>>
>>> resctrl_kmode_init() can be dropped after changes described in response
>>> to previous patch. Apart from no longer being necessary I also find that
>>> having the kernel mode fully initialized *before* the hotplug handlers run
>>> to be simpler.
>>
>> That means resctrl_set_kmode_support() will be called from the architecture layer, likely from core.c within get_rdt_alloc_resources().
>>
>> The resctrl_set_kmode_support() handler would need to initialize both the default mode and all supported modes.
> 
> I see this differently. Since resctrl_set_kmode_support() is optional for an architecture
> resctrl fs can just statically initialize the defaults. resctrl_set_kmode_support() would
> expand the defaults to also accommodate what the architecture supports.
>
Yes. We can do that.

thanks
Babu

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: hwmon: chipcap2: Add label property
From: Javier Carrasco @ 2026-06-22 16:29 UTC (permalink / raw)
  To: Flaviu Nistor, Guenter Roeck, Javier Carrasco, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, devicetree, linux-doc
In-Reply-To: <20260622122200.14245-1-flaviu.nistor@gmail.com>

On Mon Jun 22, 2026 at 2:21 PM CEST, Flaviu Nistor wrote:
> Add support for an optional label property similar to other hwmon devices.
> This allows, in case of boards with multiple CHIPCAP2 sensors, to assign
> distinct names to each instance.
>
> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
> ---
>  .../devicetree/bindings/hwmon/amphenol,chipcap2.yaml         | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
> index 17351fdbefce..f00b5a4b14dd 100644
> --- a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
> +++ b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
> @@ -33,6 +33,10 @@ properties:
>    reg:
>      maxItems: 1
>
> +  label:
> +    description:
> +      A descriptive name for this channel, like "ambient" or "psu".
> +
>    interrupts:
>      items:
>        - description: measurement ready indicator
> @@ -72,6 +76,7 @@ examples:
>                           <5 IRQ_TYPE_EDGE_RISING>,
>                           <6 IRQ_TYPE_EDGE_RISING>;
>              interrupt-names = "ready", "low", "high";
> +            label = "somelabel";
>              vdd-supply = <&reg_vdd>;
>          };
>      };

Hello Falviu, thank you for your patch.

Should we not add a reference to hwmon-common.yaml (with
unevelautedProperties instead of additionalProperties), as label is
defined there? I believe that Krzysztof Kozlowski did something similar
for the shunt-resistor-micro-ohms property. Could we follow suit here?

I am also not a big fan of a name like "somelabel", and a more
meaningful name from a "real" example would look better. I know that
some examples have already used "somelabel" as an example, but others
have used more meaningful names too.

Best regards,
Javier Carrasco

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-22 16:29 UTC (permalink / raw)
  To: Nuno Sá
  Cc: Rodrigo Alencar, Conor Dooley, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <ajkMBh-R_7pYaoAn@nsa>

On Mon, 22 Jun 2026 11:29:38 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Mon, Jun 22, 2026 at 10:24:05AM +0100, Rodrigo Alencar wrote:
> > On 21/06/26 15:33, Jonathan Cameron wrote:  
> > > On Fri, 19 Jun 2026 16:54:11 +0100
> > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > >   
> > > > On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:  
> > > > > On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:    
> > > > > > On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:    
> > > > > > > On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:    
> > > > > > > > On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:    
> > > > > > > > > 
> > > > > > > > > On 6/14/26 21:44, Jonathan Cameron wrote:    
> > > > > > > > > > On Tue, 9 Jun 2026 16:47:23 +0200
> > > > > > > > > > Janani Sunil <jan.sun97@gmail.com> wrote:
> > > > > > > > > >     
> > > > > > > > > > > On 5/26/26 15:11, Rodrigo Alencar wrote:    
> > > > > > > > > > > > On 26/05/19 05:42PM, Janani Sunil wrote:    
> > > > > > > > > > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > > > > > > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > > > > > > > > > integrated precision reference.    
> > > > > > > > > > > > ...
> > > > > > > > > > > > Probably others may comment on that, but...
> > > > > > > > > > > > 
> > > > > > > > > > > > This parent node may support device addressing for multi-device support through
> > > > > > > > > > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > > > > > > > > > other resources like the toggle pins or reset and enable.
> > > > > > > > > > > > 
> > > > > > > > > > > > That way I suppose that an example would look like...    
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +patternProperties:
> > > > > > > > > > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > > > > > > > > > +    type: object
> > > > > > > > > > > > > +    description: Child nodes for individual channel configuration
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    properties:
> > > > > > > > > > > > > +      reg:
> > > > > > > > > > > > > +        description: Channel number.
> > > > > > > > > > > > > +        minimum: 0
> > > > > > > > > > > > > +        maximum: 15
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      adi,output-range-microvolt:
> > > > > > > > > > > > > +        description: |
> > > > > > > > > > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > > > > > > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > > > > > > > > > +        oneOf:
> > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > +              - const: 0
> > > > > > > > > > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > +              - const: -5000000
> > > > > > > > > > > > > +              - const: 5000000
> > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > +              - const: -10000000
> > > > > > > > > > > > > +              - const: 10000000
> > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > +              - const: -15000000
> > > > > > > > > > > > > +              - const: 15000000
> > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > +              - const: -20000000
> > > > > > > > > > > > > +              - const: 20000000
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    required:
> > > > > > > > > > > > > +      - reg
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    additionalProperties: false
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +required:
> > > > > > > > > > > > > +  - compatible
> > > > > > > > > > > > > +  - reg
> > > > > > > > > > > > > +  - vdd-supply
> > > > > > > > > > > > > +  - avdd-supply
> > > > > > > > > > > > > +  - hvdd-supply
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +dependencies:
> > > > > > > > > > > > > +  spi-cpha: [ spi-cpol ]
> > > > > > > > > > > > > +  spi-cpol: [ spi-cpha ]
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +allOf:
> > > > > > > > > > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +unevaluatedProperties: false
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +examples:
> > > > > > > > > > > > > +  - |
> > > > > > > > > > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    spi {
> > > > > > > > > > > > > +        #address-cells = <1>;
> > > > > > > > > > > > > +        #size-cells = <0>;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +        dac@0 {
> > > > > > > > > > > > > +            compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > +            reg = <0>;
> > > > > > > > > > > > > +            spi-max-frequency = <25000000>;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            #address-cells = <1>;
> > > > > > > > > > > > > +            #size-cells = <0>;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            channel@0 {
> > > > > > > > > > > > > +                reg = <0>;
> > > > > > > > > > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > +            };
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            channel@1 {
> > > > > > > > > > > > > +                reg = <1>;
> > > > > > > > > > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > +            };
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +            channel@2 {
> > > > > > > > > > > > > +                reg = <2>;
> > > > > > > > > > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > +            };
> > > > > > > > > > > > > +        };
> > > > > > > > > > > > > +    };    
> > > > > > > > > > > > ...
> > > > > > > > > > > > 
> > > > > > > > > > > > 	spi {
> > > > > > > > > > > > 		#address-cells = <1>;
> > > > > > > > > > > > 		#size-cells = <0>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 		multi-dac@0 {
> > > > > > > > > > > > 			compatible = "adi,ad5529r-16";
> > > > > > > > > > > > 			reg = <0>;
> > > > > > > > > > > > 			spi-max-frequency = <25000000>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 			#address-cells = <1>;
> > > > > > > > > > > > 			#size-cells = <0>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 			dac@0 {
> > > > > > > > > > > > 				reg = <0>;
> > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > 				};
> > > > > > > > > > > > 
> > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > 				};
> > > > > > > > > > > > 
> > > > > > > > > > > > 				channel@2 {
> > > > > > > > > > > > 					reg = <2>;
> > > > > > > > > > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > 				};
> > > > > > > > > > > > 			}
> > > > > > > > > > > > 
> > > > > > > > > > > > 			dac@1 {
> > > > > > > > > > > > 				reg = <1>;
> > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > 
> > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > 				};
> > > > > > > > > > > > 
> > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > 				};
> > > > > > > > > > > > 			}
> > > > > > > > > > > > 		};
> > > > > > > > > > > > 	};
> > > > > > > > > > > > 
> > > > > > > > > > > > then you might need something like:
> > > > > > > > > > > > 
> > > > > > > > > > > > 	patternProperties:
> > > > > > > > > > > > 		"^dac@[0-3]$":
> > > > > > > > > > > > 
> > > > > > > > > > > > and put most of the things under this node pattern.
> > > > > > > > > > > > 
> > > > > > > > > > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > > > > > > > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > > > > > > > > > 
> > > > > > > > > > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > > > > > > > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > > > > > > > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).    
> > > > > > > > > > > Hi Rodrigo,
> > > > > > > > > > > 
> > > > > > > > > > > Thank you for looking at this.
> > > > > > > > > > > 
> > > > > > > > > > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > > > > > > > > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > > > > > > > > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > > > > > > > > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > > > > > > > > > speculatively without a validating use case.    
> > > > > > > > > > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > > > > > > > > > looking at it another way a kind of distributed SPI mux.
> > > > > > > > > > 
> > > > > > > > > > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > > > > > > > > > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > > > > > > > > > That would leave the path open to supporting the addressing at a later date.
> > > > > > > > > > An alternative might be to look at it like a chained device setup. In those we pretend there
> > > > > > > > > > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > > > > > > > > > coupled whereas for those devices it tends to be you have to read / write the same register
> > > > > > > > > > in all devices in the chain as one big SPI message.
> > > > > > > > > > 
> > > > > > > > > > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > > > > > > > > > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > > > > > > > > > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > > > > > > > > > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > > > > > > > > > longer term how to support it cleanly in SPI.    
> > > > > > > > 
> > > > > > > > I'd swear I have seen this before, from some Microchip devices. Let me
> > > > > > > > see if I can find what I am thinking of...    
> > > > > > > 
> > > > > > > 
> > > > > > > microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> > > > > > > slightly different properties.
> > > > > > > 
> > > > > > >   microchip,device-addr:
> > > > > > >     description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > >     enum: [0, 1, 2, 3]
> > > > > > >     default: 0
> > > > > > > 
> > > > > > > and
> > > > > > > 
> > > > > > > 
> > > > > > >   microchip,hw-device-address:
> > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > >     minimum: 0
> > > > > > >     maximum: 3
> > > > > > >     description:
> > > > > > >       The address is set on a per-device basis by fuses in the factory,
> > > > > > >       configured on request. If not requested, the fuses are set for 0x1.
> > > > > > >       The device address is part of the device markings to avoid
> > > > > > >       potential confusion. This address is coded on two bits, so four possible
> > > > > > >       addresses are available when multiple devices are present on the same
> > > > > > >       SPI bus with only one Chip Select line for all devices.
> > > > > > >       Each device communication starts by a CS falling edge, followed by the
> > > > > > >       clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> > > > > > >       which is first one on the wire).
> > > > > > > 
> > > > > > > This sounds exactly like the sort of feature that you're dealing with
> > > > > > > here?
> > > > > > >     
> > > > > > 
> > > > > > The core idea yes but for this chip, things are a bit more annoying (but
> > > > > > Janani can correct me if I'm wrong). Here, each device can, in theory,
> > > > > > have it's own supplies, pins and at the very least, channels with maybe
> > > > > > different scales. That is why Janani is proposing dac nodes. Given I
> > > > > > honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> > > > > > about solving this at the spi level.
> > > > > > 
> > > > > > Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> > > > > > together in the same bus.    
> > > > > 
> > > > > I'm definitely missing something, because that property for the
> > > > > microchip devices is not impacted what else is on the bus. AFAICT, you
> > > > > could have an mcp3911 and an mcp3564 on the same bus even though both
> > > > > are completely different devices with different drivers. They have
> > > > > individual device nodes and their own supplies etc etc. These aren't
> > > > > per-channel properties on an adc or dac, they're per child device on a
> > > > > spi bus.    
> > > > 
> > > > Maybe I'm the one missing something :). IIRC, spi would not allow two
> > > > devices on the same CS right? Because for this chip we would need
> > > > something like:
> > > > 
> > > > spi {
> > > > 	dac@0 {
> > > > 		reg = <0>;
> > > > 		adi,pin-id = <0>;
> > > > 	};
> > > > 
> > > > 	dac@1 {
> > > > 		reg = <0>; // which seems already problematic?
> > > > 		adi,pin-id <1>;
> > > > 	};
> > > > 
> > > > 	...
> > > > 
> > > > 	//up to 4
> > > > };  
> > > Yeah. It's not clear to me how that works for the microchip devices
> > > (I suspect it doesn't!)
> > > 
> > > Just thinking as I type, but could we do something a bit nasty with
> > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > shared?  Given this is all tied to the spi bus that should all happen
> > > under serializing locks. 
> > > 
> > > Agreed though that this would be nicer as an SPI thing that let
> > > us specify that a single CS is share by multiple devices and their
> > > is some other signal acting to select which one we are talking to.
> > >   
> > 
> > If the device-addressing on the same chip-select is to be handled
> > by the spi framework, wouldn't we lose device-specific features?
> > 
> > I understand that this multi-device feature is there mostly to extend the
> > channel count from 16 to 32, 48 or 64. I suppose the command:
> > 
> > 	"MULTI DEVICE SW LDAC MODE"
> > 
> > exists so that software can update channel values accross multiple devices.  
> 
> Right! You do have a point! I agree the main driver for a feature like
> this is likely to extend the channel count and effectively "aggregate"
> devices.
> 
> But I would say that even with the spi solution the MULTI DEVICE stuff
> should be doable (as we still need a sort of adi,pin-id property). 
> 
> But yes, I do feel that the whole feature is for aggregation so seeing
> one device with 32 channels is the expectation here? Rather than seeing
> two devices with 16 channels.

Agreed - if we have messages that address both devices at once that needs
to be a unified driver and given they are about triggering simultaneous
update of all channels it needs to look like one big device.
This ends up similar to how we handle daisy chain devices.

The question of what to do on devices that don't have this feature
is rather different. Good thing you read the datasheet :)

Jonathan

> 
> - Nuno Sá
> 
> > 
> > -- 
> > Kind regards,
> > 
> > Rodrigo Alencar  


^ permalink raw reply

* Re: [PATCH] Docs/admin-guide/cgroup-v2: fix memory.stat doc details
From: Michal Koutný @ 2026-06-22 16:25 UTC (permalink / raw)
  To: Doehyun Baek
  Cc: Tejun Heo, Jonathan Corbet, Johannes Weiner, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Yosry Ahmed, Nhat Pham, cgroups,
	linux-doc, linux-kernel
In-Reply-To: <CAN-j9Upy=thswORWaU+QxuO2i8uJKrZxcLpt5umP5QGRhpwqaQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

On Mon, Jun 22, 2026 at 05:26:53PM +0200, Doehyun Baek <doehyunbaek@gmail.com> wrote:
> However, both zswapped and zswap_incomp are memory_stats[] entries, so
> memory.stat prints them through memcg_page_state_output(). Since
> MEMCG_ZSWAP_INCOMP is not special-cased as a raw count, the stored page
> count is multiplied by the default PAGE_SIZE unit and exported as bytes.
> 
>     unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item)
>     {
>         return memcg_page_state(memcg, item) *
>         memcg_page_state_output_unit(item);
>     }

Ah, I messed up how memcg_page_state_output_unit() is used. The printed
values are amounts (in bytes).

> Separately, this matches the existing documentation style for zswapped,
> whose exported value is described as a memory amount:
> 
>     zswapped
>         Amount of application memory swapped out to zswap.
> 
> Since zswap_incomp follows the same memory.stat output path, I think its
> documentation should describe the exported value as a memory amount too.
> 
> I also boot-tested this in QEMU with the current tree and zswap enabled.
> With incompressible pages pushed into zswap, memory.stat showed:
> 
>     zswap 87822336
>     zswapped 87822336
>     zswap_incomp 87822336

Thanks for the test and for the fix!

> 
> The zswap_incomp value there is byte-valued; it is not a plain page
> count. It also matches zswapped in this all-incompressible case, which
> is consistent with both being exported as memory amounts.

Acked-by: Michal Koutný <mkoutny@suse.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-22 16:24 UTC (permalink / raw)
  To: Nuno Sá
  Cc: Conor Dooley, Janani Sunil, Rodrigo Alencar, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <ajkILRPq_g24g4dH@nsa>

On Mon, 22 Jun 2026 11:17:56 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Mon, Jun 22, 2026 at 10:27:22AM +0100, Jonathan Cameron wrote:
> > On Mon, 22 Jun 2026 10:07:01 +0100
> > Nuno Sá <noname.nuno@gmail.com> wrote:
> >   
> > > On Sun, Jun 21, 2026 at 07:35:42PM +0100, Conor Dooley wrote:  
> > > > On Sun, Jun 21, 2026 at 03:33:40PM +0100, Jonathan Cameron wrote:    
> > > > > On Fri, 19 Jun 2026 16:54:11 +0100
> > > > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > > >     
> > > > > > On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:    
> > > > > > > On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:      
> > > > > > > > On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:      
> > > > > > > > > On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:      
> > > > > > > > > > On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:      
> > > > > > > > > > > 
> > > > > > > > > > > On 6/14/26 21:44, Jonathan Cameron wrote:      
> > > > > > > > > > > > On Tue, 9 Jun 2026 16:47:23 +0200
> > > > > > > > > > > > Janani Sunil <jan.sun97@gmail.com> wrote:
> > > > > > > > > > > >       
> > > > > > > > > > > > > On 5/26/26 15:11, Rodrigo Alencar wrote:      
> > > > > > > > > > > > > > On 26/05/19 05:42PM, Janani Sunil wrote:      
> > > > > > > > > > > > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > > > > > > > > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > > > > > > > > > > > integrated precision reference.      
> > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > Probably others may comment on that, but...
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This parent node may support device addressing for multi-device support through
> > > > > > > > > > > > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > > > > > > > > > > > other resources like the toggle pins or reset and enable.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That way I suppose that an example would look like...      
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +patternProperties:
> > > > > > > > > > > > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > > > > > > > > > > > +    type: object
> > > > > > > > > > > > > > > +    description: Child nodes for individual channel configuration
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +    properties:
> > > > > > > > > > > > > > > +      reg:
> > > > > > > > > > > > > > > +        description: Channel number.
> > > > > > > > > > > > > > > +        minimum: 0
> > > > > > > > > > > > > > > +        maximum: 15
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +      adi,output-range-microvolt:
> > > > > > > > > > > > > > > +        description: |
> > > > > > > > > > > > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > > > > > > > > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > > > > > > > > > > > +        oneOf:
> > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > +              - const: 0
> > > > > > > > > > > > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > +              - const: -5000000
> > > > > > > > > > > > > > > +              - const: 5000000
> > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > +              - const: -10000000
> > > > > > > > > > > > > > > +              - const: 10000000
> > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > +              - const: -15000000
> > > > > > > > > > > > > > > +              - const: 15000000
> > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > +              - const: -20000000
> > > > > > > > > > > > > > > +              - const: 20000000
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +    required:
> > > > > > > > > > > > > > > +      - reg
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +    additionalProperties: false
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +required:
> > > > > > > > > > > > > > > +  - compatible
> > > > > > > > > > > > > > > +  - reg
> > > > > > > > > > > > > > > +  - vdd-supply
> > > > > > > > > > > > > > > +  - avdd-supply
> > > > > > > > > > > > > > > +  - hvdd-supply
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +dependencies:
> > > > > > > > > > > > > > > +  spi-cpha: [ spi-cpol ]
> > > > > > > > > > > > > > > +  spi-cpol: [ spi-cpha ]
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +allOf:
> > > > > > > > > > > > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +unevaluatedProperties: false
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +examples:
> > > > > > > > > > > > > > > +  - |
> > > > > > > > > > > > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +    spi {
> > > > > > > > > > > > > > > +        #address-cells = <1>;
> > > > > > > > > > > > > > > +        #size-cells = <0>;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +        dac@0 {
> > > > > > > > > > > > > > > +            compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > > +            reg = <0>;
> > > > > > > > > > > > > > > +            spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            #address-cells = <1>;
> > > > > > > > > > > > > > > +            #size-cells = <0>;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            channel@0 {
> > > > > > > > > > > > > > > +                reg = <0>;
> > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            channel@1 {
> > > > > > > > > > > > > > > +                reg = <1>;
> > > > > > > > > > > > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +            channel@2 {
> > > > > > > > > > > > > > > +                reg = <2>;
> > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > +        };
> > > > > > > > > > > > > > > +    };      
> > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 	spi {
> > > > > > > > > > > > > > 		#address-cells = <1>;
> > > > > > > > > > > > > > 		#size-cells = <0>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 		multi-dac@0 {
> > > > > > > > > > > > > > 			compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > 			reg = <0>;
> > > > > > > > > > > > > > 			spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 			#address-cells = <1>;
> > > > > > > > > > > > > > 			#size-cells = <0>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 			dac@0 {
> > > > > > > > > > > > > > 				reg = <0>;
> > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				channel@2 {
> > > > > > > > > > > > > > 					reg = <2>;
> > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 			dac@1 {
> > > > > > > > > > > > > > 				reg = <1>;
> > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > 		};
> > > > > > > > > > > > > > 	};
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > then you might need something like:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 	patternProperties:
> > > > > > > > > > > > > > 		"^dac@[0-3]$":
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > and put most of the things under this node pattern.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > > > > > > > > > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > > > > > > > > > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > > > > > > > > > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).      
> > > > > > > > > > > > > Hi Rodrigo,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thank you for looking at this.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > > > > > > > > > > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > > > > > > > > > > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > > > > > > > > > > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > > > > > > > > > > > speculatively without a validating use case.      
> > > > > > > > > > > > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > > > > > > > > > > > looking at it another way a kind of distributed SPI mux.
> > > > > > > > > > > > 
> > > > > > > > > > > > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > > > > > > > > > > > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > > > > > > > > > > > That would leave the path open to supporting the addressing at a later date.
> > > > > > > > > > > > An alternative might be to look at it like a chained device setup. In those we pretend there
> > > > > > > > > > > > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > > > > > > > > > > > coupled whereas for those devices it tends to be you have to read / write the same register
> > > > > > > > > > > > in all devices in the chain as one big SPI message.
> > > > > > > > > > > > 
> > > > > > > > > > > > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > > > > > > > > > > > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > > > > > > > > > > > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > > > > > > > > > > > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > > > > > > > > > > > longer term how to support it cleanly in SPI.      
> > > > > > > > > > 
> > > > > > > > > > I'd swear I have seen this before, from some Microchip devices. Let me
> > > > > > > > > > see if I can find what I am thinking of...      
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> > > > > > > > > slightly different properties.
> > > > > > > > > 
> > > > > > > > >   microchip,device-addr:
> > > > > > > > >     description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > >     enum: [0, 1, 2, 3]
> > > > > > > > >     default: 0
> > > > > > > > > 
> > > > > > > > > and
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >   microchip,hw-device-address:
> > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > >     minimum: 0
> > > > > > > > >     maximum: 3
> > > > > > > > >     description:
> > > > > > > > >       The address is set on a per-device basis by fuses in the factory,
> > > > > > > > >       configured on request. If not requested, the fuses are set for 0x1.
> > > > > > > > >       The device address is part of the device markings to avoid
> > > > > > > > >       potential confusion. This address is coded on two bits, so four possible
> > > > > > > > >       addresses are available when multiple devices are present on the same
> > > > > > > > >       SPI bus with only one Chip Select line for all devices.
> > > > > > > > >       Each device communication starts by a CS falling edge, followed by the
> > > > > > > > >       clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> > > > > > > > >       which is first one on the wire).
> > > > > > > > > 
> > > > > > > > > This sounds exactly like the sort of feature that you're dealing with
> > > > > > > > > here?
> > > > > > > > >       
> > > > > > > > 
> > > > > > > > The core idea yes but for this chip, things are a bit more annoying (but
> > > > > > > > Janani can correct me if I'm wrong). Here, each device can, in theory,
> > > > > > > > have it's own supplies, pins and at the very least, channels with maybe
> > > > > > > > different scales. That is why Janani is proposing dac nodes. Given I
> > > > > > > > honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> > > > > > > > about solving this at the spi level.
> > > > > > > > 
> > > > > > > > Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> > > > > > > > together in the same bus.      
> > > > > > > 
> > > > > > > I'm definitely missing something, because that property for the
> > > > > > > microchip devices is not impacted what else is on the bus. AFAICT, you
> > > > > > > could have an mcp3911 and an mcp3564 on the same bus even though both
> > > > > > > are completely different devices with different drivers. They have
> > > > > > > individual device nodes and their own supplies etc etc. These aren't
> > > > > > > per-channel properties on an adc or dac, they're per child device on a
> > > > > > > spi bus.      
> > > > > > 
> > > > > > Maybe I'm the one missing something :). IIRC, spi would not allow two
> > > > > > devices on the same CS right? Because for this chip we would need
> > > > > > something like:
> > > > > > 
> > > > > > spi {
> > > > > > 	dac@0 {
> > > > > > 		reg = <0>;
> > > > > > 		adi,pin-id = <0>;
> > > > > > 	};
> > > > > > 
> > > > > > 	dac@1 {
> > > > > > 		reg = <0>; // which seems already problematic?
> > > > > > 		adi,pin-id <1>;
> > > > > > 	};
> > > > > > 
> > > > > > 	...
> > > > > > 
> > > > > > 	//up to 4
> > > > > > };    
> > > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > > (I suspect it doesn't!)
> > > > > 
> > > > > Just thinking as I type, but could we do something a bit nasty with
> > > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > > under serializing locks. 
> > > > > 
> > > > > Agreed though that this would be nicer as an SPI thing that let
> > > > > us specify that a single CS is share by multiple devices and their
> > > > > is some other signal acting to select which one we are talking to.    
> > > > 
> > > > Whether it works or not, I think it is the more correct approach. Messing
> > > > with gpio muxes seems completely wrong, given the chip select may not be
> > > > a gpio at all.
> > > > 
> > > > Why do you think the microchip devices won't work? Does the spi core
> > > > reject multiple devices with the same chip select being registered or
> > > > something like that?    
> > > 
> > > Not sure how things work atm. But I'm fairly sure it used to be like
> > > that. SPI would reject devices on the same controller and CS. Now that
> > > we support more than one CS per controller, not sure how things work.   
> > We always supported more than one per CS per controller. I guess you mean
> > per device.  
> 
> Obviously :)
> >   
> > > 
> > > Janani, maybe you can give it a try?  
> > 
> > I think we'd need to get it to work with shared gpio proxy which maybe
> > will just get set up under the hood.  This used to be opt in, but seems
> > that changed fairly recently so maybe some of us are working with out
> > of date knowledge!  I haven't played with it yet, so might not be
> > that simple.
> >   
> 
> What I meant for Janani was basically testing two devices on the same CS
> as in my pseudo DT. For the GPIO, you mean having a way to select
> between devices on the same CS?

Nope. It is what you suggest - the implementation in the gpio layer
is to detect the reuse of the same GPIO and insert a proxy layer that
allows multiple consumers.  I think that will provide different gpio
numbers (well descs really) to each of them but I haven't checked the details
that closely.

> 
> For these devices the pin id numbers get's setted up as part of the spi message
> so my assumption is that all of them will receive the message but only one acks it.

Yup. As much as we have an ack on SPI.  So with a write only message you'd never
know if anyone got it.

Jonathan


> 
> - Nuno Sá
> 
> > Jonathan
> >   
> > > 
> > > - Nuno Sá
> > >   
> >   
> 


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox