Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v2 4/4] rv/rtapp: Add wakeup monitor
From: Nam Cao @ 2026-06-19  7:21 UTC (permalink / raw)
  To: Gabriele Monaco, Steven Rostedt, linux-trace-kernel, linux-doc,
	linux-kernel
  Cc: Nam Cao
In-Reply-To: <cover.1781852967.git.namcao@linutronix.de>

Add a wakeup monitor to detect a lower-priority task waking up a
higher-priority task.

The rtapp/sleep monitor already detects this. However, that monitor
triggers an error in the context of the wakee task and user only gets
the stacktrace of that task. It is also extremely useful to get the
stacktrace of the waker task, which this monitor offers. In other
words, this monitor complements the rtapp/sleep monitor.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 Documentation/trace/rv/monitor_rtapp.rst      |  20 +++
 kernel/trace/rv/Kconfig                       |   1 +
 kernel/trace/rv/Makefile                      |   1 +
 kernel/trace/rv/monitors/rtapp/Kconfig        |   2 +-
 kernel/trace/rv/monitors/wakeup/Kconfig       |  16 ++
 kernel/trace/rv/monitors/wakeup/wakeup.c      | 153 ++++++++++++++++++
 kernel/trace/rv/monitors/wakeup/wakeup.h      |  92 +++++++++++
 .../trace/rv/monitors/wakeup/wakeup_trace.h   |  14 ++
 kernel/trace/rv/rv_trace.h                    |   1 +
 tools/verification/models/rtapp/wakeup.ltl    |   5 +
 10 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 kernel/trace/rv/monitors/wakeup/Kconfig
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.c
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.h
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup_trace.h
 create mode 100644 tools/verification/models/rtapp/wakeup.ltl

diff --git a/Documentation/trace/rv/monitor_rtapp.rst b/Documentation/trace/rv/monitor_rtapp.rst
index 502d3ea412eb..238b59395ff5 100644
--- a/Documentation/trace/rv/monitor_rtapp.rst
+++ b/Documentation/trace/rv/monitor_rtapp.rst
@@ -124,3 +124,23 @@ to handle some special cases:
     real-time-safe because preemption is disabled for the duration.
   - `FUTEX_LOCK_PI` is included in the allowlist for the same reason as
     `BLOCK_ON_RT_MUTEX`.
+
+Monitor wakeup
+++++++++++++++
+
+The `wakeup` monitor reports real-time threads being woken by lower-priority threads,
+which is a hint of priority inversion. Its specification is::
+
+  RULE = always (((RT and USER_THREAD) imply
+                (not (WOKEN_BY_LOWER_PRIO or WOKEN_BY_SOFTIRQ)) or ALLOWLIST))
+
+  ALLOWLIST = BLOCK_ON_RT_MUTEX
+           or FUTEX_LOCK_PI
+
+The `sleep` monitor already reports this type of problem. The difference is the
+context in which the problem is reported. While the `sleep` monitor reports the problem
+in the context of the wakee, this `wakeup` monitor reports the problem in the context of
+the waker. This monitor complement the `sleep` monitor, giving user better
+understanding of the issue. For instance, to debug a lower-priority task waking a
+higher-priority task scenario, user can enable both `wakeup` monitor and `sleep`
+monitor to get the stack traces of both tasks.
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 3884b14df375..4d3a14a0bac2 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -76,6 +76,7 @@ source "kernel/trace/rv/monitors/opid/Kconfig"
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
 source "kernel/trace/rv/monitors/pagefault/Kconfig"
 source "kernel/trace/rv/monitors/sleep/Kconfig"
+source "kernel/trace/rv/monitors/wakeup/Kconfig"
 # Add new rtapp monitors here
 
 source "kernel/trace/rv/monitors/stall/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 94498da35b37..c2c0e4142eb4 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
 obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
 obj-$(CONFIG_RV_MON_DEADLINE) += monitors/deadline/deadline.o
 obj-$(CONFIG_RV_MON_NOMISS) += monitors/nomiss/nomiss.o
+obj-$(CONFIG_RV_MON_WAKEUP) += monitors/wakeup/wakeup.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/rtapp/Kconfig b/kernel/trace/rv/monitors/rtapp/Kconfig
index 1ce9370a9ba8..1fcd7a400ded 100644
--- a/kernel/trace/rv/monitors/rtapp/Kconfig
+++ b/kernel/trace/rv/monitors/rtapp/Kconfig
@@ -1,6 +1,6 @@
 config RV_MON_RTAPP
 	depends on RV
-	depends on RV_PER_TASK_MONITORS >= 2
+	depends on RV_PER_TASK_MONITORS >= 3
 	bool "rtapp monitor"
 	help
 	  Collection of monitors to check for common problems with real-time
diff --git a/kernel/trace/rv/monitors/wakeup/Kconfig b/kernel/trace/rv/monitors/wakeup/Kconfig
new file mode 100644
index 000000000000..ec3a5c06a8c4
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_WAKEUP
+	depends on RV
+	depends on RV_MON_RTAPP
+	depends on HAVE_SYSCALL_TRACEPOINTS
+	default y
+	select LTL_MON_EVENTS_ID
+	bool "wakeup monitor"
+	help
+	  This monitor detects a lower-priority task waking up a
+	  higher-priority task. The RV_MON_SLEEP monitor already
+	  detects this case, but this monitor detects in the context
+	  of the waker task instead. This and RV_MON_SLEEP can be
+	  enabled together to get the stacktrace of both the waker
+	  task and the wakee task.
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup.c b/kernel/trace/rv/monitors/wakeup/wakeup.c
new file mode 100644
index 000000000000..01b47416f24e
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+
+#define MODULE_NAME "wakeup"
+
+#include <trace/events/syscalls.h>
+#include <trace/events/sched.h>
+#include <trace/events/lock.h>
+#include <uapi/linux/futex.h>
+
+#include <rv_trace.h>
+#include <monitors/rtapp/rtapp.h>
+
+
+#ifndef __NR_futex
+#define __NR_futex (-__COUNTER__)
+#endif
+#ifndef __NR_futex_time64
+#define __NR_futex_time64 (-__COUNTER__)
+#endif
+
+#include "wakeup.h"
+#include <rv/ltl_monitor.h>
+
+static void ltl_atoms_fetch(struct task_struct *task, struct ltl_monitor *mon)
+{
+	/*
+	 * This includes "actual" real-time tasks and also PI-boosted
+	 * tasks. A task being PI-boosted means it is blocking an "actual"
+	 * real-task, therefore it should also obey the monitor's rule,
+	 * otherwise the "actual" real-task may be delayed.
+	 */
+	ltl_atom_set(mon, LTL_RT, rt_or_dl_task(task));
+}
+
+static void ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation)
+{
+	ltl_atom_set(mon, LTL_WOKEN_BY_LOWER_PRIO, false);
+	ltl_atom_set(mon, LTL_WOKEN_BY_SOFTIRQ, false);
+
+	if (task_creation) {
+		ltl_atom_set(mon, LTL_BLOCK_ON_RT_MUTEX, false);
+		ltl_atom_set(mon, LTL_FUTEX_LOCK_PI, false);
+	}
+
+	ltl_atom_set(mon, LTL_USER_THREAD, !(task->flags & PF_KTHREAD));
+}
+
+static void handle_sched_waking(void *data, struct task_struct *task)
+{
+	if (in_task()) {
+		if (current->prio > task->prio)
+			ltl_atom_pulse(task, LTL_WOKEN_BY_LOWER_PRIO, true);
+	} else if (in_serving_softirq()) {
+		ltl_atom_pulse(task, LTL_WOKEN_BY_SOFTIRQ, true);
+	}
+}
+
+static void handle_contention_begin(void *data, void *lock, unsigned int flags)
+{
+	if (flags & LCB_F_RT)
+		ltl_atom_update(current, LTL_BLOCK_ON_RT_MUTEX, true);
+}
+
+static void handle_contention_end(void *data, void *lock, int ret)
+{
+	ltl_atom_update(current, LTL_BLOCK_ON_RT_MUTEX, false);
+}
+
+static void handle_sys_enter(void *data, struct pt_regs *regs, long id)
+{
+	unsigned long args[6];
+	int op, cmd;
+
+	switch (id) {
+	case __NR_futex:
+	case __NR_futex_time64:
+		syscall_get_arguments(current, regs, args);
+		op = args[1];
+		cmd = op & FUTEX_CMD_MASK;
+
+		switch (cmd) {
+		case FUTEX_LOCK_PI:
+		case FUTEX_LOCK_PI2:
+			ltl_atom_update(current, LTL_FUTEX_LOCK_PI, true);
+			break;
+		}
+		break;
+	}
+}
+
+static void handle_sys_exit(void *data, struct pt_regs *regs, long ret)
+{
+	ltl_atom_update(current, LTL_FUTEX_LOCK_PI, false);
+}
+
+static int enable_wakeup(void)
+{
+	int retval;
+
+	retval = ltl_monitor_init();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("rtapp_wakeup", sched_waking, handle_sched_waking);
+	rv_attach_trace_probe("rtapp_wakeup", contention_begin, handle_contention_begin);
+	rv_attach_trace_probe("rtapp_wakeup", contention_end, handle_contention_end);
+	rv_attach_trace_probe("rtapp_wakeup", sys_enter, handle_sys_enter);
+	rv_attach_trace_probe("rtapp_wakeup", sys_exit, handle_sys_exit);
+
+	return 0;
+}
+
+static void disable_wakeup(void)
+{
+	rv_detach_trace_probe("rtapp_wakeup", sched_waking, handle_sched_waking);
+	rv_detach_trace_probe("rtapp_wakeup", contention_begin, handle_contention_begin);
+	rv_detach_trace_probe("rtapp_wakeup", contention_end, handle_contention_end);
+	rv_detach_trace_probe("rtapp_wakeup", sys_enter, handle_sys_enter);
+	rv_detach_trace_probe("rtapp_wakeup", sys_exit, handle_sys_exit);
+
+	ltl_monitor_destroy();
+}
+
+static struct rv_monitor rv_wakeup = {
+	.name = "wakeup",
+	.description = "Monitor that real-time tasks are not woken by lower-priority tasks",
+	.enable = enable_wakeup,
+	.disable = disable_wakeup,
+};
+
+static int __init register_wakeup(void)
+{
+	return rv_register_monitor(&rv_wakeup, &rv_rtapp);
+}
+
+static void __exit unregister_wakeup(void)
+{
+	rv_unregister_monitor(&rv_wakeup);
+}
+
+module_init(register_wakeup);
+module_exit(unregister_wakeup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nam Cao <namcao@linutronix.de>");
+MODULE_DESCRIPTION("Monitor that real-time tasks are not woken by lower-priority tasks");
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup.h b/kernel/trace/rv/monitors/wakeup/wakeup.h
new file mode 100644
index 000000000000..6f80da64e0e1
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * C implementation of Buchi automaton, automatically generated by
+ * tools/verification/rvgen from the linear temporal logic specification.
+ * For further information, see kernel documentation:
+ *   Documentation/trace/rv/linear_temporal_logic.rst
+ */
+
+#include <linux/rv.h>
+
+#define MONITOR_NAME wakeup
+
+enum ltl_atom {
+	LTL_BLOCK_ON_RT_MUTEX,
+	LTL_FUTEX_LOCK_PI,
+	LTL_RT,
+	LTL_USER_THREAD,
+	LTL_WOKEN_BY_LOWER_PRIO,
+	LTL_WOKEN_BY_SOFTIRQ,
+	LTL_NUM_ATOM
+};
+static_assert(LTL_NUM_ATOM <= RV_MAX_LTL_ATOM);
+
+static const char *ltl_atom_str(enum ltl_atom atom)
+{
+	static const char *const names[] = {
+		"bl_on_rt_mu",
+		"fu_lo_pi",
+		"rt",
+		"us_th",
+		"wo_lo_pr",
+		"wo_so",
+	};
+
+	return names[atom];
+}
+
+enum ltl_buchi_state {
+	S0,
+	RV_NUM_BA_STATES
+};
+static_assert(RV_NUM_BA_STATES <= RV_MAX_BA_STATES);
+
+static void ltl_start(struct task_struct *task, struct ltl_monitor *mon)
+{
+	bool woken_by_softirq = test_bit(LTL_WOKEN_BY_SOFTIRQ, mon->atoms);
+	bool woken_by_lower_prio = test_bit(LTL_WOKEN_BY_LOWER_PRIO, mon->atoms);
+	bool user_thread = test_bit(LTL_USER_THREAD, mon->atoms);
+	bool rt = test_bit(LTL_RT, mon->atoms);
+	bool futex_lock_pi = test_bit(LTL_FUTEX_LOCK_PI, mon->atoms);
+	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
+	bool val9 = block_on_rt_mutex || futex_lock_pi;
+	bool val6 = !woken_by_softirq;
+	bool val5 = !woken_by_lower_prio;
+	bool val8 = val5 && val6;
+	bool val10 = val8 || val9;
+	bool val3 = !user_thread;
+	bool val2 = !rt;
+	bool val4 = val2 || val3;
+	bool val11 = val4 || val10;
+
+	if (val11)
+		__set_bit(S0, mon->states);
+}
+
+static void
+ltl_possible_next_states(struct ltl_monitor *mon, unsigned int state, unsigned long *next)
+{
+	bool woken_by_softirq = test_bit(LTL_WOKEN_BY_SOFTIRQ, mon->atoms);
+	bool woken_by_lower_prio = test_bit(LTL_WOKEN_BY_LOWER_PRIO, mon->atoms);
+	bool user_thread = test_bit(LTL_USER_THREAD, mon->atoms);
+	bool rt = test_bit(LTL_RT, mon->atoms);
+	bool futex_lock_pi = test_bit(LTL_FUTEX_LOCK_PI, mon->atoms);
+	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
+	bool val9 = block_on_rt_mutex || futex_lock_pi;
+	bool val6 = !woken_by_softirq;
+	bool val5 = !woken_by_lower_prio;
+	bool val8 = val5 && val6;
+	bool val10 = val8 || val9;
+	bool val3 = !user_thread;
+	bool val2 = !rt;
+	bool val4 = val2 || val3;
+	bool val11 = val4 || val10;
+
+	switch (state) {
+	case S0:
+		if (val11)
+			__set_bit(S0, next);
+		break;
+	}
+}
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup_trace.h b/kernel/trace/rv/monitors/wakeup/wakeup_trace.h
new file mode 100644
index 000000000000..7e056183f920
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup_trace.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_WAKEUP
+DEFINE_EVENT(event_ltl_monitor_id, event_wakeup,
+	     TP_PROTO(struct task_struct *task, char *states, char *atoms, char *next),
+	     TP_ARGS(task, states, atoms, next));
+DEFINE_EVENT(error_ltl_monitor_id, error_wakeup,
+	     TP_PROTO(struct task_struct *task),
+	     TP_ARGS(task));
+#endif /* CONFIG_RV_MON_WAKEUP */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 9622c269789c..2f8a932432c9 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -241,6 +241,7 @@ DECLARE_EVENT_CLASS(error_ltl_monitor_id,
 );
 #include <monitors/pagefault/pagefault_trace.h>
 #include <monitors/sleep/sleep_trace.h>
+#include <monitors/wakeup/wakeup_trace.h>
 // Add new monitors based on CONFIG_LTL_MON_EVENTS_ID here
 #endif /* CONFIG_LTL_MON_EVENTS_ID */
 
diff --git a/tools/verification/models/rtapp/wakeup.ltl b/tools/verification/models/rtapp/wakeup.ltl
new file mode 100644
index 000000000000..a5d63ca0811a
--- /dev/null
+++ b/tools/verification/models/rtapp/wakeup.ltl
@@ -0,0 +1,5 @@
+RULE = always (((RT and USER_THREAD) imply
+		(not (WOKEN_BY_LOWER_PRIO or WOKEN_BY_SOFTIRQ)) or ALLOWLIST))
+
+ALLOWLIST = BLOCK_ON_RT_MUTEX
+         or FUTEX_LOCK_PI
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Rodrigo Alencar @ 2026-06-19  7:43 UTC (permalink / raw)
  To: Andy Shevchenko, Rodrigo Alencar
  Cc: Nuno Sá, rodrigo.alencar, linux-iio, devicetree,
	linux-kernel, linux-doc, linux-hardening, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, David Lechner,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
	Gustavo A. R. Silva
In-Reply-To: <ajQ1bZSNHQ96pyJx@ashevche-desk.local>

On 18/06/26 21:14, Andy Shevchenko wrote:
> On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> > On 18/06/26 16:06, Nuno Sá wrote:
> > > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> 
> ...
> 
> > > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > > +	if (!dev_attr->attr.name)
> > > >  		return -ENOMEM;
> > > 
> > > I don't oppose the change. Looks like a nice cleanup.
> 
> May I oppose it? I found use scnprintf() is harder to follow in comparison to
> nice kasprintf() that takes care for the dynamically allocated buffer.

In the next patch the function is reused in a sysfs attribute read handler,
a context wich would not be nice to have dynamic allocation. vscnprintf() is
the main building block of sysfs_emit() which limits the buffer length to
a page size, so I used scnprintf() trying not to deviate much from that. 

kasprintf() it is still used in the caller, where the logic was a bit confusing
as it tried to avoid multiple allocations.
 
> Also there is a chance to get a name silently cut due to insufficient space.
> Besides that this function can't be used (again due to 'c') in kasprintf()-like
> wrapper. I do not consider this as a good approach. Have you looked at seq_buf
> instead?

NAME_MAX is not the maximum length a filename can have? I suppose there should be
enough space for the channel-prefix. Indeed, seq_buf can be used and it cleans up
things a bit as it tracks the the position in the buffer.

> 
> > > But bear in mind this very sensible as any subtle mistake means ABI breakage.
> 
> Which immediately raises a question of test coverage. Do we have one? If not,
> this code must be accompanied with one.

Agreed. Will see to have tests for v7.

> > Yes! I tried to be careful... this is dangerous stuff!

-- 
Kind regards,

Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v4 3/5] rpmsg: virtio_rpmsg_bus: get buffer size from config space
From: Arnaud POULIQUEN @ 2026-06-19  7:45 UTC (permalink / raw)
  To: tanmay.shah, andersson, mathieu.poirier, corbet, skhan
  Cc: linux-remoteproc, linux-doc, linux-kernel
In-Reply-To: <1251e3e1-80fe-4146-a1f8-5eb251a323be@amd.com>



On 6/18/26 18:31, Shah, Tanmay wrote:
> 
> 
> On 6/18/2026 3:32 AM, Arnaud POULIQUEN wrote:
>>
>>
>> On 6/17/26 19:41, Shah, Tanmay wrote:
>>>
>>>
>>> On 6/17/2026 4:15 AM, Arnaud POULIQUEN wrote:
>>>> Hi Tanmay,
>>>>
>>>> On 6/15/26 22:20, Tanmay Shah wrote:
>>>>> 512 bytes isn't always suitable for all case, let firmware
>>>>> maker decide the best value from resource table.
>>>>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit.
>>>>>
>>>>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
>>>>> ---
>>>>>
>>>>> Changes in v4: squash to virtio rpmsg config patch
>>>>>      - Introduce new patch to modify rpmsg.rst documentation
>>>>>      - check version is always 1.
>>>>>      - check size field is same as size of struct virtio_rpmsg_config
>>>>>      - introduce alignment field
>>>>>      - check alignment field is power of 2
>>>>>      - check tx and rx buf size is aligned with alignment passed in the
>>>>>        structure
>>>>>
>>>>> Changes in v3:
>>>>>      - change version field from u16 to u8
>>>>>      - introduce size field in the rpmsg_virtio_config structure
>>>>>      - check version field is set to any non-zero value.
>>>>>      - check size field is not 0.
>>>>>      - Remove field for private config, as not needed for now.
>>>>>      - add documentation of rpmsg_virtio_config structure
>>>>>
>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c   | 129 +++++++++++++++++++++++
>>>>> +-----
>>>>>     include/linux/rpmsg/virtio_rpmsg.h |  50 +++++++++++
>>>>>     2 files changed, 160 insertions(+), 19 deletions(-)
>>>>>     create mode 100644 include/linux/rpmsg/virtio_rpmsg.h
>>>>>
>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/
>>>>> virtio_rpmsg_bus.c
>>>>> index 99df1ae07055..a59925f870a4 100644
>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>> @@ -15,11 +15,13 @@
>>>>>     #include <linux/idr.h>
>>>>>     #include <linux/jiffies.h>
>>>>>     #include <linux/kernel.h>
>>>>> +#include <linux/log2.h>
>>>>>     #include <linux/module.h>
>>>>>     #include <linux/mutex.h>
>>>>>     #include <linux/rpmsg.h>
>>>>>     #include <linux/rpmsg/byteorder.h>
>>>>>     #include <linux/rpmsg/ns.h>
>>>>> +#include <linux/rpmsg/virtio_rpmsg.h>
>>>>>     #include <linux/scatterlist.h>
>>>>>     #include <linux/slab.h>
>>>>>     #include <linux/sched.h>
>>>>> @@ -39,7 +41,8 @@
>>>>>      * @tx_bufs:    kernel address of tx buffers
>>>>>      * @num_rx_buf: total number of rx buffers
>>>>>      * @num_tx_buf: total number of tx buffers
>>>>> - * @buf_size:   size of one rx or tx buffer
>>>>> + * @rx_buf_size: size of one rx buffer
>>>>> + * @tx_buf_size: size of one tx buffer
>>>>>      * @last_tx_buf: index of last tx buffer used
>>>>>      * @bufs_dma:    dma base addr of the buffers
>>>>>      * @tx_lock:    protects svq and tx_bufs, to allow concurrent
>>>>> senders.
>>>>> @@ -59,7 +62,8 @@ struct virtproc_info {
>>>>>         void *rx_bufs, *tx_bufs;
>>>>>         unsigned int num_rx_buf;
>>>>>         unsigned int num_tx_buf;
>>>>> -    unsigned int buf_size;
>>>>> +    unsigned int rx_buf_size;
>>>>> +    unsigned int tx_buf_size;
>>>>>         int last_tx_buf;
>>>>>         dma_addr_t bufs_dma;
>>>>>         struct mutex tx_lock;
>>>>> @@ -68,9 +72,6 @@ struct virtproc_info {
>>>>>         wait_queue_head_t sendq;
>>>>>     };
>>>>>     -/* The feature bitmap for virtio rpmsg */
>>>>> -#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>>>>> notifications */
>>>>> -
>>>>>     /**
>>>>>      * struct rpmsg_hdr - common header for all rpmsg messages
>>>>>      * @src: source address
>>>>> @@ -128,7 +129,7 @@ struct virtio_rpmsg_channel {
>>>>>      * processor.
>>>>>      */
>>>>>     #define MAX_RPMSG_NUM_BUFS    (256)
>>>>> -#define MAX_RPMSG_BUF_SIZE    (512)
>>>>> +#define DEFAULT_RPMSG_BUF_SIZE    (512)
>>>>>       /*
>>>>>      * Local addresses are dynamically allocated on-demand.
>>>>> @@ -444,7 +445,7 @@ static void *get_a_tx_buf(struct virtproc_info
>>>>> *vrp)
>>>>>           /* either pick the next unused tx buffer */
>>>>>         if (vrp->last_tx_buf < vrp->num_tx_buf)
>>>>> -        ret = vrp->tx_bufs + vrp->buf_size * vrp->last_tx_buf++;
>>>>> +        ret = vrp->tx_bufs + vrp->tx_buf_size * vrp->last_tx_buf++;
>>>>>         /* or recycle a used one */
>>>>>         else
>>>>>             ret = virtqueue_get_buf(vrp->svq, &len);
>>>>> @@ -514,7 +515,7 @@ static int rpmsg_send_offchannel_raw(struct
>>>>> rpmsg_device *rpdev,
>>>>>          * messaging), or to improve the buffer allocator, to support
>>>>>          * variable-length buffer sizes.
>>>>>          */
>>>>> -    if (len > vrp->buf_size - sizeof(struct rpmsg_hdr)) {
>>>>> +    if (len > vrp->tx_buf_size - sizeof(struct rpmsg_hdr)) {
>>>>>             dev_err(dev, "message is too big (%d)\n", len);
>>>>>             return -EMSGSIZE;
>>>>>         }
>>>>> @@ -647,7 +648,7 @@ static ssize_t virtio_rpmsg_get_mtu(struct
>>>>> rpmsg_endpoint *ept)
>>>>>         struct rpmsg_device *rpdev = ept->rpdev;
>>>>>         struct virtio_rpmsg_channel *vch =
>>>>> to_virtio_rpmsg_channel(rpdev);
>>>>>     -    return vch->vrp->buf_size - sizeof(struct rpmsg_hdr);
>>>>> +    return vch->vrp->tx_buf_size - sizeof(struct rpmsg_hdr);
>>>>>     }
>>>>>       static int rpmsg_recv_single(struct virtproc_info *vrp, struct
>>>>> device *dev,
>>>>> @@ -673,7 +674,7 @@ static int rpmsg_recv_single(struct virtproc_info
>>>>> *vrp, struct device *dev,
>>>>>          * We currently use fixed-sized buffers, so trivially sanitize
>>>>>          * the reported payload length.
>>>>>          */
>>>>> -    if (len > vrp->buf_size ||
>>>>> +    if (len > vrp->rx_buf_size ||
>>>>>             msg_len > (len - sizeof(struct rpmsg_hdr))) {
>>>>>             dev_warn(dev, "inbound msg too big: (%d, %d)\n", len,
>>>>> msg_len);
>>>>>             return -EINVAL;
>>>>> @@ -706,7 +707,7 @@ static int rpmsg_recv_single(struct virtproc_info
>>>>> *vrp, struct device *dev,
>>>>>             dev_warn_ratelimited(dev, "msg received with no
>>>>> recipient\n");
>>>>>           /* publish the real size of the buffer */
>>>>> -    rpmsg_sg_init(&sg, msg, vrp->buf_size);
>>>>> +    rpmsg_sg_init(&sg, msg, vrp->rx_buf_size);
>>>>>           /* add the buffer back to the remote processor's virtqueue */
>>>>>         err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, msg, GFP_KERNEL);
>>>>> @@ -820,10 +821,13 @@ static int rpmsg_probe(struct virtio_device
>>>>> *vdev)
>>>>>         struct virtproc_info *vrp;
>>>>>         struct virtio_rpmsg_channel *vch = NULL;
>>>>>         struct rpmsg_device *rpdev_ns, *rpdev_ctrl;
>>>>> +    u16 rpmsg_buf_align = 0;
>>>>>         void *bufs_va;
>>>>>         int err = 0, i;
>>>>>         size_t total_buf_space;
>>>>>         bool notify;
>>>>> +    u8 version;
>>>>> +    u16 size;
>>>>>           vrp = kzalloc_obj(*vrp);
>>>>>         if (!vrp)
>>>>> @@ -855,9 +859,90 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>         else
>>>>>             vrp->num_tx_buf = MAX_RPMSG_NUM_BUFS;
>>>>>     -    vrp->buf_size = MAX_RPMSG_BUF_SIZE;
>>>>> +    /*
>>>>> +     * If VIRTIO_RPMSG_F_BUFSZ feature is supported, then configure
>>>>> buf
>>>>> +     * size from virtio device config space from the resource table.
>>>>> +     * If the feature is not supported, then assign default buf size.
>>>>> +     */
>>>>> +    if (virtio_has_feature(vdev, VIRTIO_RPMSG_F_BUFSZ)) {
>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>> +                 version, &version);
>>>>> +
>>>>> +        /* for now we support only v1 */
>>>>> +        if (version != RPMSG_VDEV_CONFIG_V1) {
>>>>> +            dev_err(&vdev->dev,
>>>>> +                "unsupported vdev config version %u\n", version);
>>>>> +            err = -EINVAL;
>>>>> +            goto vqs_del;
>>>>> +        }
>>>>> +
>>>>> +        /* size of the config space must match */
>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>> +                 size, &size);
>>>>> +        if (size != sizeof(struct virtio_rpmsg_config)) {
>>>>> +            dev_err(&vdev->dev, "invalid size of vdev config %u\n",
>>>>> +                size);
>>>>> +            err = -EINVAL;
>>>>> +            goto vqs_del;
>>>>> +        }
>>>>>     -    total_buf_space = (vrp->num_rx_buf + vrp->num_tx_buf) * vrp-
>>>>>> buf_size;
>>>>> +        /*
>>>>> +         * Optional alignment applied to each buffer size and to
>>>>> the TX
>>>>> +         * buffer base address (e.g. to align buffers on a cache
>>>>> line).
>>>>> +         * It must be a power of two; zero means no extra alignment.
>>>>> +         */
>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>> +                 rpmsg_buf_align, &rpmsg_buf_align);
>>>>> +        if (rpmsg_buf_align && !is_power_of_2(rpmsg_buf_align)) {
>>>>> +            dev_err(&vdev->dev,
>>>>> +                "bad vdev config: rpmsg_buf_align %u is not a power
>>>>> of two\n",
>>>>> +                rpmsg_buf_align);
>>>>> +            err = -EINVAL;
>>>>> +            goto vqs_del;
>>>>> +        }
>>>>> +
>>>>> +        /* note: tx and rx are defined from remote view */
>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>> +                 txbuf_size, &vrp->rx_buf_size);
>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>> +                 rxbuf_size, &vrp->tx_buf_size);
>>>>> +
>>>>> +        /* The buffers must hold at least the rpmsg header */
>>>>> +        if (vrp->rx_buf_size < sizeof(struct rpmsg_hdr) ||
>>>>> +            vrp->tx_buf_size < sizeof(struct rpmsg_hdr)) {
>>>>> +            dev_err(&vdev->dev,
>>>>> +                "bad vdev config: rx buf sz = %u, tx buf sz = %u\n",
>>>>> +                vrp->rx_buf_size, vrp->tx_buf_size);
>>>>> +            err = -EINVAL;
>>>>> +            goto vqs_del;
>>>>> +        }
>>>>> +
>>>>> +        /*
>>>>> +         * The buffer size must be aligned to the provided
>>>>> alignment for
>>>>> +         * so that the start address of tx bufs can be aligned.
>>>>> +         */
>>>>
>>>> 'tx' to remove as  it also concerns Rx buffers
>>>>
>>>
>>> Ack.
>>>
>>>>
>>>> What about removing this check to manage alignment during buffer
>>>> allocation?
>>>>
>>>> For example, if the alignment is on a 64-bit address and the tx_buffer
>>>> and rx_buffer sizes are 40 bytes, 48 bytes can be allocated in memory
>>>> for each buffer, and the virtio descriptor can be filled with aligned
>>>> addresses.
>>>>
>>>> In other words, the rpmsg_buf_align field contains the alignment
>>>> constraint from the remote processor. If the Linux kernel wants to
>>>> impose another alignment constraint, it must test or update
>>>> rpmsg_buf_align, but it must not impose alignment on the buffer size.
>>>>
>>>>
>>>
>>> This part I don't understand. `rpmsg_buf_align` is alignment for only
>>> single buffer size. The linux kernel is checking that single rx buf size
>>> and tx buf size is aligned with `rpmsg_buf_align` as firmware has
>>> claimed.
>>>
>>> For reference the openamp-system-reference PR:
>>> https://github.com/OpenAMP/openamp-system-reference/pull/106/changes
>>>
>>>      .vdev_config = {
>>>          .version = 1,
>>>          .reserved = 0,
>>>          .size = (uint16_t)(sizeof(struct rpmsg_virtio_config) -
>>> sizeof(bool)),
>>>          .alignment = RPMSG_BUF_ALIGN,
>>>          .reserved1 = 0,
>>>          /* Tx for host */
>>>          .h2r_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
>>>          /* Rx for host */
>>>          .r2h_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
>>>      },
>>>
>>> IIUC, The linux kernel is not really supposed to modify
>>> `rpmsg_buf_align`. It only uses it to check that firmware has assigned
>>> correct size of single rx and tx buffer.
>>>
>>>
>>> When the linux kernel uses dma_alloc_coherent() API it aligns total
>>> buffer size with page size. That is different than single tx buf size
>>> and single rx buf size. The total buf size alignment to page size is
>>> irrelevant to `rpmsg_buf_align` field.
>>>
>>> Please let me know if I am missing something or didn't understand your
>>> comment. I prefer that `rpmsg_buf_align` should be only modified by the
>>> firmware and not the linux kernel.
>>
>>
>> Sorry it was unclear, let try to reexplain my suggestion:
>>
>> Two alignment constraints can apply:
>> - The remote processor can require an alignment through
>>    vdev_config::alignment.
>> - The main processor, which runs Linux or another operating system (OS),
>>    can require a different alignment, for example, for cache alignment.
>> In current Linux implementation no constraint in Linux.
>> nevertheless  I would be in favor of taking into account such future
>> constraint without imposing constraint on the buffer sizes.
> 
> Is this ever going to be ture? Is it ever possible that Linux and remote
> has different cache alignment? IIUC, both will be using same cache and
> so same alignment will be applicable. That is why only signle alignment
> is required.

Some remote processors, for example, some Arm Cortex-M33, do not 
integrate cache. Even if cache exists, cache can be enabled on one 
processor, but not on the other.

> 
>> Based on that in short term the local 'rpmsg_buf_align' would still
>> computed
>> only from vdev_config::alignment (not update of vdev_config::alignment).
>>
>> virtio_cread(vdev, struct virtio_rpmsg_config,
>>                   rpmsg_buf_align, &rpmsg_buf_align);
>>
>> Then you could use use ALIGN() helper:
>>
>> unsigned int rx_buf_align_size = ALIGN(vrp->rx_buf_size,
>>                         rpmsg_buf_align);
>> unsigned int tx_buf_align_size = ALIGN(vrp->tx_buf_size,
>>                         rpmsg_buf_align);
>>
> 
> This is where I have different opinion. Instead of Linux using ALIGN()
> macro, can we expect that firmware must assign the aligned buffer size
> with vdev_config::rpmsg_buf_align? And so Linux will fail if the buffer
> size is not aligned already from the firmware side. That is why I had
> introduced checks instead of doing alignment by linux.
> 
>> total_buf_space = (vrp->num_rx_buf * rx_buf_align_size) +
>>            (vrp->num_tx_buf * tx_buf_align_size);
>>
>> vrp->tx_bufs = bufs_va + vrp->num_rx_buf * rx_buf_align_size;
>>
>> Apply the same rule to cpu_addr in the vring descriptor:
>>
>> void *cpu_addr = vrp->rx_bufs + i * rx_buf_align_size;
>>
>> rpmsg_sg_init(&sg, cpu_addr, vrp->rx_buf_size);
>>
>> With this approach, the buffer addresses remain aligned
>> independently of vdev_config::Rxbuf_size and vdev_config::txbuf_size.
>> Don't hesitate if it is still not clear!
> 
> How they remain aligned independent of tx/rx_buf_size? tx_bufs address
> is still calculated based on rx_buf_align_size, so its alignment still
> depends on rx_buf_align_size which is derived using
> vdev_config::rpmsg_buf_align.>
> I think we are trying to achive the same thing, but implementation is
> differnt. We just need to decide where the alignment should be done?
> 
> Either on the linux side? Or in the firmware resource table?
> 
> I prefer that the firmware should already provide aligned buffer size,
> and Linux should only check it. If alignment is not done, then simply
> fail with error. That way, firmware also knows the correct size of the
> buffer. If Linux does the alignment, then the firmware is not aware of
> the correct size that is used by the linux.
> 
> I am open to move the alignment operation to the linux side with the
> reasonable justification.

That remains a suggestion. My main concern with the implementation is
that RPMsg size should depend only on the max playlod size needed, not
also on the memory alignment.

If this constraint is kept, it must be imposed on all other non-Linux
solutions. Otherwise, the remote implementation depends on the main
processor implementation.

 From my POV, It would be preferable not to impose such constraint when
possible.

Thanks,
Arnaud

> 
> Thank You,
> Tanmay
> 
>>>
>>>
>>>>> +        if (rpmsg_buf_align &&
>>>>> +            (!IS_ALIGNED(vrp->rx_buf_size, rpmsg_buf_align) ||
>>>>> +             !IS_ALIGNED(vrp->tx_buf_size, rpmsg_buf_align))) {
>>>>> +            dev_err(&vdev->dev,
>>>>> +                "bad vdev config: buf sizes (rx %u, tx %u) not
>>>>> aligned to %u\n",
>>>>> +                vrp->rx_buf_size, vrp->tx_buf_size,
>>>>> +                rpmsg_buf_align);
>>>>> +            err = -EINVAL;
>>>>> +            goto vqs_del;
>>>>> +        }
>>>>> +
>>>>> +        dev_dbg(&vdev->dev,
>>>>> +            "vdev config: ver=%u, align=0x%x, rx sz = 0x%x, tx sz =
>>>>> 0x%x\n",
>>>>> +            version, rpmsg_buf_align, vrp->rx_buf_size,
>>>>> +            vrp->tx_buf_size);
>>>>> +    } else {
>>>>> +        vrp->rx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>>>>> +        vrp->tx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>>>>> +    }
>>>>> +
>>>>> +    total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>>>>> +              (vrp->num_tx_buf * vrp->tx_buf_size);
>>>>>           /* allocate coherent memory for the buffers */
>>>>>         bufs_va = dma_alloc_coherent(vdev->dev.parent,
>>>>> @@ -874,15 +959,20 @@ static int rpmsg_probe(struct virtio_device
>>>>> *vdev)
>>>>>         /* first part of the buffers is dedicated for RX */
>>>>>         vrp->rx_bufs = bufs_va;
>>>>>     -    /* and second part is dedicated for TX */
>>>>> -    vrp->tx_bufs = bufs_va + vrp->num_rx_buf * vrp->buf_size;
>>>>> +    /*
>>>>> +     * Here buf_va is aligned to a page. Also rx buf size is aligned
>>>>> with
>>>>> +     * cache line alignment provided by the firmware, so tx buf's
>>>>> start
>>>>> +     * address is guranteed to be aligned with the alignment
>>>>> provided by
>>>>> +     * the firmware.
>>>>> +     */
>>>>> +    vrp->tx_bufs = bufs_va + (vrp->num_rx_buf * vrp->rx_buf_size);
>>>>>           /* set up the receive buffers */
>>>>>         for (i = 0; i < vrp->num_rx_buf; i++) {
>>>>>             struct scatterlist sg;
>>>>> -        void *cpu_addr = vrp->rx_bufs + i * vrp->buf_size;
>>>>> +        void *cpu_addr = vrp->rx_bufs + i * vrp->rx_buf_size;
>>>>>     -        rpmsg_sg_init(&sg, cpu_addr, vrp->buf_size);
>>>>> +        rpmsg_sg_init(&sg, cpu_addr, vrp->rx_buf_size);
>>>>>               err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, cpu_addr,
>>>>>                           GFP_KERNEL);
>>>>> @@ -965,8 +1055,8 @@ static int rpmsg_remove_device(struct device
>>>>> *dev, void *data)
>>>>>     static void rpmsg_remove(struct virtio_device *vdev)
>>>>>     {
>>>>>         struct virtproc_info *vrp = vdev->priv;
>>>>> -    unsigned int num_bufs = vrp->num_rx_buf + vrp->num_tx_buf;
>>>>> -    size_t total_buf_space = num_bufs * vrp->buf_size;
>>>>> +    size_t total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>>>>> +                 (vrp->num_tx_buf * vrp->tx_buf_size);
>>>>>         int ret;
>>>>>           virtio_reset_device(vdev);
>>>>> @@ -992,6 +1082,7 @@ static struct virtio_device_id id_table[] = {
>>>>>       static unsigned int features[] = {
>>>>>         VIRTIO_RPMSG_F_NS,
>>>>> +    VIRTIO_RPMSG_F_BUFSZ,
>>>>>     };
>>>>>       static struct virtio_driver virtio_ipc_driver = {
>>>>> diff --git a/include/linux/rpmsg/virtio_rpmsg.h b/include/linux/rpmsg/
>>>>> virtio_rpmsg.h
>>>>> new file mode 100644
>>>>> index 000000000000..7e14da68fd17
>>>>> --- /dev/null
>>>>> +++ b/include/linux/rpmsg/virtio_rpmsg.h
>>>>> @@ -0,0 +1,50 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>>> +/*
>>>>> + * Copyright (C) Pinecone Inc. 2019
>>>>> + * Copyright (C) Xiang Xiao <xiaoxiang@pinecone.net>
>>>>> + * Copyright (C) Advanced Micro Devices, Inc. 2026
>>>>> + */
>>>>> +
>>>>> +#ifndef _LINUX_VIRTIO_RPMSG_H
>>>>> +#define _LINUX_VIRTIO_RPMSG_H
>>>>> +
>>>>> +#include <linux/types.h>
>>>>> +#include <linux/virtio_types.h>
>>>>> +
>>>>> +/* The feature bitmap for virtio rpmsg */
>>>>> +#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>>>>> notifications */
>>>>> +#define VIRTIO_RPMSG_F_BUFSZ    1 /* RP get buffer size from config
>>>>> space */
>>>>> +
>>>>> +/* Version of struct virtio_rpmsg_config understood by this driver */
>>>>> +#define RPMSG_VDEV_CONFIG_V1    1
>>>>> +
>>>>> +/**
>>>>> + * struct virtio_rpmsg_config - config space for rpmsg virtio device
>>>>> + *
>>>>> + * @version:    version of this structure, currently
>>>>> %RPMSG_VDEV_CONFIG_V1.
>>>>> + * @reserved:    reserved for padding, must be zero.
>>>>> + * @size:    size of this structure in bytes.
>>>>> + * @rpmsg_buf_align:    required alignment in bytes for each buffer.
>>>>> Must be a
>>>>> + *        power of two so that both the buffer sizes and the TX buffer
>>>>> + *        base address can be aligned (e.g. to a cache line).
>>>>> + * @reserved1:    reserved for padding, must be zero. Keeps the
>>>>> following 32-bit
>>>>> + *        fields naturally aligned.
>>>>> + * @txbuf_size:    Tx buf size from remote's view. For Linux this is
>>>>> rx buf size.
>>>>> + * @rxbuf_size:    Rx buf size from remote's view. For Linux this is
>>>>> tx buf size.
>>>>> + *
>>>>> + * This is the configuration structure shared by the device and the
>>>>> driver,
>>>>> + * read when %VIRTIO_RPMSG_F_BUFSZ is negotiated. The fields are laid
>>>>> out so
>>>>> + * the structure is naturally 32-bit aligned.
>>>>> + */
>>>>> +struct virtio_rpmsg_config {
>>>>> +    u8 version;
>>>>> +    u8 reserved;
>>>>
>>>> Why about defining the version type to u16 to avoid the reserved field?
>>>>
>>>>> +    __virtio16 size;
>>>>> +    __virtio16 rpmsg_buf_align;
>>>>> +    __virtio16 reserved1;
>>>>
>>>> Seems useless if __packed prevents the compiler from inserting extra
>>>> padding
>>>> bytes between fields,
>>>>
>>>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>>>> +    __virtio32 txbuf_size;
>>>>> +    __virtio32 rxbuf_size;
>>>>> +} __packed;
>>>>
>>>> proposal
>>>>
>>>> +struct virtio_rpmsg_config {
>>>> +    __virtio16 version;
>>>> +    __virtio16 size;
>>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>>> +    __virtio32 txbuf_size;
>>>> +    __virtio32 rxbuf_size;
>>>> +    __virtio16 rpmsg_buf_align;
>>>> +} __packed;
>>>> +
>>>>
>>>
>>> I am okay with the above proposal with minor difference:
>>>
>>> My proposal:
>>>
>>> +struct virtio_rpmsg_config {
>>> +    u8 version;
>>> +    __virtio16 size;
>>> +    __virtio16 rpmsg_buf_align;
>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>> +    __virtio32 txbuf_size;
>>> +    __virtio32 rxbuf_size;
>>> +} __packed;
>>>
>>> I just want to keep version field 8-bit, as we will probably never use
>>> upper byte of that field if we use 16-bit. Rest is okay. If the
>>> strucutre is packed then reserved bytes are not needed.
>>>
>>> Please let me know your view.
>>
>> No strong opinion on that. In the end, this structure is read only one
>> time.
>> If it is acceptable to Mathieu, it is acceptable to me.
>>
>> Thanks,
>> Arnaud
>>
>>>
>>> Thanks,
>>> Tanmay
>>>
>>>
>>>> Regards,
>>>> Arnaud
>>>>
>>>>> +
>>>>> +#endif /* _LINUX_VIRTIO_RPMSG_H */
>>>>
>>>
>>
> 


^ permalink raw reply

* Re: [PATCH v8 04/46] KVM: Decouple kvm_has_arch_private_mem from CONFIG_KVM_VM_MEMORY_ATTRIBUTES
From: Fuad Tabba @ 2026-06-19  8:10 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-4-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> When memory attributes become trackable in guest_memfd, the concept of
> having private memory is no longer dependent on
> CONFIG_KVM_VM_MEMORY_ATTRIBUTES.
>
> With this, on x86, kvm_arch_has_private_mem() is defined if some CoCo
> platform support (or the testing CONFIG_KVM_SW_PROTECTED_VM) is compiled
> in.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  arch/x86/include/asm/kvm_host.h | 4 +++-
>  include/linux/kvm_host.h        | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8e8eb8a5e8a6b..1bde67cf6eb0e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2394,7 +2394,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>                        int tdp_max_root_level, int tdp_huge_page_level);
>
>
> -#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#if defined(CONFIG_KVM_SW_PROTECTED_VM) ||     \
> +       defined(CONFIG_KVM_INTEL_TDX) ||        \
> +       defined(CONFIG_KVM_AMD_SEV)
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
>  #endif
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 201d0f2143976..d370e834d619e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -#ifndef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +#ifndef kvm_arch_has_private_mem
>  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  {
>         return false;
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 05/46] KVM: Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable
From: Fuad Tabba @ 2026-06-19  8:12 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-5-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable, only for (CoCo) VM types
> that might use vm_memory_attributes.
>
> Also document CONFIG_KVM_VM_MEMORY_ATTRIBUTES to specifically be about the
> private/shared attribute.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

You're missing a SoB, but with that fixed:

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/x86/kvm/Kconfig | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 24f96396cfa1c..c28393dc664eb 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -81,13 +81,16 @@ config KVM_WERROR
>           If in doubt, say "N".
>
>  config KVM_VM_MEMORY_ATTRIBUTES
> -       bool
> +       depends on KVM_SW_PROTECTED_VM || KVM_INTEL_TDX || KVM_AMD_SEV
> +       bool "Enable per-VM PRIVATE vs. SHARED attributes (for CoCo VMs)"
> +       help
> +         Enable support for tracking PRIVATE vs. SHARED memory using per-VM
> +         memory attributes.
>
>  config KVM_SW_PROTECTED_VM
>         bool "Enable support for KVM software-protected VMs"
>         depends on EXPERT
>         depends on KVM_X86 && X86_64
> -       select KVM_VM_MEMORY_ATTRIBUTES
>         help
>           Enable support for KVM software-protected VMs.  Currently, software-
>           protected VMs are purely a development and testing vehicle for
> @@ -138,7 +141,6 @@ config KVM_INTEL_TDX
>         bool "Intel Trust Domain Extensions (TDX) support"
>         default y
>         depends on INTEL_TDX_HOST
> -       select KVM_VM_MEMORY_ATTRIBUTES
>         select HAVE_KVM_ARCH_GMEM_POPULATE
>         help
>           Provides support for launching Intel Trust Domain Extensions (TDX)
> @@ -162,7 +164,6 @@ config KVM_AMD_SEV
>         depends on KVM_AMD && X86_64
>         depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>         select ARCH_HAS_CC_PLATFORM
> -       select KVM_VM_MEMORY_ATTRIBUTES
>         select HAVE_KVM_ARCH_GMEM_PREPARE
>         select HAVE_KVM_ARCH_GMEM_INVALIDATE
>         select HAVE_KVM_ARCH_GMEM_POPULATE
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 07/46] KVM: Rename memory attribute APIs to prepare for in-place gmem conversion
From: Fuad Tabba @ 2026-06-19  8:16 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-7-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Rename memory attribute APIs to add a "vm_" in the name in anticipation of
> moving PRIVATE tracking into guest_memfd, to allow in-place conversion
> between SHARED and PRIVATE.  At that point, there will effectively be two
> (potential) sources of memory attributes: the VM and guest_memfd.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Missing SoB (other patches as well, I won't mention it again). But for
this (and other patches I review with a missing SoB fixed):

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  arch/x86/kvm/mmu/mmu.c   |  6 +++---
>  include/linux/kvm_host.h | 15 +++++++++++----
>  virt/kvm/guest_memfd.c   |  6 +++---
>  virt/kvm/kvm_main.c      | 16 ++++++++--------
>  4 files changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e0005a21b6e22..cbc50aef801fb 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -8087,11 +8087,11 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
>         const unsigned long end = start + KVM_PAGES_PER_HPAGE(level);
>
>         if (level == PG_LEVEL_2M)
> -               return kvm_range_has_memory_attributes(kvm, start, end, ~0, attrs);
> +               return kvm_range_has_vm_memory_attributes(kvm, start, end, ~0, attrs);
>
>         for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
>                 if (hugepage_test_mixed(slot, gfn, level - 1) ||
> -                   attrs != kvm_get_memory_attributes(kvm, gfn))
> +                   attrs != kvm_get_vm_memory_attributes(kvm, gfn))
>                         return false;
>         }
>         return true;
> @@ -8191,7 +8191,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
>                  * be manually checked as the attributes may already be mixed.
>                  */
>                 for (gfn = start; gfn < end; gfn += nr_pages) {
> -                       unsigned long attrs = kvm_get_memory_attributes(kvm, gfn);
> +                       unsigned long attrs = kvm_get_vm_memory_attributes(kvm, gfn);
>
>                         if (hugepage_has_attrs(kvm, slot, gfn, level, attrs))
>                                 hugepage_clear_mixed(slot, gfn, level);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d370e834d619e..eb26d4ea8945a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2534,13 +2534,13 @@ static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
>  }
>
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> -static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> +static inline unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
>  {
>         return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
>  }
>
> -bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> -                                    unsigned long mask, unsigned long attrs);
> +bool kvm_range_has_vm_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> +                                       unsigned long mask, unsigned long attrs);
>  bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>                                         struct kvm_gfn_range *range);
>  bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> @@ -2548,7 +2548,14 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
> -       return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +       return kvm_get_vm_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +}
> +static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
> +                                           gfn_t end)
> +{
> +       return kvm_range_has_vm_memory_attributes(kvm, start, end,
> +                                                 KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +                                                 KVM_MEMORY_ATTRIBUTE_PRIVATE);
>  }
>  #else
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b4c24fdf159f6..8101f64e0366f 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -915,9 +915,9 @@ static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
>
>         folio_unlock(folio);
>
> -       if (!kvm_range_has_memory_attributes(kvm, gfn, gfn + 1,
> -                                            KVM_MEMORY_ATTRIBUTE_PRIVATE,
> -                                            KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +       if (!kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + 1,
> +                                               KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +                                               KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
>                 ret = -EINVAL;
>                 goto out_put_folio;
>         }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7b989b659cf82..6669f1477013c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2419,7 +2419,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> -static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> +static u64 kvm_supported_vm_mem_attributes(struct kvm *kvm)
>  {
>  #ifdef kvm_arch_has_private_mem
>         if (!kvm || kvm_arch_has_private_mem(kvm))
> @@ -2433,19 +2433,19 @@ static u64 kvm_supported_mem_attributes(struct kvm *kvm)
>   * Returns true if _all_ gfns in the range [@start, @end) have attributes
>   * such that the bits in @mask match @attrs.
>   */
> -bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> -                                    unsigned long mask, unsigned long attrs)
> +bool kvm_range_has_vm_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> +                                       unsigned long mask, unsigned long attrs)
>  {
>         XA_STATE(xas, &kvm->mem_attr_array, start);
>         unsigned long index;
>         void *entry;
>
> -       mask &= kvm_supported_mem_attributes(kvm);
> +       mask &= kvm_supported_vm_mem_attributes(kvm);
>         if (attrs & ~mask)
>                 return false;
>
>         if (end == start + 1)
> -               return (kvm_get_memory_attributes(kvm, start) & mask) == attrs;
> +               return (kvm_get_vm_memory_attributes(kvm, start) & mask) == attrs;
>
>         guard(rcu)();
>         if (!attrs)
> @@ -2567,7 +2567,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
>         mutex_lock(&kvm->slots_lock);
>
>         /* Nothing to do if the entire range has the desired attributes. */
> -       if (kvm_range_has_memory_attributes(kvm, start, end, ~0, attributes))
> +       if (kvm_range_has_vm_memory_attributes(kvm, start, end, ~0, attributes))
>                 goto out_unlock;
>
>         /*
> @@ -2606,7 +2606,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>         /* flags is currently not used. */
>         if (attrs->flags)
>                 return -EINVAL;
> -       if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
> +       if (attrs->attributes & ~kvm_supported_vm_mem_attributes(kvm))
>                 return -EINVAL;
>         if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
>                 return -EINVAL;
> @@ -4926,7 +4926,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>                 return 1;
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>         case KVM_CAP_MEMORY_ATTRIBUTES:
> -               return kvm_supported_mem_attributes(kvm);
> +               return kvm_supported_vm_mem_attributes(kvm);
>  #endif
>  #ifdef CONFIG_KVM_GUEST_MEMFD
>         case KVM_CAP_GUEST_MEMFD:
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 08/46] KVM: Provide generic interface for checking memory private/shared status
From: Fuad Tabba @ 2026-06-19  8:19 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-8-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Introduce a generic kvm_mem_is_private() interface using a static call to
> determine if a GFN is private. This allows the implementation for checking
> a GFN's private/shared status to be set at runtime.
>
> In preparation for choosing implementations between a guest_memfd lookup
> and the existing VM attribute lookup, rename the existing
> VM-attribute-based check to kvm_vm_mem_is_private to emphasize that it
> looks up VM attributes.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

(SoB fix plz)

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  include/linux/kvm_host.h | 12 +++++++++++-
>  virt/kvm/kvm_main.c      | 15 +++++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eb26d4ea8945a..3915da2a61778 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2546,7 +2546,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>  bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>                                          struct kvm_gfn_range *range);
>
> -static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> +static inline bool kvm_vm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return kvm_get_vm_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>  }
> @@ -2557,6 +2557,16 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
>                                                   KVM_MEMORY_ATTRIBUTE_PRIVATE,
>                                                   KVM_MEMORY_ATTRIBUTE_PRIVATE);
>  }
> +#endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
> +
> +#ifdef kvm_arch_has_private_mem
> +typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
> +DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
> +
> +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> +{
> +       return static_call(__kvm_mem_is_private)(kvm, gfn);
> +}
>  #else
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6669f1477013c..8b238e461b854 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2627,6 +2627,20 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
> +#ifdef kvm_arch_has_private_mem
> +DEFINE_STATIC_CALL_RET0(__kvm_mem_is_private, kvm_mem_is_private_t);
> +EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
> +
> +static void kvm_init_memory_attributes(void)
> +{
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
> +#endif
> +}
> +#else
> +static void kvm_init_memory_attributes(void) { }
> +#endif
> +
>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
>  {
>         return __gfn_to_memslot(kvm_memslots(kvm), gfn);
> @@ -6528,6 +6542,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
>         kvm_preempt_ops.sched_in = kvm_sched_in;
>         kvm_preempt_ops.sched_out = kvm_sched_out;
>
> +       kvm_init_memory_attributes();
>         kvm_init_debug();
>
>         r = kvm_vfio_ops_init();
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 08/46] KVM: Provide generic interface for checking memory private/shared status
From: Fuad Tabba @ 2026-06-19  8:21 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <CA+EHjTw6x-mxDnJjnhE-6SV73tMrb0paKDTtOC2j6zJ1fXZDLA@mail.gmail.com>

On Fri, 19 Jun 2026 at 09:19, Fuad Tabba <tabba@google.com> wrote:
>
> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Sean Christopherson <seanjc@google.com>
> >
> > Introduce a generic kvm_mem_is_private() interface using a static call to
> > determine if a GFN is private. This allows the implementation for checking
> > a GFN's private/shared status to be set at runtime.
> >
> > In preparation for choosing implementations between a guest_memfd lookup
> > and the existing VM attribute lookup, rename the existing
> > VM-attribute-based check to kvm_vm_mem_is_private to emphasize that it
> > looks up VM attributes.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> (SoB fix plz)
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
>
> Cheers,
> /fuad
> > ---
> >  include/linux/kvm_host.h | 12 +++++++++++-
> >  virt/kvm/kvm_main.c      | 15 +++++++++++++++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index eb26d4ea8945a..3915da2a61778 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2546,7 +2546,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> >  bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> >                                          struct kvm_gfn_range *range);
> >
> > -static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > +static inline bool kvm_vm_mem_is_private(struct kvm *kvm, gfn_t gfn)

Should have read the Sashiko review first, but where is this used?
It's not used at all in this series...

/fuad

> >  {
> >         return kvm_get_vm_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >  }
> > @@ -2557,6 +2557,16 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
> >                                                   KVM_MEMORY_ATTRIBUTE_PRIVATE,
> >                                                   KVM_MEMORY_ATTRIBUTE_PRIVATE);
> >  }
> > +#endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
> > +
> > +#ifdef kvm_arch_has_private_mem
> > +typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
> > +DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
> > +
> > +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > +{
> > +       return static_call(__kvm_mem_is_private)(kvm, gfn);
> > +}
> >  #else
> >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 6669f1477013c..8b238e461b854 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2627,6 +2627,20 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> >  }
> >  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
> >
> > +#ifdef kvm_arch_has_private_mem
> > +DEFINE_STATIC_CALL_RET0(__kvm_mem_is_private, kvm_mem_is_private_t);
> > +EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
> > +
> > +static void kvm_init_memory_attributes(void)
> > +{
> > +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> > +       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
> > +#endif
> > +}
> > +#else
> > +static void kvm_init_memory_attributes(void) { }
> > +#endif
> > +
> >  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
> >  {
> >         return __gfn_to_memslot(kvm_memslots(kvm), gfn);
> > @@ -6528,6 +6542,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> >         kvm_preempt_ops.sched_in = kvm_sched_in;
> >         kvm_preempt_ops.sched_out = kvm_sched_out;
> >
> > +       kvm_init_memory_attributes();
> >         kvm_init_debug();
> >
> >         r = kvm_vfio_ops_init();
> >
> > --
> > 2.55.0.rc0.738.g0c8ab3ebcc-goog
> >
> >

^ permalink raw reply

* Re: [PATCH v8 09/46] KVM: guest_memfd: Introduce function to check GFN private/shared status
From: Fuad Tabba @ 2026-06-19  8:25 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-9-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce function for KVM to check the private/shared status of guest
> memory at a given GFN.
>
> This will be used in a later patch.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  include/linux/kvm_host.h |  2 ++
>  virt/kvm/guest_memfd.c   | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3915da2a61778..27687fb9d5201 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2575,6 +2575,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  #ifdef CONFIG_KVM_GUEST_MEMFD
> +bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn);
> +
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                      gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>                      int *max_order);
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 8101f64e0366f..bca912db5be6e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -510,6 +510,37 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>         return 0;
>  }
>
> +bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn)
> +{
> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> +       struct inode *inode;
> +
> +       /*
> +        * If this gfn has no associated memslot, there's no chance of the gfn
> +        * being backed by private memory, since guest_memfd must be used for
> +        * private memory, and guest_memfd must be associated with some memslot.
> +        */
> +       if (!slot)
> +               return 0;
> +
> +       CLASS(gmem_get_file, file)(slot);
> +       if (!file)
> +               return 0;
> +
> +       inode = file_inode(file);
> +
> +       /*
> +        * Rely on the maple tree's internal RCU lock to ensure a
> +        * stable result. This result can become stale as soon as the
> +        * lock is dropped, so the caller _must_ still protect
> +        * consumption of private vs. shared by checking
> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> +        * against ongoing attribute updates.
> +        */
> +       return kvm_gmem_is_private_mem(inode, kvm_gmem_get_index(slot, gfn));
> +}
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_is_private);
> +
>  static struct file_operations kvm_gmem_fops = {
>         .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v5 1/7] tracing/events: Fix to check the simple_tsk_fn creation
From: Masami Hiramatsu @ 2026-06-19  8:33 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178165817322.269421.3992299509400184196.stgit@devnote2>

Let me pick this fix to probes/core.

Thanks,

On Wed, 17 Jun 2026 10:02:53 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Sashiko pointed that this sample code does not correctly handle the
> failure of thread creation because kthread_run() can return -errno.
> 
> Check the simple_tsk_fn is correctly initialized (created) or not.
> 
> Link: https://sashiko.dev/#/patchset/178092865666.163648.10457567771536160909.stgit%40devnote2
> 
> Fixes: 9cfe06f8cd5c ("tracing/events: add trace-events-sample")
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  Changes in v4:
>    - Fix to remove decrementing counter in error path, since foo_bar_reg() always returns 0.
>    - Add a newline to error message.
>  Changes in v3:
>    - Recover the usage counter.
> ---
>  samples/trace_events/trace-events-sample.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
> index ecc7db237f2e..0b7a6efdb247 100644
> --- a/samples/trace_events/trace-events-sample.c
> +++ b/samples/trace_events/trace-events-sample.c
> @@ -107,6 +107,10 @@ int foo_bar_reg(void)
>  	 * for consistency sake, we still take the thread_mutex.
>  	 */
>  	simple_tsk_fn = kthread_run(simple_thread_fn, NULL, "event-sample-fn");
> +	if (IS_ERR_OR_NULL(simple_tsk_fn)) {
> +		pr_err("Failed to create simple_thread_fn\n");
> +		simple_tsk_fn = NULL;
> +	}
>   out:
>  	mutex_unlock(&thread_mutex);
>  	return 0;
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v8 10/46] KVM: guest_memfd: Wire up core private/shared attribute interfaces
From: Fuad Tabba @ 2026-06-19  8:34 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-10-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> With in-place conversion, guest_memfd is able to track the private/shared
> status of memory. Use a global flag to toggle between tracking
> private/shared status per-vm or within guest_memfd.
>
> When queried for supported vm memory attributes, return 0 if attributes are
> tracked in guest_memfd.
>
> When querying for memory attributes over a range, look up memory attributes
> based on the flag's state at query time.
>
> For per-GFN memory attribute queries, choosing an implementation (VM or
> guest_memfd lookup) at KVM load time.
>
> The flag is always false for now and will be made toggle-able after all
> in-place conversion features are added in subsequent patches.
>
> If/since the flag is false, if CONFIG_KVM_VM_MEMORY_ATTRIBUTES is also not
> selected, the per-GFN memory attribute query defaults to returning
> 0 (false/not private).
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  include/linux/kvm_host.h |  4 ++++
>  virt/kvm/guest_memfd.c   | 22 +++++++++++++++++++---
>  virt/kvm/kvm_main.c      | 12 +++++++++++-
>  3 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27687fb9d5201..acb552745b428 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2560,6 +2560,8 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
>  #endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  #ifdef kvm_arch_has_private_mem
> +extern bool gmem_in_place_conversion;
> +
>  typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
>  DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
>
> @@ -2568,6 +2570,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>         return static_call(__kvm_mem_is_private)(kvm, gfn);
>  }
>  #else
> +#define gmem_in_place_conversion false
> +
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return false;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index bca912db5be6e..e0e544ef47d69 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -926,6 +926,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
>
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
> +static bool kvm_gmem_range_is_private(struct file *file, pgoff_t index,
> +                                     size_t nr_pages, struct kvm *kvm, gfn_t gfn)
> +{
> +       struct maple_tree *mt = &GMEM_I(file_inode(file))->attributes;
> +       pgoff_t end = index + nr_pages - 1;
> +       void *entry;
> +
> +       if (!gmem_in_place_conversion)
> +               return kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + nr_pages,
> +                                                         KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +                                                         KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +
> +       mt_for_each(mt, entry, index, end) {
> +               if (xa_to_value(entry) != KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +                       return false;
> +       }
> +       return true;
> +}
>
>  static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
>                                 struct file *file, gfn_t gfn, struct page *src_page,
> @@ -946,9 +964,7 @@ static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
>
>         folio_unlock(folio);
>
> -       if (!kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + 1,
> -                                               KVM_MEMORY_ATTRIBUTE_PRIVATE,
> -                                               KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +       if (!kvm_gmem_range_is_private(file, index, 1, kvm, gfn)) {
>                 ret = -EINVAL;
>                 goto out_put_folio;
>         }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b238e461b854..01761f6e25d25 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -101,6 +101,10 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink);
>  static bool __ro_after_init allow_unsafe_mappings;
>  module_param(allow_unsafe_mappings, bool, 0444);
>
> +#ifdef kvm_arch_has_private_mem
> +bool __ro_after_init gmem_in_place_conversion = false;
> +#endif
> +
>  /*
>   * Ordering of locks:
>   *
> @@ -2422,6 +2426,9 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  static u64 kvm_supported_vm_mem_attributes(struct kvm *kvm)
>  {
>  #ifdef kvm_arch_has_private_mem
> +       if (gmem_in_place_conversion)
> +               return 0;
> +
>         if (!kvm || kvm_arch_has_private_mem(kvm))
>                 return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>  #endif
> @@ -2633,8 +2640,11 @@ EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
>
>  static void kvm_init_memory_attributes(void)
>  {
> +       if (gmem_in_place_conversion)
> +               static_call_update(__kvm_mem_is_private, kvm_gmem_is_private);
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> -       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
> +       else
> +               static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
>  #endif
>  }
>  #else
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:16 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: rodrigo.alencar, linux-iio, devicetree, linux-kernel, linux-doc,
	linux-hardening, Lars-Peter Clausen, Michael Hennerich,
	Jonathan Cameron, David Lechner, Andy Shevchenko, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Philipp Zabel, Jonathan Corbet,
	Shuah Khan, Kees Cook, Gustavo A. R. Silva
In-Reply-To: <x3aijvc4buo7aqbchikuoyyrgiq3afidtkla37h2rg4tvfdbc3@h42qp3estg2s>

On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> On 18/06/26 16:06, Nuno Sá wrote:
> > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> > > From: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > 
> > > Move logic to create a channel prefix for naming attribute files into a
> > > separate __iio_chan_prefix_emit() function for reuse.
> 
> ...
> 
> > > +static int __iio_chan_prefix_emit(const struct iio_chan_spec *chan,
> > > +				  enum iio_shared_by shared_by,
> > > +				  char *buf, size_t len)
> > > +{
> > > +	const char *dir = iio_direction[chan->output];
> > > +	const char *type = iio_chan_type_name_spec[chan->type];
> > > +	int n = 0;
> > > +
> > > +	switch (shared_by) {
> > > +	case IIO_SHARED_BY_ALL:
> > > +		buf[0] = '\0'; /* empty channel prefix */
> > > +		break;
> > > +	case IIO_SHARED_BY_DIR:
> > > +		n = scnprintf(buf, len, "%s", dir);
> > > +		break;
> > > +	case IIO_SHARED_BY_TYPE:
> > > +		n = scnprintf(buf, len, "%s_%s", dir, type);
> > > +		if (chan->differential)
> > > +			n += scnprintf(buf + n, len - n, "-%s", type);
> > > +		break;
> > > +	case IIO_SEPARATE:
> > > +		if (chan->indexed) {
> > > +			n = scnprintf(buf, len, "%s_%s%d", dir, type,
> > > +				      chan->channel);
> > > +			if (chan->differential)
> > > +				n += scnprintf(buf + n, len - n, "-%s%d", type,
> > > +					       chan->channel2);
> > > +		} else {
> > > +			if (chan->differential) {
> > > +				WARN(1, "Differential channels must be indexed\n");
> > > +				return -EINVAL;
> > > +			}
> > > +			n = scnprintf(buf, len, "%s_%s", dir, type);
> > > +		}
> > > +
> > > +		if (chan->modified) {
> > > +			if (chan->differential) {
> > > +				WARN(1, "Differential channels can not have modifier\n");
> > > +				return -EINVAL;
> > 
> > WARN() looks too much to me. dev_error() as we're treating it as such. I
> > guess you don't want to pass struct device but not really an issue IMHO.
> 
> __iio_device_attr_init() also used WARN(), probably because it didnt have
> access to a dev pointer. It would not be a problem to add an extra param.

Hmm, fair enough. Maybe a chance to change it. Not sure how others feel
about it.

>  
> > 
> > > +			}
> > > +			n += scnprintf(buf + n, len - n, "_%s",
> > > +				       iio_modifier_names[chan->channel2]);
> > > +		}
> > > +
> > > +		if (chan->extend_name)
> > > +			n += scnprintf(buf + n, len - n, "_%s", chan->extend_name);
> > > +		break;
> > > +	}
> > > +
> > > +	if (n > 0 && n < len - 1) { /* prefix termination if not empty */
> > > +		buf[n++] = '_';
> > > +		buf[n] = '\0';
> > > +	}
> > > +
> > 
> > Can't we handle the above in the caller on kasprintf()? Then we could
> > simplify and return in place.
> 
> I felt like doing this here would get a cleaner logic in the caller, which
> would have to add the '_' conditionally.
> 

I think it makes things more clear in the caller given we return n
anyways but I don't feel strong about it.

- Nuno Sá

> > 
> > > +	return n;
> > > +}
> > > +
> > >  /**
> > >   * iio_device_id() - query the unique ID for the device
> > >   * @indio_dev:		Device structure whose ID is being queried
> > > @@ -1100,106 +1159,19 @@ int __iio_device_attr_init(struct device_attribute *dev_attr,
> > >  						size_t len),
> > >  			   enum iio_shared_by shared_by)
> > >  {
> > > -	int ret = 0;
> > > -	char *name = NULL;
> > > -	char *full_postfix;
> > > +	char prefix[NAME_MAX + 1];
> > > +	int ret;
> > >  
> > >  	sysfs_attr_init(&dev_attr->attr);
> > >  
> > > -	/* Build up postfix of <extend_name>_<modifier>_postfix */
> > > -	if (chan->modified && (shared_by == IIO_SEPARATE)) {
> > > -		if (chan->extend_name)
> > > -			full_postfix = kasprintf(GFP_KERNEL, "%s_%s_%s",
> > > -						 iio_modifier_names[chan->channel2],
> > > -						 chan->extend_name,
> > > -						 postfix);
> > > -		else
> > > -			full_postfix = kasprintf(GFP_KERNEL, "%s_%s",
> > > -						 iio_modifier_names[chan->channel2],
> > > -						 postfix);
> > > -	} else {
> > > -		if (chan->extend_name == NULL || shared_by != IIO_SEPARATE)
> > > -			full_postfix = kstrdup(postfix, GFP_KERNEL);
> > > -		else
> > > -			full_postfix = kasprintf(GFP_KERNEL,
> > > -						 "%s_%s",
> > > -						 chan->extend_name,
> > > -						 postfix);
> > > -	}
> > > -	if (full_postfix == NULL)
> > > +	ret = __iio_chan_prefix_emit(chan, shared_by, prefix, sizeof(prefix));
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > +	if (!dev_attr->attr.name)
> > >  		return -ENOMEM;
> > 
> > I don't oppose the change. Looks like a nice cleanup. But bear in mind
> > this very sensible as any subtle mistake means ABI breakage.
> 
> Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> Kind regards,
> 
> Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:19 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Rodrigo Alencar, rodrigo.alencar, linux-iio, devicetree,
	linux-kernel, linux-doc, linux-hardening, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, David Lechner,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
	Gustavo A. R. Silva
In-Reply-To: <ajQ1bZSNHQ96pyJx@ashevche-desk.local>

On Thu, Jun 18, 2026 at 09:14:05PM +0300, Andy Shevchenko wrote:
> On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> > On 18/06/26 16:06, Nuno Sá wrote:
> > > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> 
> ...
> 
> > > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > > +	if (!dev_attr->attr.name)
> > > >  		return -ENOMEM;
> > > 
> > > I don't oppose the change. Looks like a nice cleanup.
> 
> May I oppose it? I found use scnprintf() is harder to follow in comparison to
> nice kasprintf() that takes care for the dynamically allocated buffer.

Tend to agree a bit given I was used to the older code. So matching the
old logic with the new one is an exercise, yes.

> 
> Also there is a chance to get a name silently cut due to insufficient space.
> Besides that this function can't be used (again due to 'c') in kasprintf()-like
> wrapper. I do not consider this as a good approach. Have you looked at seq_buf
> instead?

Not so sure the above bothers me that much.

> 
> > > But bear in mind this very sensible as any subtle mistake means ABI breakage.
> 
> Which immediately raises a question of test coverage. Do we have one? If not,
> this code must be accompanied with one.

The above is the more concerning part to me.

- Nuno Sá

> 
> > Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:20 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: Andy Shevchenko, rodrigo.alencar, linux-iio, devicetree,
	linux-kernel, linux-doc, linux-hardening, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, David Lechner,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
	Gustavo A. R. Silva
In-Reply-To: <dlisetsssjoyodmv5ubl4rzhxtla3g46mrrzv2f65nqecel5fu@dqiqsayr4aip>

On Fri, Jun 19, 2026 at 08:43:24AM +0100, Rodrigo Alencar wrote:
> On 18/06/26 21:14, Andy Shevchenko wrote:
> > On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> > > On 18/06/26 16:06, Nuno Sá wrote:
> > > > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> > 
> > ...
> > 
> > > > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > > > +	if (!dev_attr->attr.name)
> > > > >  		return -ENOMEM;
> > > > 
> > > > I don't oppose the change. Looks like a nice cleanup.
> > 
> > May I oppose it? I found use scnprintf() is harder to follow in comparison to
> > nice kasprintf() that takes care for the dynamically allocated buffer.
> 
> In the next patch the function is reused in a sysfs attribute read handler,
> a context wich would not be nice to have dynamic allocation. vscnprintf() is
> the main building block of sysfs_emit() which limits the buffer length to
> a page size, so I used scnprintf() trying not to deviate much from that. 
> 
> kasprintf() it is still used in the caller, where the logic was a bit confusing
> as it tried to avoid multiple allocations.
>  
> > Also there is a chance to get a name silently cut due to insufficient space.
> > Besides that this function can't be used (again due to 'c') in kasprintf()-like
> > wrapper. I do not consider this as a good approach. Have you looked at seq_buf
> > instead?
> 
> NAME_MAX is not the maximum length a filename can have? I suppose there should be
> enough space for the channel-prefix. Indeed, seq_buf can be used and it cleans up
> things a bit as it tracks the the position in the buffer.
> 
> > 
> > > > But bear in mind this very sensible as any subtle mistake means ABI breakage.
> > 
> > Which immediately raises a question of test coverage. Do we have one? If not,
> > this code must be accompanied with one.
> 
> Agreed. Will see to have tests for v7.

libiio now has an emulator backend. Maybe something that can be used to
test this. But ideally we can have some kunit for validation.

- Nuno Sá

> 
> > > Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> Kind regards,
> 
> Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v8 13/46] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2
From: Fuad Tabba @ 2026-06-19  9:25 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-13-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce base support for KVM_SET_MEMORY_ATTRIBUTES2 in guest_memfd, which
> just updates attributes tracked by guest_memfd.
>
> Validate input fields in general. Guard usage of KVM_SET_MEMORY_ATTRIBUTES2
> by making sure requested attributes are supported for this instance of kvm.
>
> A new KVM_SET_MEMORY_ATTRIBUTES2 is defined to support writes (unlike
> KVM_SET_MEMORY_ATTRIBUTES) in addition to reads so it can provide error
> details to userspace. This will be used in a later patch.
>
> The two ioctls use their corresponding structs with no overlap, but
> backward compatibility is baked in for future support of
> KVM_SET_MEMORY_ATTRIBUTES2 and struct kvm_memory_attributes2 in the VM
> ioctl.
>
> The process of setting memory attributes is set up such that the later half
> will not fail due to allocation. Any necessary checks are performed before
> the point of no return.
>
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> Co-developed-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Sean Christoperson <seanjc@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Note sure if it's user error on my part, if I'm applying this to the
wrong base, but I found a build break here on patch 13:
kvm_gmem_invalidate_start() doesn't exist in the base tree. The
function is kvm_gmem_invalidate_begin() here. The rename
(190cc5370a8b6) landed via a different merge path and isn't an
ancestor of the stated base.

Patches 19 and 20 have the same mismatch. Fix for all three is
s/kvm_gmem_invalidate_start/kvm_gmem_invalidate_begin/.

Cheers,
/fuad

> ---
>  include/uapi/linux/kvm.h |  13 ++++++
>  virt/kvm/Kconfig         |   1 +
>  virt/kvm/guest_memfd.c   | 116 +++++++++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  12 +++++
>  4 files changed, 142 insertions(+)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 419011097fa8e..956877a6aab05 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1649,6 +1649,19 @@ struct kvm_memory_attributes {
>         __u64 flags;
>  };
>
> +#define KVM_SET_MEMORY_ATTRIBUTES2              _IOWR(KVMIO,  0xd2, struct kvm_memory_attributes2)
> +
> +struct kvm_memory_attributes2 {
> +       union {
> +               __u64 address;
> +               __u64 offset;
> +       };
> +       __u64 size;
> +       __u64 attributes;
> +       __u64 flags;
> +       __u64 reserved[12];
> +};
> +
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>
>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 297e4399fbd49..cfa2c78ba5fb9 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -102,6 +102,7 @@ config KVM_MMU_LOCKLESS_AGING
>
>  config KVM_GUEST_MEMFD
>         select XARRAY_MULTI
> +       select KVM_MEMORY_ATTRIBUTES
>         bool
>
>  config HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 65ce795c090d9..0d14548c1ed22 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -541,11 +541,127 @@ bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_is_private);
>
> +/*
> + * Preallocate memory for attributes to be stored on a maple tree, pointed to
> + * by mas.  Adjacent ranges with attributes identical to the new attributes
> + * will be merged.  Also sets mas's bounds up for storing attributes.
> + *
> + * This maintains the invariant that ranges with the same attributes will
> + * always be merged.
> + */
> +static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> +                                   pgoff_t start, size_t nr_pages)
> +{
> +       pgoff_t end = start + nr_pages;
> +       pgoff_t last = end - 1;
> +       void *entry;
> +
> +       /* Try extending range. entry is NULL on overflow/wrap-around. */
> +       mas_set(mas, end);
> +       entry = mas_find(mas, end);
> +       if (entry && xa_to_value(entry) == attributes)
> +               last = mas->last;
> +
> +       if (start > 0) {
> +               mas_set(mas, start - 1);
> +               entry = mas_find(mas, start - 1);
> +               if (entry && xa_to_value(entry) == attributes)
> +                       start = mas->index;
> +       }
> +
> +       mas_set_range(mas, start, last);
> +       return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> +}
> +
> +static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> +                                    size_t nr_pages, uint64_t attrs)
> +{
> +       struct address_space *mapping = inode->i_mapping;
> +       struct gmem_inode *gi = GMEM_I(inode);
> +       pgoff_t end = start + nr_pages;
> +       struct maple_tree *mt;
> +       struct ma_state mas;
> +       int r;
> +
> +       mt = &gi->attributes;
> +
> +       filemap_invalidate_lock(mapping);
> +
> +       mas_init(&mas, mt, start);
> +       r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> +       if (r)
> +               goto out;
> +
> +       /*
> +        * From this point on guest_memfd has performed necessary
> +        * checks and can proceed to do guest-breaking changes.
> +        */
> +
> +       kvm_gmem_invalidate_start(inode, start, end);
> +       mas_store_prealloc(&mas, xa_mk_value(attrs));
> +       kvm_gmem_invalidate_end(inode, start, end);
> +out:
> +       filemap_invalidate_unlock(mapping);
> +       return r;
> +}
> +
> +static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> +{
> +       struct gmem_file *f = file->private_data;
> +       struct inode *inode = file_inode(file);
> +       struct kvm_memory_attributes2 attrs;
> +       size_t nr_pages;
> +       pgoff_t index;
> +       int i;
> +
> +       if (copy_from_user(&attrs, argp, sizeof(attrs)))
> +               return -EFAULT;
> +
> +       if (attrs.flags)
> +               return -EINVAL;
> +       for (i = 0; i < ARRAY_SIZE(attrs.reserved); i++) {
> +               if (attrs.reserved[i])
> +                       return -EINVAL;
> +       }
> +       if (!kvm_arch_has_private_mem(f->kvm))
> +               return -EINVAL;
> +       if (attrs.attributes & ~KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +               return -EINVAL;
> +       if (attrs.size == 0 || attrs.offset + attrs.size < attrs.offset)
> +               return -EINVAL;
> +       if (!PAGE_ALIGNED(attrs.offset) || !PAGE_ALIGNED(attrs.size))
> +               return -EINVAL;
> +
> +       if (attrs.offset >= i_size_read(inode) ||
> +           attrs.offset + attrs.size > i_size_read(inode))
> +               return -EINVAL;
> +
> +       nr_pages = attrs.size >> PAGE_SHIFT;
> +       index = attrs.offset >> PAGE_SHIFT;
> +       return __kvm_gmem_set_attributes(inode, index, nr_pages,
> +                                        attrs.attributes);
> +}
> +
> +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
> +                          unsigned long arg)
> +{
> +       switch (ioctl) {
> +       case KVM_SET_MEMORY_ATTRIBUTES2:
> +               if (!gmem_in_place_conversion)
> +                       return -ENOTTY;
> +
> +               return kvm_gmem_set_attributes(file, (void __user *)arg);
> +       default:
> +               return -ENOTTY;
> +       }
> +}
> +
>  static struct file_operations kvm_gmem_fops = {
>         .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
>         .release        = kvm_gmem_release,
>         .fallocate      = kvm_gmem_fallocate,
> +       .unlocked_ioctl = kvm_gmem_ioctl,
>  };
>
>  static int kvm_gmem_migrate_folio(struct address_space *mapping,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 01761f6e25d25..a08b518cdb175 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -105,6 +105,18 @@ module_param(allow_unsafe_mappings, bool, 0444);
>  bool __ro_after_init gmem_in_place_conversion = false;
>  #endif
>
> +#define MEMORY_ATTRIBUTES_MATCH(one, two)                              \
> +       static_assert(offsetof(struct kvm_memory_attributes, one) ==    \
> +                     offsetof(struct kvm_memory_attributes2, two));    \
> +       static_assert(sizeof_field(struct kvm_memory_attributes, one) ==\
> +                     sizeof_field(struct kvm_memory_attributes2, two))
> +
> +/* Ensure the common parts of the two structs are identical. */
> +MEMORY_ATTRIBUTES_MATCH(address, address);
> +MEMORY_ATTRIBUTES_MATCH(size, size);
> +MEMORY_ATTRIBUTES_MATCH(attributes, attributes);
> +MEMORY_ATTRIBUTES_MATCH(flags, flags);
> +
>  /*
>   * Ordering of locks:
>   *
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v4 0/2] cpufreq: CPPC: add autonomous mode boot parameter support
From: Pierre Gondois @ 2026-06-19  9:29 UTC (permalink / raw)
  To: Viresh Kumar, Sumit Gupta
  Cc: rafael, ionela.voinescu, zhenglifeng1, zhanjie9, corbet, skhan,
	rdunlap, mario.limonciello, linux-pm, linux-doc, linux-kernel,
	linux-tegra, treding, jonathanh, vsethi, ksitaraman, sanjayc,
	mochs, bbasu
In-Reply-To: <oxw5k2wad4vorehgmrduoxblequy3ynqufwy4sruclnh5d5wrb@awzmfafoucnn>


On 6/18/26 07:28, Viresh Kumar wrote:
> On 16-06-26, 18:22, Sumit Gupta wrote:
>> The dependency it was waiting on, the "cpufreq: Set policy->min and
>> max as real QoS constraints" series, is now in linux-pm (linux-next).
>> I rebased on top and verified autonomous mode works as expected, and
>> it applies cleanly on the current linux-next.
>>
>> The [1] reference in patch 2/2 points to v2 of that series; the merged
>> version is v3 [2].
>>
>> If there are no further comments, please consider acking and queuing
>> this for the next cycle.
> I was waiting for CPPC reviewers to provide some feedback.i
>
> Jie / Lifeng / Pierre ?
>
I think the patchset has the same issue described at:

https://lore.kernel.org/all/86780f97-29ee-4a72-b311-38c89434b707@arm.com/

I don't know if this is important to other persons,
but IMO it would be preferable to have a solution to this issue
before adding more functionalities relying on registers that are left
in an unknown state.

If there are any other opinion ?


^ permalink raw reply

* [PATCH 0/5] hwmon: add Altera Stratix 10 SoC FPGA hardware  monitor support
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc

From: Tze Yee Ng <tze.yee.ng@altera.com>

This series adds hardware monitor support for Altera Stratix 10 SoC FPGA
devices. Temperature and voltage sensors are accessed through the 
Stratix 10 service layer and Secure Device Manager.

Patch 1 adds the device tree binding for the hwmon node and sensor layout.
Patch 2 extends the Stratix 10 service layer binding with an optional
hwmon child node. Patch 3 adds async HWMON read commands to the service
firmware driver. Patch 4 adds the hwmon driver, using the async service
interface when available and falling back to synchronous reads otherwise.
Patch 5 enables the hwmon node on the Stratix 10 SoCDK.

Tze Yee Ng (5):
  dt-bindings: hwmon: add Altera Stratix 10 hardware monitor binding
  dt-bindings: firmware: svc: add hwmon property
  firmware: stratix10-svc: add async HWMON read commands
  hwmon: add Stratix 10 SoC FPGA hardware monitor driver
  arm64: dts: socfpga: stratix10: add hwmon node

 .../firmware/intel,stratix10-svc.yaml         |   4 +
 .../bindings/hwmon/altr,stratix10-hwmon.yaml  | 164 +++++
 Documentation/hwmon/index.rst                 |   1 +
 Documentation/hwmon/stratix10-hwmon.rst       |  31 +
 MAINTAINERS                                   |   9 +
 .../boot/dts/altera/socfpga_stratix10.dtsi    |   5 +
 .../dts/altera/socfpga_stratix10_socdk.dts    |  33 +
 drivers/firmware/stratix10-svc.c              |  12 +
 drivers/hwmon/Kconfig                         |  10 +
 drivers/hwmon/Makefile                        |   1 +
 drivers/hwmon/stratix10-hwmon.c               | 575 ++++++++++++++++++
 include/linux/firmware/intel/stratix10-smc.h  |  38 ++
 12 files changed, 883 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
 create mode 100644 Documentation/hwmon/stratix10-hwmon.rst
 create mode 100644 drivers/hwmon/stratix10-hwmon.c

-- 
2.43.7


^ permalink raw reply

* [PATCH 1/5] dt-bindings: hwmon: add Altera Stratix 10 hardware monitor binding
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Document the device tree binding for the Altera Stratix 10 SoC FPGA
hardware monitor, including temperature and voltage sensor nodes.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../bindings/hwmon/altr,stratix10-hwmon.yaml  | 164 ++++++++++++++++++
 MAINTAINERS                                   |   7 +
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml

diff --git a/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml b/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
new file mode 100644
index 000000000000..5bd98660ee7b
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
@@ -0,0 +1,164 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/hwmon/altr,stratix10-hwmon.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Altera Hardware Monitor for Stratix 10 SoC FPGA
+
+maintainers:
+  - Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+  - Tze Yee Ng <tze.yee.ng@altera.com>
+
+description: |
+  The Altera Stratix 10 SoC FPGA hardware monitor unit provides on-chip
+  voltage and temperature sensors. These sensors can be used to monitor
+  external voltages and on-chip operating conditions such as internal
+  power rails and on-chip junction temperatures.
+
+  The specific sensor configuration varies by device. Check the device
+  documentation to verify which sensors are available.
+
+  Stratix 10 voltage sensors:
+
+    page 0, channel 2 = 0.8V VCC
+    page 0, channel 3 = 1.8V VCCIO_SDM
+    page 0, channel 6 = 0.9V VCCERAM
+
+  Stratix 10 temperature sensors:
+
+    page 0, channel 0 = main die
+    page 0, channel 1 = tile bottom left
+    page 0, channel 2 = tile middle left
+    page 0, channel 3 = tile top left
+    page 0, channel 4 = tile bottom right
+    page 0, channel 5 = tile middle right
+    page 0, channel 6 = tile top right
+    page 0, channel 7 = hbm2 bottom
+    page 0, channel 8 = hbm2 top
+
+properties:
+  compatible:
+    const: altr,stratix10-hwmon
+
+  temperature:
+    description:
+      The temperature node specifies mappings of temperature sensor diodes on
+      the Stratix 10 SoC FPGA main die and tile die.
+    type: object
+    properties:
+      '#address-cells':
+        const: 1
+      '#size-cells':
+        const: 0
+    patternProperties:
+      "^input(@[0-9a-f]+)?$":
+        description:
+          The input node specifies each individual temperature sensor.
+        type: object
+        properties:
+          reg:
+            description:
+              Sensor channel index in the lower 16-bits (0-15). For temperature
+              sensors, the page number is encoded in the upper 16-bits.
+              The driver encodes the SMC request argument as a channel
+              bitmask (1 << channel) in bits 0..15, with the page number
+              placed in bits 16..31. Channel values >= 16 are rejected to
+              avoid overlap with the page field. For example, reg = <2>
+              selects channel 2 and the driver passes 0x4 to the service layer.
+          label:
+            description:
+              A descriptive name for this channel (e.g. "Main Die" or
+              "Tile Bottom Left").
+        required:
+          - reg
+        additionalProperties: false
+    required:
+      - '#address-cells'
+      - '#size-cells'
+    additionalProperties: false
+
+  voltage:
+    description:
+      The voltage node specifies mappings of voltage sensors on the Stratix 10
+      SoC FPGA analog to digital converter of the Secure Device Manager (SDM).
+    type: object
+    properties:
+      '#address-cells':
+        const: 1
+      '#size-cells':
+        const: 0
+    patternProperties:
+      "^input(@[0-9a-f]+)?$":
+        description:
+          The input node specifies each individual voltage sensor.
+        type: object
+        properties:
+          reg:
+            description:
+              Sensor channel index in the lower 16-bits (0-15). The driver
+              encodes the SMC request argument as a channel bitmask
+              (1 << channel). For example, reg = <2> selects channel 2 and
+              the driver passes 0x4 to the service layer.
+          label:
+            description:
+              A descriptive name for this channel (e.g. "0.8V VCC" or
+              "1.8V VCCIO_SDM").
+        required:
+          - reg
+        additionalProperties: false
+    required:
+      - '#address-cells'
+      - '#size-cells'
+    additionalProperties: false
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+    hwmon {
+      compatible = "altr,stratix10-hwmon";
+
+      voltage {
+        #address-cells = <1>;
+        #size-cells = <0>;
+
+        input@2 {
+          label = "0.8V VCC";
+          reg = <2>;
+        };
+
+        input@3 {
+          label = "1.8V VCCIO_SDM";
+          reg = <3>;
+        };
+
+        input@6 {
+          label = "0.9V VCCERAM";
+          reg = <6>;
+        };
+      };
+
+      temperature {
+        #address-cells = <1>;
+        #size-cells = <0>;
+
+        input@0 {
+          label = "Main Die";
+          reg = <0>;
+        };
+
+        input@1 {
+          label = "Tile Bottom Left";
+          reg = <1>;
+        };
+
+        input@2 {
+          label = "Tile Middle Left";
+          reg = <2>;
+        };
+      };
+    };
diff --git a/MAINTAINERS b/MAINTAINERS
index 6aa3fe2ee1bb..678f6c429627 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -937,6 +937,13 @@ ALPS PS/2 TOUCHPAD DRIVER
 R:	Pali Rohár <pali@kernel.org>
 F:	drivers/input/mouse/alps.*
 
+ALTERA STRATIX 10 SoC FPGA HWMON DRIVER
+M:	Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+M:	Tze Yee Ng <tze.yee.ng@altera.com>
+L:	linux-hwmon@vger.kernel.org
+S:	Maintained
+F:	Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
+
 ALTERA MAILBOX DRIVER
 M:	Tien Sung Ang <tiensung.ang@altera.com>
 S:	Maintained
-- 
2.43.7


^ permalink raw reply related

* [PATCH 2/5] dt-bindings: firmware: svc: add hwmon property
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Altera Stratix 10 SoCFPGA supports hardware monitor access through the
service layer mailbox. Add an optional hwmon child node to the service
layer binding so device trees can describe the hardware monitor.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../devicetree/bindings/firmware/intel,stratix10-svc.yaml     | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml b/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
index b42cfa78b28b..86ffdb10132f 100644
--- a/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
+++ b/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
@@ -62,6 +62,10 @@ properties:
     $ref: /schemas/fpga/intel,stratix10-soc-fpga-mgr.yaml
     description: Optional child node for fpga manager to perform fabric configuration.
 
+  hwmon:
+    $ref: /schemas/hwmon/altr,stratix10-hwmon.yaml
+    description: Optional child node for Stratix 10 hardware monitor.
+
 required:
   - compatible
   - method
-- 
2.43.7


^ permalink raw reply related

* [PATCH 3/5] firmware: stratix10-svc: add async HWMON read commands
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add asynchronous Stratix 10 service layer support for hardware monitor
temperature and voltage read commands in stratix10_svc_async_send() and
stratix10_svc_async_prepare_response().

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 drivers/firmware/stratix10-svc.c             | 12 +++++++
 include/linux/firmware/intel/stratix10-smc.h | 38 ++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/firmware/stratix10-svc.c b/drivers/firmware/stratix10-svc.c
index e9e35d67ef96..2cfdac31402c 100644
--- a/drivers/firmware/stratix10-svc.c
+++ b/drivers/firmware/stratix10-svc.c
@@ -1311,6 +1311,14 @@ int stratix10_svc_async_send(struct stratix10_svc_chan *chan, void *msg,
 		args.a0 = INTEL_SIP_SMC_ASYNC_RSU_NOTIFY;
 		args.a2 = p_msg->arg[0];
 		break;
+	case COMMAND_HWMON_READTEMP:
+		args.a0 = INTEL_SIP_SMC_ASYNC_HWMON_READTEMP;
+		args.a2 = p_msg->arg[0];
+		break;
+	case COMMAND_HWMON_READVOLT:
+		args.a0 = INTEL_SIP_SMC_ASYNC_HWMON_READVOLT;
+		args.a2 = p_msg->arg[0];
+		break;
 	default:
 		dev_err(ctrl->dev, "Invalid command ,%d\n", p_msg->command);
 		ret = -EINVAL;
@@ -1404,6 +1412,10 @@ static int stratix10_svc_async_prepare_response(struct stratix10_svc_chan *chan,
 		 */
 		data->kaddr1 = (void *)&handle->res;
 		break;
+	case COMMAND_HWMON_READTEMP:
+	case COMMAND_HWMON_READVOLT:
+		data->kaddr1 = (void *)&handle->res.a2;
+		break;
 
 	default:
 		dev_alert(ctrl->dev, "Invalid command\n ,%d", p_msg->command);
diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h
index 935dba3633b5..4eb3a6e9659d 100644
--- a/include/linux/firmware/intel/stratix10-smc.h
+++ b/include/linux/firmware/intel/stratix10-smc.h
@@ -680,6 +680,44 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE)
 #define INTEL_SIP_SMC_ASYNC_POLL \
 	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_POLL)
 
+/**
+ * Request INTEL_SIP_SMC_ASYNC_HWMON_READTEMP
+ * Async call to request temperature
+ *
+ * Call register usage:
+ * a0 INTEL_SIP_SMC_ASYNC_HWMON_READTEMP
+ * a1 transaction job id
+ * a2 Temperature Channel
+ * a3-a17 not used
+ *
+ * Return status
+ * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_REJECTED
+ * or INTEL_SIP_SMC_STATUS_BUSY
+ * a1-a17 not used
+ */
+#define INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READTEMP	0xE8
+#define INTEL_SIP_SMC_ASYNC_HWMON_READTEMP \
+	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READTEMP)
+
+/**
+ * Request INTEL_SIP_SMC_ASYNC_HWMON_READVOLT
+ * Async call to request voltage
+ *
+ * Call register usage:
+ * a0 INTEL_SIP_SMC_ASYNC_HWMON_READVOLT
+ * a1 transaction job id
+ * a2 Voltage Channel
+ * a3-a17 not used
+ *
+ * Return status
+ * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_REJECTED
+ * or INTEL_SIP_SMC_STATUS_BUSY
+ * a1-a17 not used
+ */
+#define INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READVOLT	0xE9
+#define INTEL_SIP_SMC_ASYNC_HWMON_READVOLT \
+	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READVOLT)
+
 /**
  * Request INTEL_SIP_SMC_ASYNC_RSU_GET_SPT
  * Async call to get RSU SPT from SDM.
-- 
2.43.7


^ permalink raw reply related

* [PATCH 4/5] hwmon: add Stratix 10 SoC FPGA hardware monitor driver
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add a hardware monitoring driver for Altera Stratix 10 SoC FPGA devices
that reads temperature and voltage sensors through the Stratix 10 service
layer. Use the asynchronous service layer interface when available, with
a synchronous fallback.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 Documentation/hwmon/index.rst           |   1 +
 Documentation/hwmon/stratix10-hwmon.rst |  31 ++
 MAINTAINERS                             |   2 +
 drivers/hwmon/Kconfig                   |  10 +
 drivers/hwmon/Makefile                  |   1 +
 drivers/hwmon/stratix10-hwmon.c         | 575 ++++++++++++++++++++++++
 6 files changed, 620 insertions(+)
 create mode 100644 Documentation/hwmon/stratix10-hwmon.rst
 create mode 100644 drivers/hwmon/stratix10-hwmon.c

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8b655e5d6b68..30f533301903 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -244,6 +244,7 @@ Hardware Monitoring Kernel Drivers
    sparx5-temp
    spd5118
    stpddc60
+   stratix10-hwmon
    surface_fan
    sy7636a-hwmon
    tc654
diff --git a/Documentation/hwmon/stratix10-hwmon.rst b/Documentation/hwmon/stratix10-hwmon.rst
new file mode 100644
index 000000000000..61b682fe177a
--- /dev/null
+++ b/Documentation/hwmon/stratix10-hwmon.rst
@@ -0,0 +1,31 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver stratix10-hwmon
+=============================
+
+Supported chips:
+
+ * Altera Stratix 10 SoC FPGA
+
+Authors:
+      - Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+      - Tze Yee Ng <tze.yee.ng@altera.com>
+
+Description
+-----------
+
+This driver supports hardware monitoring for Altera Stratix 10 SoC FPGA
+devices through the Secure Device Manager and Stratix 10 service layer.
+
+The following sensor types are supported:
+
+  * temperature
+  * voltage
+
+Usage Notes
+-----------
+
+The driver relies on a device tree node to enumerate sensors present on the
+specific device. See
+Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml for details
+of the device-tree node.
diff --git a/MAINTAINERS b/MAINTAINERS
index 678f6c429627..5afdf286f8f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -943,6 +943,8 @@ M:	Tze Yee Ng <tze.yee.ng@altera.com>
 L:	linux-hwmon@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
+F:	Documentation/hwmon/stratix10-hwmon.rst
+F:	drivers/hwmon/stratix10-hwmon.c
 
 ALTERA MAILBOX DRIVER
 M:	Tien Sung Ang <tiensung.ang@altera.com>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 14e4cea48acc..8eff1c71a226 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -2112,6 +2112,16 @@ config SENSORS_SMSC47M192
 	  This driver can also be built as a module. If so, the module
 	  will be called smsc47m192.
 
+config SENSORS_ALTERA_SOCFPGA_STRATIX10
+	tristate "Altera SoC FPGA Stratix 10 hardware monitoring features"
+	depends on INTEL_STRATIX10_SERVICE
+	help
+	  If you say yes here you get support for the temperature and
+	  voltage sensors of Altera SoC FPGA Stratix 10 devices.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called stratix10-hwmon.
+
 config SENSORS_SMSC47B397
 	tristate "SMSC LPC47B397-NC"
 	depends on HAS_IOPORT
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 982ee2c6f9de..7e643de0e7d4 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -217,6 +217,7 @@ obj-$(CONFIG_SENSORS_SMPRO)	+= smpro-hwmon.o
 obj-$(CONFIG_SENSORS_SMSC47B397)+= smsc47b397.o
 obj-$(CONFIG_SENSORS_SMSC47M1)	+= smsc47m1.o
 obj-$(CONFIG_SENSORS_SMSC47M192)+= smsc47m192.o
+obj-$(CONFIG_SENSORS_ALTERA_SOCFPGA_STRATIX10)	+= stratix10-hwmon.o
 obj-$(CONFIG_SENSORS_SPARX5)	+= sparx5-temp.o
 obj-$(CONFIG_SENSORS_SPD5118)	+= spd5118.o
 obj-$(CONFIG_SENSORS_STTS751)	+= stts751.o
diff --git a/drivers/hwmon/stratix10-hwmon.c b/drivers/hwmon/stratix10-hwmon.c
new file mode 100644
index 000000000000..7ed1116e57b8
--- /dev/null
+++ b/drivers/hwmon/stratix10-hwmon.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Altera Stratix 10 SoC FPGA hardware monitoring driver
+ *
+ * Copyright (c) 2026 Altera Corporation
+ *
+ * Authors:
+ *	Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+ *	Tze Yee Ng <tze.yee.ng@altera.com>
+ */
+
+#include <linux/bitops.h>
+#include <linux/cleanup.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/firmware/intel/stratix10-svc-client.h>
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+#define HWMON_TIMEOUT			msecs_to_jiffies(SVC_HWMON_REQUEST_TIMEOUT_MS)
+#define HWMON_RETRY_SLEEP_MS		1U
+#define HWMON_ASYNC_MSG_RETRY		3U
+#define STRATIX10_HWMON_MAXSENSORS	16
+#define STRATIX10_HWMON_TEMPERATURE	"temperature"
+#define STRATIX10_HWMON_VOLTAGE		"voltage"
+#define STRATIX10_HWMON_CHANNEL_MASK	GENMASK(15, 0)
+#define STRATIX10_HWMON_PAGE_SHIFT	16
+#define STRATIX10_HWMON_ATTR_VISIBLE	0444
+/* Temperature from SDM is signed Q8.8 millidegrees Celsius (8 fractional bits). */
+#define STRATIX10_HWMON_TEMP_FRAC_BITS	8
+#define STRATIX10_HWMON_TEMP_FRAC_DIV	BIT(STRATIX10_HWMON_TEMP_FRAC_BITS)
+/* Voltage from SDM is unsigned Q16 (millivolts, 16 fractional bits). */
+#define STRATIX10_HWMON_VOLT_FRAC_BITS	16
+#define STRATIX10_HWMON_VOLT_FRAC_DIV	BIT(STRATIX10_HWMON_VOLT_FRAC_BITS)
+
+#define ETEMP_INACTIVE			0x80000000U
+#define ETEMP_TOO_OLD			0x80000001U
+#define ETEMP_NOT_PRESENT		0x80000002U
+#define ETEMP_TIMEOUT			0x80000003U
+#define ETEMP_CORRUPT			0x80000004U
+#define ETEMP_BUSY			0x80000005U
+#define ETEMP_NOT_INITIALIZED		0x800000FFU
+
+struct stratix10_hwmon_priv {
+	struct stratix10_svc_chan *chan;
+	struct stratix10_svc_client client;
+	struct completion completion;
+	struct mutex lock;	/* protect SVC calls */
+	bool async;
+	u32 temperature;
+	u32 voltage;
+	int temperature_channels;
+	int voltage_channels;
+	const char *temp_chan_names[STRATIX10_HWMON_MAXSENSORS];
+	const char *volt_chan_names[STRATIX10_HWMON_MAXSENSORS];
+	u32 temp_chan[STRATIX10_HWMON_MAXSENSORS];
+	u32 volt_chan[STRATIX10_HWMON_MAXSENSORS];
+};
+
+static umode_t stratix10_hwmon_is_visible(const void *dev,
+					  enum hwmon_sensor_types type,
+					 u32 attr, int chan)
+{
+	const struct stratix10_hwmon_priv *priv = dev;
+
+	switch (type) {
+	case hwmon_temp:
+		if (chan < priv->temperature_channels)
+			return STRATIX10_HWMON_ATTR_VISIBLE;
+		return 0;
+	case hwmon_in:
+		if (chan < priv->voltage_channels)
+			return STRATIX10_HWMON_ATTR_VISIBLE;
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+static void stratix10_hwmon_readtemp_cb(struct stratix10_svc_client *client,
+					struct stratix10_svc_cb_data *data)
+{
+	struct stratix10_hwmon_priv *priv = client->priv;
+
+	if (data->status == BIT(SVC_STATUS_OK)) {
+		priv->temperature = (u32)*(unsigned long *)data->kaddr1;
+	} else if (data->kaddr1) {
+		dev_err(client->dev, "%s failed with status 0x%x, value 0x%lx\n",
+			__func__, data->status,
+			*(unsigned long *)data->kaddr1);
+	} else {
+		dev_err(client->dev, "%s failed with status 0x%x\n",
+			__func__, data->status);
+	}
+
+	complete(&priv->completion);
+}
+
+static void stratix10_hwmon_readvolt_cb(struct stratix10_svc_client *client,
+					struct stratix10_svc_cb_data *data)
+{
+	struct stratix10_hwmon_priv *priv = client->priv;
+
+	if (data->status == BIT(SVC_STATUS_OK)) {
+		priv->voltage = (u32)*(unsigned long *)data->kaddr1;
+	} else if (data->kaddr1) {
+		dev_err(client->dev, "%s failed with status 0x%x, value 0x%lx\n",
+			__func__, data->status,
+			*(unsigned long *)data->kaddr1);
+	} else {
+		dev_err(client->dev, "%s failed with status 0x%x\n",
+			__func__, data->status);
+	}
+
+	complete(&priv->completion);
+}
+
+static void stratix10_hwmon_async_callback(void *ptr)
+{
+	if (ptr)
+		complete(ptr);
+}
+
+static int stratix10_hwmon_parse_temp(long *val, u32 temperature)
+{
+	switch (temperature) {
+	case ETEMP_INACTIVE:
+	case ETEMP_NOT_PRESENT:
+	case ETEMP_CORRUPT:
+	case ETEMP_NOT_INITIALIZED:
+		return -EOPNOTSUPP;
+	case ETEMP_TIMEOUT:
+	case ETEMP_BUSY:
+	case ETEMP_TOO_OLD:
+		return -EAGAIN;
+	default:
+		/* Convert Q8.8 millidegrees Celsius to millidegrees for hwmon. */
+		*val = (long)(s32)temperature / STRATIX10_HWMON_TEMP_FRAC_DIV;
+		return 0;
+	}
+}
+
+static int stratix10_hwmon_encode_temp_arg(u32 reg, u64 *arg)
+{
+	u32 page = (reg >> STRATIX10_HWMON_PAGE_SHIFT) & STRATIX10_HWMON_CHANNEL_MASK;
+	u32 channel = reg & STRATIX10_HWMON_CHANNEL_MASK;
+
+	if (channel >= STRATIX10_HWMON_MAXSENSORS)
+		return -EINVAL;
+
+	*arg = (1ULL << channel) | ((u64)page << STRATIX10_HWMON_PAGE_SHIFT);
+	return 0;
+}
+
+static int stratix10_hwmon_encode_volt_arg(u32 reg, u64 *arg)
+{
+	u32 channel = reg & STRATIX10_HWMON_CHANNEL_MASK;
+
+	if (channel >= STRATIX10_HWMON_MAXSENSORS)
+		return -EINVAL;
+
+	*arg = 1ULL << channel;
+	return 0;
+}
+
+static int stratix10_hwmon_async_read(struct device *dev,
+				      enum hwmon_sensor_types type,
+				     struct stratix10_svc_client_msg *msg)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	struct stratix10_svc_cb_data data = {};
+	struct completion completion;
+	unsigned long wait_ret;
+	void *handle = NULL;
+	int status, index, ret;
+
+	init_completion(&completion);
+
+	for (index = 0; index < HWMON_ASYNC_MSG_RETRY; index++) {
+		status = stratix10_svc_async_send(priv->chan, msg, &handle,
+						  stratix10_hwmon_async_callback,
+						  &completion);
+		if (status == 0)
+			break;
+		dev_warn(dev, "Failed to send async message\n");
+		msleep(HWMON_RETRY_SLEEP_MS);
+	}
+
+	if (status && !handle)
+		return status;
+
+	wait_ret = wait_for_completion_io_timeout(&completion, HWMON_TIMEOUT);
+	if (wait_ret > 0)
+		dev_dbg(dev, "Received async interrupt\n");
+	else if (wait_ret == 0)
+		dev_dbg(dev, "Timeout occurred, trying to poll the response\n");
+
+	ret = -ETIMEDOUT;
+	for (index = 0; index < HWMON_ASYNC_MSG_RETRY; index++) {
+		status = stratix10_svc_async_poll(priv->chan, handle, &data);
+		if (status == -EAGAIN) {
+			dev_dbg(dev, "Async message is still in progress\n");
+		} else if (status < 0) {
+			dev_alert(dev, "Failed to poll async message: %d\n", status);
+			ret = status;
+			break;
+		} else if (status == 0) {
+			ret = 0;
+			break;
+		}
+		msleep(HWMON_RETRY_SLEEP_MS);
+	}
+
+	if (ret) {
+		dev_err(dev, "Failed to get async response\n");
+		goto done;
+	}
+
+	if (data.status) {
+		dev_err(dev, "%s returned 0x%x from SDM\n", __func__,
+			data.status);
+		ret = -EFAULT;
+		goto done;
+	}
+
+	if (type == hwmon_temp)
+		priv->temperature = (u32)*(unsigned long *)data.kaddr1;
+	else
+		priv->voltage = (u32)*(unsigned long *)data.kaddr1;
+
+	ret = 0;
+
+done:
+	stratix10_svc_async_done(priv->chan, handle);
+	return ret;
+}
+
+static int stratix10_hwmon_sync_read(struct device *dev,
+				     enum hwmon_sensor_types type,
+				    struct stratix10_svc_client_msg *msg)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	int ret;
+
+	reinit_completion(&priv->completion);
+
+	if (type == hwmon_temp)
+		priv->client.receive_cb = stratix10_hwmon_readtemp_cb;
+	else
+		priv->client.receive_cb = stratix10_hwmon_readvolt_cb;
+
+	ret = stratix10_svc_send(priv->chan, msg);
+	if (ret < 0)
+		goto status_done;
+
+	ret = wait_for_completion_interruptible_timeout(&priv->completion,
+							HWMON_TIMEOUT);
+	if (!ret) {
+		dev_err(priv->client.dev, "timeout waiting for SMC call\n");
+		ret = -ETIMEDOUT;
+		goto status_done;
+	}
+	if (ret < 0) {
+		dev_err(priv->client.dev, "error %d waiting for SMC call\n", ret);
+		goto status_done;
+	}
+
+	ret = 0;
+
+status_done:
+	stratix10_svc_done(priv->chan);
+	return ret;
+}
+
+static int stratix10_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+				u32 attr, int chan, long *val)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	struct stratix10_svc_client_msg msg = {0};
+	int ret;
+
+	if (chan >= STRATIX10_HWMON_MAXSENSORS)
+		return -EOPNOTSUPP;
+
+	switch (type) {
+	case hwmon_temp:
+		ret = stratix10_hwmon_encode_temp_arg(priv->temp_chan[chan],
+						      &msg.arg[0]);
+		if (ret)
+			return ret;
+		msg.command = COMMAND_HWMON_READTEMP;
+		break;
+	case hwmon_in:
+		ret = stratix10_hwmon_encode_volt_arg(priv->volt_chan[chan],
+						      &msg.arg[0]);
+		if (ret)
+			return ret;
+		msg.command = COMMAND_HWMON_READVOLT;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	guard(mutex)(&priv->lock);
+	if (priv->async)
+		ret = stratix10_hwmon_async_read(dev, type, &msg);
+	else
+		ret = stratix10_hwmon_sync_read(dev, type, &msg);
+	if (ret)
+		return ret;
+
+	if (type == hwmon_temp)
+		ret = stratix10_hwmon_parse_temp(val, priv->temperature);
+	else
+		/* Convert Q16 millivolts to millivolts for hwmon. */
+		*val = (long)priv->voltage / STRATIX10_HWMON_VOLT_FRAC_DIV;
+	return ret;
+}
+
+static int stratix10_hwmon_read_string(struct device *dev,
+				       enum hwmon_sensor_types type, u32 attr,
+				      int chan, const char **str)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+
+	switch (type) {
+	case hwmon_in:
+		*str = priv->volt_chan_names[chan];
+		return 0;
+	case hwmon_temp:
+		*str = priv->temp_chan_names[chan];
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static const struct hwmon_ops stratix10_hwmon_ops = {
+	.is_visible = stratix10_hwmon_is_visible,
+	.read = stratix10_hwmon_read,
+	.read_string = stratix10_hwmon_read_string,
+};
+
+static const struct hwmon_channel_info *stratix10_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(temp,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL),
+	HWMON_CHANNEL_INFO(in,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL),
+	NULL
+};
+
+static const struct hwmon_chip_info stratix10_hwmon_chip_info = {
+	.ops = &stratix10_hwmon_ops,
+	.info = stratix10_hwmon_info,
+};
+
+static int stratix10_hwmon_add_channel(struct device *dev, const char *type,
+				       u32 val, const char *label,
+				      struct stratix10_hwmon_priv *priv)
+{
+	if (!strcmp(type, STRATIX10_HWMON_TEMPERATURE)) {
+		if (priv->temperature_channels >= STRATIX10_HWMON_MAXSENSORS) {
+			dev_warn(dev, "Can't add temp node %s, too many channels\n",
+				 label);
+			return 0;
+		}
+
+		priv->temp_chan_names[priv->temperature_channels] = label;
+		priv->temp_chan[priv->temperature_channels] = val;
+		priv->temperature_channels++;
+		return 0;
+	}
+
+	if (!strcmp(type, STRATIX10_HWMON_VOLTAGE)) {
+		if (priv->voltage_channels >= STRATIX10_HWMON_MAXSENSORS) {
+			dev_warn(dev, "Can't add voltage node %s, too many channels\n",
+				 label);
+			return 0;
+		}
+
+		priv->volt_chan_names[priv->voltage_channels] = label;
+		priv->volt_chan[priv->voltage_channels] = val;
+		priv->voltage_channels++;
+		return 0;
+	}
+
+	dev_warn(dev, "unsupported sensor type %s\n", type);
+	return 0;
+}
+
+static int stratix10_hwmon_probe_child_from_dt(struct device *dev,
+					       struct device_node *child,
+					      struct stratix10_hwmon_priv *priv)
+{
+	struct device_node *grandchild;
+	const char *label;
+	u32 val;
+	int ret;
+
+	for_each_child_of_node(child, grandchild) {
+		ret = of_property_read_u32(grandchild, "reg", &val);
+		if (ret) {
+			dev_err(dev, "missing reg property of %pOFn\n",
+				grandchild);
+			of_node_put(grandchild);
+			return ret;
+		}
+
+		ret = of_property_read_string(grandchild, "label", &label);
+		if (ret)
+			label = grandchild->name;
+
+		stratix10_hwmon_add_channel(dev, child->name, val, label, priv);
+	}
+
+	return 0;
+}
+
+static int stratix10_hwmon_probe_from_dt(struct device *dev,
+					 struct stratix10_hwmon_priv *priv)
+{
+	struct device_node *child;
+	int ret;
+
+	if (!dev->of_node)
+		return 0;
+
+	for_each_child_of_node(dev->of_node, child) {
+		ret = stratix10_hwmon_probe_child_from_dt(dev, child, priv);
+		if (ret) {
+			of_node_put(child);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int stratix10_hwmon_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct stratix10_hwmon_priv *priv;
+	struct device *hwmon_dev;
+	int ret;
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->client.dev = dev;
+	priv->client.priv = priv;
+	init_completion(&priv->completion);
+	mutex_init(&priv->lock);
+
+	ret = stratix10_hwmon_probe_from_dt(dev, priv);
+	if (ret) {
+		dev_err(dev, "Unable to probe from device tree\n");
+		return ret;
+	}
+
+	if (!priv->temperature_channels && !priv->voltage_channels) {
+		dev_err(dev, "no temperature or voltage channels in device tree\n");
+		return -ENODEV;
+	}
+
+	priv->chan = stratix10_svc_request_channel_byname(&priv->client,
+							  SVC_CLIENT_HWMON);
+	if (IS_ERR(priv->chan)) {
+		ret = PTR_ERR(priv->chan);
+		if (ret == -EPROBE_DEFER)
+			dev_dbg(dev, "service channel %s not ready, deferring probe\n",
+				SVC_CLIENT_HWMON);
+		else
+			dev_err(dev, "couldn't get service channel %s: %d\n",
+				SVC_CLIENT_HWMON, ret);
+		return ret;
+	}
+
+	ret = stratix10_svc_add_async_client(priv->chan, false);
+	switch (ret) {
+	case 0:
+		priv->async = true;
+		break;
+	case -EINVAL:
+		dev_dbg(dev, "async operations not supported, using sync mode\n");
+		priv->async = false;
+		break;
+	default:
+		dev_err(dev, "failed to add async client: %d\n", ret);
+		stratix10_svc_free_channel(priv->chan);
+		return ret;
+	}
+
+	dev_info(dev, "Initialized %d temperature and %d voltage channels\n",
+		 priv->temperature_channels, priv->voltage_channels);
+
+	hwmon_dev = devm_hwmon_device_register_with_info(dev, "stratix10_hwmon",
+							 priv,
+							 &stratix10_hwmon_chip_info,
+							 NULL);
+	if (IS_ERR(hwmon_dev)) {
+		if (priv->async)
+			stratix10_svc_remove_async_client(priv->chan);
+		stratix10_svc_free_channel(priv->chan);
+		return PTR_ERR(hwmon_dev);
+	}
+
+	platform_set_drvdata(pdev, priv);
+	return 0;
+}
+
+static void stratix10_hwmon_remove(struct platform_device *pdev)
+{
+	struct stratix10_hwmon_priv *priv = platform_get_drvdata(pdev);
+
+	if (priv->async)
+		stratix10_svc_remove_async_client(priv->chan);
+	stratix10_svc_free_channel(priv->chan);
+}
+
+static const struct of_device_id stratix10_hwmon_of_match[] = {
+	{ .compatible = "altr,stratix10-hwmon" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, stratix10_hwmon_of_match);
+
+static struct platform_driver stratix10_hwmon_driver = {
+	.driver = {
+		.name = "stratix10-hwmon",
+		.of_match_table = stratix10_hwmon_of_match,
+	},
+	.probe = stratix10_hwmon_probe,
+	.remove = stratix10_hwmon_remove,
+};
+module_platform_driver(stratix10_hwmon_driver);
+
+MODULE_AUTHOR("Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>");
+MODULE_AUTHOR("Tze Yee Ng <tze.yee.ng@altera.com>");
+MODULE_DESCRIPTION("Altera Stratix 10 SoC FPGA hardware monitoring driver");
+MODULE_LICENSE("GPL");
-- 
2.43.7


^ permalink raw reply related

* [PATCH 5/5] arm64: dts: socfpga: stratix10: add hwmon node
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add an hwmon child node under the Stratix 10 service layer and describe
the SoCDK voltage and temperature sensors using the altr,stratix10-hwmon
compatible.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../boot/dts/altera/socfpga_stratix10.dtsi    |  5 +++
 .../dts/altera/socfpga_stratix10_socdk.dts    | 33 +++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 0d9cad0c0351..afb11e6f6813 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -78,6 +78,11 @@ svc {
 			fpga_mgr: fpga-mgr {
 				compatible = "intel,stratix10-soc-fpga-mgr";
 			};
+
+			temp_volt: hwmon {
+				compatible = "altr,stratix10-hwmon";
+				status = "disabled";
+			};
 		};
 	};
 
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
index e2a1cea7f3da..01a8ffe430ed 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
@@ -134,3 +134,36 @@ root: partition@4200000 {
 		};
 	};
 };
+
+&temp_volt {
+	status = "okay";
+
+	voltage {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		input@2 {
+			label = "0.8V VCC";
+			reg = <2>;
+		};
+
+		input@3 {
+			label = "1.8V VCCIO_SDM";
+			reg = <3>;
+		};
+
+		input@6 {
+			label = "0.9V VCCERAM";
+			reg = <6>;
+		};
+	};
+
+	temperature {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		input@0 {
+			label = "Main Die SDM";
+			reg = <0x0>;
+		};
+	};
+};
-- 
2.43.7


^ permalink raw reply related

* Re: [PATCH v8 08/46] KVM: Provide generic interface for checking memory private/shared status
From: Suzuki K Poulose @ 2026-06-19  9:57 UTC (permalink / raw)
  To: Fuad Tabba, ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTxu32nQ+vPV7Zmcw76_-V4g2_g=P_UzRnnO2dP1PFO2ww@mail.gmail.com>

On 19/06/2026 09:21, Fuad Tabba wrote:
> On Fri, 19 Jun 2026 at 09:19, Fuad Tabba <tabba@google.com> wrote:
>>
>> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
>> <devnull+ackerleytng.google.com@kernel.org> wrote:
>>>
>>> From: Sean Christopherson <seanjc@google.com>
>>>
>>> Introduce a generic kvm_mem_is_private() interface using a static call to
>>> determine if a GFN is private. This allows the implementation for checking
>>> a GFN's private/shared status to be set at runtime.
>>>
>>> In preparation for choosing implementations between a guest_memfd lookup
>>> and the existing VM attribute lookup, rename the existing
>>> VM-attribute-based check to kvm_vm_mem_is_private to emphasize that it
>>> looks up VM attributes.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>
>> (SoB fix plz)
>>
>> Reviewed-by: Fuad Tabba <tabba@google.com>
>>
>> Cheers,
>> /fuad
>>> ---
>>>   include/linux/kvm_host.h | 12 +++++++++++-
>>>   virt/kvm/kvm_main.c      | 15 +++++++++++++++
>>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index eb26d4ea8945a..3915da2a61778 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2546,7 +2546,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>>>   bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>>>                                           struct kvm_gfn_range *range);
>>>
>>> -static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> +static inline bool kvm_vm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> 
> Should have read the Sashiko review first, but where is this used?
> It's not used at all in this series...

See below:

> 
> /fuad
> 
>>>   {
>>>          return kvm_get_vm_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>   }
>>> @@ -2557,6 +2557,16 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
>>>                                                    KVM_MEMORY_ATTRIBUTE_PRIVATE,
>>>                                                    KVM_MEMORY_ATTRIBUTE_PRIVATE);
>>>   }
>>> +#endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>>> +
>>> +#ifdef kvm_arch_has_private_mem
>>> +typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
>>> +DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
>>> +
>>> +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +       return static_call(__kvm_mem_is_private)(kvm, gfn);
>>> +}
>>>   #else
>>>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>   {
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 6669f1477013c..8b238e461b854 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2627,6 +2627,20 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>>>   }
>>>   #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>>>
>>> +#ifdef kvm_arch_has_private_mem
>>> +DEFINE_STATIC_CALL_RET0(__kvm_mem_is_private, kvm_mem_is_private_t);
>>> +EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
>>> +
>>> +static void kvm_init_memory_attributes(void)
>>> +{
>>> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>>> +       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
>>> +#endif
>>> +}


Here ^^ as the static call update ?


Suzuki

^ permalink raw reply

* Re: [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion
From: Fuad Tabba @ 2026-06-19 10:09 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-15-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> When memory in guest_memfd is converted from private to shared, the
> platform-specific state associated with the guest-private pages must be
> invalidated or cleaned up.
>
> Iterate over the folios in the affected range and call the
> kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> architectures to perform necessary teardown, such as updating hardware
> metadata or encryption states, before the pages are transitioned to the
> shared state.
>
> Invoke this helper after indicating to KVM's mmu code that an invalidation
> is in progress to stop in-flight page faults from succeeding.
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Coming back to this after working through the arm64/pKVM side. My
Reviewed-by here is from the previous round and the patch hasn't
changed, but I missed an implication for arm64.

kvm_arch_gmem_invalidate() is now called from two paths with the same
(start, end) signature: folio teardown (kvm_gmem_free_folio) and
private->shared conversion (here). For SNP/TDX that's fine, conversion is
destructive anyway. For pKVM the two need opposite content semantics:
conversion must preserve the page in place (same physical page, the point
of in-place conversion without encryption), while teardown must scrub it
before returning it to the host.

The hook gets only a pfn range with no indication of which caller it's
serving, so arm64 can't give the two paths the behaviour they need. It
would help to signal intent on the conversion path: a reason/flag, a
separate hook, or not routing non-destructive conversion through the
teardown hook.

arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
second caller now, and it's cheaper to leave room for the distinction
than to change a generic contract other arches depend on later.

Cheers,
/fuad


> ---
>  virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 433f79047b9d1..3c94442bc8131 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
>         return safe;
>  }
>
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> +       struct folio_batch fbatch;
> +       pgoff_t next = start;
> +       int i;
> +
> +       folio_batch_init(&fbatch);
> +       while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> +               for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> +                       struct folio *folio = fbatch.folios[i];
> +                       pgoff_t start_index, end_index;
> +                       kvm_pfn_t start_pfn, end_pfn;
> +
> +                       start_index = max(start, folio->index);
> +                       end_index = min(end, folio_next_index(folio));
> +                       /*
> +                        * end_index is either in folio or points to
> +                        * the first page of the next folio. Hence,
> +                        * all pages in range [start_index, end_index)
> +                        * are contiguous.
> +                        */
> +                       start_pfn = folio_file_pfn(folio, start_index);
> +                       end_pfn = start_pfn + end_index - start_index;
> +
> +                       kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> +               }
> +
> +               folio_batch_release(&fbatch);
> +               cond_resched();
> +       }
> +}
> +#else
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> +#endif
> +
>  static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>                                      size_t nr_pages, uint64_t attrs,
>                                      pgoff_t *err_index)
> @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>          */
>
>         kvm_gmem_invalidate_start(inode, start, end);
> +
> +       if (!to_private)
> +               kvm_gmem_invalidate(inode, start, end);
> +
>         mas_store_prealloc(&mas, xa_mk_value(attrs));
> +
>         kvm_gmem_invalidate_end(inode, start, end);
>  out:
>         filemap_invalidate_unlock(mapping);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v4 00/31] Introduce SCMI Telemetry FS support
From: David Hildenbrand (Arm) @ 2026-06-19 10:16 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Christian Brauner, linux-kernel, linux-arm-kernel, arm-scmi,
	linux-fsdevel, linux-doc, sudeep.holla, james.quinlan, f.fainelli,
	vincent.guittot, etienne.carriere, peng.fan, michal.simek, d-gole,
	jic23, elif.topuz, lukasz.luba, philip.radford,
	souvik.chakravarty, leitao, kas, puranjay, usama.arif,
	kernel-team
In-Reply-To: <ajR_FBWOoXJKSeoH@pluto>


>> Is the configuration aspect limited to enabling selected events, or is there
>> more that can be configured?
>>
> 
> The needed configuration is:
> 
>  - global Telemetry enable (tlm_enable)
>  - global common update_interval (current_update_interval)

Okay, so simple global properties.

>  - per-DE enable/disable (des/0x<NNNN>/enable)
>  - per-DE timestamping enable/disable (des/0x<NNNN>/tstamp_enable)
> 
>  ... then there are a couple of handy catch-all entries:
> 	all_des_enable, all_des_tstamp_enable

Okay, so fairly trivial configs.
> 
> Note that all the existent DEs are discovered at runtime dynamically via
> SCMI in the background at init/probe and then never change: i.e.
> the tree is statically created upon discovery, user cannot
> create/destroy or symlink files at will, nor the backend platform FW
> running the SCMI server can pop-up new DataEvents after the initial
> enumeration.

That makes sense.

> 
> All the above configs can also be pre-defined in the FW (at built time)
> as being default boot-on with predefined values, like a specific
> boot-on update interval, so that you could have a system in which really
> you dont need to configure anything...everything is on and you just
> read data. (unless you want to change config of course...)

Okay, so the initial value of some parameters might not be "disabled" etc.

I guess, from a user space perspective, reading should be allowed by everyone
but writing should be limited to root?

> 
> There is more stuff that indeed is configurable per the SCMI spec
> but these additional params are hidden into the SCMI Telemetry protocol
> layer (the initial patches in this series) and NOT made available to
> the driver/users of the protocol (like the SCMI FS driver that sits on
> top)

Do you assume that there will get significantly more config options added in the
future for user space to configure?

> 
> IOW, this humonguos series (~8k lines) is only partially composed by
> the Filesystem driver (~3k): the bulk of the Telemetry logic and SCMI
> message exchanges are contained in the SCMI Protocol stack which has
> been extended to support the Telemerty protocol at first
> (the 'firmware: arm_scmi:' initial patches).
> 
> This latter common support is exposed by the SCMI stack for the SCMI
> drivers to use via custom per-protocol operations (not an orginal name :P)
> exposed in include/linux/scmi_protocol.h
> 
> So when you write into FS to configure smth, you end up calling an internal
> tlm_proto_ops that in turn will cause an SCMI message to be sent
> (in some cases say to enable a DE or set the update interval)

Makes sense.

> 
> When you read something, you end up calling another Telemetry operation
> that in turn returns you the DataEvent value you were looking for...how
> this is retrieved via SCMI in the background is transparent to the
> FS driver because, again, these details are buried into the protocol
> layer. Talking about reads, you can:
> 
>  - read a single value from des/0x<NNNN>/value
>  - read ALL the currently enabled DE in a bulk read via des_bulk_read
> 
> ...most of the other entries in the tree are simply RO properties of the DEs
> that have been discovered at enumeration time.

Is this bulk-reading relevant for performance or just a "nice to have" ?


> 
> Given that walking a FS tree and issuing configuration as writes is NOT
> performant really (nor handy if you are not a human), currently, even
> in this FS-based series you can really perform all of the discovery AND
> the configuration tasks WITHOUT walking the filesystem tree, but instead
> issuing a bunch of IOCTLs issued on a special 'control' file that I
> embedded in the FS. Such UAPI IOCTLs described at:

Makes sense.

> 
> https://lore.kernel.org/arm-scmi/20260612223802.1337232-6-cristian.marussi@arm.com/T/#u
>  
> So my plan of action in order to get rid of the FS in-kenel implementation
> would be to drop this Filesystem in favour of simple character devices
> and move the existent IOCTLs interface (revisited where needed) on top of
> these devices: that way you will be able to use IOCTLs to enumerate the
> Telemetry sources and then configure them.
> 
> Read will then happen (probably) leveraging a number of chardev fops like:
> IOCTLs, .read and .mmap...up to the tool decide what to use.
> 
> After this porting to chardev is done, I would start optionally exposing
> again all of this in a human-readable alternative way by adding a layer
> of FUSE on top of this chardev interface.

Yes. How high-priority is the fs side? Or would a tool using a library to access
this information also work in the first step?

> 
> Basically my aim is to drop the FS implementation from the kernel, as
> advised, while trying to optionally make it still available via a userspace
> FUSE implementation...IOW the intention would be for the next V5 to expose
> the same interfaces as V4 but with the help of a tool instead that builds,
> if wanted, a FUSE mount built on top of the chardev interface.
> 
> So basically 'floating up' the current FS-like interface into userspace.

Yes.

> 
>>
>> You mention json here ... but I assume the data we are getting fed by the
>> protocol is not in some default format? (e.g., json)
> 
> The data format is defined by the SCMI spec and it is buried in the SCMI
> layer, there are a number of collection method and a number of formats: this
> is NOT exposed from the SCMI core BUT handled transparently.
> 
> The raw spec format basically defines how DE ID, Tstamps, values are represented
> in memory and how their consistency can be assured despite the fact that
> platform could update the same entries that a user is concurrently reading...
> 
> JSON definitions only assign a semantic to the DEs (in theory...): e.g. on this
> specific platform...wth is 0x1234 ? ..also note that JSON defs are NOT part of
> the spec....they do NOT really exist for the Kernel: they are parsed and
> interpreted by more complex user space tools that are supposed to leverage some
> of these interfaces to retrieve data and carry-on analysis.

What I thought, thanks.

> 
>>
>>
>> Maybe you have it in some of the patches here, but what does the typical
>> directory + file structure look like in the current implementation?
>>
>> Do you have an example?
>>
>> Also, is everything in that filesystem read-only, or are there some writable
>> file (IOW, how is stuff configured?).
> 
> See above for config/write entry ... and I think you found the FS layout in the
> doc already...
> 
>>
>>
>> Okay, so you really only feed this data to user space, exposing all the data you
>> have easily available as part of the protocol.
> 
> Yes, no interpetation nor filtering: I expose all that have enumerated and/
> discovered by the protocol, allowing for configurations while hiding the inner
> SCMI Telemetry mechanism...
> 
>>
>>
>> It's a good question how that could be done, if you need more information about
>> these events from user space.
> 
> I have NOT really delved into that, so as of know we do NOT fed any data
> to existing Kernel subsystems, not there is any available in-kernel
> interface to consume DE data (nobody asked), but, I can imagine 2 solution:
> 
>  - our beloved architects decide to 'architect' more DataEvents in the
>    next version of the spec.. i.e. they reserve some specific DE IDs to
>    represent some well defined entity (like it is done already in the spec
>    for a dozen IDs)...this avoids the needs of any new interface all
>    together

That would be the cleanest solution :)

> 
> OR
> 
> - we open some sort of user-->kernel ABI channel 'somewhere' where the
>   userspace tool, interpreting the JSON description, can communicate something
>   like " on this platform ID 1,2,3,4 should be fed to the IIO sensors frmwk
>   too, while ID 39,8,76 can be fed to HWMON..." etc
> 
>>
>> [...]
>>
>>
>> That sounds reasonable.
>>
>> [...]
>>
>>> ...I would not say that this was the kind of feedback I was hoping for,
>>> but I am NOT gonna argue, given that you shot down already what I thought
>>> were all my best selling points :P
>>>
>>> At this point my understanding is that the way forward must be to use
>>> a custom tool to configure/extract/translate the raw Telemetry data and
>>> move up into userspace the whole human readable FS layer via FUSE, if
>>> really needed.
>>>
>>> I suppose that the new kernel/user interface has to be some dedicated char
>>> device implementing proper fops. (like I did previously in early versions
>>> of this series and then abandoned...)
>>>
>>> Is this you have in mind ? Dedicated character device(s) with enough fops
>>> to be able to configure/extract Telemetry data with a custom tool ?
>>
>> I cannot speak for Christian, but I guess you could have some kind of libscmi in
>> user space that can obtain the information (as you say, probably char device,
>> not sure which alternatives we have), to expose the data through a nice ABI, to
>> then either make tools build upon that directly, or have a fuse server in user
>> space that mimics what you currently do with the file system.
> 
> My aim would be at first a simple tool that can exercise the chardev interface to
> discover configure and read back data, and then a FUSE server on top of this to
> optionally expose the human readable FS....I suppose our internal and external
> customers can use the FS interface to validate/test/script on one side, OR
> simply code their own tools/libs to use directly the bare chardev inteface...
> 
> ...we do have a tools team already working on a library to ease all of this
> SCMI Telemtry collection and analysis...it is just a matter to re-target the
> kind of lower level interfaces that they are using in the near future
> probably (they were already planning indeed AFAIK to use more performant
> interface that FS...)

Good.

> 
>>
>> One thing that is not clear to me yet is how stuff would be configured, and how
>> possibly multiple users of libscmi would possibly interact.
>>
> 
> Configuration/discovery will happen via IOCTls, while consuming the Data
> can happen:
> 
>  - all together in bulk via a device read fops
>  - a single DE via a targeted IOCTL
>  - direct access to the raw SCMI data via dev/mmap of the underlying SCMI
>    areas (that means the tool has to parse the SCMI format defined by the
>    spec on its own, without the currently provided Kernel mediation...)
> 
> Regarding the user concurrency, I have already explicitly pushed back on
> this, our own tools team: any concurrent read or configuration write is
> allowed and properly handled in a consistent way, BUT on the configuration
> side the last write/ioctl wins: there is NO in-kernel OR userspace
> co-ordination provided out of the box: IOW if you use multiple tools
> concurrently to apply conflicting configurations, it is none of our problem

Would concurrent reading work? I assume so, right?

> 
> ...similarly as if you have an actively running network configuration daemon
> and you try to set your IP manually...nobody will prevent you from doing this,
> the same netlink will be used freely by you on the shell and the daemon (if you
> have enough privilege), but you will gonna have unexpected result...
> 
> I dont either see the case to enforce exclusive access for Telemetry resources:
> co-ordination is up to the user in my view...I mean if you have 2 tools
> configuring concurrently SCMI telemetry in a conflicting way something has been
> misconfigured somewhere
> 
> .....having said that, I understand that the concurrency co-ordination
> issue can be particularly tricky to spot and solve in userspace, so I DO
> expose a generation counter entry that is updated on any configuration
> change, so that a userspace app using Telemetry can monitor (poll) this
> counter to spot if someone else on the system is quietly suddenly applying
> configuration changes...

Okay, so a single writer (admin) changing stuff could get picked up my possibly
many concurrent readers?

> 
>>>
>>> Should/could such a tool live in the kernel tree (tools/) at least for
>>> ease of development/deployment ?
>>
>> I think OOT.
>>
> 
> Ok.
> 
> Sorry for the long email..I hope I have clarified the situation, anyway
> I am already moving to get rid of the in-kernel interface as advised in
> favour of a chardev kernel interface and an optional FUSE based FS...

Yes, thank you a lot, I hope it also helps Christian to help push this into the
right direction!

-- 
Cheers,

David

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox