Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v5 1/7] tracing/events: Fix to check the simple_tsk_fn creation
From: Masami Hiramatsu @ 2026-06-19  8:33 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178165817322.269421.3992299509400184196.stgit@devnote2>

Let me pick this fix to probes/core.

Thanks,

On Wed, 17 Jun 2026 10:02:53 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Sashiko pointed that this sample code does not correctly handle the
> failure of thread creation because kthread_run() can return -errno.
> 
> Check the simple_tsk_fn is correctly initialized (created) or not.
> 
> Link: https://sashiko.dev/#/patchset/178092865666.163648.10457567771536160909.stgit%40devnote2
> 
> Fixes: 9cfe06f8cd5c ("tracing/events: add trace-events-sample")
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  Changes in v4:
>    - Fix to remove decrementing counter in error path, since foo_bar_reg() always returns 0.
>    - Add a newline to error message.
>  Changes in v3:
>    - Recover the usage counter.
> ---
>  samples/trace_events/trace-events-sample.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
> index ecc7db237f2e..0b7a6efdb247 100644
> --- a/samples/trace_events/trace-events-sample.c
> +++ b/samples/trace_events/trace-events-sample.c
> @@ -107,6 +107,10 @@ int foo_bar_reg(void)
>  	 * for consistency sake, we still take the thread_mutex.
>  	 */
>  	simple_tsk_fn = kthread_run(simple_thread_fn, NULL, "event-sample-fn");
> +	if (IS_ERR_OR_NULL(simple_tsk_fn)) {
> +		pr_err("Failed to create simple_thread_fn\n");
> +		simple_tsk_fn = NULL;
> +	}
>   out:
>  	mutex_unlock(&thread_mutex);
>  	return 0;
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v8 10/46] KVM: guest_memfd: Wire up core private/shared attribute interfaces
From: Fuad Tabba @ 2026-06-19  8:34 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-10-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> With in-place conversion, guest_memfd is able to track the private/shared
> status of memory. Use a global flag to toggle between tracking
> private/shared status per-vm or within guest_memfd.
>
> When queried for supported vm memory attributes, return 0 if attributes are
> tracked in guest_memfd.
>
> When querying for memory attributes over a range, look up memory attributes
> based on the flag's state at query time.
>
> For per-GFN memory attribute queries, choosing an implementation (VM or
> guest_memfd lookup) at KVM load time.
>
> The flag is always false for now and will be made toggle-able after all
> in-place conversion features are added in subsequent patches.
>
> If/since the flag is false, if CONFIG_KVM_VM_MEMORY_ATTRIBUTES is also not
> selected, the per-GFN memory attribute query defaults to returning
> 0 (false/not private).
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  include/linux/kvm_host.h |  4 ++++
>  virt/kvm/guest_memfd.c   | 22 +++++++++++++++++++---
>  virt/kvm/kvm_main.c      | 12 +++++++++++-
>  3 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27687fb9d5201..acb552745b428 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2560,6 +2560,8 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
>  #endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>
>  #ifdef kvm_arch_has_private_mem
> +extern bool gmem_in_place_conversion;
> +
>  typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
>  DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
>
> @@ -2568,6 +2570,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>         return static_call(__kvm_mem_is_private)(kvm, gfn);
>  }
>  #else
> +#define gmem_in_place_conversion false
> +
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return false;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index bca912db5be6e..e0e544ef47d69 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -926,6 +926,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
>
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
> +static bool kvm_gmem_range_is_private(struct file *file, pgoff_t index,
> +                                     size_t nr_pages, struct kvm *kvm, gfn_t gfn)
> +{
> +       struct maple_tree *mt = &GMEM_I(file_inode(file))->attributes;
> +       pgoff_t end = index + nr_pages - 1;
> +       void *entry;
> +
> +       if (!gmem_in_place_conversion)
> +               return kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + nr_pages,
> +                                                         KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +                                                         KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +
> +       mt_for_each(mt, entry, index, end) {
> +               if (xa_to_value(entry) != KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +                       return false;
> +       }
> +       return true;
> +}
>
>  static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
>                                 struct file *file, gfn_t gfn, struct page *src_page,
> @@ -946,9 +964,7 @@ static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
>
>         folio_unlock(folio);
>
> -       if (!kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + 1,
> -                                               KVM_MEMORY_ATTRIBUTE_PRIVATE,
> -                                               KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +       if (!kvm_gmem_range_is_private(file, index, 1, kvm, gfn)) {
>                 ret = -EINVAL;
>                 goto out_put_folio;
>         }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b238e461b854..01761f6e25d25 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -101,6 +101,10 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink);
>  static bool __ro_after_init allow_unsafe_mappings;
>  module_param(allow_unsafe_mappings, bool, 0444);
>
> +#ifdef kvm_arch_has_private_mem
> +bool __ro_after_init gmem_in_place_conversion = false;
> +#endif
> +
>  /*
>   * Ordering of locks:
>   *
> @@ -2422,6 +2426,9 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  static u64 kvm_supported_vm_mem_attributes(struct kvm *kvm)
>  {
>  #ifdef kvm_arch_has_private_mem
> +       if (gmem_in_place_conversion)
> +               return 0;
> +
>         if (!kvm || kvm_arch_has_private_mem(kvm))
>                 return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>  #endif
> @@ -2633,8 +2640,11 @@ EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
>
>  static void kvm_init_memory_attributes(void)
>  {
> +       if (gmem_in_place_conversion)
> +               static_call_update(__kvm_mem_is_private, kvm_gmem_is_private);
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> -       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
> +       else
> +               static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
>  #endif
>  }
>  #else
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:16 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: rodrigo.alencar, linux-iio, devicetree, linux-kernel, linux-doc,
	linux-hardening, Lars-Peter Clausen, Michael Hennerich,
	Jonathan Cameron, David Lechner, Andy Shevchenko, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Philipp Zabel, Jonathan Corbet,
	Shuah Khan, Kees Cook, Gustavo A. R. Silva
In-Reply-To: <x3aijvc4buo7aqbchikuoyyrgiq3afidtkla37h2rg4tvfdbc3@h42qp3estg2s>

On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> On 18/06/26 16:06, Nuno Sá wrote:
> > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> > > From: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > 
> > > Move logic to create a channel prefix for naming attribute files into a
> > > separate __iio_chan_prefix_emit() function for reuse.
> 
> ...
> 
> > > +static int __iio_chan_prefix_emit(const struct iio_chan_spec *chan,
> > > +				  enum iio_shared_by shared_by,
> > > +				  char *buf, size_t len)
> > > +{
> > > +	const char *dir = iio_direction[chan->output];
> > > +	const char *type = iio_chan_type_name_spec[chan->type];
> > > +	int n = 0;
> > > +
> > > +	switch (shared_by) {
> > > +	case IIO_SHARED_BY_ALL:
> > > +		buf[0] = '\0'; /* empty channel prefix */
> > > +		break;
> > > +	case IIO_SHARED_BY_DIR:
> > > +		n = scnprintf(buf, len, "%s", dir);
> > > +		break;
> > > +	case IIO_SHARED_BY_TYPE:
> > > +		n = scnprintf(buf, len, "%s_%s", dir, type);
> > > +		if (chan->differential)
> > > +			n += scnprintf(buf + n, len - n, "-%s", type);
> > > +		break;
> > > +	case IIO_SEPARATE:
> > > +		if (chan->indexed) {
> > > +			n = scnprintf(buf, len, "%s_%s%d", dir, type,
> > > +				      chan->channel);
> > > +			if (chan->differential)
> > > +				n += scnprintf(buf + n, len - n, "-%s%d", type,
> > > +					       chan->channel2);
> > > +		} else {
> > > +			if (chan->differential) {
> > > +				WARN(1, "Differential channels must be indexed\n");
> > > +				return -EINVAL;
> > > +			}
> > > +			n = scnprintf(buf, len, "%s_%s", dir, type);
> > > +		}
> > > +
> > > +		if (chan->modified) {
> > > +			if (chan->differential) {
> > > +				WARN(1, "Differential channels can not have modifier\n");
> > > +				return -EINVAL;
> > 
> > WARN() looks too much to me. dev_error() as we're treating it as such. I
> > guess you don't want to pass struct device but not really an issue IMHO.
> 
> __iio_device_attr_init() also used WARN(), probably because it didnt have
> access to a dev pointer. It would not be a problem to add an extra param.

Hmm, fair enough. Maybe a chance to change it. Not sure how others feel
about it.

>  
> > 
> > > +			}
> > > +			n += scnprintf(buf + n, len - n, "_%s",
> > > +				       iio_modifier_names[chan->channel2]);
> > > +		}
> > > +
> > > +		if (chan->extend_name)
> > > +			n += scnprintf(buf + n, len - n, "_%s", chan->extend_name);
> > > +		break;
> > > +	}
> > > +
> > > +	if (n > 0 && n < len - 1) { /* prefix termination if not empty */
> > > +		buf[n++] = '_';
> > > +		buf[n] = '\0';
> > > +	}
> > > +
> > 
> > Can't we handle the above in the caller on kasprintf()? Then we could
> > simplify and return in place.
> 
> I felt like doing this here would get a cleaner logic in the caller, which
> would have to add the '_' conditionally.
> 

I think it makes things more clear in the caller given we return n
anyways but I don't feel strong about it.

- Nuno Sá

> > 
> > > +	return n;
> > > +}
> > > +
> > >  /**
> > >   * iio_device_id() - query the unique ID for the device
> > >   * @indio_dev:		Device structure whose ID is being queried
> > > @@ -1100,106 +1159,19 @@ int __iio_device_attr_init(struct device_attribute *dev_attr,
> > >  						size_t len),
> > >  			   enum iio_shared_by shared_by)
> > >  {
> > > -	int ret = 0;
> > > -	char *name = NULL;
> > > -	char *full_postfix;
> > > +	char prefix[NAME_MAX + 1];
> > > +	int ret;
> > >  
> > >  	sysfs_attr_init(&dev_attr->attr);
> > >  
> > > -	/* Build up postfix of <extend_name>_<modifier>_postfix */
> > > -	if (chan->modified && (shared_by == IIO_SEPARATE)) {
> > > -		if (chan->extend_name)
> > > -			full_postfix = kasprintf(GFP_KERNEL, "%s_%s_%s",
> > > -						 iio_modifier_names[chan->channel2],
> > > -						 chan->extend_name,
> > > -						 postfix);
> > > -		else
> > > -			full_postfix = kasprintf(GFP_KERNEL, "%s_%s",
> > > -						 iio_modifier_names[chan->channel2],
> > > -						 postfix);
> > > -	} else {
> > > -		if (chan->extend_name == NULL || shared_by != IIO_SEPARATE)
> > > -			full_postfix = kstrdup(postfix, GFP_KERNEL);
> > > -		else
> > > -			full_postfix = kasprintf(GFP_KERNEL,
> > > -						 "%s_%s",
> > > -						 chan->extend_name,
> > > -						 postfix);
> > > -	}
> > > -	if (full_postfix == NULL)
> > > +	ret = __iio_chan_prefix_emit(chan, shared_by, prefix, sizeof(prefix));
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > +	if (!dev_attr->attr.name)
> > >  		return -ENOMEM;
> > 
> > I don't oppose the change. Looks like a nice cleanup. But bear in mind
> > this very sensible as any subtle mistake means ABI breakage.
> 
> Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> Kind regards,
> 
> Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:19 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Rodrigo Alencar, rodrigo.alencar, linux-iio, devicetree,
	linux-kernel, linux-doc, linux-hardening, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, David Lechner,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
	Gustavo A. R. Silva
In-Reply-To: <ajQ1bZSNHQ96pyJx@ashevche-desk.local>

On Thu, Jun 18, 2026 at 09:14:05PM +0300, Andy Shevchenko wrote:
> On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> > On 18/06/26 16:06, Nuno Sá wrote:
> > > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> 
> ...
> 
> > > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > > +	if (!dev_attr->attr.name)
> > > >  		return -ENOMEM;
> > > 
> > > I don't oppose the change. Looks like a nice cleanup.
> 
> May I oppose it? I found use scnprintf() is harder to follow in comparison to
> nice kasprintf() that takes care for the dynamically allocated buffer.

Tend to agree a bit given I was used to the older code. So matching the
old logic with the new one is an exercise, yes.

> 
> Also there is a chance to get a name silently cut due to insufficient space.
> Besides that this function can't be used (again due to 'c') in kasprintf()-like
> wrapper. I do not consider this as a good approach. Have you looked at seq_buf
> instead?

Not so sure the above bothers me that much.

> 
> > > But bear in mind this very sensible as any subtle mistake means ABI breakage.
> 
> Which immediately raises a question of test coverage. Do we have one? If not,
> this code must be accompanied with one.

The above is the more concerning part to me.

- Nuno Sá

> 
> > Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply

* Re: [PATCH v6 06/16] iio: core: create local __iio_chan_prefix_emit() for reuse
From: Nuno Sá @ 2026-06-19  9:20 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: Andy Shevchenko, rodrigo.alencar, linux-iio, devicetree,
	linux-kernel, linux-doc, linux-hardening, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, David Lechner,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
	Gustavo A. R. Silva
In-Reply-To: <dlisetsssjoyodmv5ubl4rzhxtla3g46mrrzv2f65nqecel5fu@dqiqsayr4aip>

On Fri, Jun 19, 2026 at 08:43:24AM +0100, Rodrigo Alencar wrote:
> On 18/06/26 21:14, Andy Shevchenko wrote:
> > On Thu, Jun 18, 2026 at 05:14:19PM +0100, Rodrigo Alencar wrote:
> > > On 18/06/26 16:06, Nuno Sá wrote:
> > > > On Thu, Jun 18, 2026 at 02:27:22PM +0100, Rodrigo Alencar via B4 Relay wrote:
> > 
> > ...
> > 
> > > > > +	dev_attr->attr.name = kasprintf(GFP_KERNEL, "%s%s", prefix, postfix);
> > > > > +	if (!dev_attr->attr.name)
> > > > >  		return -ENOMEM;
> > > > 
> > > > I don't oppose the change. Looks like a nice cleanup.
> > 
> > May I oppose it? I found use scnprintf() is harder to follow in comparison to
> > nice kasprintf() that takes care for the dynamically allocated buffer.
> 
> In the next patch the function is reused in a sysfs attribute read handler,
> a context wich would not be nice to have dynamic allocation. vscnprintf() is
> the main building block of sysfs_emit() which limits the buffer length to
> a page size, so I used scnprintf() trying not to deviate much from that. 
> 
> kasprintf() it is still used in the caller, where the logic was a bit confusing
> as it tried to avoid multiple allocations.
>  
> > Also there is a chance to get a name silently cut due to insufficient space.
> > Besides that this function can't be used (again due to 'c') in kasprintf()-like
> > wrapper. I do not consider this as a good approach. Have you looked at seq_buf
> > instead?
> 
> NAME_MAX is not the maximum length a filename can have? I suppose there should be
> enough space for the channel-prefix. Indeed, seq_buf can be used and it cleans up
> things a bit as it tracks the the position in the buffer.
> 
> > 
> > > > But bear in mind this very sensible as any subtle mistake means ABI breakage.
> > 
> > Which immediately raises a question of test coverage. Do we have one? If not,
> > this code must be accompanied with one.
> 
> Agreed. Will see to have tests for v7.

libiio now has an emulator backend. Maybe something that can be used to
test this. But ideally we can have some kunit for validation.

- Nuno Sá

> 
> > > Yes! I tried to be careful... this is dangerous stuff!
> 
> -- 
> Kind regards,
> 
> Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v8 13/46] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2
From: Fuad Tabba @ 2026-06-19  9:25 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-13-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce base support for KVM_SET_MEMORY_ATTRIBUTES2 in guest_memfd, which
> just updates attributes tracked by guest_memfd.
>
> Validate input fields in general. Guard usage of KVM_SET_MEMORY_ATTRIBUTES2
> by making sure requested attributes are supported for this instance of kvm.
>
> A new KVM_SET_MEMORY_ATTRIBUTES2 is defined to support writes (unlike
> KVM_SET_MEMORY_ATTRIBUTES) in addition to reads so it can provide error
> details to userspace. This will be used in a later patch.
>
> The two ioctls use their corresponding structs with no overlap, but
> backward compatibility is baked in for future support of
> KVM_SET_MEMORY_ATTRIBUTES2 and struct kvm_memory_attributes2 in the VM
> ioctl.
>
> The process of setting memory attributes is set up such that the later half
> will not fail due to allocation. Any necessary checks are performed before
> the point of no return.
>
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> Co-developed-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Sean Christoperson <seanjc@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Note sure if it's user error on my part, if I'm applying this to the
wrong base, but I found a build break here on patch 13:
kvm_gmem_invalidate_start() doesn't exist in the base tree. The
function is kvm_gmem_invalidate_begin() here. The rename
(190cc5370a8b6) landed via a different merge path and isn't an
ancestor of the stated base.

Patches 19 and 20 have the same mismatch. Fix for all three is
s/kvm_gmem_invalidate_start/kvm_gmem_invalidate_begin/.

Cheers,
/fuad

> ---
>  include/uapi/linux/kvm.h |  13 ++++++
>  virt/kvm/Kconfig         |   1 +
>  virt/kvm/guest_memfd.c   | 116 +++++++++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  12 +++++
>  4 files changed, 142 insertions(+)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 419011097fa8e..956877a6aab05 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1649,6 +1649,19 @@ struct kvm_memory_attributes {
>         __u64 flags;
>  };
>
> +#define KVM_SET_MEMORY_ATTRIBUTES2              _IOWR(KVMIO,  0xd2, struct kvm_memory_attributes2)
> +
> +struct kvm_memory_attributes2 {
> +       union {
> +               __u64 address;
> +               __u64 offset;
> +       };
> +       __u64 size;
> +       __u64 attributes;
> +       __u64 flags;
> +       __u64 reserved[12];
> +};
> +
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>
>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 297e4399fbd49..cfa2c78ba5fb9 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -102,6 +102,7 @@ config KVM_MMU_LOCKLESS_AGING
>
>  config KVM_GUEST_MEMFD
>         select XARRAY_MULTI
> +       select KVM_MEMORY_ATTRIBUTES
>         bool
>
>  config HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 65ce795c090d9..0d14548c1ed22 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -541,11 +541,127 @@ bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_is_private);
>
> +/*
> + * Preallocate memory for attributes to be stored on a maple tree, pointed to
> + * by mas.  Adjacent ranges with attributes identical to the new attributes
> + * will be merged.  Also sets mas's bounds up for storing attributes.
> + *
> + * This maintains the invariant that ranges with the same attributes will
> + * always be merged.
> + */
> +static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> +                                   pgoff_t start, size_t nr_pages)
> +{
> +       pgoff_t end = start + nr_pages;
> +       pgoff_t last = end - 1;
> +       void *entry;
> +
> +       /* Try extending range. entry is NULL on overflow/wrap-around. */
> +       mas_set(mas, end);
> +       entry = mas_find(mas, end);
> +       if (entry && xa_to_value(entry) == attributes)
> +               last = mas->last;
> +
> +       if (start > 0) {
> +               mas_set(mas, start - 1);
> +               entry = mas_find(mas, start - 1);
> +               if (entry && xa_to_value(entry) == attributes)
> +                       start = mas->index;
> +       }
> +
> +       mas_set_range(mas, start, last);
> +       return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> +}
> +
> +static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> +                                    size_t nr_pages, uint64_t attrs)
> +{
> +       struct address_space *mapping = inode->i_mapping;
> +       struct gmem_inode *gi = GMEM_I(inode);
> +       pgoff_t end = start + nr_pages;
> +       struct maple_tree *mt;
> +       struct ma_state mas;
> +       int r;
> +
> +       mt = &gi->attributes;
> +
> +       filemap_invalidate_lock(mapping);
> +
> +       mas_init(&mas, mt, start);
> +       r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> +       if (r)
> +               goto out;
> +
> +       /*
> +        * From this point on guest_memfd has performed necessary
> +        * checks and can proceed to do guest-breaking changes.
> +        */
> +
> +       kvm_gmem_invalidate_start(inode, start, end);
> +       mas_store_prealloc(&mas, xa_mk_value(attrs));
> +       kvm_gmem_invalidate_end(inode, start, end);
> +out:
> +       filemap_invalidate_unlock(mapping);
> +       return r;
> +}
> +
> +static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> +{
> +       struct gmem_file *f = file->private_data;
> +       struct inode *inode = file_inode(file);
> +       struct kvm_memory_attributes2 attrs;
> +       size_t nr_pages;
> +       pgoff_t index;
> +       int i;
> +
> +       if (copy_from_user(&attrs, argp, sizeof(attrs)))
> +               return -EFAULT;
> +
> +       if (attrs.flags)
> +               return -EINVAL;
> +       for (i = 0; i < ARRAY_SIZE(attrs.reserved); i++) {
> +               if (attrs.reserved[i])
> +                       return -EINVAL;
> +       }
> +       if (!kvm_arch_has_private_mem(f->kvm))
> +               return -EINVAL;
> +       if (attrs.attributes & ~KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +               return -EINVAL;
> +       if (attrs.size == 0 || attrs.offset + attrs.size < attrs.offset)
> +               return -EINVAL;
> +       if (!PAGE_ALIGNED(attrs.offset) || !PAGE_ALIGNED(attrs.size))
> +               return -EINVAL;
> +
> +       if (attrs.offset >= i_size_read(inode) ||
> +           attrs.offset + attrs.size > i_size_read(inode))
> +               return -EINVAL;
> +
> +       nr_pages = attrs.size >> PAGE_SHIFT;
> +       index = attrs.offset >> PAGE_SHIFT;
> +       return __kvm_gmem_set_attributes(inode, index, nr_pages,
> +                                        attrs.attributes);
> +}
> +
> +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
> +                          unsigned long arg)
> +{
> +       switch (ioctl) {
> +       case KVM_SET_MEMORY_ATTRIBUTES2:
> +               if (!gmem_in_place_conversion)
> +                       return -ENOTTY;
> +
> +               return kvm_gmem_set_attributes(file, (void __user *)arg);
> +       default:
> +               return -ENOTTY;
> +       }
> +}
> +
>  static struct file_operations kvm_gmem_fops = {
>         .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
>         .release        = kvm_gmem_release,
>         .fallocate      = kvm_gmem_fallocate,
> +       .unlocked_ioctl = kvm_gmem_ioctl,
>  };
>
>  static int kvm_gmem_migrate_folio(struct address_space *mapping,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 01761f6e25d25..a08b518cdb175 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -105,6 +105,18 @@ module_param(allow_unsafe_mappings, bool, 0444);
>  bool __ro_after_init gmem_in_place_conversion = false;
>  #endif
>
> +#define MEMORY_ATTRIBUTES_MATCH(one, two)                              \
> +       static_assert(offsetof(struct kvm_memory_attributes, one) ==    \
> +                     offsetof(struct kvm_memory_attributes2, two));    \
> +       static_assert(sizeof_field(struct kvm_memory_attributes, one) ==\
> +                     sizeof_field(struct kvm_memory_attributes2, two))
> +
> +/* Ensure the common parts of the two structs are identical. */
> +MEMORY_ATTRIBUTES_MATCH(address, address);
> +MEMORY_ATTRIBUTES_MATCH(size, size);
> +MEMORY_ATTRIBUTES_MATCH(attributes, attributes);
> +MEMORY_ATTRIBUTES_MATCH(flags, flags);
> +
>  /*
>   * Ordering of locks:
>   *
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v4 0/2] cpufreq: CPPC: add autonomous mode boot parameter support
From: Pierre Gondois @ 2026-06-19  9:29 UTC (permalink / raw)
  To: Viresh Kumar, Sumit Gupta
  Cc: rafael, ionela.voinescu, zhenglifeng1, zhanjie9, corbet, skhan,
	rdunlap, mario.limonciello, linux-pm, linux-doc, linux-kernel,
	linux-tegra, treding, jonathanh, vsethi, ksitaraman, sanjayc,
	mochs, bbasu
In-Reply-To: <oxw5k2wad4vorehgmrduoxblequy3ynqufwy4sruclnh5d5wrb@awzmfafoucnn>


On 6/18/26 07:28, Viresh Kumar wrote:
> On 16-06-26, 18:22, Sumit Gupta wrote:
>> The dependency it was waiting on, the "cpufreq: Set policy->min and
>> max as real QoS constraints" series, is now in linux-pm (linux-next).
>> I rebased on top and verified autonomous mode works as expected, and
>> it applies cleanly on the current linux-next.
>>
>> The [1] reference in patch 2/2 points to v2 of that series; the merged
>> version is v3 [2].
>>
>> If there are no further comments, please consider acking and queuing
>> this for the next cycle.
> I was waiting for CPPC reviewers to provide some feedback.i
>
> Jie / Lifeng / Pierre ?
>
I think the patchset has the same issue described at:

https://lore.kernel.org/all/86780f97-29ee-4a72-b311-38c89434b707@arm.com/

I don't know if this is important to other persons,
but IMO it would be preferable to have a solution to this issue
before adding more functionalities relying on registers that are left
in an unknown state.

If there are any other opinion ?


^ permalink raw reply

* [PATCH 0/5] hwmon: add Altera Stratix 10 SoC FPGA hardware  monitor support
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc

From: Tze Yee Ng <tze.yee.ng@altera.com>

This series adds hardware monitor support for Altera Stratix 10 SoC FPGA
devices. Temperature and voltage sensors are accessed through the 
Stratix 10 service layer and Secure Device Manager.

Patch 1 adds the device tree binding for the hwmon node and sensor layout.
Patch 2 extends the Stratix 10 service layer binding with an optional
hwmon child node. Patch 3 adds async HWMON read commands to the service
firmware driver. Patch 4 adds the hwmon driver, using the async service
interface when available and falling back to synchronous reads otherwise.
Patch 5 enables the hwmon node on the Stratix 10 SoCDK.

Tze Yee Ng (5):
  dt-bindings: hwmon: add Altera Stratix 10 hardware monitor binding
  dt-bindings: firmware: svc: add hwmon property
  firmware: stratix10-svc: add async HWMON read commands
  hwmon: add Stratix 10 SoC FPGA hardware monitor driver
  arm64: dts: socfpga: stratix10: add hwmon node

 .../firmware/intel,stratix10-svc.yaml         |   4 +
 .../bindings/hwmon/altr,stratix10-hwmon.yaml  | 164 +++++
 Documentation/hwmon/index.rst                 |   1 +
 Documentation/hwmon/stratix10-hwmon.rst       |  31 +
 MAINTAINERS                                   |   9 +
 .../boot/dts/altera/socfpga_stratix10.dtsi    |   5 +
 .../dts/altera/socfpga_stratix10_socdk.dts    |  33 +
 drivers/firmware/stratix10-svc.c              |  12 +
 drivers/hwmon/Kconfig                         |  10 +
 drivers/hwmon/Makefile                        |   1 +
 drivers/hwmon/stratix10-hwmon.c               | 575 ++++++++++++++++++
 include/linux/firmware/intel/stratix10-smc.h  |  38 ++
 12 files changed, 883 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
 create mode 100644 Documentation/hwmon/stratix10-hwmon.rst
 create mode 100644 drivers/hwmon/stratix10-hwmon.c

-- 
2.43.7


^ permalink raw reply

* [PATCH 1/5] dt-bindings: hwmon: add Altera Stratix 10 hardware monitor binding
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Document the device tree binding for the Altera Stratix 10 SoC FPGA
hardware monitor, including temperature and voltage sensor nodes.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../bindings/hwmon/altr,stratix10-hwmon.yaml  | 164 ++++++++++++++++++
 MAINTAINERS                                   |   7 +
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml

diff --git a/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml b/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
new file mode 100644
index 000000000000..5bd98660ee7b
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
@@ -0,0 +1,164 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/hwmon/altr,stratix10-hwmon.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Altera Hardware Monitor for Stratix 10 SoC FPGA
+
+maintainers:
+  - Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+  - Tze Yee Ng <tze.yee.ng@altera.com>
+
+description: |
+  The Altera Stratix 10 SoC FPGA hardware monitor unit provides on-chip
+  voltage and temperature sensors. These sensors can be used to monitor
+  external voltages and on-chip operating conditions such as internal
+  power rails and on-chip junction temperatures.
+
+  The specific sensor configuration varies by device. Check the device
+  documentation to verify which sensors are available.
+
+  Stratix 10 voltage sensors:
+
+    page 0, channel 2 = 0.8V VCC
+    page 0, channel 3 = 1.8V VCCIO_SDM
+    page 0, channel 6 = 0.9V VCCERAM
+
+  Stratix 10 temperature sensors:
+
+    page 0, channel 0 = main die
+    page 0, channel 1 = tile bottom left
+    page 0, channel 2 = tile middle left
+    page 0, channel 3 = tile top left
+    page 0, channel 4 = tile bottom right
+    page 0, channel 5 = tile middle right
+    page 0, channel 6 = tile top right
+    page 0, channel 7 = hbm2 bottom
+    page 0, channel 8 = hbm2 top
+
+properties:
+  compatible:
+    const: altr,stratix10-hwmon
+
+  temperature:
+    description:
+      The temperature node specifies mappings of temperature sensor diodes on
+      the Stratix 10 SoC FPGA main die and tile die.
+    type: object
+    properties:
+      '#address-cells':
+        const: 1
+      '#size-cells':
+        const: 0
+    patternProperties:
+      "^input(@[0-9a-f]+)?$":
+        description:
+          The input node specifies each individual temperature sensor.
+        type: object
+        properties:
+          reg:
+            description:
+              Sensor channel index in the lower 16-bits (0-15). For temperature
+              sensors, the page number is encoded in the upper 16-bits.
+              The driver encodes the SMC request argument as a channel
+              bitmask (1 << channel) in bits 0..15, with the page number
+              placed in bits 16..31. Channel values >= 16 are rejected to
+              avoid overlap with the page field. For example, reg = <2>
+              selects channel 2 and the driver passes 0x4 to the service layer.
+          label:
+            description:
+              A descriptive name for this channel (e.g. "Main Die" or
+              "Tile Bottom Left").
+        required:
+          - reg
+        additionalProperties: false
+    required:
+      - '#address-cells'
+      - '#size-cells'
+    additionalProperties: false
+
+  voltage:
+    description:
+      The voltage node specifies mappings of voltage sensors on the Stratix 10
+      SoC FPGA analog to digital converter of the Secure Device Manager (SDM).
+    type: object
+    properties:
+      '#address-cells':
+        const: 1
+      '#size-cells':
+        const: 0
+    patternProperties:
+      "^input(@[0-9a-f]+)?$":
+        description:
+          The input node specifies each individual voltage sensor.
+        type: object
+        properties:
+          reg:
+            description:
+              Sensor channel index in the lower 16-bits (0-15). The driver
+              encodes the SMC request argument as a channel bitmask
+              (1 << channel). For example, reg = <2> selects channel 2 and
+              the driver passes 0x4 to the service layer.
+          label:
+            description:
+              A descriptive name for this channel (e.g. "0.8V VCC" or
+              "1.8V VCCIO_SDM").
+        required:
+          - reg
+        additionalProperties: false
+    required:
+      - '#address-cells'
+      - '#size-cells'
+    additionalProperties: false
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+    hwmon {
+      compatible = "altr,stratix10-hwmon";
+
+      voltage {
+        #address-cells = <1>;
+        #size-cells = <0>;
+
+        input@2 {
+          label = "0.8V VCC";
+          reg = <2>;
+        };
+
+        input@3 {
+          label = "1.8V VCCIO_SDM";
+          reg = <3>;
+        };
+
+        input@6 {
+          label = "0.9V VCCERAM";
+          reg = <6>;
+        };
+      };
+
+      temperature {
+        #address-cells = <1>;
+        #size-cells = <0>;
+
+        input@0 {
+          label = "Main Die";
+          reg = <0>;
+        };
+
+        input@1 {
+          label = "Tile Bottom Left";
+          reg = <1>;
+        };
+
+        input@2 {
+          label = "Tile Middle Left";
+          reg = <2>;
+        };
+      };
+    };
diff --git a/MAINTAINERS b/MAINTAINERS
index 6aa3fe2ee1bb..678f6c429627 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -937,6 +937,13 @@ ALPS PS/2 TOUCHPAD DRIVER
 R:	Pali Rohár <pali@kernel.org>
 F:	drivers/input/mouse/alps.*
 
+ALTERA STRATIX 10 SoC FPGA HWMON DRIVER
+M:	Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+M:	Tze Yee Ng <tze.yee.ng@altera.com>
+L:	linux-hwmon@vger.kernel.org
+S:	Maintained
+F:	Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
+
 ALTERA MAILBOX DRIVER
 M:	Tien Sung Ang <tiensung.ang@altera.com>
 S:	Maintained
-- 
2.43.7


^ permalink raw reply related

* [PATCH 2/5] dt-bindings: firmware: svc: add hwmon property
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Altera Stratix 10 SoCFPGA supports hardware monitor access through the
service layer mailbox. Add an optional hwmon child node to the service
layer binding so device trees can describe the hardware monitor.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../devicetree/bindings/firmware/intel,stratix10-svc.yaml     | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml b/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
index b42cfa78b28b..86ffdb10132f 100644
--- a/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
+++ b/Documentation/devicetree/bindings/firmware/intel,stratix10-svc.yaml
@@ -62,6 +62,10 @@ properties:
     $ref: /schemas/fpga/intel,stratix10-soc-fpga-mgr.yaml
     description: Optional child node for fpga manager to perform fabric configuration.
 
+  hwmon:
+    $ref: /schemas/hwmon/altr,stratix10-hwmon.yaml
+    description: Optional child node for Stratix 10 hardware monitor.
+
 required:
   - compatible
   - method
-- 
2.43.7


^ permalink raw reply related

* [PATCH 3/5] firmware: stratix10-svc: add async HWMON read commands
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add asynchronous Stratix 10 service layer support for hardware monitor
temperature and voltage read commands in stratix10_svc_async_send() and
stratix10_svc_async_prepare_response().

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 drivers/firmware/stratix10-svc.c             | 12 +++++++
 include/linux/firmware/intel/stratix10-smc.h | 38 ++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/firmware/stratix10-svc.c b/drivers/firmware/stratix10-svc.c
index e9e35d67ef96..2cfdac31402c 100644
--- a/drivers/firmware/stratix10-svc.c
+++ b/drivers/firmware/stratix10-svc.c
@@ -1311,6 +1311,14 @@ int stratix10_svc_async_send(struct stratix10_svc_chan *chan, void *msg,
 		args.a0 = INTEL_SIP_SMC_ASYNC_RSU_NOTIFY;
 		args.a2 = p_msg->arg[0];
 		break;
+	case COMMAND_HWMON_READTEMP:
+		args.a0 = INTEL_SIP_SMC_ASYNC_HWMON_READTEMP;
+		args.a2 = p_msg->arg[0];
+		break;
+	case COMMAND_HWMON_READVOLT:
+		args.a0 = INTEL_SIP_SMC_ASYNC_HWMON_READVOLT;
+		args.a2 = p_msg->arg[0];
+		break;
 	default:
 		dev_err(ctrl->dev, "Invalid command ,%d\n", p_msg->command);
 		ret = -EINVAL;
@@ -1404,6 +1412,10 @@ static int stratix10_svc_async_prepare_response(struct stratix10_svc_chan *chan,
 		 */
 		data->kaddr1 = (void *)&handle->res;
 		break;
+	case COMMAND_HWMON_READTEMP:
+	case COMMAND_HWMON_READVOLT:
+		data->kaddr1 = (void *)&handle->res.a2;
+		break;
 
 	default:
 		dev_alert(ctrl->dev, "Invalid command\n ,%d", p_msg->command);
diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h
index 935dba3633b5..4eb3a6e9659d 100644
--- a/include/linux/firmware/intel/stratix10-smc.h
+++ b/include/linux/firmware/intel/stratix10-smc.h
@@ -680,6 +680,44 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE)
 #define INTEL_SIP_SMC_ASYNC_POLL \
 	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_POLL)
 
+/**
+ * Request INTEL_SIP_SMC_ASYNC_HWMON_READTEMP
+ * Async call to request temperature
+ *
+ * Call register usage:
+ * a0 INTEL_SIP_SMC_ASYNC_HWMON_READTEMP
+ * a1 transaction job id
+ * a2 Temperature Channel
+ * a3-a17 not used
+ *
+ * Return status
+ * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_REJECTED
+ * or INTEL_SIP_SMC_STATUS_BUSY
+ * a1-a17 not used
+ */
+#define INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READTEMP	0xE8
+#define INTEL_SIP_SMC_ASYNC_HWMON_READTEMP \
+	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READTEMP)
+
+/**
+ * Request INTEL_SIP_SMC_ASYNC_HWMON_READVOLT
+ * Async call to request voltage
+ *
+ * Call register usage:
+ * a0 INTEL_SIP_SMC_ASYNC_HWMON_READVOLT
+ * a1 transaction job id
+ * a2 Voltage Channel
+ * a3-a17 not used
+ *
+ * Return status
+ * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_REJECTED
+ * or INTEL_SIP_SMC_STATUS_BUSY
+ * a1-a17 not used
+ */
+#define INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READVOLT	0xE9
+#define INTEL_SIP_SMC_ASYNC_HWMON_READVOLT \
+	INTEL_SIP_SMC_ASYNC_VAL(INTEL_SIP_SMC_ASYNC_FUNC_ID_HWMON_READVOLT)
+
 /**
  * Request INTEL_SIP_SMC_ASYNC_RSU_GET_SPT
  * Async call to get RSU SPT from SDM.
-- 
2.43.7


^ permalink raw reply related

* [PATCH 4/5] hwmon: add Stratix 10 SoC FPGA hardware monitor driver
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add a hardware monitoring driver for Altera Stratix 10 SoC FPGA devices
that reads temperature and voltage sensors through the Stratix 10 service
layer. Use the asynchronous service layer interface when available, with
a synchronous fallback.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 Documentation/hwmon/index.rst           |   1 +
 Documentation/hwmon/stratix10-hwmon.rst |  31 ++
 MAINTAINERS                             |   2 +
 drivers/hwmon/Kconfig                   |  10 +
 drivers/hwmon/Makefile                  |   1 +
 drivers/hwmon/stratix10-hwmon.c         | 575 ++++++++++++++++++++++++
 6 files changed, 620 insertions(+)
 create mode 100644 Documentation/hwmon/stratix10-hwmon.rst
 create mode 100644 drivers/hwmon/stratix10-hwmon.c

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8b655e5d6b68..30f533301903 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -244,6 +244,7 @@ Hardware Monitoring Kernel Drivers
    sparx5-temp
    spd5118
    stpddc60
+   stratix10-hwmon
    surface_fan
    sy7636a-hwmon
    tc654
diff --git a/Documentation/hwmon/stratix10-hwmon.rst b/Documentation/hwmon/stratix10-hwmon.rst
new file mode 100644
index 000000000000..61b682fe177a
--- /dev/null
+++ b/Documentation/hwmon/stratix10-hwmon.rst
@@ -0,0 +1,31 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver stratix10-hwmon
+=============================
+
+Supported chips:
+
+ * Altera Stratix 10 SoC FPGA
+
+Authors:
+      - Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+      - Tze Yee Ng <tze.yee.ng@altera.com>
+
+Description
+-----------
+
+This driver supports hardware monitoring for Altera Stratix 10 SoC FPGA
+devices through the Secure Device Manager and Stratix 10 service layer.
+
+The following sensor types are supported:
+
+  * temperature
+  * voltage
+
+Usage Notes
+-----------
+
+The driver relies on a device tree node to enumerate sensors present on the
+specific device. See
+Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml for details
+of the device-tree node.
diff --git a/MAINTAINERS b/MAINTAINERS
index 678f6c429627..5afdf286f8f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -943,6 +943,8 @@ M:	Tze Yee Ng <tze.yee.ng@altera.com>
 L:	linux-hwmon@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/hwmon/altr,stratix10-hwmon.yaml
+F:	Documentation/hwmon/stratix10-hwmon.rst
+F:	drivers/hwmon/stratix10-hwmon.c
 
 ALTERA MAILBOX DRIVER
 M:	Tien Sung Ang <tiensung.ang@altera.com>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 14e4cea48acc..8eff1c71a226 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -2112,6 +2112,16 @@ config SENSORS_SMSC47M192
 	  This driver can also be built as a module. If so, the module
 	  will be called smsc47m192.
 
+config SENSORS_ALTERA_SOCFPGA_STRATIX10
+	tristate "Altera SoC FPGA Stratix 10 hardware monitoring features"
+	depends on INTEL_STRATIX10_SERVICE
+	help
+	  If you say yes here you get support for the temperature and
+	  voltage sensors of Altera SoC FPGA Stratix 10 devices.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called stratix10-hwmon.
+
 config SENSORS_SMSC47B397
 	tristate "SMSC LPC47B397-NC"
 	depends on HAS_IOPORT
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 982ee2c6f9de..7e643de0e7d4 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -217,6 +217,7 @@ obj-$(CONFIG_SENSORS_SMPRO)	+= smpro-hwmon.o
 obj-$(CONFIG_SENSORS_SMSC47B397)+= smsc47b397.o
 obj-$(CONFIG_SENSORS_SMSC47M1)	+= smsc47m1.o
 obj-$(CONFIG_SENSORS_SMSC47M192)+= smsc47m192.o
+obj-$(CONFIG_SENSORS_ALTERA_SOCFPGA_STRATIX10)	+= stratix10-hwmon.o
 obj-$(CONFIG_SENSORS_SPARX5)	+= sparx5-temp.o
 obj-$(CONFIG_SENSORS_SPD5118)	+= spd5118.o
 obj-$(CONFIG_SENSORS_STTS751)	+= stts751.o
diff --git a/drivers/hwmon/stratix10-hwmon.c b/drivers/hwmon/stratix10-hwmon.c
new file mode 100644
index 000000000000..7ed1116e57b8
--- /dev/null
+++ b/drivers/hwmon/stratix10-hwmon.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Altera Stratix 10 SoC FPGA hardware monitoring driver
+ *
+ * Copyright (c) 2026 Altera Corporation
+ *
+ * Authors:
+ *	Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
+ *	Tze Yee Ng <tze.yee.ng@altera.com>
+ */
+
+#include <linux/bitops.h>
+#include <linux/cleanup.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/firmware/intel/stratix10-svc-client.h>
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+#define HWMON_TIMEOUT			msecs_to_jiffies(SVC_HWMON_REQUEST_TIMEOUT_MS)
+#define HWMON_RETRY_SLEEP_MS		1U
+#define HWMON_ASYNC_MSG_RETRY		3U
+#define STRATIX10_HWMON_MAXSENSORS	16
+#define STRATIX10_HWMON_TEMPERATURE	"temperature"
+#define STRATIX10_HWMON_VOLTAGE		"voltage"
+#define STRATIX10_HWMON_CHANNEL_MASK	GENMASK(15, 0)
+#define STRATIX10_HWMON_PAGE_SHIFT	16
+#define STRATIX10_HWMON_ATTR_VISIBLE	0444
+/* Temperature from SDM is signed Q8.8 millidegrees Celsius (8 fractional bits). */
+#define STRATIX10_HWMON_TEMP_FRAC_BITS	8
+#define STRATIX10_HWMON_TEMP_FRAC_DIV	BIT(STRATIX10_HWMON_TEMP_FRAC_BITS)
+/* Voltage from SDM is unsigned Q16 (millivolts, 16 fractional bits). */
+#define STRATIX10_HWMON_VOLT_FRAC_BITS	16
+#define STRATIX10_HWMON_VOLT_FRAC_DIV	BIT(STRATIX10_HWMON_VOLT_FRAC_BITS)
+
+#define ETEMP_INACTIVE			0x80000000U
+#define ETEMP_TOO_OLD			0x80000001U
+#define ETEMP_NOT_PRESENT		0x80000002U
+#define ETEMP_TIMEOUT			0x80000003U
+#define ETEMP_CORRUPT			0x80000004U
+#define ETEMP_BUSY			0x80000005U
+#define ETEMP_NOT_INITIALIZED		0x800000FFU
+
+struct stratix10_hwmon_priv {
+	struct stratix10_svc_chan *chan;
+	struct stratix10_svc_client client;
+	struct completion completion;
+	struct mutex lock;	/* protect SVC calls */
+	bool async;
+	u32 temperature;
+	u32 voltage;
+	int temperature_channels;
+	int voltage_channels;
+	const char *temp_chan_names[STRATIX10_HWMON_MAXSENSORS];
+	const char *volt_chan_names[STRATIX10_HWMON_MAXSENSORS];
+	u32 temp_chan[STRATIX10_HWMON_MAXSENSORS];
+	u32 volt_chan[STRATIX10_HWMON_MAXSENSORS];
+};
+
+static umode_t stratix10_hwmon_is_visible(const void *dev,
+					  enum hwmon_sensor_types type,
+					 u32 attr, int chan)
+{
+	const struct stratix10_hwmon_priv *priv = dev;
+
+	switch (type) {
+	case hwmon_temp:
+		if (chan < priv->temperature_channels)
+			return STRATIX10_HWMON_ATTR_VISIBLE;
+		return 0;
+	case hwmon_in:
+		if (chan < priv->voltage_channels)
+			return STRATIX10_HWMON_ATTR_VISIBLE;
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+static void stratix10_hwmon_readtemp_cb(struct stratix10_svc_client *client,
+					struct stratix10_svc_cb_data *data)
+{
+	struct stratix10_hwmon_priv *priv = client->priv;
+
+	if (data->status == BIT(SVC_STATUS_OK)) {
+		priv->temperature = (u32)*(unsigned long *)data->kaddr1;
+	} else if (data->kaddr1) {
+		dev_err(client->dev, "%s failed with status 0x%x, value 0x%lx\n",
+			__func__, data->status,
+			*(unsigned long *)data->kaddr1);
+	} else {
+		dev_err(client->dev, "%s failed with status 0x%x\n",
+			__func__, data->status);
+	}
+
+	complete(&priv->completion);
+}
+
+static void stratix10_hwmon_readvolt_cb(struct stratix10_svc_client *client,
+					struct stratix10_svc_cb_data *data)
+{
+	struct stratix10_hwmon_priv *priv = client->priv;
+
+	if (data->status == BIT(SVC_STATUS_OK)) {
+		priv->voltage = (u32)*(unsigned long *)data->kaddr1;
+	} else if (data->kaddr1) {
+		dev_err(client->dev, "%s failed with status 0x%x, value 0x%lx\n",
+			__func__, data->status,
+			*(unsigned long *)data->kaddr1);
+	} else {
+		dev_err(client->dev, "%s failed with status 0x%x\n",
+			__func__, data->status);
+	}
+
+	complete(&priv->completion);
+}
+
+static void stratix10_hwmon_async_callback(void *ptr)
+{
+	if (ptr)
+		complete(ptr);
+}
+
+static int stratix10_hwmon_parse_temp(long *val, u32 temperature)
+{
+	switch (temperature) {
+	case ETEMP_INACTIVE:
+	case ETEMP_NOT_PRESENT:
+	case ETEMP_CORRUPT:
+	case ETEMP_NOT_INITIALIZED:
+		return -EOPNOTSUPP;
+	case ETEMP_TIMEOUT:
+	case ETEMP_BUSY:
+	case ETEMP_TOO_OLD:
+		return -EAGAIN;
+	default:
+		/* Convert Q8.8 millidegrees Celsius to millidegrees for hwmon. */
+		*val = (long)(s32)temperature / STRATIX10_HWMON_TEMP_FRAC_DIV;
+		return 0;
+	}
+}
+
+static int stratix10_hwmon_encode_temp_arg(u32 reg, u64 *arg)
+{
+	u32 page = (reg >> STRATIX10_HWMON_PAGE_SHIFT) & STRATIX10_HWMON_CHANNEL_MASK;
+	u32 channel = reg & STRATIX10_HWMON_CHANNEL_MASK;
+
+	if (channel >= STRATIX10_HWMON_MAXSENSORS)
+		return -EINVAL;
+
+	*arg = (1ULL << channel) | ((u64)page << STRATIX10_HWMON_PAGE_SHIFT);
+	return 0;
+}
+
+static int stratix10_hwmon_encode_volt_arg(u32 reg, u64 *arg)
+{
+	u32 channel = reg & STRATIX10_HWMON_CHANNEL_MASK;
+
+	if (channel >= STRATIX10_HWMON_MAXSENSORS)
+		return -EINVAL;
+
+	*arg = 1ULL << channel;
+	return 0;
+}
+
+static int stratix10_hwmon_async_read(struct device *dev,
+				      enum hwmon_sensor_types type,
+				     struct stratix10_svc_client_msg *msg)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	struct stratix10_svc_cb_data data = {};
+	struct completion completion;
+	unsigned long wait_ret;
+	void *handle = NULL;
+	int status, index, ret;
+
+	init_completion(&completion);
+
+	for (index = 0; index < HWMON_ASYNC_MSG_RETRY; index++) {
+		status = stratix10_svc_async_send(priv->chan, msg, &handle,
+						  stratix10_hwmon_async_callback,
+						  &completion);
+		if (status == 0)
+			break;
+		dev_warn(dev, "Failed to send async message\n");
+		msleep(HWMON_RETRY_SLEEP_MS);
+	}
+
+	if (status && !handle)
+		return status;
+
+	wait_ret = wait_for_completion_io_timeout(&completion, HWMON_TIMEOUT);
+	if (wait_ret > 0)
+		dev_dbg(dev, "Received async interrupt\n");
+	else if (wait_ret == 0)
+		dev_dbg(dev, "Timeout occurred, trying to poll the response\n");
+
+	ret = -ETIMEDOUT;
+	for (index = 0; index < HWMON_ASYNC_MSG_RETRY; index++) {
+		status = stratix10_svc_async_poll(priv->chan, handle, &data);
+		if (status == -EAGAIN) {
+			dev_dbg(dev, "Async message is still in progress\n");
+		} else if (status < 0) {
+			dev_alert(dev, "Failed to poll async message: %d\n", status);
+			ret = status;
+			break;
+		} else if (status == 0) {
+			ret = 0;
+			break;
+		}
+		msleep(HWMON_RETRY_SLEEP_MS);
+	}
+
+	if (ret) {
+		dev_err(dev, "Failed to get async response\n");
+		goto done;
+	}
+
+	if (data.status) {
+		dev_err(dev, "%s returned 0x%x from SDM\n", __func__,
+			data.status);
+		ret = -EFAULT;
+		goto done;
+	}
+
+	if (type == hwmon_temp)
+		priv->temperature = (u32)*(unsigned long *)data.kaddr1;
+	else
+		priv->voltage = (u32)*(unsigned long *)data.kaddr1;
+
+	ret = 0;
+
+done:
+	stratix10_svc_async_done(priv->chan, handle);
+	return ret;
+}
+
+static int stratix10_hwmon_sync_read(struct device *dev,
+				     enum hwmon_sensor_types type,
+				    struct stratix10_svc_client_msg *msg)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	int ret;
+
+	reinit_completion(&priv->completion);
+
+	if (type == hwmon_temp)
+		priv->client.receive_cb = stratix10_hwmon_readtemp_cb;
+	else
+		priv->client.receive_cb = stratix10_hwmon_readvolt_cb;
+
+	ret = stratix10_svc_send(priv->chan, msg);
+	if (ret < 0)
+		goto status_done;
+
+	ret = wait_for_completion_interruptible_timeout(&priv->completion,
+							HWMON_TIMEOUT);
+	if (!ret) {
+		dev_err(priv->client.dev, "timeout waiting for SMC call\n");
+		ret = -ETIMEDOUT;
+		goto status_done;
+	}
+	if (ret < 0) {
+		dev_err(priv->client.dev, "error %d waiting for SMC call\n", ret);
+		goto status_done;
+	}
+
+	ret = 0;
+
+status_done:
+	stratix10_svc_done(priv->chan);
+	return ret;
+}
+
+static int stratix10_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+				u32 attr, int chan, long *val)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+	struct stratix10_svc_client_msg msg = {0};
+	int ret;
+
+	if (chan >= STRATIX10_HWMON_MAXSENSORS)
+		return -EOPNOTSUPP;
+
+	switch (type) {
+	case hwmon_temp:
+		ret = stratix10_hwmon_encode_temp_arg(priv->temp_chan[chan],
+						      &msg.arg[0]);
+		if (ret)
+			return ret;
+		msg.command = COMMAND_HWMON_READTEMP;
+		break;
+	case hwmon_in:
+		ret = stratix10_hwmon_encode_volt_arg(priv->volt_chan[chan],
+						      &msg.arg[0]);
+		if (ret)
+			return ret;
+		msg.command = COMMAND_HWMON_READVOLT;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	guard(mutex)(&priv->lock);
+	if (priv->async)
+		ret = stratix10_hwmon_async_read(dev, type, &msg);
+	else
+		ret = stratix10_hwmon_sync_read(dev, type, &msg);
+	if (ret)
+		return ret;
+
+	if (type == hwmon_temp)
+		ret = stratix10_hwmon_parse_temp(val, priv->temperature);
+	else
+		/* Convert Q16 millivolts to millivolts for hwmon. */
+		*val = (long)priv->voltage / STRATIX10_HWMON_VOLT_FRAC_DIV;
+	return ret;
+}
+
+static int stratix10_hwmon_read_string(struct device *dev,
+				       enum hwmon_sensor_types type, u32 attr,
+				      int chan, const char **str)
+{
+	struct stratix10_hwmon_priv *priv = dev_get_drvdata(dev);
+
+	switch (type) {
+	case hwmon_in:
+		*str = priv->volt_chan_names[chan];
+		return 0;
+	case hwmon_temp:
+		*str = priv->temp_chan_names[chan];
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static const struct hwmon_ops stratix10_hwmon_ops = {
+	.is_visible = stratix10_hwmon_is_visible,
+	.read = stratix10_hwmon_read,
+	.read_string = stratix10_hwmon_read_string,
+};
+
+static const struct hwmon_channel_info *stratix10_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(temp,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL),
+	HWMON_CHANNEL_INFO(in,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL,
+			   HWMON_I_INPUT | HWMON_I_LABEL),
+	NULL
+};
+
+static const struct hwmon_chip_info stratix10_hwmon_chip_info = {
+	.ops = &stratix10_hwmon_ops,
+	.info = stratix10_hwmon_info,
+};
+
+static int stratix10_hwmon_add_channel(struct device *dev, const char *type,
+				       u32 val, const char *label,
+				      struct stratix10_hwmon_priv *priv)
+{
+	if (!strcmp(type, STRATIX10_HWMON_TEMPERATURE)) {
+		if (priv->temperature_channels >= STRATIX10_HWMON_MAXSENSORS) {
+			dev_warn(dev, "Can't add temp node %s, too many channels\n",
+				 label);
+			return 0;
+		}
+
+		priv->temp_chan_names[priv->temperature_channels] = label;
+		priv->temp_chan[priv->temperature_channels] = val;
+		priv->temperature_channels++;
+		return 0;
+	}
+
+	if (!strcmp(type, STRATIX10_HWMON_VOLTAGE)) {
+		if (priv->voltage_channels >= STRATIX10_HWMON_MAXSENSORS) {
+			dev_warn(dev, "Can't add voltage node %s, too many channels\n",
+				 label);
+			return 0;
+		}
+
+		priv->volt_chan_names[priv->voltage_channels] = label;
+		priv->volt_chan[priv->voltage_channels] = val;
+		priv->voltage_channels++;
+		return 0;
+	}
+
+	dev_warn(dev, "unsupported sensor type %s\n", type);
+	return 0;
+}
+
+static int stratix10_hwmon_probe_child_from_dt(struct device *dev,
+					       struct device_node *child,
+					      struct stratix10_hwmon_priv *priv)
+{
+	struct device_node *grandchild;
+	const char *label;
+	u32 val;
+	int ret;
+
+	for_each_child_of_node(child, grandchild) {
+		ret = of_property_read_u32(grandchild, "reg", &val);
+		if (ret) {
+			dev_err(dev, "missing reg property of %pOFn\n",
+				grandchild);
+			of_node_put(grandchild);
+			return ret;
+		}
+
+		ret = of_property_read_string(grandchild, "label", &label);
+		if (ret)
+			label = grandchild->name;
+
+		stratix10_hwmon_add_channel(dev, child->name, val, label, priv);
+	}
+
+	return 0;
+}
+
+static int stratix10_hwmon_probe_from_dt(struct device *dev,
+					 struct stratix10_hwmon_priv *priv)
+{
+	struct device_node *child;
+	int ret;
+
+	if (!dev->of_node)
+		return 0;
+
+	for_each_child_of_node(dev->of_node, child) {
+		ret = stratix10_hwmon_probe_child_from_dt(dev, child, priv);
+		if (ret) {
+			of_node_put(child);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int stratix10_hwmon_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct stratix10_hwmon_priv *priv;
+	struct device *hwmon_dev;
+	int ret;
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->client.dev = dev;
+	priv->client.priv = priv;
+	init_completion(&priv->completion);
+	mutex_init(&priv->lock);
+
+	ret = stratix10_hwmon_probe_from_dt(dev, priv);
+	if (ret) {
+		dev_err(dev, "Unable to probe from device tree\n");
+		return ret;
+	}
+
+	if (!priv->temperature_channels && !priv->voltage_channels) {
+		dev_err(dev, "no temperature or voltage channels in device tree\n");
+		return -ENODEV;
+	}
+
+	priv->chan = stratix10_svc_request_channel_byname(&priv->client,
+							  SVC_CLIENT_HWMON);
+	if (IS_ERR(priv->chan)) {
+		ret = PTR_ERR(priv->chan);
+		if (ret == -EPROBE_DEFER)
+			dev_dbg(dev, "service channel %s not ready, deferring probe\n",
+				SVC_CLIENT_HWMON);
+		else
+			dev_err(dev, "couldn't get service channel %s: %d\n",
+				SVC_CLIENT_HWMON, ret);
+		return ret;
+	}
+
+	ret = stratix10_svc_add_async_client(priv->chan, false);
+	switch (ret) {
+	case 0:
+		priv->async = true;
+		break;
+	case -EINVAL:
+		dev_dbg(dev, "async operations not supported, using sync mode\n");
+		priv->async = false;
+		break;
+	default:
+		dev_err(dev, "failed to add async client: %d\n", ret);
+		stratix10_svc_free_channel(priv->chan);
+		return ret;
+	}
+
+	dev_info(dev, "Initialized %d temperature and %d voltage channels\n",
+		 priv->temperature_channels, priv->voltage_channels);
+
+	hwmon_dev = devm_hwmon_device_register_with_info(dev, "stratix10_hwmon",
+							 priv,
+							 &stratix10_hwmon_chip_info,
+							 NULL);
+	if (IS_ERR(hwmon_dev)) {
+		if (priv->async)
+			stratix10_svc_remove_async_client(priv->chan);
+		stratix10_svc_free_channel(priv->chan);
+		return PTR_ERR(hwmon_dev);
+	}
+
+	platform_set_drvdata(pdev, priv);
+	return 0;
+}
+
+static void stratix10_hwmon_remove(struct platform_device *pdev)
+{
+	struct stratix10_hwmon_priv *priv = platform_get_drvdata(pdev);
+
+	if (priv->async)
+		stratix10_svc_remove_async_client(priv->chan);
+	stratix10_svc_free_channel(priv->chan);
+}
+
+static const struct of_device_id stratix10_hwmon_of_match[] = {
+	{ .compatible = "altr,stratix10-hwmon" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, stratix10_hwmon_of_match);
+
+static struct platform_driver stratix10_hwmon_driver = {
+	.driver = {
+		.name = "stratix10-hwmon",
+		.of_match_table = stratix10_hwmon_of_match,
+	},
+	.probe = stratix10_hwmon_probe,
+	.remove = stratix10_hwmon_remove,
+};
+module_platform_driver(stratix10_hwmon_driver);
+
+MODULE_AUTHOR("Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>");
+MODULE_AUTHOR("Tze Yee Ng <tze.yee.ng@altera.com>");
+MODULE_DESCRIPTION("Altera Stratix 10 SoC FPGA hardware monitoring driver");
+MODULE_LICENSE("GPL");
-- 
2.43.7


^ permalink raw reply related

* [PATCH 5/5] arm64: dts: socfpga: stratix10: add hwmon node
From: tze.yee.ng @ 2026-06-19  9:38 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-hwmon, devicetree, linux-kernel, Dinh Nguyen, Mahesh Rao,
	Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <cover.1781861409.git.tze.yee.ng@altera.com>

From: Tze Yee Ng <tze.yee.ng@altera.com>

Add an hwmon child node under the Stratix 10 service layer and describe
the SoCDK voltage and temperature sensors using the altr,stratix10-hwmon
compatible.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 .../boot/dts/altera/socfpga_stratix10.dtsi    |  5 +++
 .../dts/altera/socfpga_stratix10_socdk.dts    | 33 +++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 0d9cad0c0351..afb11e6f6813 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -78,6 +78,11 @@ svc {
 			fpga_mgr: fpga-mgr {
 				compatible = "intel,stratix10-soc-fpga-mgr";
 			};
+
+			temp_volt: hwmon {
+				compatible = "altr,stratix10-hwmon";
+				status = "disabled";
+			};
 		};
 	};
 
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
index e2a1cea7f3da..01a8ffe430ed 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts
@@ -134,3 +134,36 @@ root: partition@4200000 {
 		};
 	};
 };
+
+&temp_volt {
+	status = "okay";
+
+	voltage {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		input@2 {
+			label = "0.8V VCC";
+			reg = <2>;
+		};
+
+		input@3 {
+			label = "1.8V VCCIO_SDM";
+			reg = <3>;
+		};
+
+		input@6 {
+			label = "0.9V VCCERAM";
+			reg = <6>;
+		};
+	};
+
+	temperature {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		input@0 {
+			label = "Main Die SDM";
+			reg = <0x0>;
+		};
+	};
+};
-- 
2.43.7


^ permalink raw reply related

* Re: [PATCH v8 08/46] KVM: Provide generic interface for checking memory private/shared status
From: Suzuki K Poulose @ 2026-06-19  9:57 UTC (permalink / raw)
  To: Fuad Tabba, ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTxu32nQ+vPV7Zmcw76_-V4g2_g=P_UzRnnO2dP1PFO2ww@mail.gmail.com>

On 19/06/2026 09:21, Fuad Tabba wrote:
> On Fri, 19 Jun 2026 at 09:19, Fuad Tabba <tabba@google.com> wrote:
>>
>> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
>> <devnull+ackerleytng.google.com@kernel.org> wrote:
>>>
>>> From: Sean Christopherson <seanjc@google.com>
>>>
>>> Introduce a generic kvm_mem_is_private() interface using a static call to
>>> determine if a GFN is private. This allows the implementation for checking
>>> a GFN's private/shared status to be set at runtime.
>>>
>>> In preparation for choosing implementations between a guest_memfd lookup
>>> and the existing VM attribute lookup, rename the existing
>>> VM-attribute-based check to kvm_vm_mem_is_private to emphasize that it
>>> looks up VM attributes.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>
>> (SoB fix plz)
>>
>> Reviewed-by: Fuad Tabba <tabba@google.com>
>>
>> Cheers,
>> /fuad
>>> ---
>>>   include/linux/kvm_host.h | 12 +++++++++++-
>>>   virt/kvm/kvm_main.c      | 15 +++++++++++++++
>>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index eb26d4ea8945a..3915da2a61778 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2546,7 +2546,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>>>   bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>>>                                           struct kvm_gfn_range *range);
>>>
>>> -static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> +static inline bool kvm_vm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> 
> Should have read the Sashiko review first, but where is this used?
> It's not used at all in this series...

See below:

> 
> /fuad
> 
>>>   {
>>>          return kvm_get_vm_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>   }
>>> @@ -2557,6 +2557,16 @@ static inline bool kvm_mem_range_is_private(struct kvm *kvm, gfn_t start,
>>>                                                    KVM_MEMORY_ATTRIBUTE_PRIVATE,
>>>                                                    KVM_MEMORY_ATTRIBUTE_PRIVATE);
>>>   }
>>> +#endif  /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>>> +
>>> +#ifdef kvm_arch_has_private_mem
>>> +typedef bool (kvm_mem_is_private_t)(struct kvm *kvm, gfn_t gfn);
>>> +DECLARE_STATIC_CALL(__kvm_mem_is_private, kvm_mem_is_private_t);
>>> +
>>> +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +       return static_call(__kvm_mem_is_private)(kvm, gfn);
>>> +}
>>>   #else
>>>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>   {
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 6669f1477013c..8b238e461b854 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2627,6 +2627,20 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>>>   }
>>>   #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>>>
>>> +#ifdef kvm_arch_has_private_mem
>>> +DEFINE_STATIC_CALL_RET0(__kvm_mem_is_private, kvm_mem_is_private_t);
>>> +EXPORT_STATIC_CALL_GPL(__kvm_mem_is_private);
>>> +
>>> +static void kvm_init_memory_attributes(void)
>>> +{
>>> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>>> +       static_call_update(__kvm_mem_is_private, kvm_vm_mem_is_private);
>>> +#endif
>>> +}


Here ^^ as the static call update ?


Suzuki

^ permalink raw reply

* Re: [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion
From: Fuad Tabba @ 2026-06-19 10:09 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-15-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> When memory in guest_memfd is converted from private to shared, the
> platform-specific state associated with the guest-private pages must be
> invalidated or cleaned up.
>
> Iterate over the folios in the affected range and call the
> kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> architectures to perform necessary teardown, such as updating hardware
> metadata or encryption states, before the pages are transitioned to the
> shared state.
>
> Invoke this helper after indicating to KVM's mmu code that an invalidation
> is in progress to stop in-flight page faults from succeeding.
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Coming back to this after working through the arm64/pKVM side. My
Reviewed-by here is from the previous round and the patch hasn't
changed, but I missed an implication for arm64.

kvm_arch_gmem_invalidate() is now called from two paths with the same
(start, end) signature: folio teardown (kvm_gmem_free_folio) and
private->shared conversion (here). For SNP/TDX that's fine, conversion is
destructive anyway. For pKVM the two need opposite content semantics:
conversion must preserve the page in place (same physical page, the point
of in-place conversion without encryption), while teardown must scrub it
before returning it to the host.

The hook gets only a pfn range with no indication of which caller it's
serving, so arm64 can't give the two paths the behaviour they need. It
would help to signal intent on the conversion path: a reason/flag, a
separate hook, or not routing non-destructive conversion through the
teardown hook.

arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
second caller now, and it's cheaper to leave room for the distinction
than to change a generic contract other arches depend on later.

Cheers,
/fuad


> ---
>  virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 433f79047b9d1..3c94442bc8131 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
>         return safe;
>  }
>
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> +       struct folio_batch fbatch;
> +       pgoff_t next = start;
> +       int i;
> +
> +       folio_batch_init(&fbatch);
> +       while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> +               for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> +                       struct folio *folio = fbatch.folios[i];
> +                       pgoff_t start_index, end_index;
> +                       kvm_pfn_t start_pfn, end_pfn;
> +
> +                       start_index = max(start, folio->index);
> +                       end_index = min(end, folio_next_index(folio));
> +                       /*
> +                        * end_index is either in folio or points to
> +                        * the first page of the next folio. Hence,
> +                        * all pages in range [start_index, end_index)
> +                        * are contiguous.
> +                        */
> +                       start_pfn = folio_file_pfn(folio, start_index);
> +                       end_pfn = start_pfn + end_index - start_index;
> +
> +                       kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> +               }
> +
> +               folio_batch_release(&fbatch);
> +               cond_resched();
> +       }
> +}
> +#else
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> +#endif
> +
>  static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>                                      size_t nr_pages, uint64_t attrs,
>                                      pgoff_t *err_index)
> @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>          */
>
>         kvm_gmem_invalidate_start(inode, start, end);
> +
> +       if (!to_private)
> +               kvm_gmem_invalidate(inode, start, end);
> +
>         mas_store_prealloc(&mas, xa_mk_value(attrs));
> +
>         kvm_gmem_invalidate_end(inode, start, end);
>  out:
>         filemap_invalidate_unlock(mapping);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v4 00/31] Introduce SCMI Telemetry FS support
From: David Hildenbrand (Arm) @ 2026-06-19 10:16 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Christian Brauner, linux-kernel, linux-arm-kernel, arm-scmi,
	linux-fsdevel, linux-doc, sudeep.holla, james.quinlan, f.fainelli,
	vincent.guittot, etienne.carriere, peng.fan, michal.simek, d-gole,
	jic23, elif.topuz, lukasz.luba, philip.radford,
	souvik.chakravarty, leitao, kas, puranjay, usama.arif,
	kernel-team
In-Reply-To: <ajR_FBWOoXJKSeoH@pluto>


>> Is the configuration aspect limited to enabling selected events, or is there
>> more that can be configured?
>>
> 
> The needed configuration is:
> 
>  - global Telemetry enable (tlm_enable)
>  - global common update_interval (current_update_interval)

Okay, so simple global properties.

>  - per-DE enable/disable (des/0x<NNNN>/enable)
>  - per-DE timestamping enable/disable (des/0x<NNNN>/tstamp_enable)
> 
>  ... then there are a couple of handy catch-all entries:
> 	all_des_enable, all_des_tstamp_enable

Okay, so fairly trivial configs.
> 
> Note that all the existent DEs are discovered at runtime dynamically via
> SCMI in the background at init/probe and then never change: i.e.
> the tree is statically created upon discovery, user cannot
> create/destroy or symlink files at will, nor the backend platform FW
> running the SCMI server can pop-up new DataEvents after the initial
> enumeration.

That makes sense.

> 
> All the above configs can also be pre-defined in the FW (at built time)
> as being default boot-on with predefined values, like a specific
> boot-on update interval, so that you could have a system in which really
> you dont need to configure anything...everything is on and you just
> read data. (unless you want to change config of course...)

Okay, so the initial value of some parameters might not be "disabled" etc.

I guess, from a user space perspective, reading should be allowed by everyone
but writing should be limited to root?

> 
> There is more stuff that indeed is configurable per the SCMI spec
> but these additional params are hidden into the SCMI Telemetry protocol
> layer (the initial patches in this series) and NOT made available to
> the driver/users of the protocol (like the SCMI FS driver that sits on
> top)

Do you assume that there will get significantly more config options added in the
future for user space to configure?

> 
> IOW, this humonguos series (~8k lines) is only partially composed by
> the Filesystem driver (~3k): the bulk of the Telemetry logic and SCMI
> message exchanges are contained in the SCMI Protocol stack which has
> been extended to support the Telemerty protocol at first
> (the 'firmware: arm_scmi:' initial patches).
> 
> This latter common support is exposed by the SCMI stack for the SCMI
> drivers to use via custom per-protocol operations (not an orginal name :P)
> exposed in include/linux/scmi_protocol.h
> 
> So when you write into FS to configure smth, you end up calling an internal
> tlm_proto_ops that in turn will cause an SCMI message to be sent
> (in some cases say to enable a DE or set the update interval)

Makes sense.

> 
> When you read something, you end up calling another Telemetry operation
> that in turn returns you the DataEvent value you were looking for...how
> this is retrieved via SCMI in the background is transparent to the
> FS driver because, again, these details are buried into the protocol
> layer. Talking about reads, you can:
> 
>  - read a single value from des/0x<NNNN>/value
>  - read ALL the currently enabled DE in a bulk read via des_bulk_read
> 
> ...most of the other entries in the tree are simply RO properties of the DEs
> that have been discovered at enumeration time.

Is this bulk-reading relevant for performance or just a "nice to have" ?


> 
> Given that walking a FS tree and issuing configuration as writes is NOT
> performant really (nor handy if you are not a human), currently, even
> in this FS-based series you can really perform all of the discovery AND
> the configuration tasks WITHOUT walking the filesystem tree, but instead
> issuing a bunch of IOCTLs issued on a special 'control' file that I
> embedded in the FS. Such UAPI IOCTLs described at:

Makes sense.

> 
> https://lore.kernel.org/arm-scmi/20260612223802.1337232-6-cristian.marussi@arm.com/T/#u
>  
> So my plan of action in order to get rid of the FS in-kenel implementation
> would be to drop this Filesystem in favour of simple character devices
> and move the existent IOCTLs interface (revisited where needed) on top of
> these devices: that way you will be able to use IOCTLs to enumerate the
> Telemetry sources and then configure them.
> 
> Read will then happen (probably) leveraging a number of chardev fops like:
> IOCTLs, .read and .mmap...up to the tool decide what to use.
> 
> After this porting to chardev is done, I would start optionally exposing
> again all of this in a human-readable alternative way by adding a layer
> of FUSE on top of this chardev interface.

Yes. How high-priority is the fs side? Or would a tool using a library to access
this information also work in the first step?

> 
> Basically my aim is to drop the FS implementation from the kernel, as
> advised, while trying to optionally make it still available via a userspace
> FUSE implementation...IOW the intention would be for the next V5 to expose
> the same interfaces as V4 but with the help of a tool instead that builds,
> if wanted, a FUSE mount built on top of the chardev interface.
> 
> So basically 'floating up' the current FS-like interface into userspace.

Yes.

> 
>>
>> You mention json here ... but I assume the data we are getting fed by the
>> protocol is not in some default format? (e.g., json)
> 
> The data format is defined by the SCMI spec and it is buried in the SCMI
> layer, there are a number of collection method and a number of formats: this
> is NOT exposed from the SCMI core BUT handled transparently.
> 
> The raw spec format basically defines how DE ID, Tstamps, values are represented
> in memory and how their consistency can be assured despite the fact that
> platform could update the same entries that a user is concurrently reading...
> 
> JSON definitions only assign a semantic to the DEs (in theory...): e.g. on this
> specific platform...wth is 0x1234 ? ..also note that JSON defs are NOT part of
> the spec....they do NOT really exist for the Kernel: they are parsed and
> interpreted by more complex user space tools that are supposed to leverage some
> of these interfaces to retrieve data and carry-on analysis.

What I thought, thanks.

> 
>>
>>
>> Maybe you have it in some of the patches here, but what does the typical
>> directory + file structure look like in the current implementation?
>>
>> Do you have an example?
>>
>> Also, is everything in that filesystem read-only, or are there some writable
>> file (IOW, how is stuff configured?).
> 
> See above for config/write entry ... and I think you found the FS layout in the
> doc already...
> 
>>
>>
>> Okay, so you really only feed this data to user space, exposing all the data you
>> have easily available as part of the protocol.
> 
> Yes, no interpetation nor filtering: I expose all that have enumerated and/
> discovered by the protocol, allowing for configurations while hiding the inner
> SCMI Telemetry mechanism...
> 
>>
>>
>> It's a good question how that could be done, if you need more information about
>> these events from user space.
> 
> I have NOT really delved into that, so as of know we do NOT fed any data
> to existing Kernel subsystems, not there is any available in-kernel
> interface to consume DE data (nobody asked), but, I can imagine 2 solution:
> 
>  - our beloved architects decide to 'architect' more DataEvents in the
>    next version of the spec.. i.e. they reserve some specific DE IDs to
>    represent some well defined entity (like it is done already in the spec
>    for a dozen IDs)...this avoids the needs of any new interface all
>    together

That would be the cleanest solution :)

> 
> OR
> 
> - we open some sort of user-->kernel ABI channel 'somewhere' where the
>   userspace tool, interpreting the JSON description, can communicate something
>   like " on this platform ID 1,2,3,4 should be fed to the IIO sensors frmwk
>   too, while ID 39,8,76 can be fed to HWMON..." etc
> 
>>
>> [...]
>>
>>
>> That sounds reasonable.
>>
>> [...]
>>
>>> ...I would not say that this was the kind of feedback I was hoping for,
>>> but I am NOT gonna argue, given that you shot down already what I thought
>>> were all my best selling points :P
>>>
>>> At this point my understanding is that the way forward must be to use
>>> a custom tool to configure/extract/translate the raw Telemetry data and
>>> move up into userspace the whole human readable FS layer via FUSE, if
>>> really needed.
>>>
>>> I suppose that the new kernel/user interface has to be some dedicated char
>>> device implementing proper fops. (like I did previously in early versions
>>> of this series and then abandoned...)
>>>
>>> Is this you have in mind ? Dedicated character device(s) with enough fops
>>> to be able to configure/extract Telemetry data with a custom tool ?
>>
>> I cannot speak for Christian, but I guess you could have some kind of libscmi in
>> user space that can obtain the information (as you say, probably char device,
>> not sure which alternatives we have), to expose the data through a nice ABI, to
>> then either make tools build upon that directly, or have a fuse server in user
>> space that mimics what you currently do with the file system.
> 
> My aim would be at first a simple tool that can exercise the chardev interface to
> discover configure and read back data, and then a FUSE server on top of this to
> optionally expose the human readable FS....I suppose our internal and external
> customers can use the FS interface to validate/test/script on one side, OR
> simply code their own tools/libs to use directly the bare chardev inteface...
> 
> ...we do have a tools team already working on a library to ease all of this
> SCMI Telemtry collection and analysis...it is just a matter to re-target the
> kind of lower level interfaces that they are using in the near future
> probably (they were already planning indeed AFAIK to use more performant
> interface that FS...)

Good.

> 
>>
>> One thing that is not clear to me yet is how stuff would be configured, and how
>> possibly multiple users of libscmi would possibly interact.
>>
> 
> Configuration/discovery will happen via IOCTls, while consuming the Data
> can happen:
> 
>  - all together in bulk via a device read fops
>  - a single DE via a targeted IOCTL
>  - direct access to the raw SCMI data via dev/mmap of the underlying SCMI
>    areas (that means the tool has to parse the SCMI format defined by the
>    spec on its own, without the currently provided Kernel mediation...)
> 
> Regarding the user concurrency, I have already explicitly pushed back on
> this, our own tools team: any concurrent read or configuration write is
> allowed and properly handled in a consistent way, BUT on the configuration
> side the last write/ioctl wins: there is NO in-kernel OR userspace
> co-ordination provided out of the box: IOW if you use multiple tools
> concurrently to apply conflicting configurations, it is none of our problem

Would concurrent reading work? I assume so, right?

> 
> ...similarly as if you have an actively running network configuration daemon
> and you try to set your IP manually...nobody will prevent you from doing this,
> the same netlink will be used freely by you on the shell and the daemon (if you
> have enough privilege), but you will gonna have unexpected result...
> 
> I dont either see the case to enforce exclusive access for Telemetry resources:
> co-ordination is up to the user in my view...I mean if you have 2 tools
> configuring concurrently SCMI telemetry in a conflicting way something has been
> misconfigured somewhere
> 
> .....having said that, I understand that the concurrency co-ordination
> issue can be particularly tricky to spot and solve in userspace, so I DO
> expose a generation counter entry that is updated on any configuration
> change, so that a userspace app using Telemetry can monitor (poll) this
> counter to spot if someone else on the system is quietly suddenly applying
> configuration changes...

Okay, so a single writer (admin) changing stuff could get picked up my possibly
many concurrent readers?

> 
>>>
>>> Should/could such a tool live in the kernel tree (tools/) at least for
>>> ease of development/deployment ?
>>
>> I think OOT.
>>
> 
> Ok.
> 
> Sorry for the long email..I hope I have clarified the situation, anyway
> I am already moving to get rid of the in-kernel interface as advised in
> favour of a chardev kernel interface and an optional FUSE based FS...

Yes, thank you a lot, I hope it also helps Christian to help push this into the
right direction!

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-06-19 10:33 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rodrigo Alencar, Janani Sunil, Lars-Peter Clausen,
	Michael Hennerich, David Lechner, Nuno Sá, Andy Shevchenko,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Philipp Zabel,
	Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
	linux-doc, Mark Brown
In-Reply-To: <20260614204455.408c4d40@jic23-huawei>


On 6/14/26 21:44, Jonathan Cameron wrote:
> On Tue, 9 Jun 2026 16:47:23 +0200
> Janani Sunil <jan.sun97@gmail.com> wrote:
>
>> On 5/26/26 15:11, Rodrigo Alencar wrote:
>>> On 26/05/19 05:42PM, Janani Sunil wrote:
>>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>>>> buffered voltage output digital-to-analog converter (DAC) with an
>>>> integrated precision reference.
>>> ...
>>> Probably others may comment on that, but...
>>>
>>> This parent node may support device addressing for multi-device support through
>>> those ID pins. I suppose that each device may have its own power supplies or
>>> other resources like the toggle pins or reset and enable.
>>>
>>> That way I suppose that an example would look like...
>>>   
>>>> +
>>>> +patternProperties:
>>>> +  "^channel@([0-9]|1[0-5])$":
>>>> +    type: object
>>>> +    description: Child nodes for individual channel configuration
>>>> +
>>>> +    properties:
>>>> +      reg:
>>>> +        description: Channel number.
>>>> +        minimum: 0
>>>> +        maximum: 15
>>>> +
>>>> +      adi,output-range-microvolt:
>>>> +        description: |
>>>> +          Output voltage range for this channel as [min, max] in microvolts.
>>>> +          If not specified, defaults to 0V to 5V range.
>>>> +        oneOf:
>>>> +          - items:
>>>> +              - const: 0
>>>> +              - enum: [5000000, 10000000, 20000000, 40000000]
>>>> +          - items:
>>>> +              - const: -5000000
>>>> +              - const: 5000000
>>>> +          - items:
>>>> +              - const: -10000000
>>>> +              - const: 10000000
>>>> +          - items:
>>>> +              - const: -15000000
>>>> +              - const: 15000000
>>>> +          - items:
>>>> +              - const: -20000000
>>>> +              - const: 20000000
>>>> +
>>>> +    required:
>>>> +      - reg
>>>> +
>>>> +    additionalProperties: false
>>>> +
>>>> +required:
>>>> +  - compatible
>>>> +  - reg
>>>> +  - vdd-supply
>>>> +  - avdd-supply
>>>> +  - hvdd-supply
>>>> +
>>>> +dependencies:
>>>> +  spi-cpha: [ spi-cpol ]
>>>> +  spi-cpol: [ spi-cpha ]
>>>> +
>>>> +allOf:
>>>> +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
>>>> +
>>>> +unevaluatedProperties: false
>>>> +
>>>> +examples:
>>>> +  - |
>>>> +    #include <dt-bindings/gpio/gpio.h>
>>>> +
>>>> +    spi {
>>>> +        #address-cells = <1>;
>>>> +        #size-cells = <0>;
>>>> +
>>>> +        dac@0 {
>>>> +            compatible = "adi,ad5529r-16";
>>>> +            reg = <0>;
>>>> +            spi-max-frequency = <25000000>;
>>>> +
>>>> +            vdd-supply = <&vdd_regulator>;
>>>> +            avdd-supply = <&avdd_regulator>;
>>>> +            hvdd-supply = <&hvdd_regulator>;
>>>> +            hvss-supply = <&hvss_regulator>;
>>>> +
>>>> +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
>>>> +
>>>> +            #address-cells = <1>;
>>>> +            #size-cells = <0>;
>>>> +
>>>> +            channel@0 {
>>>> +                reg = <0>;
>>>> +                adi,output-range-microvolt = <0 5000000>;
>>>> +            };
>>>> +
>>>> +            channel@1 {
>>>> +                reg = <1>;
>>>> +                adi,output-range-microvolt = <(-10000000) 10000000>;
>>>> +            };
>>>> +
>>>> +            channel@2 {
>>>> +                reg = <2>;
>>>> +                adi,output-range-microvolt = <0 40000000>;
>>>> +            };
>>>> +        };
>>>> +    };
>>> ...
>>>
>>> 	spi {
>>> 		#address-cells = <1>;
>>> 		#size-cells = <0>;
>>>
>>> 		multi-dac@0 {
>>> 			compatible = "adi,ad5529r-16";
>>> 			reg = <0>;
>>> 			spi-max-frequency = <25000000>;
>>>
>>> 			#address-cells = <1>;
>>> 			#size-cells = <0>;
>>>
>>> 			dac@0 {
>>> 				reg = <0>;
>>> 				vdd-supply = <&vdd_regulator>;
>>> 				avdd-supply = <&avdd_regulator>;
>>> 				hvdd-supply = <&hvdd_regulator>;
>>> 				hvss-supply = <&hvss_regulator>;
>>>
>>> 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
>>>
>>> 				#address-cells = <1>;
>>> 				#size-cells = <0>;
>>>
>>> 				channel@0 {
>>> 					reg = <0>;
>>> 					adi,output-range-microvolt = <0 5000000>;
>>> 				};
>>>
>>> 				channel@1 {
>>> 					reg = <1>;
>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
>>> 				};
>>>
>>> 				channel@2 {
>>> 					reg = <2>;
>>> 					adi,output-range-microvolt = <0 40000000>;
>>> 				};
>>> 			}
>>>
>>> 			dac@1 {
>>> 				reg = <1>;
>>> 				vdd-supply = <&vdd_regulator>;
>>> 				avdd-supply = <&avdd_regulator>;
>>> 				hvdd-supply = <&hvdd_regulator>;
>>> 				hvss-supply = <&hvss_regulator>;
>>>
>>> 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
>>>
>>> 				#address-cells = <1>;
>>> 				#size-cells = <0>;
>>>
>>> 				channel@0 {
>>> 					reg = <0>;
>>> 					adi,output-range-microvolt = <0 5000000>;
>>> 				};
>>>
>>> 				channel@1 {
>>> 					reg = <1>;
>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
>>> 				};
>>> 			}
>>> 		};
>>> 	};
>>>
>>> then you might need something like:
>>>
>>> 	patternProperties:
>>> 		"^dac@[0-3]$":
>>>
>>> and put most of the things under this node pattern.
>>>
>>> So the main driver that you're putting together might need to handle up to four instances.
>>> Even if your current driver cannot handle this, the dt-bindings might need cover that.
>>>
>>> Need to double check if each dac node needs a separate compatible, so you would maybe populate
>>> a platform data to be shared with the child nodes, which would be a separate driver.
>>> (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).
>> Hi Rodrigo,
>>
>> Thank you for looking at this.
>>
>> For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
>> hardware/use case we have only needs one device node and the driver is written around that model as well.
>> While the device addressing pins could allow multi-device topology, we do not have an actual platform using
>> that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
>> speculatively without a validating use case.
> Interesting feature - kind of similar to address control on a typical i2c bus device, or
> looking at it another way a kind of distributed SPI mux.
>
> Challenge of a binding is we need to anticipate the future.  So I think we do need something
> like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> That would leave the path open to supporting the addressing at a later date.
> An alternative might be to look at it like a chained device setup. In those we pretend there
> is just one device with a lot of channels etc.  The snag is that here things are more loosely
> coupled whereas for those devices it tends to be you have to read / write the same register
> in all devices in the chain as one big SPI message.
>
> +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> longer term how to support it cleanly in SPI.
>
> Jonathan

Hi Jonathan, Rob, Krzysztof, Conor,

One possible model that would also allow mixing the 12-bit and 16-bit variants would be to treat the parent node
as the shared SPI transport only, and let each dac@N child carry its own compatible.

Rob, Krzysztof, Conor — wanted to get your input on whether this is an acceptable binding pattern.

properties:
   compatible:
     const: adi,ad5529r-bus

patternProperties:
   "^dac@[0-3]$":
     type: object
     properties:
       compatible:
         enum:
           - adi,ad5529r-16
           - adi,ad5529r-12
       reg:
         minimum: 0
         maximum: 3

With a DT example such as:

ad5529r@0 {
         compatible = "adi,ad5529r-bus";
         reg = <0>;

         dac@0 {
                 compatible = "adi,ad5529r-16";
                 reg = <0>;
         };

         dac@1 {
                 compatible = "adi,ad5529r-12";
                 reg = <1>;
         };
};

The downside is that it introduces adi,ad5529r-bus as a compatible that does not correspond to an actual
standalone device variant - it would require a parent driver to manage the shared SPI transport and enumerate the
child devices. The actual DAC functionality is handled by the matching per-child compatibles(12 or 16 bit).
Is this an acceptable pattern, or is there a preferred way to model this type of addressing scheme?

Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v8 17/46] KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl
From: Fuad Tabba @ 2026-06-19 10:35 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-17-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES to advertise the
> availability of the KVM_SET_MEMORY_ATTRIBUTES2 ioctl.
>
> KVM_SET_MEMORY_ATTRIBUTES2 is a guest_memfd-scoped version of the existing
> KVM_SET_MEMORY_ATTRIBUTES VM ioctl. It allows userspace to manage memory
> attributes, such as KVM_MEMORY_ATTRIBUTE_PRIVATE, directly on a guest_memfd
> file descriptor.
>
> This new version uses struct kvm_memory_attributes2, which adds an
> error_offset field to the output. This allows KVM to return the specific
> offset that triggered an error, which is especially useful for handling
> EAGAIN results caused by transient page reference counts during attribute
> conversions.
>
> Update the KVM API documentation to define the new ioctl and its behavior,
> and add the necessary UAPI definitions and capability checks.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  Documentation/virt/kvm/api.rst | 78 +++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/kvm.h       |  2 ++
>  virt/kvm/kvm_main.c            | 23 +++++++++----
>  3 files changed, 95 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a833d90845b95..73878f34f6d2e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -117,7 +117,7 @@ description:
>        x86 includes both i386 and x86_64.
>
>    Type:
> -      system, vm, or vcpu.
> +      system, vm, vcpu or guest_memfd.
>
>    Parameters:
>        what parameters are accepted by the ioctl.
> @@ -6373,6 +6373,8 @@ S390:
>  Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.
>  Returns -EINVAL if called on a protected VM.
>
> +.. _KVM_SET_MEMORY_ATTRIBUTES:
> +
>  4.141 KVM_SET_MEMORY_ATTRIBUTES
>  -------------------------------
>
> @@ -6566,6 +6568,80 @@ KVM_S390_KEYOP_SSKE
>    Sets the storage key for the guest address ``guest_addr`` to the key
>    specified in ``key``, returning the previous value in ``key``.
>
> +4.145 KVM_SET_MEMORY_ATTRIBUTES2
> +---------------------------------
> +
> +:Capability: KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES
> +:Architectures: all
> +:Type: guest_memfd ioctl
> +:Parameters: struct kvm_memory_attributes2 (in/out)
> +:Returns: 0 on success, <0 on error
> +
> +Errors:
> +
> +  ========== ===============================================================
> +  EINVAL     The specified `offset` or `size` were invalid (e.g. not
> +             page aligned, causes an overflow, or size is zero).
> +  EFAULT     The parameter address was invalid.
> +  EAGAIN     Some page within requested range had unexpected refcounts. The
> +             offset of the page will be returned in `error_offset`.
> +  ENOMEM     Ran out of memory trying to track private/shared state
> +  ========== ===============================================================
> +
> +KVM_SET_MEMORY_ATTRIBUTES2 is an extension to
> +KVM_SET_MEMORY_ATTRIBUTES that supports returning (writing) values to
> +userspace.  The original (pre-extension) fields are shared with
> +KVM_SET_MEMORY_ATTRIBUTES identically.
> +
> +Attribute values are shared with KVM_SET_MEMORY_ATTRIBUTES.
> +
> +::
> +
> +  struct kvm_memory_attributes2 {
> +       /* in */
> +       union {
> +               __u64 address;
> +               __u64 offset;
> +       };
> +       __u64 size;
> +       __u64 attributes;
> +       __u64 flags;
> +       /* out */
> +       __u64 error_offset;
> +       __u64 reserved[11];
> +  };
> +
> +  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> +
> +Set attributes for a range of offsets within a guest_memfd to
> +KVM_MEMORY_ATTRIBUTE_PRIVATE to limit the specified guest_memfd backed
> +memory range for guest_use. Even if KVM_CAP_GUEST_MEMFD_MMAP is
> +supported, after a successful call to set
> +KVM_MEMORY_ATTRIBUTE_PRIVATE, the requested range will not be mappable
> +into host userspace and will only be mappable by the guest.
> +
> +To allow the range to be mappable into host userspace again, call
> +KVM_SET_MEMORY_ATTRIBUTES2 on the guest_memfd again with
> +KVM_MEMORY_ATTRIBUTE_PRIVATE unset.
> +
> +KVM does not directly manipulate the memory contents of pages during
> +attribute updates. However, the process of setting these attributes,
> +which includes operations such as unmapping pages from the host or
> +stage-2 page tables, may result in side effects on memory contents
> +that vary across different trusted firmware implementations.
> +
> +If this ioctl returns -EAGAIN, the offset of the page with unexpected
> +refcounts will be returned in `error_offset`. This can occur if there
> +are transient refcounts on the pages, taken by other parts of the
> +kernel.
> +
> +Userspace is expected to figure out how to remove all known refcounts
> +on the shared pages, such as refcounts taken by get_user_pages(), and
> +try the ioctl again. A possible source of these long term refcounts is
> +if the guest_memfd memory was pinned in IOMMU page tables.
> +
> +See also: :ref: `KVM_SET_MEMORY_ATTRIBUTES`.
> +
>  .. _kvm_run:
>
>  5. The kvm_run structure
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 876c0429f9d4e..129d6f6303251 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -997,6 +997,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_S390_KEYOP 247
>  #define KVM_CAP_S390_VSIE_ESAMODE 248
>  #define KVM_CAP_S390_HPAGE_2G 249
> +#define KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES 250
>
>  struct kvm_irq_routing_irqchip {
>         __u32 irqchip;
> @@ -1649,6 +1650,7 @@ struct kvm_memory_attributes {
>         __u64 flags;
>  };
>
> +/* Available with KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES */
>  #define KVM_SET_MEMORY_ATTRIBUTES2              _IOWR(KVMIO,  0xd2, struct kvm_memory_attributes2)
>
>  struct kvm_memory_attributes2 {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a08b518cdb175..044486f128c37 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2434,18 +2434,22 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
> +#ifdef kvm_arch_has_private_mem
> +static u64 kvm_supports_private_mem(struct kvm *kvm)
> +{
> +       return !kvm || kvm_arch_has_private_mem(kvm);
> +}
> +#else
> +#define kvm_supports_private_mem(kvm) false
> +#endif
> +
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static u64 kvm_supported_vm_mem_attributes(struct kvm *kvm)
>  {
> -#ifdef kvm_arch_has_private_mem
> -       if (gmem_in_place_conversion)
> +       if (gmem_in_place_conversion || !kvm_supports_private_mem(kvm))
>                 return 0;
>
> -       if (!kvm || kvm_arch_has_private_mem(kvm))
> -               return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> -#endif
> -
> -       return 0;
> +       return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>  }
>
>  /*
> @@ -4969,6 +4973,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>                 return 1;
>         case KVM_CAP_GUEST_MEMFD_FLAGS:
>                 return kvm_gmem_get_supported_flags(kvm);
> +       case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> +               if (!gmem_in_place_conversion || !kvm_supports_private_mem(kvm))
> +                       return 0;
> +
> +               return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>  #endif
>         default:
>                 break;
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 19/46] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
From: Fuad Tabba @ 2026-06-19 10:46 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-19-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> __kvm_gmem_invalidate_begin() and __kvm_gmem_invalidate_end() actually do
> not specially handle -1ul. -1ul is used as a huge number, which legal
> indices do not exceed, and hence the invalidation works as expected.
>
> Since a later patch is going to make use of the exact range, calculate the
> size of the guest_memfd inode and use it as the end range for invalidating
> SPTEs.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

>  virt/kvm/guest_memfd.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index d163559da0235..d72ecbfcc3144 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -366,6 +366,7 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>
>  static int kvm_gmem_release(struct inode *inode, struct file *file)
>  {
> +       pgoff_t end = i_size_read(inode) >> PAGE_SHIFT;
>         struct gmem_file *f = file->private_data;
>         struct kvm_memory_slot *slot;
>         struct kvm *kvm = f->kvm;
> @@ -396,9 +397,9 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
>          * Zap all SPTEs pointed at by this file.  Do not free the backing
>          * memory, as its lifetime is associated with the inode, not the file.
>          */
> -       __kvm_gmem_invalidate_start(f, 0, -1ul,
> +       __kvm_gmem_invalidate_start(f, 0, end,
>                                     kvm_gmem_get_invalidate_filter(inode));
> -       __kvm_gmem_invalidate_end(f, 0, -1ul);
> +       __kvm_gmem_invalidate_end(f, 0, end);
>
>         list_del(&f->entry);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 21/46] KVM: guest_memfd: Zero page while getting pfn
From: Fuad Tabba @ 2026-06-19 10:51 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-21-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Move the folio initialization logic from kvm_gmem_get_pfn() into
> __kvm_gmem_get_pfn() to also zero pages if the page is to be used in
> kvm_gmem_populate().
>
> With in-place conversion, the existing data in a guest_memfd page can be
> populated into guest memory through platform-specific ioctls.
>
> Without first zeroing the page obtained using __kvm_gmem_get_pfn(), it
> might contain uninitialized host memory, which would leak to the guest if
> the populate completes.
>
> guest_memfd pages are zeroed at most once in the page's entire lifetime
> with guest_memfd, and that is tracked using the uptodate flag.
>
> Zeroing the page in __kvm_gmem_get_pfn() is chosen over zeroing in
> kvm_gmem_get_folio() since other flows, such as a future write() syscall,
> can get a page, write to the page and then set page uptodate without
> zeroing.
>
> This aligns with the concept of zeroing before first use - the other place
> where zeroing happens is in kvm_gmem_fault_user_mapping().
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  virt/kvm/guest_memfd.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 90bc1a26512b6..86c9f5b0863cb 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -1137,6 +1137,11 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file,
>                 return ERR_PTR(-EHWPOISON);
>         }
>
> +       if (!folio_test_uptodate(folio)) {
> +               clear_highpage(folio_page(folio, 0));
> +               folio_mark_uptodate(folio);
> +       }
> +
>         *pfn = folio_file_pfn(folio, index);
>         if (max_order)
>                 *max_order = 0;
> @@ -1166,11 +1171,6 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                 goto out;
>         }
>
> -       if (!folio_test_uptodate(folio)) {
> -               clear_highpage(folio_page(folio, 0));
> -               folio_mark_uptodate(folio);
> -       }
> -
>         if (kvm_gmem_is_private_mem(inode, index))
>                 r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 22/46] KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE
From: Fuad Tabba @ 2026-06-19 11:01 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-22-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Michael Roth <michael.roth@amd.com>
>
> Make the source page for populating an SNP guest_memfd instance optional
> if in-place conversion/population is enabled.  If KVM can convert the page
> in-place, then it's possible for guest memory to be initialized directly
> from userspace by mmap()'ing the guest_memfd and writing to it while the
> corresponding GPA ranges are in a 'shared' state, before converting them
> to the 'private' state expected by KVM_SEV_SNP_LAUNCH_UPDATE.
>
> Update the handling/documentation for KVM_SEV_SNP_LAUNCH_UPDATE to allow
> for 'uaddr' to be set to NULL when in-place conversion is enabled, which
> SNP_LAUNCH_UPDATE will then use to determine when it should/shouldn't
> copy in data from a separate memory location. Continue to enforce
> non-NULL when PRIVATE is tracked per-VM, not per-guest_memfd.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> [Added src_page check in error handling path when the firmware command fails]
> [Dropped ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES]
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> [sean: drop explicit vm_memory_attributes references]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/x86/amd-memory-encryption.rst | 13 +++++++++----
>  arch/x86/kvm/svm/sev.c                               | 16 +++++++++++-----
>  virt/kvm/kvm_main.c                                  |  1 +
>  3 files changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index bd04a908a8dbd..29409297f1ef0 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -503,7 +503,8 @@ secrets.
>
>  It is required that the GPA ranges initialized by this command have had the
>  KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
> -for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
> +for KVM_SET_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES2 for more details on
> +this aspect.
>
>  Upon success, this command is not guaranteed to have processed the entire
>  range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
> @@ -511,9 +512,13 @@ range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
>  remaining range that has yet to be processed. The caller should continue
>  calling this command until those fields indicate the entire range has been
>  processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
> -range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
> -buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
> -``uaddr`` will be ignored completely.
> +range plus 1, and ``uaddr`` (if specified) is the last byte of the
> +userspace-provided source buffer address plus 1.
> +
> +In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO, ``uaddr`` will be
> +ignored completely. For all other page types, ``uaddr`` is optional if in-place
> +conversion is enable, i.e. when the destination can also be the source, and is

Typo: "is enable" -> "is enabled".

"when the destination can also be the source" is hard to parse without
context. Maybe: "i.e. when the data has been written directly to
guest_memfd while the range was in the shared state".

Also, how does userspace discover whether in-place conversion is
enabled? A cross-reference to KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES
would help here.

Cheers,
/fuad

> +required if in-place conversion is disabled.
>
>  Parameters (in): struct  kvm_sev_snp_launch_update
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 74fb15551e83f..2b7569b6a8609 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2330,7 +2330,13 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>         int level;
>         int ret;
>
> -       if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src_page))
> +       /*
> +        * A source page is required if in-place conversion isn't enabled, as
> +        * the data needs to come from a separate physical page.  Zero pages
> +        * are exempt as they don't consume a source page.
> +        */
> +       if (!gmem_in_place_conversion &&
> +           sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src_page)
>                 return -EINVAL;
>
>         ret = snp_lookup_rmpentry((u64)pfn, &assigned, &level);
> @@ -2377,7 +2383,7 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>          */
>         if (ret && !snp_page_reclaim(kvm, pfn) &&
>             sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
> -           sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
> +           sev_populate_args->fw_error == SEV_RET_INVALID_PARAM && src_page) {
>                 void *src_vaddr = kmap_local_page(src_page);
>                 void *dst_vaddr = kmap_local_pfn(pfn);
>
> @@ -2410,8 +2416,8 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
>                 return -EFAULT;
>
> -       pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__,
> -                params.gfn_start, params.len, params.type, params.flags);
> +       pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d src %llx\n", __func__,
> +                params.gfn_start, params.len, params.type, params.flags, params.uaddr);
>
>         if (!params.len || !PAGE_ALIGNED(params.len) || params.flags ||
>             (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
> @@ -2468,7 +2474,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>         params.gfn_start += count;
>         params.len -= count * PAGE_SIZE;
> -       if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
> +       if (src && params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
>                 params.uaddr += count * PAGE_SIZE;
>
>         if (copy_to_user(u64_to_user_ptr(argp->data), &params, sizeof(params)))
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 044486f128c37..dd1d18a1d2f68 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -103,6 +103,7 @@ module_param(allow_unsafe_mappings, bool, 0444);
>
>  #ifdef kvm_arch_has_private_mem
>  bool __ro_after_init gmem_in_place_conversion = false;
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(gmem_in_place_conversion);
>  #endif
>
>  #define MEMORY_ATTRIBUTES_MATCH(one, two)                              \
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 11/46] KVM: Consolidate private memory and guest_memfd ifdeffery in kvm_host.h
From: Fuad Tabba @ 2026-06-19 11:02 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-11-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Move the kvm_arch_has_private_mem() stub and a few guest_memfd function
> definitions/declarations "down" in kvm_host.h to utilize existing #ifdefs,
> and so that related code is clustered together.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

SoB fix please. With that...

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  include/linux/kvm_host.h | 37 ++++++++++++++++---------------------
>  1 file changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index acb552745b428..9c1cf1a6559e3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,27 +722,6 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -#ifndef kvm_arch_has_private_mem
> -static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> -{
> -       return false;
> -}
> -#endif
> -
> -#ifdef CONFIG_KVM_GUEST_MEMFD
> -bool kvm_arch_supports_gmem_init_shared(struct kvm *kvm);
> -
> -static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
> -{
> -       u64 flags = GUEST_MEMFD_FLAG_MMAP;
> -
> -       if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
> -               flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
> -
> -       return flags;
> -}
> -#endif
> -
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> @@ -2572,6 +2551,11 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  #else
>  #define gmem_in_place_conversion false
>
> +static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> +{
> +       return false;
> +}
> +
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
>         return false;
> @@ -2580,6 +2564,17 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>
>  #ifdef CONFIG_KVM_GUEST_MEMFD
>  bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn);
> +bool kvm_arch_supports_gmem_init_shared(struct kvm *kvm);
> +
> +static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
> +{
> +       u64 flags = GUEST_MEMFD_FLAG_MMAP;
> +
> +       if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
> +               flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
> +
> +       return flags;
> +}
>
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                      gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
From: Fuad Tabba @ 2026-06-19 11:09 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
	yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
	liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-23-9d2959357853@google.com>

On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Update tdx_gmem_post_populate() to handle cases where a source page is
> not explicitly provided. Instead of returning -EOPNOTSUPP when src_page
> is NULL, default to using the page associated with the destination PFN.
>
> This change allows for in-place memory conversion where the data is
> already present in the target PFN, ensuring the TDX module has a valid
> source page reference for the TDH.MEM.PAGE.ADD operation.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---

Sashiko flagged that when src_page = pfn_to_page(pfn),
tdh_mem_page_add gets identical physical addresses for r8
(destination) and r9 (source), reading with host KeyID and writing
with TD KeyID on the same address. I don't know enough about the TDX
module's operand constraints to confirm whether it allows overlapping
source and destination, but the concern looks legitimate.

nit: why does it have Sean's SoB?

Cheers,
/fuad


>  Documentation/virt/kvm/x86/intel-tdx.rst |  4 ++++
>  arch/x86/kvm/vmx/tdx.c                   | 11 ++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst
> index 6a222e9d09541..74357fe87f9ec 100644
> --- a/Documentation/virt/kvm/x86/intel-tdx.rst
> +++ b/Documentation/virt/kvm/x86/intel-tdx.rst
> @@ -158,6 +158,10 @@ KVM_TDX_INIT_MEM_REGION
>  Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
>  provided data from @source_addr. @source_addr must be PAGE_SIZE-aligned.
>
> +If guest_memfd in-place conversion is enabled, pass NULL for @source_addr to
> +initialize the memory region using memory contents already populated in
> +guest_memfd memory.
> +
>  Note, before calling this sub command, memory attribute of the range
>  [gpa, gpa + nr_pages] needs to be private.  Userspace can use
>  KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ffe9d0db58c59..56d10333c61a7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -3198,8 +3198,12 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>         if (KVM_BUG_ON(kvm_tdx->page_add_src, kvm))
>                 return -EIO;
>
> -       if (!src_page)
> -               return -EOPNOTSUPP;
> +       if (!src_page) {
> +               if (!gmem_in_place_conversion)
> +                       return -EOPNOTSUPP;
> +
> +               src_page = pfn_to_page(pfn);
> +       }
>
>         kvm_tdx->page_add_src = src_page;
>         ret = kvm_tdp_mmu_map_private_pfn(arg->vcpu, gfn, pfn);
> @@ -3278,7 +3282,8 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *c
>                         break;
>                 }
>
> -               region.source_addr += PAGE_SIZE;
> +               if (region.source_addr)
> +                       region.source_addr += PAGE_SIZE;
>                 region.gpa += PAGE_SIZE;
>                 region.nr_pages--;
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Nuno Sá @ 2026-06-19 11:31 UTC (permalink / raw)
  To: Janani Sunil
  Cc: Jonathan Cameron, Rodrigo Alencar, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <076d7d2d-81a0-49c2-af94-bd65ead66c09@gmail.com>

On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:
> 
> On 6/14/26 21:44, Jonathan Cameron wrote:
> > On Tue, 9 Jun 2026 16:47:23 +0200
> > Janani Sunil <jan.sun97@gmail.com> wrote:
> > 
> 
> Hi Jonathan, Rob, Krzysztof, Conor,
> 
> One possible model that would also allow mixing the 12-bit and 16-bit variants would be to treat the parent node
> as the shared SPI transport only, and let each dac@N child carry its own compatible.
> 
> Rob, Krzysztof, Conor — wanted to get your input on whether this is an acceptable binding pattern.
> 
> properties:
>   compatible:
>     const: adi,ad5529r-bus
> 
> patternProperties:
>   "^dac@[0-3]$":
>     type: object
>     properties:
>       compatible:
>         enum:
>           - adi,ad5529r-16
>           - adi,ad5529r-12
>       reg:
>         minimum: 0
>         maximum: 3
> 
> With a DT example such as:
> 
> ad5529r@0 {
>         compatible = "adi,ad5529r-bus";
>         reg = <0>;
> 
>         dac@0 {
>                 compatible = "adi,ad5529r-16";
>                 reg = <0>;
>         };
> 
>         dac@1 {
>                 compatible = "adi,ad5529r-12";
>                 reg = <1>;
>         };
> };
> 
> The downside is that it introduces adi,ad5529r-bus as a compatible that does not correspond to an actual
> standalone device variant - it would require a parent driver to manage the shared SPI transport and enumerate the
> child devices. The actual DAC functionality is handled by the matching per-child compatibles(12 or 16 bit).
> Is this an acceptable pattern, or is there a preferred way to model this type of addressing scheme?
> 

At some point, I wondered if we can't just have this at spi level? Like
(in the simplest terms) a new spi-peripheral property that would allow
devices to share the same CS. Then we would need an adi,pin-id kind of
property for this device but the bindings would be pretty much as if we
only supported one device.

I see Mark is already in the loop, maybe he has seen this kind of things
before.

- Nuno Sá


^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Conor Dooley @ 2026-06-19 11:36 UTC (permalink / raw)
  To: Janani Sunil
  Cc: Jonathan Cameron, Rodrigo Alencar, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <076d7d2d-81a0-49c2-af94-bd65ead66c09@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8557 bytes --]

On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:
> 
> On 6/14/26 21:44, Jonathan Cameron wrote:
> > On Tue, 9 Jun 2026 16:47:23 +0200
> > Janani Sunil <jan.sun97@gmail.com> wrote:
> > 
> > > On 5/26/26 15:11, Rodrigo Alencar wrote:
> > > > On 26/05/19 05:42PM, Janani Sunil wrote:
> > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > integrated precision reference.
> > > > ...
> > > > Probably others may comment on that, but...
> > > > 
> > > > This parent node may support device addressing for multi-device support through
> > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > other resources like the toggle pins or reset and enable.
> > > > 
> > > > That way I suppose that an example would look like...
> > > > > +
> > > > > +patternProperties:
> > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > +    type: object
> > > > > +    description: Child nodes for individual channel configuration
> > > > > +
> > > > > +    properties:
> > > > > +      reg:
> > > > > +        description: Channel number.
> > > > > +        minimum: 0
> > > > > +        maximum: 15
> > > > > +
> > > > > +      adi,output-range-microvolt:
> > > > > +        description: |
> > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > +        oneOf:
> > > > > +          - items:
> > > > > +              - const: 0
> > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > +          - items:
> > > > > +              - const: -5000000
> > > > > +              - const: 5000000
> > > > > +          - items:
> > > > > +              - const: -10000000
> > > > > +              - const: 10000000
> > > > > +          - items:
> > > > > +              - const: -15000000
> > > > > +              - const: 15000000
> > > > > +          - items:
> > > > > +              - const: -20000000
> > > > > +              - const: 20000000
> > > > > +
> > > > > +    required:
> > > > > +      - reg
> > > > > +
> > > > > +    additionalProperties: false
> > > > > +
> > > > > +required:
> > > > > +  - compatible
> > > > > +  - reg
> > > > > +  - vdd-supply
> > > > > +  - avdd-supply
> > > > > +  - hvdd-supply
> > > > > +
> > > > > +dependencies:
> > > > > +  spi-cpha: [ spi-cpol ]
> > > > > +  spi-cpol: [ spi-cpha ]
> > > > > +
> > > > > +allOf:
> > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > +
> > > > > +unevaluatedProperties: false
> > > > > +
> > > > > +examples:
> > > > > +  - |
> > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > +
> > > > > +    spi {
> > > > > +        #address-cells = <1>;
> > > > > +        #size-cells = <0>;
> > > > > +
> > > > > +        dac@0 {
> > > > > +            compatible = "adi,ad5529r-16";
> > > > > +            reg = <0>;
> > > > > +            spi-max-frequency = <25000000>;
> > > > > +
> > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > +
> > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > +
> > > > > +            #address-cells = <1>;
> > > > > +            #size-cells = <0>;
> > > > > +
> > > > > +            channel@0 {
> > > > > +                reg = <0>;
> > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > +            };
> > > > > +
> > > > > +            channel@1 {
> > > > > +                reg = <1>;
> > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > +            };
> > > > > +
> > > > > +            channel@2 {
> > > > > +                reg = <2>;
> > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > +            };
> > > > > +        };
> > > > > +    };
> > > > ...
> > > > 
> > > > 	spi {
> > > > 		#address-cells = <1>;
> > > > 		#size-cells = <0>;
> > > > 
> > > > 		multi-dac@0 {
> > > > 			compatible = "adi,ad5529r-16";
> > > > 			reg = <0>;
> > > > 			spi-max-frequency = <25000000>;
> > > > 
> > > > 			#address-cells = <1>;
> > > > 			#size-cells = <0>;
> > > > 
> > > > 			dac@0 {
> > > > 				reg = <0>;
> > > > 				vdd-supply = <&vdd_regulator>;
> > > > 				avdd-supply = <&avdd_regulator>;
> > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > 				hvss-supply = <&hvss_regulator>;
> > > > 
> > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > 
> > > > 				#address-cells = <1>;
> > > > 				#size-cells = <0>;
> > > > 
> > > > 				channel@0 {
> > > > 					reg = <0>;
> > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > 				};
> > > > 
> > > > 				channel@1 {
> > > > 					reg = <1>;
> > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > 				};
> > > > 
> > > > 				channel@2 {
> > > > 					reg = <2>;
> > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > 				};
> > > > 			}
> > > > 
> > > > 			dac@1 {
> > > > 				reg = <1>;
> > > > 				vdd-supply = <&vdd_regulator>;
> > > > 				avdd-supply = <&avdd_regulator>;
> > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > 				hvss-supply = <&hvss_regulator>;
> > > > 
> > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > 
> > > > 				#address-cells = <1>;
> > > > 				#size-cells = <0>;
> > > > 
> > > > 				channel@0 {
> > > > 					reg = <0>;
> > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > 				};
> > > > 
> > > > 				channel@1 {
> > > > 					reg = <1>;
> > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > 				};
> > > > 			}
> > > > 		};
> > > > 	};
> > > > 
> > > > then you might need something like:
> > > > 
> > > > 	patternProperties:
> > > > 		"^dac@[0-3]$":
> > > > 
> > > > and put most of the things under this node pattern.
> > > > 
> > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > 
> > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).
> > > Hi Rodrigo,
> > > 
> > > Thank you for looking at this.
> > > 
> > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > speculatively without a validating use case.
> > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > looking at it another way a kind of distributed SPI mux.
> > 
> > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > That would leave the path open to supporting the addressing at a later date.
> > An alternative might be to look at it like a chained device setup. In those we pretend there
> > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > coupled whereas for those devices it tends to be you have to read / write the same register
> > in all devices in the chain as one big SPI message.
> > 
> > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > longer term how to support it cleanly in SPI.

I'd swear I have seen this before, from some Microchip devices. Let me
see if I can find what I am thinking of...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox