* Re: [PATCH v6 12/43] KVM: guest_memfd: Call arch invalidate hooks on conversion
From: Fuad Tabba @ 2026-05-20 14:30 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-12-91ab5a8b19a4@google.com>
On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> When memory in guest_memfd is converted from private to shared, the
> platform-specific state associated with the guest-private pages must be
> invalidated or cleaned up.
>
> Iterate over the folios in the affected range and call the
> kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> architectures to perform necessary teardown, such as updating hardware
> metadata or encryption states, before the pages are transitioned to the
> shared state.
>
> Invoke this helper after indicating to KVM's mmu code that an invalidation
> is in progress to stop in-flight page faults from succeeding.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Minor nit below, but lgtm.
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9d82642a025e9..baf4b88dead1f 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -603,6 +603,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> return safe;
> }
>
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> + struct folio_batch fbatch;
> + pgoff_t next = start;
> + int i;
> +
> + folio_batch_init(&fbatch);
> + while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> + struct folio *folio = fbatch.folios[i];
> + pgoff_t start_index, end_index;
> + kvm_pfn_t start_pfn, end_pfn;
> +
> + start_index = max(start, folio->index);
> + end_index = min(end, folio_next_index(folio));
> + /*
> + * end_index is either in folio or points to
> + * the first page of the next folio. Hence,
> + * all pages in range [start_index, end_index)
> + * are contiguous.
> + */
> + start_pfn = folio_file_pfn(folio, start_index);
> + end_pfn = start_pfn + end_index - start_index;
> +
> + kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> + }
> +
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +}
> +#else
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> +#endif
> +
> static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> size_t nr_pages, uint64_t attrs,
> pgoff_t *err_index)
> @@ -643,7 +679,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> */
>
> kvm_gmem_invalidate_begin(inode, start, end);
> +
> + if (!to_private)
> + kvm_gmem_invalidate(inode, start, end);
> +
> mas_store_prealloc(&mas, xa_mk_value(attrs));
> +
Why the unrelated extra space?
> kvm_gmem_invalidate_end(inode, start, end);
> out:
> filemap_invalidate_unlock(mapping);
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply
* Re: [PATCH 08/15] accel/qda: Add QUERY IOCTL and QDA UAPI header
From: Dmitry Baryshkov @ 2026-05-20 14:29 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-8-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:58AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Introduce the DRM_IOCTL_QDA_QUERY IOCTL, which allows user-space to
> identify which DSP domain a given /dev/accel/accel* node represents
> (e.g. "cdsp", "adsp").
>
> include/uapi/drm/qda_accel.h
> Defines the QDA IOCTL command numbers and the associated data
> structures. The header follows the standard DRM UAPI conventions:
> __u8/__u32 types, a C++ extern "C" guard, and GPL-2.0-only WITH
> Linux-syscall-note licensing.
>
> drivers/accel/qda/qda_ioctl.c / qda_ioctl.h
> Implements qda_ioctl_query(), which copies the DSP domain name
> stored in qda_dev.dsp_name into the user-supplied drm_qda_query
> buffer using strscpy().
>
> drivers/accel/qda/qda_drv.c
> Registers the qda_ioctls[] table with the drm_driver so that the
> DRM core dispatches DRM_IOCTL_QDA_QUERY to qda_ioctl_query().
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/Makefile | 1 +
> drivers/accel/qda/qda_drv.c | 8 +++++++
> drivers/accel/qda/qda_ioctl.c | 26 +++++++++++++++++++++++
> drivers/accel/qda/qda_ioctl.h | 13 ++++++++++++
> include/uapi/drm/qda_accel.h | 49 +++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 97 insertions(+)
>
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index 701fad5ffb50..b658dad35fee 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
> qda-y := \
> qda_cb.o \
> qda_drv.o \
> + qda_ioctl.o \
> qda_memory_manager.o \
> qda_rpmsg.o
>
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> index 0ad5d9873d7e..becd831d10be 100644
> --- a/drivers/accel/qda/qda_drv.c
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -8,8 +8,10 @@
> #include <drm/drm_gem.h>
> #include <drm/drm_ioctl.h>
> #include <drm/drm_print.h>
> +#include <drm/qda_accel.h>
>
> #include "qda_drv.h"
> +#include "qda_ioctl.h"
> #include "qda_rpmsg.h"
>
> static int qda_open(struct drm_device *dev, struct drm_file *file)
> @@ -36,11 +38,17 @@ static void qda_postclose(struct drm_device *dev, struct drm_file *file)
>
> DEFINE_DRM_ACCEL_FOPS(qda_accel_fops);
>
> +static const struct drm_ioctl_desc qda_ioctls[] = {
> + DRM_IOCTL_DEF_DRV(QDA_QUERY, qda_ioctl_query, 0),
> +};
> +
> static const struct drm_driver qda_drm_driver = {
> .driver_features = DRIVER_COMPUTE_ACCEL,
> .fops = &qda_accel_fops,
> .open = qda_open,
> .postclose = qda_postclose,
> + .ioctls = qda_ioctls,
> + .num_ioctls = ARRAY_SIZE(qda_ioctls),
> .name = QDA_DRIVER_NAME,
> .desc = "Qualcomm DSP Accelerator Driver",
> };
> diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
> new file mode 100644
> index 000000000000..761d3567c33f
> --- /dev/null
> +++ b/drivers/accel/qda/qda_ioctl.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <drm/drm_ioctl.h>
> +#include <drm/qda_accel.h>
> +#include "qda_drv.h"
> +#include "qda_ioctl.h"
> +
> +/**
> + * qda_ioctl_query() - Query DSP device information
> + * @dev: DRM device structure
> + * @data: User-space data (struct drm_qda_query)
> + * @file_priv: DRM file private data
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_ioctl_query(struct drm_device *dev, void *data, struct drm_file *file_priv)
> +{
> + struct drm_qda_query *args = data;
> + struct qda_dev *qdev;
> +
> + qdev = qda_dev_from_drm(dev);
> +
> + strscpy(args->dsp_name, qdev->dsp_name, sizeof(args->dsp_name));
> +
> + return 0;
> +}
> diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
> new file mode 100644
> index 000000000000..b8fd536a111f
> --- /dev/null
> +++ b/drivers/accel/qda/qda_ioctl.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_IOCTL_H__
> +#define __QDA_IOCTL_H__
> +
> +#include "qda_drv.h"
> +
> +int qda_ioctl_query(struct drm_device *dev, void *data, struct drm_file *file_priv);
> +
> +#endif /* __QDA_IOCTL_H__ */
> diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
> new file mode 100644
> index 000000000000..1971a4263065
> --- /dev/null
> +++ b/include/uapi/drm/qda_accel.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_ACCEL_H__
> +#define __QDA_ACCEL_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +/*
> + * QDA IOCTL command numbers
> + *
> + * These define the command numbers for QDA-specific IOCTLs.
> + * They are used with DRM_COMMAND_BASE to create the full IOCTL numbers.
> + */
> +#define DRM_QDA_QUERY 0x00
> +
> +/*
> + * QDA IOCTL definitions
> + *
> + * These macros define the actual IOCTL numbers used by userspace applications.
> + * They combine the command numbers with DRM_COMMAND_BASE and specify the
> + * data structure and direction (read/write) for each IOCTL.
> + */
> +#define DRM_IOCTL_QDA_QUERY DRM_IOR(DRM_COMMAND_BASE + DRM_QDA_QUERY, \
> + struct drm_qda_query)
> +
> +/**
> + * struct drm_qda_query - Device information query structure
> + * @dsp_name: Name of DSP (e.g., "adsp", "cdsp", "cdsp1", "gdsp0", "gdsp1")
> + *
> + * This structure is used with DRM_IOCTL_QDA_QUERY to query device type,
> + * allowing userspace to identify which DSP a device node represents. The
> + * kernel provides the DSP name directly as a null-terminated string.
> + */
> +struct drm_qda_query {
> + __u8 dsp_name[16];
Are you sure that you want to query only the name? No extra options, no
attributes, no hardware capabilities?
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif /* __QDA_ACCEL_H__ */
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH v6 11/43] KVM: guest_memfd: Ensure pages are not in use before conversion
From: Fuad Tabba @ 2026-05-20 14:28 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-11-91ab5a8b19a4@google.com>
On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> When converting memory to private in guest_memfd, it is necessary to ensure
> that the pages are not currently being accessed by any other part of the
> kernel or userspace to avoid any current user writing to guest private
> memory.
>
> guest_memfd checks for unexpected refcounts to determine whether a page is
> still in use. The only expected refcounts after unmapping the range
> requested for conversion are those that are held by guest_memfd itself.
>
> Update the kvm_memory_attributes2 structure to include an error_offset
> field. This allows KVM to report the exact offset where a conversion
> failed to userspace. If the safety check fails, return -EAGAIN and copy
> the error_offset back to userspace so that it can potentially retry the
> operation or handle the failure gracefully.
>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> include/uapi/linux/kvm.h | 3 ++-
> virt/kvm/guest_memfd.c | 65 ++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 62 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e6bbf68a83813..0b55258573d3d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1658,7 +1658,8 @@ struct kvm_memory_attributes2 {
> __u64 size;
> __u64 attributes;
> __u64 flags;
> - __u64 reserved[12];
> + __u64 error_offset;
> + __u64 reserved[11];
> };
>
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 91e89b188f583..9d82642a025e9 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -572,9 +572,42 @@ static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> }
>
> +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> + size_t nr_pages, pgoff_t *err_index)
> +{
> + struct address_space *mapping = inode->i_mapping;
> + const int filemap_get_folios_refcount = 1;
> + pgoff_t last = start + nr_pages - 1;
> + struct folio_batch fbatch;
> + bool safe = true;
> + int i;
> +
> + folio_batch_init(&fbatch);
> + while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
> +
> + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> + struct folio *folio = fbatch.folios[i];
> +
> + if (folio_ref_count(folio) !=
> + folio_nr_pages(folio) + filemap_get_folios_refcount) {
> + safe = false;
> + *err_index = folio->index;
> + break;
> + }
> + }
> +
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +
> + return safe;
> +}
> +
> static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> - size_t nr_pages, uint64_t attrs)
> + size_t nr_pages, uint64_t attrs,
> + pgoff_t *err_index)
> {
> + bool to_private = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> struct address_space *mapping = inode->i_mapping;
> struct gmem_inode *gi = GMEM_I(inode);
> pgoff_t end = start + nr_pages;
> @@ -588,8 +621,21 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>
> mas_init(&mas, mt, start);
> r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> - if (r)
> + if (r) {
> + *err_index = start;
> goto out;
> + }
> +
> + if (to_private) {
> + unmap_mapping_pages(mapping, start, nr_pages, false);
> +
> + if (!kvm_gmem_is_safe_for_conversion(inode, start, nr_pages,
> + err_index)) {
> + mas_destroy(&mas);
> + r = -EAGAIN;
> + goto out;
> + }
> + }
>
> /*
> * From this point on guest_memfd has performed necessary
> @@ -609,9 +655,10 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> struct gmem_file *f = file->private_data;
> struct inode *inode = file_inode(file);
> struct kvm_memory_attributes2 attrs;
> + pgoff_t err_index;
> size_t nr_pages;
> pgoff_t index;
> - int i;
> + int i, r;
>
> if (copy_from_user(&attrs, argp, sizeof(attrs)))
> return -EFAULT;
> @@ -635,8 +682,16 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
>
> nr_pages = attrs.size >> PAGE_SHIFT;
> index = attrs.offset >> PAGE_SHIFT;
> - return __kvm_gmem_set_attributes(inode, index, nr_pages,
> - attrs.attributes);
> + r = __kvm_gmem_set_attributes(inode, index, nr_pages, attrs.attributes,
> + &err_index);
> + if (r) {
> + attrs.error_offset = ((uint64_t)err_index) << PAGE_SHIFT;
> +
> + if (copy_to_user(argp, &attrs, sizeof(attrs)))
> + return -EFAULT;
> + }
> +
> + return r;
> }
>
> static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply
* Re: [PATCH 07/15] accel/qda: Add memory manager for CB devices
From: Dmitry Baryshkov @ 2026-05-20 14:27 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-7-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:57AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Introduce the QDA memory manager (qda_memory_manager) to track and
> manage the IOMMU devices that back each compute context bank (CB).
>
> Each CB device registered on the qda-compute-cb bus is assigned a
> unique ID via an XArray and wrapped in a qda_iommu_device descriptor
> that records the device pointer and its stream ID. This registry
> allows the driver to look up the correct IOMMU domain for a given
> session when mapping DSP buffers.
>
> The memory manager is initialised in qda_init_device() before CB
> devices are populated and torn down in qda_deinit_device() after they
> are destroyed, ensuring no dangling references remain in the XArray.
>
> qda_cb.c is extended with qda_cb_setup_device(), which is called
> immediately after a CB device is registered on the bus. It allocates
> a qda_iommu_device, registers it with the memory manager, and stores
> it as the CB device's driver data so that qda_destroy_cb_device() can
> retrieve and unregister it during teardown.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/Makefile | 1 +
> drivers/accel/qda/qda_cb.c | 47 ++++++++++++++
> drivers/accel/qda/qda_drv.c | 34 ++++++++++
> drivers/accel/qda/qda_drv.h | 5 ++
> drivers/accel/qda/qda_memory_manager.c | 111 +++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_memory_manager.h | 49 +++++++++++++++
> drivers/accel/qda/qda_rpmsg.c | 7 +++
> 7 files changed, 254 insertions(+)
>
> @@ -61,14 +62,20 @@ static int qda_rpmsg_probe(struct rpmsg_device *rpdev)
> }
> qdev->dsp_name = label;
>
> + ret = qda_init_device(qdev);
> + if (ret)
> + return ret;
> +
> ret = qda_cb_populate(qdev, rpdev->dev.of_node);
> if (ret) {
> dev_err(qdev->dev, "Failed to populate child devices: %d\n", ret);
> + qda_deinit_device(qdev);
> return ret;
> }
>
> ret = qda_register_device(qdev);
> if (ret) {
> + qda_deinit_device(qdev);
> qda_cb_unpopulate(qdev);
No, this is not how you unwind in the error case in the kernel. Follow
the established patterns.
> return ret;
> }
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 07/15] accel/qda: Add memory manager for CB devices
From: Dmitry Baryshkov @ 2026-05-20 14:26 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-7-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:57AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Introduce the QDA memory manager (qda_memory_manager) to track and
> manage the IOMMU devices that back each compute context bank (CB).
>
> Each CB device registered on the qda-compute-cb bus is assigned a
> unique ID via an XArray and wrapped in a qda_iommu_device descriptor
Why do you need an XArray? The number of devices is (more or less)
fixed. You can use a normal array, allocated in the probe function after
counting OF children nodes.
> that records the device pointer and its stream ID. This registry
> allows the driver to look up the correct IOMMU domain for a given
> session when mapping DSP buffers.
>
> The memory manager is initialised in qda_init_device() before CB
> devices are populated and torn down in qda_deinit_device() after they
> are destroyed, ensuring no dangling references remain in the XArray.
>
> qda_cb.c is extended with qda_cb_setup_device(), which is called
> immediately after a CB device is registered on the bus. It allocates
> a qda_iommu_device, registers it with the memory manager, and stores
> it as the CB device's driver data so that qda_destroy_cb_device() can
> retrieve and unregister it during teardown.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/Makefile | 1 +
> drivers/accel/qda/qda_cb.c | 47 ++++++++++++++
> drivers/accel/qda/qda_drv.c | 34 ++++++++++
> drivers/accel/qda/qda_drv.h | 5 ++
> drivers/accel/qda/qda_memory_manager.c | 111 +++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_memory_manager.h | 49 +++++++++++++++
> drivers/accel/qda/qda_rpmsg.c | 7 +++
> 7 files changed, 254 insertions(+)
>
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index 143c9e4e789e..701fad5ffb50 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
> qda-y := \
> qda_cb.o \
> qda_drv.o \
> + qda_memory_manager.o \
> qda_rpmsg.o
>
> obj-$(CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS) += qda_compute_bus.o
> diff --git a/drivers/accel/qda/qda_cb.c b/drivers/accel/qda/qda_cb.c
> index 77caf8438c67..6d540bb0ec7b 100644
> --- a/drivers/accel/qda/qda_cb.c
> +++ b/drivers/accel/qda/qda_cb.c
> @@ -8,11 +8,42 @@
> #include <linux/slab.h>
> #include <drm/drm_print.h>
> #include "qda_drv.h"
> +#include "qda_memory_manager.h"
> #include "qda_cb.h"
>
> +static int qda_cb_setup_device(struct qda_dev *qdev, struct device *cb_dev, u32 sid)
> +{
> + struct qda_iommu_device *iommu_dev;
> + int rc;
> +
> + drm_dbg_driver(&qdev->drm_dev, "Setting up CB device %s\n", dev_name(cb_dev));
> +
> + iommu_dev = kzalloc_obj(*iommu_dev);
> + if (!iommu_dev)
> + return -ENOMEM;
> +
> + iommu_dev->dev = cb_dev;
> + iommu_dev->qdev = qdev;
> + iommu_dev->sid = sid;
> +
> + rc = qda_memory_manager_register_device(qdev->iommu_mgr, iommu_dev);
> + if (rc) {
> + drm_err(&qdev->drm_dev, "Failed to register IOMMU device: %d\n", rc);
> + kfree(iommu_dev);
> + return rc;
> + }
> +
> + dev_set_drvdata(cb_dev, iommu_dev);
> +
> + drm_dbg_driver(&qdev->drm_dev, "CB device setup complete - SID: %u\n", sid);
> +
> + return 0;
> +}
> +
> int qda_create_cb_device(struct qda_dev *qdev, struct device_node *cb_node)
> {
> struct device *cb_dev;
> + int ret;
> u32 sid = 0;
> char name[64];
> struct qda_cb_dev *entry;
> @@ -30,6 +61,13 @@ int qda_create_cb_device(struct qda_dev *qdev, struct device_node *cb_node)
> return PTR_ERR(cb_dev);
> }
>
> + ret = qda_cb_setup_device(qdev, cb_dev, sid);
> + if (ret) {
> + drm_err(&qdev->drm_dev, "CB device setup failed: %d\n", ret);
> + device_unregister(cb_dev);
> + return ret;
> + }
> +
> entry = kzalloc_obj(*entry);
> if (!entry) {
> device_unregister(cb_dev);
> @@ -80,6 +118,7 @@ int qda_cb_populate(struct qda_dev *qdev, struct device_node *parent_node)
> void qda_destroy_cb_device(struct device *cb_dev)
> {
> struct iommu_group *group;
> + struct qda_iommu_device *iommu_dev;
>
> if (!cb_dev) {
> pr_debug("qda: NULL CB device passed to destroy\n");
> @@ -88,6 +127,14 @@ void qda_destroy_cb_device(struct device *cb_dev)
>
> dev_dbg(cb_dev, "Destroying CB device %s\n", dev_name(cb_dev));
>
> + iommu_dev = dev_get_drvdata(cb_dev);
> + if (iommu_dev && iommu_dev->qdev && iommu_dev->qdev->iommu_mgr) {
> + dev_dbg(cb_dev, "Unregistering IOMMU device for %s\n",
> + dev_name(cb_dev));
> + qda_memory_manager_unregister_device(iommu_dev->qdev->iommu_mgr,
> + iommu_dev);
> + }
> +
> group = iommu_group_get(cb_dev);
> if (group) {
> dev_dbg(cb_dev, "Removing %s from IOMMU group\n", dev_name(cb_dev));
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> index 6c20d6a2fc47..0ad5d9873d7e 100644
> --- a/drivers/accel/qda/qda_drv.c
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -57,6 +57,40 @@ struct qda_dev *qda_alloc_device(struct device *dev)
> return qdev;
> }
>
> +static void cleanup_memory_manager(struct qda_dev *qdev)
Prefixes...
> +{
> + if (qdev->iommu_mgr) {
> + qda_memory_manager_exit(qdev->iommu_mgr);
> + kfree(qdev->iommu_mgr);
> + qdev->iommu_mgr = NULL;
> + }
> +}
> +
> +static int init_memory_manager(struct qda_dev *qdev)
> +{
> + qdev->iommu_mgr = kzalloc_obj(*qdev->iommu_mgr);
> + if (!qdev->iommu_mgr)
> + return -ENOMEM;
> +
> + return qda_memory_manager_init(qdev->iommu_mgr);
> +}
> +
> +void qda_deinit_device(struct qda_dev *qdev)
> +{
> + cleanup_memory_manager(qdev);
Ugh, inline all your one-line wrappers.
> +}
> +
> +int qda_init_device(struct qda_dev *qdev)
> +{
> + int ret;
> +
> + ret = init_memory_manager(qdev);
> + if (ret)
> + drm_err(&qdev->drm_dev, "Failed to initialize memory manager: %d\n", ret);
> +
> + return ret;
> +}
> +
> void qda_unregister_device(struct qda_dev *qdev)
> {
> drm_dev_unregister(&qdev->drm_dev);
> diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
> index 2715f378775d..eb089e586b17 100644
> --- a/drivers/accel/qda/qda_drv.h
> +++ b/drivers/accel/qda/qda_drv.h
> @@ -13,6 +13,7 @@
> #include <drm/drm_device.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_file.h>
> +#include "qda_memory_manager.h"
>
> /* Driver identification */
> #define QDA_DRIVER_NAME "qda"
> @@ -40,6 +41,8 @@ struct qda_dev {
> struct device *dev;
> /** @cb_devs: Compute context-bank (CB) child devices */
> struct list_head cb_devs;
> + /** @iommu_mgr: IOMMU/memory manager instance */
> + struct qda_memory_manager *iommu_mgr;
> /** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
> const char *dsp_name;
> };
> @@ -59,6 +62,8 @@ static inline struct qda_dev *qda_dev_from_drm(struct drm_device *dev)
> struct qda_dev *qda_alloc_device(struct device *dev);
>
> /* Core device lifecycle */
> +int qda_init_device(struct qda_dev *qdev);
> +void qda_deinit_device(struct qda_dev *qdev);
> int qda_register_device(struct qda_dev *qdev);
> void qda_unregister_device(struct qda_dev *qdev);
>
> diff --git a/drivers/accel/qda/qda_memory_manager.c b/drivers/accel/qda/qda_memory_manager.c
> new file mode 100644
> index 000000000000..00a9c0ae4224
> --- /dev/null
> +++ b/drivers/accel/qda/qda_memory_manager.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +
> +#include <linux/refcount.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/xarray.h>
> +#include <drm/drm_file.h>
> +#include "qda_drv.h"
> +#include "qda_memory_manager.h"
> +
> +static void cleanup_all_memory_devices(struct qda_memory_manager *mem_mgr)
> +{
> + unsigned long index;
> + void *entry;
> +
> + pr_debug("qda: Starting cleanup of all memory devices\n");
pr_debug is a third way to debug. Stop it, please.
> +
> + xa_for_each(&mem_mgr->device_xa, index, entry) {
> + struct qda_iommu_device *iommu_dev = entry;
> +
> + pr_debug("qda: Cleaning up device id=%lu\n", index);
> +
> + xa_erase(&mem_mgr->device_xa, index);
> + kfree(iommu_dev);
> + }
> +
> + pr_debug("qda: Completed cleanup of all memory devices\n");
> +}
> +
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 06/15] accel/qda: Create compute context bank devices on QDA compute bus
From: Dmitry Baryshkov @ 2026-05-20 14:23 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-6-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:56AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Introduce the CB (compute context bank) device management layer for the
> QDA driver. Each DSP domain node in the device tree may contain child
> nodes with compatible "qcom,fastrpc-compute-cb", each representing one
> IOMMU context bank. The driver enumerates those child nodes during
> RPMsg probe and creates a corresponding device on the qda-compute-cb
> bus for each one.
>
> The CB devices are created via create_qda_cb_device(), which registers
> them on the qda-compute-cb bus so that the IOMMU subsystem assigns each
> device its own IOMMU domain, enabling per-session address space
> isolation for DSP buffer mapping.
>
> The new qda_cb.c file provides two functions:
>
> qda_create_cb_device()
> Reads the "reg" property from the DT child node to obtain the
> stream ID, constructs a unique device name of the form
> "qda-cb-<dsp>-<sid>", and registers the device on the compute bus.
> A qda_cb_dev entry is allocated and appended to qdev->cb_devs so
> that the list can be walked during teardown.
>
> qda_destroy_cb_device()
> Removes the device from its IOMMU group before calling
> device_unregister(), ensuring the IOMMU domain is released cleanly.
>
> CB devices are populated before the DRM device is registered and
> destroyed before it is unplugged, so no DRM operation can race with
> CB teardown. On probe failure after population, qda_cb_unpopulate()
> is called to clean up any CBs that were successfully created before
> the error.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/Makefile | 1 +
> drivers/accel/qda/qda_cb.c | 99 +++++++++++++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_cb.h | 32 ++++++++++++++
> drivers/accel/qda/qda_drv.c | 1 +
> drivers/accel/qda/qda_drv.h | 3 ++
> drivers/accel/qda/qda_rpmsg.c | 12 +++++-
> 6 files changed, 147 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index 424176f652a5..143c9e4e789e 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -6,6 +6,7 @@
> obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
>
> qda-y := \
> + qda_cb.o \
> qda_drv.o \
> qda_rpmsg.o
>
> diff --git a/drivers/accel/qda/qda_cb.c b/drivers/accel/qda/qda_cb.c
> new file mode 100644
> index 000000000000..77caf8438c67
> --- /dev/null
> +++ b/drivers/accel/qda/qda_cb.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <linux/dma-mapping.h>
> +#include <linux/device.h>
> +#include <linux/of.h>
> +#include <linux/iommu.h>
> +#include <linux/qda_compute_bus.h>
> +#include <linux/slab.h>
> +#include <drm/drm_print.h>
> +#include "qda_drv.h"
> +#include "qda_cb.h"
> +
> +int qda_create_cb_device(struct qda_dev *qdev, struct device_node *cb_node)
> +{
> + struct device *cb_dev;
> + u32 sid = 0;
> + char name[64];
> + struct qda_cb_dev *entry;
> +
> + drm_dbg_driver(&qdev->drm_dev, "Creating CB device for node: %s\n", cb_node->name);
> +
> + of_property_read_u32(cb_node, "reg", &sid);
> +
> + snprintf(name, sizeof(name), "qda-cb-%s-%u", qdev->dsp_name, sid);
> +
> + cb_dev = create_qda_cb_device(qdev->dev, name, DMA_BIT_MASK(32), cb_node);
Wrong prefix. Pass the name format and the params to this function. Use
kasprintf in it.
> + if (IS_ERR(cb_dev)) {
> + drm_err(&qdev->drm_dev, "Failed to create CB device for SID %u: %ld\n",
> + sid, PTR_ERR(cb_dev));
> + return PTR_ERR(cb_dev);
> + }
> +
> + entry = kzalloc_obj(*entry);
> + if (!entry) {
> + device_unregister(cb_dev);
> + return -ENOMEM;
> + }
> +
> + entry->dev = cb_dev;
> + list_add_tail(&entry->node, &qdev->cb_devs);
> +
> + drm_dbg_driver(&qdev->drm_dev, "Successfully created CB device for SID %u\n", sid);
> + return 0;
> +}
> +
> +void qda_cb_unpopulate(struct qda_dev *qdev)
> +{
> + struct qda_cb_dev *entry, *tmp;
> +
> + list_for_each_entry_safe(entry, tmp, &qdev->cb_devs, node) {
> + list_del(&entry->node);
> + qda_destroy_cb_device(entry->dev);
> + kfree(entry);
> + }
> +}
> +
> +int qda_cb_populate(struct qda_dev *qdev, struct device_node *parent_node)
> +{
> + struct device_node *child;
> + int count = 0, success = 0;
> +
> + for_each_child_of_node(parent_node, child) {
> + if (of_device_is_compatible(child, "qcom,fastrpc-compute-cb")) {
> + count++;
> + if (qda_create_cb_device(qdev, child) == 0) {
> + success++;
> + dev_dbg(qdev->dev, "Created CB device for node: %s\n",
> + child->name);
Stop counting successes.
> + } else {
> + dev_err(qdev->dev, "Failed to create CB device for: %s\n",
> + child->name);
Unwind, return error.
> + }
> + }
> + }
> + if (count == 0)
> + return 0;
> + return success > 0 ? 0 : -ENODEV;
> +}
> +
> +void qda_destroy_cb_device(struct device *cb_dev)
> +{
> + struct iommu_group *group;
> +
> + if (!cb_dev) {
How can it be?
> + pr_debug("qda: NULL CB device passed to destroy\n");
> + return;
> + }
> +
> + dev_dbg(cb_dev, "Destroying CB device %s\n", dev_name(cb_dev));
> +
> + group = iommu_group_get(cb_dev);
> + if (group) {
> + dev_dbg(cb_dev, "Removing %s from IOMMU group\n", dev_name(cb_dev));
Be uniform. It's either drm_dbg_foo() or dev_dbg() all over the place.
Don't mix them.
> + iommu_group_remove_device(cb_dev);
> + iommu_group_put(group);
> + }
> +
> + device_unregister(cb_dev);
> +}
> @@ -59,9 +61,17 @@ static int qda_rpmsg_probe(struct rpmsg_device *rpdev)
> }
> qdev->dsp_name = label;
>
> + ret = qda_cb_populate(qdev, rpdev->dev.of_node);
> + if (ret) {
> + dev_err(qdev->dev, "Failed to populate child devices: %d\n", ret);
> + return ret;
> + }
> +
> ret = qda_register_device(qdev);
> - if (ret)
> + if (ret) {
> + qda_cb_unpopulate(qdev);
> return ret;
Unwinding registration?
> + }
>
> drm_info(&qdev->drm_dev, "QDA RPMsg probe complete for %s\n", qdev->dsp_name);
> return 0;
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH v6 06/43] KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level
From: Sean Christopherson @ 2026-05-20 14:21 UTC (permalink / raw)
To: Fuad Tabba
Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <CA+EHjTxvLU4XDPXDXYXXWJES1OFQgN8VTRLMgCCNMwBE6Hk8tQ@mail.gmail.com>
On Wed, May 20, 2026, Fuad Tabba wrote:
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > When the maximum mapping level is queried, KVM's MMU lock is held, and
> > while the MMU lock is held, guest_memfd cannot take the
> > filemap_invalidate_lock() to look up the current shared/private state of
> > the gfn, for these reasons:
> >
> > + The MMU lock is a spinlock or rwlock and cannot be held while taking a
> > lock that can sleep.
> > + In guest_memfd's code paths (such as truncate), the
> > filemap_invalidate_lock() is held while taking the MMU lock, and taking
> > the locks in reverse order would introduce a AB-BA deadlock.
> >
> > Currently, the maximum mapping level is only queried from guest_memfd in
> > the process of recovering huge pages, if dirty logging is disabled on a
> > memslot. Dirty logging is not currently supported for guest_memfd, and
> > guest_memfd memslots also cannot be updated.
> >
> > For now, bug the VM if guest_memfd needs to be queried to determine the
> > maximum mapping level. This guard can be removed if/when support is added.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a80a876ab4ad6..153bcc5369985 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3357,6 +3357,15 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
> > max_level = fault->max_level;
> > is_private = fault->is_private;
> > } else {
> > + /*
> > + * Memory attributes cannot be obtained from guest_memfd while
> > + * the MMU lock is held.
> > + */
> > + if (KVM_BUG_ON(static_call_query(__kvm_get_memory_attributes) ==
> > + kvm_gmem_get_memory_attributes, kvm)) {
> > + return 0;
> > + }
> > +
>
> This directly takes the address of kvm_gmem_get_memory_attributes,
> which is only compiled if CONFIG_KVM_GUEST_MEMFD=y. This breaks
> ARCH=i386.
And this bleeds guest_memfd implementation details into places they don't belong.
The right way to deal with this is to use lockdep_assert_not_held() in whatever
code mustn't run with mmu_lock held. E.g.
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index c9f155c2dc5c..3bea9c1137ef 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -547,6 +547,9 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
struct inode *inode;
+ /* Comment goes here. */
+ lockdep_assert_not_held(&kvm->mmu_lock);
+
/*
* If this gfn has no associated memslot, there's no chance of the gfn
* being backed by private memory, since guest_memfd must be used for
But I'm confused, because kvm_gmem_get_memory_attributes() doesn't actually take
filemap_invalidate_lock(), so what exactly is the problem?
> > max_level = PG_LEVEL_NUM;
> > is_private = kvm_mem_is_private(kvm, gfn);
> > }
> >
> > --
> > 2.54.0.563.g4f69b47b94-goog
> >
> >
^ permalink raw reply related
* Re: [PATCH 05/15] iommu: Add QDA compute context bank bus to iommu_buses
From: Dmitry Baryshkov @ 2026-05-20 14:19 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-5-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:55AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Register the QDA compute context bank bus (qda-compute-cb) with the
> IOMMU subsystem by adding it to the iommu_buses[] array.
>
> The QDA driver creates synthetic devices on this bus to represent
> IOMMU context banks (CBs). Each CB device needs its own IOMMU domain
> so that the DSP memory manager can enforce per-session address space
> isolation. Without this registration, the IOMMU subsystem does not
> probe CB devices for IOMMU groups and of_dma_configure() in the bus
> dma_configure callback has no IOMMU domain to attach to.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/iommu/iommu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 04/15] accel/qda: Add compute bus for QDA context banks
From: Dmitry Baryshkov @ 2026-05-20 14:19 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-4-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:54AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Introduce a custom virtual bus (qda-compute-cb) for managing IOMMU
> context bank (CB) devices used by the QDA driver.
>
> IOMMU context banks are synthetic constructs — they are not real
> platform devices and do not appear as children of a platform bus node
> in the device tree. Using a platform driver to represent them was
> therefore incorrect and introduced a probe-ordering race: device nodes
> were created before the RPMsg channel resources were fully initialized,
> and because probe runs asynchronously, user-space could open a CB
> device and attempt to start a session before the underlying transport
> was ready.
>
> The qda-compute-cb bus solves this by allowing the main QDA driver to
> create CB devices explicitly and under its own control, making their
> lifetime strictly subordinate to the parent qda_dev. The bus provides
> a dma_configure callback that calls of_dma_configure() so that each CB
> device gets its own IOMMU domain derived from its device-tree node,
> enabling per-session memory isolation.
>
> The bus type and the CB device constructor (create_qda_cb_device) are
> exported for use by the QDA memory manager.
>
> A hidden Kconfig symbol (DRM_ACCEL_QDA_COMPUTE_BUS) is introduced and
> automatically selected by DRM_ACCEL_QDA so that the bus initialisation
> runs via postcore_initcall before any QDA device probes.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/Makefile | 1 +
> drivers/accel/qda/Kconfig | 4 +++
> drivers/accel/qda/Makefile | 2 ++
> drivers/accel/qda/qda_compute_bus.c | 68 +++++++++++++++++++++++++++++++++++++
> include/linux/qda_compute_bus.h | 32 +++++++++++++++++
> 5 files changed, 107 insertions(+)
>
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index 58c08dd5f389..9ed843cd293f 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -6,4 +6,5 @@ obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
> obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
> obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
> obj-$(CONFIG_DRM_ACCEL_QDA) += qda/
> +obj-$(CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS) += qda/
Ugh. The previous line should be enough (but don't trust me).
> obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
> \ No newline at end of file
> diff --git a/drivers/accel/qda/Kconfig b/drivers/accel/qda/Kconfig
> index 484d21ff1b55..2a61a4dda054 100644
> --- a/drivers/accel/qda/Kconfig
> +++ b/drivers/accel/qda/Kconfig
> @@ -3,11 +3,15 @@
> # Qualcomm DSP accelerator driver
> #
>
> +config DRM_ACCEL_QDA_COMPUTE_BUS
> + bool
> +
> config DRM_ACCEL_QDA
> tristate "Qualcomm DSP accelerator"
> depends on DRM_ACCEL
> depends on ARCH_QCOM || COMPILE_TEST
> depends on RPMSG
> + select DRM_ACCEL_QDA_COMPUTE_BUS
> help
> Enables the DRM-based accelerator driver for Qualcomm's Hexagon DSPs.
> This driver provides a standardized interface for offloading computational
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index dbe809067a8b..424176f652a5 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -8,3 +8,5 @@ obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
> qda-y := \
> qda_drv.o \
> qda_rpmsg.o
> +
> +obj-$(CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS) += qda_compute_bus.o
> diff --git a/drivers/accel/qda/qda_compute_bus.c b/drivers/accel/qda/qda_compute_bus.c
> new file mode 100644
> index 000000000000..c59d977e924d
> --- /dev/null
> +++ b/drivers/accel/qda/qda_compute_bus.c
> @@ -0,0 +1,68 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/qda_compute_bus.h>
> +#include <linux/slab.h>
> +
> +static int qda_cb_bus_dma_configure(struct device *dev)
> +{
> + return of_dma_configure(dev, dev->of_node, true);
> +}
> +
> +const struct bus_type qda_cb_bus_type = {
> + .name = "qda-compute-cb",
> + .dma_configure = qda_cb_bus_dma_configure,
> +};
> +EXPORT_SYMBOL_GPL(qda_cb_bus_type);
> +
> +static void release_qda_cb_device(struct device *dev)
> +{
> + of_node_put(dev->of_node);
> + kfree(dev);
> +}
> +
> +struct device *create_qda_cb_device(struct device *parent_device, const char *name,
> + u64 dma_mask, struct device_node *of_node)
> +{
> + struct device *dev;
> + int ret;
> +
> + dev = kzalloc_obj(*dev);
> + if (!dev)
> + return ERR_PTR(-ENOMEM);
> +
> + dev->release = release_qda_cb_device;
> + dev->bus = &qda_cb_bus_type;
> + dev->parent = parent_device;
> + dev->coherent_dma_mask = dma_mask;
> + dev->dma_mask = &dev->coherent_dma_mask;
> + dev->of_node = of_node_get(of_node);
> +
> + dev_set_name(dev, "%s", name);
> +
> + ret = device_register(dev);
> + if (ret) {
> + put_device(dev);
> + return ERR_PTR(ret);
> + }
> +
> + return dev;
> +}
> +EXPORT_SYMBOL_GPL(create_qda_cb_device);
> +
> +static int __init qda_cb_bus_init(void)
> +{
> + int err;
> +
> + err = bus_register(&qda_cb_bus_type);
> + if (err < 0) {
> + pr_err("qda-compute-cb bus registration failed: %d\n", err);
> + return err;
> + }
> + return 0;
> +}
> +
> +postcore_initcall(qda_cb_bus_init);
> diff --git a/include/linux/qda_compute_bus.h b/include/linux/qda_compute_bus.h
> new file mode 100644
> index 000000000000..90bf248c7285
> --- /dev/null
> +++ b/include/linux/qda_compute_bus.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_COMPUTE_BUS_H__
> +#define __QDA_COMPUTE_BUS_H__
> +
> +#include <linux/device.h>
> +
> +/*
> + * Custom bus type for QDA compute context bank (CB) devices
> + *
> + * This bus type is used for manually created CB devices that represent
> + * IOMMU context banks. The custom bus allows proper IOMMU configuration
> + * and device management for these virtual devices.
> + */
> +#ifdef CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS
> +extern const struct bus_type qda_cb_bus_type;
> +
> +struct device *create_qda_cb_device(struct device *parent_device, const char *name,
> + u64 dma_mask, struct device_node *of_node);
> +#else
> +static inline struct device *create_qda_cb_device(struct device *parent_device,
> + const char *name, u64 dma_mask,
> + struct device_node *of_node)
> +{
> + return ERR_PTR(-ENODEV);
> +}
> +#endif
> +
> +#endif /* __QDA_COMPUTE_BUS_H__ */
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH v4 1/3] drm/fdinfo: Add "evicted" memory accounting
From: Tvrtko Ursulin @ 2026-05-20 14:19 UTC (permalink / raw)
To: Nicolas Frattaroli, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Boris Brezillon,
Steven Price, Liviu Dudau, Jonathan Corbet, Shuah Khan
Cc: dri-devel, linux-kernel, kernel, linux-doc
In-Reply-To: <20260520-panthor-bo-reclaim-observability-v4-1-a47ab61cb80d@collabora.com>
On 20/05/2026 14:04, Nicolas Frattaroli wrote:
> Currently, there's no way to know for certain how much GPU memory was
> swapped out. The difference between total and resident memory would
> include newly allocated pages, which are not resident, but also aren't
> swapped out.
>
> Add a new drm_gem_object_status so drivers can signal when an object has
> been evicted to swap, and add a new "evicted" counter to
> drm_memory_stats.
>
> Due to how the supported_flags bitmask is determined, the "evicted"
> count won't be printed to fdinfo if there's no swapped out pages.
>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> Documentation/gpu/drm-usage-stats.rst | 6 ++++++
> drivers/gpu/drm/drm_file.c | 8 ++++++++
> include/drm/drm_file.h | 2 ++
> include/drm/drm_gem.h | 2 ++
> 4 files changed, 18 insertions(+)
>
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index 70b7cfcc194f..ac1dbf52d96d 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -202,6 +202,12 @@ One practical example of this could be the presence of unsignaled fences in a
> GEM buffer reservation object. Therefore, the active category is a subset of the
> resident category.
>
> +- drm-evicted-<region>: <uint> [KiB|MiB]
> +
> +The total size of buffers that have been evicted and are no longer pinned by the
> +device. Only present if there are buffers that are currently evicted, and if the
> +driver implements reporting of this type of memory.
The semantics as tricky to make work in an obvious way.
On one hand the text above is almost exactly the semantics of 'total' -
'resident'. Almost meaning it was resident at some point, but isn't any
more. Whereas raw 'total' - 'resident' can also mean it never has been
instantiated.
You could even have a "workaround" where you report a 'swap' memory
region and then don't need to add anything new to the spec.
Next problem - on paper evicted could be useful to replace driver legacy
keys such as 'amd-evicted-ram'. But that "evicted" is defined as "not in
a the preferred placement". While your evicted is more like "no current
placement" (as in, no GPU accessible backing storage).
Is it possible to find a definition of this new category which makes
sense for different GPUs/drivers, be it integrated or discrete.
Or would simply going for 'drm-total-swap:' (or resident?) work for
panthor? Advantage being it would also work unambiguously for discrete
drivers.
Like the ones which support multiple TTM placements, for example VRAM +
SYSTEM and then next step is swapping out so an extreme example on a
16GiB GPU + 16GiB RAM machine with a 32GiB gfx workload could be like:
drm-total-vram: 32GiB
drm-resident-vram: 16GiB
drm-resident-system: 15GiB
drm-total-swap: 1GiB
Does this look clear enough? Whereas with the "evicted" category it
would be:
drm-total-vram: 32GiB
drm-resident-vram: 16GiB
drm-evicted-vram: 16GiB # portion which got demoted to system RAM
drm-resident-system: 15GiB
drm-evicted-system: 1GiB # portion which got demoted to swap
Where drm-evicted-vram is redundant to "total - resident". And it is
overloaded semantics as it where does evicted go depending on the
GPU/driver/region.
Thoughts, opinions?
Regards,
Tvrtko
> +
> Implementation Details
> ======================
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index ec820686b302..5078172976c0 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -868,6 +868,7 @@ int drm_memory_stats_is_zero(const struct drm_memory_stats *stats)
> stats->private == 0 &&
> stats->resident == 0 &&
> stats->purgeable == 0 &&
> + stats->evicted == 0 &&
> stats->active == 0);
> }
> EXPORT_SYMBOL(drm_memory_stats_is_zero);
> @@ -901,6 +902,10 @@ void drm_print_memory_stats(struct drm_printer *p,
> if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
> drm_fdinfo_print_size(p, prefix, "purgeable", region,
> stats->purgeable);
> +
> + if (supported_status & DRM_GEM_OBJECT_EVICTED)
> + drm_fdinfo_print_size(p, prefix, "evicted", region,
> + stats->evicted);
> }
> EXPORT_SYMBOL(drm_print_memory_stats);
>
> @@ -954,6 +959,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>
> if (s & DRM_GEM_OBJECT_PURGEABLE)
> status.purgeable += add_size;
> +
> + if (s & DRM_GEM_OBJECT_EVICTED)
> + status.evicted += add_size;
> }
> spin_unlock(&file->table_lock);
>
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 6ee70ad65e1f..7e4cb45a52c3 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -500,6 +500,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev,
> * @resident: Total size of GEM objects backing pages
> * @purgeable: Total size of GEM objects that can be purged (resident and not active)
> * @active: Total size of GEM objects active on one or more engines
> + * @evicted: Total size of GEM objects that have been evicted
> *
> * Used by drm_print_memory_stats()
> */
> @@ -509,6 +510,7 @@ struct drm_memory_stats {
> u64 resident;
> u64 purgeable;
> u64 active;
> + u64 evicted;
> };
>
> enum drm_gem_object_status;
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 86f5846154f7..799588a2762a 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -53,6 +53,7 @@ struct drm_gem_object;
> * @DRM_GEM_OBJECT_RESIDENT: object is resident in memory (ie. not unpinned)
> * @DRM_GEM_OBJECT_PURGEABLE: object marked as purgeable by userspace
> * @DRM_GEM_OBJECT_ACTIVE: object is currently used by an active submission
> + * @DRM_GEM_OBJECT_EVICTED: object is evicted and no longer pinned by driver
> *
> * Bitmask of status used for fdinfo memory stats, see &drm_gem_object_funcs.status
> * and drm_show_fdinfo(). Note that an object can report DRM_GEM_OBJECT_PURGEABLE
> @@ -67,6 +68,7 @@ enum drm_gem_object_status {
> DRM_GEM_OBJECT_RESIDENT = BIT(0),
> DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> DRM_GEM_OBJECT_ACTIVE = BIT(2),
> + DRM_GEM_OBJECT_EVICTED = BIT(3),
> };
>
> /**
>
^ permalink raw reply
* Re: [PATCH v7 1/2] usb: xhci-pci: add AMD Promontory 21 PCI glue
From: Guenter Roeck @ 2026-05-20 14:18 UTC (permalink / raw)
To: Jihong Min
Cc: Greg Kroah-Hartman, Mathias Nyman, Jonathan Corbet, Shuah Khan,
Mario Limonciello, Basavaraj Natikar, Michal Pecio,
Mario Limonciello, Yaroslav Isakov, linux-usb, linux-hwmon,
linux-doc, linux-pci, linux-kernel
In-Reply-To: <20260519000732.2334711-2-hurryman2212@gmail.com>
On Tue, May 19, 2026 at 09:07:31AM +0900, Jihong Min wrote:
> AMD Promontory 21 (PROM21) xHCI PCI functions use the common xhci-pci
> core for USB operation, but also expose controller-specific sensor data.
> Add a small PROM21 PCI glue driver for AMD 1022:43fc and 1022:43fd
> controllers.
>
> The glue delegates USB host operation to the common xhci-pci core and
> publishes a "hwmon" auxiliary device with parent-provided MMIO data.
> Auxiliary device creation failure is logged but does not fail the xHCI
> probe.
>
> Make the PROM21 glue a hidden Kconfig tristate driven by the user-visible
> SENSORS_PROM21_XHCI option. If sensor support is disabled, generic
> xhci-pci binds PROM21 controllers normally. If sensor support is enabled,
> the glue follows USB_XHCI_PCI.
>
> This keeps the auxiliary device available for a modular sensor driver while
> avoiding a built-in xhci-pci core handing PROM21 controllers to a glue
> driver that is only available as a module during initramfs.
>
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Jihong Min <hurryman2212@gmail.com>
> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
The two patches should be applied together. For now I will assume that
they will both be applied through a usb tree since this patch touches
common usb code.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH 03/15] accel/qda: Add initial QDA DRM accelerator driver
From: Dmitry Baryshkov @ 2026-05-20 14:18 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-3-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:53AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Add the foundational driver files for the Qualcomm DSP Accelerator
> (QDA), a DRM accel driver for Qualcomm DSPs. The driver integrates
> with the DRM accel subsystem (drivers/accel/) and provides:
>
> - A standard /dev/accel/accel* character device node via DRM.
> - GEM-based buffer management with DMA-BUF import/export (PRIME).
> - IOMMU context bank management for per-session memory isolation.
> - Standard DRM IOCTLs for device management and job submission.
>
> qda_drv.c / qda_drv.h: Core DRM driver registration. Defines the
> drm_driver ops table, per-file private state (qda_file_priv), and the
> main device structure (qda_dev) which embeds drm_device.
>
> qda_rpmsg.c / qda_rpmsg.h: RPMsg transport layer. Registers an
> rpmsg_driver matching the "qcom,fastrpc" compatible string. On probe
> it allocates a qda_dev, reads the DSP domain name from the "label" DT
> property, and registers the DRM device.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/Kconfig | 1 +
> drivers/accel/Makefile | 1 +
> drivers/accel/qda/Kconfig | 30 +++++++++++++
> drivers/accel/qda/Makefile | 10 +++++
> drivers/accel/qda/qda_drv.c | 97 ++++++++++++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_drv.h | 62 +++++++++++++++++++++++++++
> drivers/accel/qda/qda_rpmsg.c | 99 +++++++++++++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_rpmsg.h | 13 ++++++
> 8 files changed, 313 insertions(+)
>
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index bdf48ccafcf2..74ac0f71bc9d 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -29,6 +29,7 @@ source "drivers/accel/ethosu/Kconfig"
> source "drivers/accel/habanalabs/Kconfig"
> source "drivers/accel/ivpu/Kconfig"
> source "drivers/accel/qaic/Kconfig"
> +source "drivers/accel/qda/Kconfig"
> source "drivers/accel/rocket/Kconfig"
>
> endif
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index 1d3a7251b950..58c08dd5f389 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) += ethosu/
> obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
> obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
> obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
> +obj-$(CONFIG_DRM_ACCEL_QDA) += qda/
> obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
> \ No newline at end of file
> diff --git a/drivers/accel/qda/Kconfig b/drivers/accel/qda/Kconfig
> new file mode 100644
> index 000000000000..484d21ff1b55
> --- /dev/null
> +++ b/drivers/accel/qda/Kconfig
> @@ -0,0 +1,30 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Qualcomm DSP accelerator driver
> +#
> +
> +config DRM_ACCEL_QDA
> + tristate "Qualcomm DSP accelerator"
> + depends on DRM_ACCEL
> + depends on ARCH_QCOM || COMPILE_TEST
> + depends on RPMSG
> + help
> + Enables the DRM-based accelerator driver for Qualcomm's Hexagon DSPs.
> + This driver provides a standardized interface for offloading computational
> + tasks to the DSP, including audio processing, sensor offload, computer
> + vision, and AI inference workloads.
> +
> + The driver supports all DSP domains (ADSP, CDSP, SDSP, GDSP) and
> + implements the FastRPC protocol for communication between the application
> + processor and DSP. It integrates with the Linux kernel's Compute
> + Accelerators subsystem (drivers/accel/) and provides a modern alternative
> + to the legacy FastRPC driver found in drivers/misc/.
> +
> + Key features include DMA-BUF interoperability for seamless buffer sharing
Key features of what? Consider distro maintainers reading your help text
in order to identify whether to enable it or not.
> + with other multimedia subsystems, IOMMU-based memory isolation, and
> + standard DRM IOCTLs for device management and job submission.
> +
> + If unsure, say N.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called qda.
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> new file mode 100644
> index 000000000000..dbe809067a8b
> --- /dev/null
> +++ b/drivers/accel/qda/Makefile
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for Qualcomm DSP accelerator driver
> +#
> +
> +obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
> +
> +qda-y := \
> + qda_drv.o \
> + qda_rpmsg.o
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> new file mode 100644
> index 000000000000..1c1bab68d445
> --- /dev/null
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <drm/drm_accel.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_ioctl.h>
> +#include <drm/drm_print.h>
> +
> +#include "qda_drv.h"
> +#include "qda_rpmsg.h"
> +
> +static int qda_open(struct drm_device *dev, struct drm_file *file)
> +{
> + struct qda_file_priv *qda_file_priv;
> +
> + qda_file_priv = kzalloc_obj(*qda_file_priv);
> + if (!qda_file_priv)
> + return -ENOMEM;
> +
> + qda_file_priv->qda_dev = qda_dev_from_drm(dev);
> + file->driver_priv = qda_file_priv;
> +
> + return 0;
> +}
> +
> +static void qda_postclose(struct drm_device *dev, struct drm_file *file)
> +{
> + struct qda_file_priv *qda_file_priv = file->driver_priv;
> +
> + kfree(qda_file_priv);
> + file->driver_priv = NULL;
> +}
> +
> +DEFINE_DRM_ACCEL_FOPS(qda_accel_fops);
> +
> +static const struct drm_driver qda_drm_driver = {
> + .driver_features = DRIVER_COMPUTE_ACCEL,
> + .fops = &qda_accel_fops,
> + .open = qda_open,
> + .postclose = qda_postclose,
> + .name = QDA_DRIVER_NAME,
> + .desc = "Qualcomm DSP Accelerator Driver",
> +};
> +
> +struct qda_dev *qda_alloc_device(struct device *dev)
> +{
> + struct qda_dev *qdev;
> +
> + qdev = devm_drm_dev_alloc(dev, &qda_drm_driver, struct qda_dev, drm_dev);
> + if (IS_ERR(qdev))
> + return ERR_CAST(qdev);
> +
> + return qdev;
> +}
> +
> +void qda_unregister_device(struct qda_dev *qdev)
> +{
> + drm_dev_unregister(&qdev->drm_dev);
> +}
> +
> +int qda_register_device(struct qda_dev *qdev)
> +{
> + int ret;
> +
> + ret = drm_dev_register(&qdev->drm_dev, 0);
> + if (ret)
> + drm_err(&qdev->drm_dev, "Failed to register DRM device: %d\n", ret);
> +
> + return ret;
> +}
> +
> +static int __init qda_core_init(void)
> +{
> + int ret;
> +
> + ret = qda_rpmsg_register();
> + if (ret)
> + return ret;
> +
> + pr_info("qda: QDA driver initialization complete\n");
> + return 0;
> +}
> +
> +static void __exit qda_core_exit(void)
> +{
> + qda_rpmsg_unregister();
> +}
> +
> +module_init(qda_core_init);
> +module_exit(qda_core_exit);
> +
> +MODULE_AUTHOR("Qualcomm AI Infra Team");
> +MODULE_DESCRIPTION("Qualcomm DSP Accelerator Driver");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
> new file mode 100644
> index 000000000000..7ba2ef19a411
> --- /dev/null
> +++ b/drivers/accel/qda/qda_drv.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_DRV_H__
> +#define __QDA_DRV_H__
> +
> +#include <linux/device.h>
> +#include <linux/rpmsg.h>
> +#include <linux/types.h>
> +#include <drm/drm_device.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +
> +/* Driver identification */
> +#define QDA_DRIVER_NAME "qda"
> +
> +/**
> + * struct qda_file_priv - Per-process private data for DRM file
> + */
> +struct qda_file_priv {
> + /** @qda_dev: Back-pointer to device structure */
> + struct qda_dev *qda_dev;
> +};
> +
> +/**
> + * struct qda_dev - Main device structure for QDA driver
> + *
> + * The DRM device is embedded as the first member so that container_of()
> + * can recover the qda_dev from any drm_device pointer.
> + */
> +struct qda_dev {
> + /** @drm_dev: Embedded DRM device; recover via qda_dev_from_drm() */
> + struct drm_device drm_dev;
> + /** @rpdev: RPMsg device for communication with the remote processor */
> + struct rpmsg_device *rpdev;
> + /** @dev: Underlying Linux device */
> + struct device *dev;
> + /** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
> + const char *dsp_name;
> +};
> +
> +/**
> + * qda_dev_from_drm - Recover qda_dev from an embedded drm_device pointer
> + * @dev: Pointer to the embedded drm_device
> + *
> + * Return: Pointer to the enclosing qda_dev.
> + */
> +static inline struct qda_dev *qda_dev_from_drm(struct drm_device *dev)
> +{
> + return container_of(dev, struct qda_dev, drm_dev);
> +}
> +
> +/* Device allocation (uses devm_drm_dev_alloc internally) */
> +struct qda_dev *qda_alloc_device(struct device *dev);
> +
> +/* Core device lifecycle */
> +int qda_register_device(struct qda_dev *qdev);
> +void qda_unregister_device(struct qda_dev *qdev);
> +
> +#endif /* __QDA_DRV_H__ */
> diff --git a/drivers/accel/qda/qda_rpmsg.c b/drivers/accel/qda/qda_rpmsg.c
> new file mode 100644
> index 000000000000..6eaf1b145f8a
> --- /dev/null
> +++ b/drivers/accel/qda/qda_rpmsg.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/rpmsg.h>
> +#include <drm/drm_print.h>
> +
> +#include "qda_drv.h"
> +#include "qda_rpmsg.h"
> +
> +static struct qda_dev *alloc_and_init_qdev(struct rpmsg_device *rpdev)
Use the prefix uniformly.
> +{
> + struct qda_dev *qdev;
> +
> + qdev = qda_alloc_device(&rpdev->dev);
> + if (IS_ERR(qdev))
> + return qdev;
> +
> + qdev->dev = &rpdev->dev;
> + qdev->rpdev = rpdev;
> + dev_set_drvdata(&rpdev->dev, qdev);
> +
> + return qdev;
> +}
> +
> +static int qda_rpmsg_cb(struct rpmsg_device *rpdev, void *data, int len,
> + void *priv, u32 src)
> +{
> + /* Placeholder: responses will be dispatched here */
> + return 0;
> +}
> +
> +static void qda_rpmsg_remove(struct rpmsg_device *rpdev)
> +{
> + struct qda_dev *qdev = dev_get_drvdata(&rpdev->dev);
> +
> + drm_dev_unplug(&qdev->drm_dev);
> + qdev->rpdev = NULL;
> + qda_unregister_device(qdev);
> + dev_info(qdev->dev, "RPMsg device removed\n");
Drop the spamming. And useless (where it is useless) drm_dbg() / dev_dbg() spamming too.
> +}
> +
> +static int qda_rpmsg_probe(struct rpmsg_device *rpdev)
> +{
> + struct qda_dev *qdev;
> + const char *label;
> + int ret;
> +
> + dev_dbg(&rpdev->dev, "QDA RPMsg probe starting\n");
> +
> + qdev = alloc_and_init_qdev(rpdev);
> + if (IS_ERR(qdev))
> + return PTR_ERR(qdev);
> +
> + ret = of_property_read_string(rpdev->dev.of_node, "label", &label);
> + if (ret) {
> + dev_err(qdev->dev, "Missing 'label' property in DT node: %d\n", ret);
> + return ret;
> + }
> + qdev->dsp_name = label;
Why not just of_property_read_string(...., &qdev->dsp_name)?
> +
> + ret = qda_register_device(qdev);
return qda_register_device();
> + if (ret)
> + return ret;
> +
> + drm_info(&qdev->drm_dev, "QDA RPMsg probe complete for %s\n", qdev->dsp_name);
> + return 0;
> +}
> +
> +static const struct of_device_id qda_rpmsg_id_table[] = {
> + { .compatible = "qcom,fastrpc" },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, qda_rpmsg_id_table);
> +
> +static struct rpmsg_driver qda_rpmsg_driver = {
> + .probe = qda_rpmsg_probe,
> + .remove = qda_rpmsg_remove,
> + .callback = qda_rpmsg_cb,
> + .drv = {
> + .name = "qcom,fastrpc",
> + .of_match_table = qda_rpmsg_id_table,
> + },
> +};
> +
> +int qda_rpmsg_register(void)
> +{
> + int ret = register_rpmsg_driver(&qda_rpmsg_driver);
> +
> + if (ret)
> + pr_err("qda: Failed to register RPMsg driver: %d\n", ret);
> +
> + return ret;
> +}
> +
> +void qda_rpmsg_unregister(void)
> +{
> + unregister_rpmsg_driver(&qda_rpmsg_driver);
> +}
Just use module_rpmsg_driver(), drop all the wrappers and module_init()
/ exit().
> diff --git a/drivers/accel/qda/qda_rpmsg.h b/drivers/accel/qda/qda_rpmsg.h
> new file mode 100644
> index 000000000000..5229d834b34b
> --- /dev/null
> +++ b/drivers/accel/qda/qda_rpmsg.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_RPMSG_H__
> +#define __QDA_RPMSG_H__
> +
> +/* RPMsg transport layer registration */
> +int qda_rpmsg_register(void);
> +void qda_rpmsg_unregister(void);
> +
> +#endif /* __QDA_RPMSG_H__ */
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 02/15] accel/qda: Add QDA driver documentation
From: Dmitry Baryshkov @ 2026-05-20 14:12 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-2-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Add documentation for the Qualcomm DSP Accelerator (QDA) driver under
> Documentation/accel/qda/. The documentation covers the driver
> architecture, GEM-based buffer management, IOMMU context bank
> isolation, and the RPMsg transport layer.
>
> The user-space API section describes the DRM IOCTLs for session
> management, GEM buffer allocation, and remote procedure invocation via
> the FastRPC protocol, along with a typical application lifecycle
> example. Sections for dynamic debug and basic testing are also
> included.
>
> Wire the new documentation into the Compute Accelerators index at
> Documentation/accel/index.rst.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> Documentation/accel/index.rst | 1 +
> Documentation/accel/qda/index.rst | 13 ++++
> Documentation/accel/qda/qda.rst | 146 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 160 insertions(+)
>
> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> index cbc7d4c3876a..5901ea7f784c 100644
> --- a/Documentation/accel/index.rst
> +++ b/Documentation/accel/index.rst
> @@ -10,4 +10,5 @@ Compute Accelerators
> introduction
> amdxdna/index
> qaic/index
> + qda/index
> rocket/index
> diff --git a/Documentation/accel/qda/index.rst b/Documentation/accel/qda/index.rst
> new file mode 100644
> index 000000000000..013400cf9c25
> --- /dev/null
> +++ b/Documentation/accel/qda/index.rst
> @@ -0,0 +1,13 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +==================================
> +accel/qda Qualcomm DSP Accelerator
> +==================================
> +
> +The QDA driver provides a DRM accel based interface for Qualcomm DSP offload.
> +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure
> +for device and buffer management.
> +
> +.. toctree::
> +
> + qda
> diff --git a/Documentation/accel/qda/qda.rst b/Documentation/accel/qda/qda.rst
> new file mode 100644
> index 000000000000..9f49af6e6acc
> --- /dev/null
> +++ b/Documentation/accel/qda/qda.rst
> @@ -0,0 +1,146 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=====================================
> +Qualcomm DSP Accelerator (QDA) Driver
> +=====================================
> +
> +Introduction
> +============
> +
> +The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a
> +DRM accel based interface for Qualcomm DSP offload, supporting workloads
> +such as AI inference, computer vision, audio processing, and sensor offload
> +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and
> +GEM infrastructure for device and buffer management.
> +
> +Key Features
> +============
> +
> +* **DRM accel Interface**: Exposes a standard character device node
> + (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem.
> +* **FastRPC Protocol**: Implements the FastRPC protocol for communication
> + between the application processor and the DSP.
> +* **GEM Buffer Management**: Uses the DRM GEM interface for buffer
> + allocation, lifecycle management, and DMA-BUF import/export.
> +* **IOMMU Isolation**: Uses IOMMU context banks to enforce memory isolation
> + between different DSP user sessions.
> +* **Modular Design**: Clean separation between the core DRM logic, the
> + memory manager, and the RPMsg-based transport layer.
> +
> +Architecture
> +============
> +
> +The QDA driver consists of several functional blocks:
> +
> +1. **Core Driver (``qda_drv``)**: Manages device registration, file operations,
> + and DRM accel integration.
> +2. **Memory Manager (``qda_memory_manager``)**: A flexible memory management
> + layer that handles IOMMU context banks. It supports pluggable backends
> + (such as DMA-coherent) to adapt to different SoC memory architectures.
> +3. **GEM Subsystem**: Implements the DRM GEM interface for buffer management:
> +
> + * **``qda_gem``**: Core GEM object management, including allocation, mmap
> + operations, and buffer lifecycle management.
> + * **``qda_prime``**: PRIME import functionality for DMA-BUF interoperability
> + with other kernel subsystems.
> +
> +4. **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg framework
> + to handle low-level message passing with the DSP firmware.
> +5. **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to
> + enumerate and manage the specific compute context banks defined in the
> + device tree. The bus was introduced because IOMMU context banks (CBs) are
> + synthetic constructs — not real platform devices — making a platform driver
> + an incorrect abstraction for them. The earlier platform-driver approach also
> + had a race condition: device nodes were created before the RPMsg channel
> + resources were fully initialized, and because ``probe`` runs asynchronously,
> + applications could open a CB device and attempt to start a session before
> + the underlying transport was ready. The compute bus makes CB lifetime
> + explicitly subordinate to the parent QDA device, closing that window.
> +6. **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for
> + marshalling arguments and handling remote invocations.
> +
> +User-Space API
> +==============
> +
> +The driver exposes a set of DRM-compliant IOCTLs:
> +
> +* ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp")
> + and capabilities.
> +* ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process context
> + on the DSP.
> +* ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the
> + primary execution unit).
> +* ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP usage.
> +* ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory mapping.
> +* ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or unmap
> + buffers into the DSP's virtual address space. Each accepts a ``request``
> + field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` /
> + ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation
> + (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``).
Explain, what happens in the users don't map the buffers into the DSP
space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What
is the difference between those two modes?
Would the driver benefit from using GPUVM?
> +
> +Usage Example
> +=============
> +
> +A typical lifecycle for a user-space application:
> +
> +1. **Discovery**: Open ``/dev/accel/accel*`` and use
> + ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that
> + device node.
> +2. **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to
> + establish a session and create a process context on the DSP.
> +3. **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import
> + DMA-BUFs (PRIME fd) from other drivers using ``DRM_IOCTL_PRIME_FD_TO_HANDLE``.
> +4. **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments and
> + execute functions on the DSP.
> +5. **Cleanup**: Close file descriptors to automatically release resources and
> + detach the session.
I'd have expected the description of the actual example. I.e. clone the
app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo),
run make, run the app, check the results. I'd remind that DRM Accel has
a very specific requirement of having the working toolhain in the
open-source.
> +
> +Internal Implementation
> +=======================
> +
> +Memory Management
> +-----------------
> +The driver's memory manager creates virtual "IOMMU devices" that map to
> +hardware context banks. This allows the driver to manage multiple isolated
> +address spaces. The implementation uses a DMA-coherent backend to ensure data consistency
> +between the CPU and DSP without manual cache maintenance in most cases.
GEM usage?
> +
> +Debugging
> +=========
> +The driver includes extensive dynamic debug support. Enable it via the
> +kernel's dynamic debug control:
> +
> +.. code-block:: bash
> +
> + echo "file drivers/accel/qda/* +p" > /sys/kernel/debug/dynamic_debug/control
> +
> +Testing
> +=======
> +The QDA driver can be exercised using the ``fastrpc_test`` utility from the
> +FastRPC userspace library. Run the test application:
pointer
> +
> +.. code-block:: bash
> +
> + fastrpc_test -d 3 -U 1 -t linux -a v68
> +
> +**Options**
> +
> +``-d domain``
> + Select the DSP domain to run on:
> +
> + * ``0`` — ADSP
> + * ``1`` — MDSP
> + * ``2`` — SDSP
> + * ``3`` — CDSP *(default on targets with CDSP)*
> +
> +``-U unsigned_PD``
> + Select signed or unsigned protection domain:
> +
> + * ``0`` — signed PD
> + * ``1`` — unsigned PD *(default)*
> +
> +``-t target``
> + Target platform: ``android`` or ``linux`` *(default: linux)*
> +
> +``-a arch_version``
> + DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)*
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH v2 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings
From: Stanislav Kinsburskii @ 2026-05-20 14:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Stanislav Kinsburskii, kys, Liam.Howlett, david, jgg, corbet,
leon, ljs, mhocko, rppt, shuah, skhan, surenb, vbabka, linux-doc,
linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <20260518104808.ba773348f8564303c4330a33@linux-foundation.org>
On Mon, May 18, 2026 at 10:48:08AM -0700, Andrew Morton wrote:
> On Wed, 13 May 2026 02:40:11 +0000 Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> wrote:
>
> > This series extends the HMM framework to support userfaultfd-backed memory
> > by allowing the mmap read lock to be dropped during hmm_range_fault().
> >
> > Some page fault handlers — most notably userfaultfd — require the mmap lock
> > to be released so that userspace can resolve the fault. The current HMM
> > interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
> > in pages from userfaultfd-registered regions.
> >
> > This series follows the established int *locked pattern from
> > get_user_pages_remote() in mm/gup.c. A new entry point,
> > hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
> > mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
> > VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
> > the caller to restart its walk. The existing hmm_range_fault() is
> > refactored into a thin wrapper that passes NULL, preserving current
> > behavior for all existing callers.
> >
> > Faulting hugetlb pages on the unlockable path is not supported because
> > walk_hugetlb_range() unconditionally holds and releases
> > hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
> > inside the callback, the VMA may be freed before the walk framework's
> > unlock. Hugetlb pages already present in page tables are handled normally.
> > Possible approaches to lift this limitation are documented in
> > Documentation/mm/hmm.rst.
>
> Thanks. AI review asked some questions:
> https://sashiko.dev/#/patchset/177863991557.82528.15288076059759579141.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net
>
> I'd ignore the fist one: don't write buggy fault handlers!
>
>
Thank you for the review.
I addressed the issues found in the new self test in v3 of this series.
Please, take a look.
Thanks,
Stanislav
^ permalink raw reply
* [PATCH v3 3/3] selftests/mm: add userfaultfd test for HMM unlockable path
From: Stanislav Kinsburskii @ 2026-05-20 14:09 UTC (permalink / raw)
To: Liam.Howlett, akpm, akpm, david, jgg, corbet, leon, ljs, mhocko,
rppt, shuah, skhan, surenb, vbabka, skinsburskii
Cc: linux-doc, linux-kernel, linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <177928604779.589431.14703161356676674288.stgit@skinsburskii>
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Add a selftest that exercises hmm_range_fault_unlockable() with a
userfaultfd-backed mapping. The test:
1. Creates an anonymous mmap region
2. Registers it with userfaultfd (UFFDIO_REGISTER_MODE_MISSING)
3. Spawns a handler thread that responds to page faults by filling
pages with a known pattern (0xAB) via UFFDIO_COPY
4. Issues HMM_DMIRROR_READ_UNLOCKABLE to the test_hmm driver, which
calls hmm_range_fault_unlockable() internally
5. Verifies the device read back the data provided by the userfaultfd
handler
This requires changes to the test_hmm kernel module:
- New dmirror_range_fault_unlockable() that uses the new HMM API
- New dmirror_fault_unlockable() and dmirror_read_unlockable() wrappers
- New HMM_DMIRROR_READ_UNLOCKABLE ioctl (0x09)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
lib/test_hmm.c | 122 ++++++++++++++++++++++++++
lib/test_hmm_uapi.h | 1
tools/testing/selftests/mm/hmm-tests.c | 149 ++++++++++++++++++++++++++++++++
3 files changed, 272 insertions(+)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 38996c4baa40..7cc7320c9494 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -389,6 +389,84 @@ static int dmirror_range_fault(struct dmirror *dmirror,
return ret;
}
+static int dmirror_range_fault_unlockable(struct dmirror *dmirror,
+ struct hmm_range *range)
+{
+ struct mm_struct *mm = dmirror->notifier.mm;
+ unsigned long timeout =
+ jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+ int locked;
+ int ret;
+
+ while (true) {
+ if (time_after(jiffies, timeout)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ range->notifier_seq = mmu_interval_read_begin(range->notifier);
+ locked = 1;
+ mmap_read_lock(mm);
+ ret = hmm_range_fault_unlockable(range, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ if (ret) {
+ if (ret == -EBUSY)
+ continue;
+ goto out;
+ }
+ if (!locked)
+ continue;
+
+ mutex_lock(&dmirror->mutex);
+ if (mmu_interval_read_retry(range->notifier,
+ range->notifier_seq)) {
+ mutex_unlock(&dmirror->mutex);
+ continue;
+ }
+ break;
+ }
+
+ ret = dmirror_do_fault(dmirror, range);
+
+ mutex_unlock(&dmirror->mutex);
+out:
+ return ret;
+}
+
+static int dmirror_fault_unlockable(struct dmirror *dmirror,
+ unsigned long start,
+ unsigned long end, bool write)
+{
+ struct mm_struct *mm = dmirror->notifier.mm;
+ unsigned long addr;
+ unsigned long pfns[32];
+ struct hmm_range range = {
+ .notifier = &dmirror->notifier,
+ .hmm_pfns = pfns,
+ .pfn_flags_mask = 0,
+ .default_flags =
+ HMM_PFN_REQ_FAULT | (write ? HMM_PFN_REQ_WRITE : 0),
+ .dev_private_owner = dmirror->mdevice,
+ };
+ int ret = 0;
+
+ if (!mmget_not_zero(mm))
+ return -EFAULT;
+
+ for (addr = start; addr < end; addr = range.end) {
+ range.start = addr;
+ range.end = min(addr + (ARRAY_SIZE(pfns) << PAGE_SHIFT), end);
+
+ ret = dmirror_range_fault_unlockable(dmirror, &range);
+ if (ret)
+ break;
+ }
+
+ mmput(mm);
+ return ret;
+}
+
static int dmirror_fault(struct dmirror *dmirror, unsigned long start,
unsigned long end, bool write)
{
@@ -488,6 +566,47 @@ static int dmirror_read(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd)
return ret;
}
+static int dmirror_read_unlockable(struct dmirror *dmirror,
+ struct hmm_dmirror_cmd *cmd)
+{
+ struct dmirror_bounce bounce;
+ unsigned long start, end;
+ unsigned long size = cmd->npages << PAGE_SHIFT;
+ int ret;
+
+ start = cmd->addr;
+ end = start + size;
+ if (end < start)
+ return -EINVAL;
+
+ ret = dmirror_bounce_init(&bounce, start, size);
+ if (ret)
+ return ret;
+
+ while (1) {
+ mutex_lock(&dmirror->mutex);
+ ret = dmirror_do_read(dmirror, start, end, &bounce);
+ mutex_unlock(&dmirror->mutex);
+ if (ret != -ENOENT)
+ break;
+
+ start = cmd->addr + (bounce.cpages << PAGE_SHIFT);
+ ret = dmirror_fault_unlockable(dmirror, start, end, false);
+ if (ret)
+ break;
+ cmd->faults++;
+ }
+
+ if (ret == 0) {
+ if (copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr,
+ bounce.size))
+ ret = -EFAULT;
+ }
+ cmd->cpages = bounce.cpages;
+ dmirror_bounce_fini(&bounce);
+ return ret;
+}
+
static int dmirror_do_write(struct dmirror *dmirror, unsigned long start,
unsigned long end, struct dmirror_bounce *bounce)
{
@@ -1549,6 +1668,9 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
dmirror->flags = cmd.npages;
ret = 0;
break;
+ case HMM_DMIRROR_READ_UNLOCKABLE:
+ ret = dmirror_read_unlockable(dmirror, &cmd);
+ break;
default:
return -EINVAL;
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index f94c6d457338..076df6df9227 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -38,6 +38,7 @@ struct hmm_dmirror_cmd {
#define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cmd)
#define HMM_DMIRROR_RELEASE _IOWR('H', 0x07, struct hmm_dmirror_cmd)
#define HMM_DMIRROR_FLAGS _IOWR('H', 0x08, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_READ_UNLOCKABLE _IOWR('H', 0x09, struct hmm_dmirror_cmd)
#define HMM_DMIRROR_FLAG_FAIL_ALLOC (1ULL << 0)
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index e1c8a679a4cf..cc5525be74b0 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -27,6 +27,10 @@
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <sys/time.h>
+#include <sys/syscall.h>
+#include <sys/eventfd.h>
+#include <linux/userfaultfd.h>
+#include <poll.h>
/*
* This is a private UAPI to the kernel test module so it isn't exported
@@ -2853,4 +2857,149 @@ TEST_F_TIMEOUT(hmm, benchmark_thp_migration, 120)
&thp_results, ®ular_results);
}
}
+/*
+ * Test that HMM can fault in pages backed by userfaultfd using the
+ * hmm_range_fault_unlockable() path. This exercises the lock-drop retry
+ * logic in the HMM framework.
+ */
+struct uffd_thread_args {
+ int uffd;
+ int stop_fd;
+ void *page_buffer;
+ unsigned long page_size;
+};
+
+static void *uffd_handler_thread(void *arg)
+{
+ struct uffd_thread_args *args = arg;
+ struct uffd_msg msg;
+ struct uffdio_copy copy;
+ struct pollfd pollfd[2];
+ int ret;
+
+ pollfd[0].fd = args->uffd;
+ pollfd[0].events = POLLIN;
+ pollfd[1].fd = args->stop_fd;
+ pollfd[1].events = POLLIN;
+
+ while (1) {
+ ret = poll(pollfd, 2, -1);
+ if (ret <= 0)
+ break;
+ if (pollfd[1].revents)
+ break;
+ if (!(pollfd[0].revents & POLLIN))
+ break;
+
+ ret = read(args->uffd, &msg, sizeof(msg));
+ if (ret != sizeof(msg))
+ break;
+
+ if (msg.event != UFFD_EVENT_PAGEFAULT)
+ break;
+
+ /* Fill the page with a known pattern */
+ memset(args->page_buffer, 0xAB, args->page_size);
+
+ copy.dst = msg.arg.pagefault.address & ~(args->page_size - 1);
+ copy.src = (unsigned long)args->page_buffer;
+ copy.len = args->page_size;
+ copy.mode = 0;
+ copy.copy = 0;
+
+ ret = ioctl(args->uffd, UFFDIO_COPY, ©);
+ if (ret < 0)
+ break;
+ }
+
+ return NULL;
+}
+
+TEST_F(hmm, userfaultfd_read)
+{
+ struct hmm_buffer *buffer;
+ struct uffd_thread_args uffd_args;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ unsigned char *ptr;
+ pthread_t thread;
+ int uffd;
+ int stop_fd;
+ int ret;
+ struct uffdio_api api;
+ struct uffdio_register reg;
+ uint64_t stop = 1;
+ ssize_t nwrite;
+
+ npages = 4;
+ size = npages << self->page_shift;
+
+ /* Create userfaultfd */
+ uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+ if (uffd < 0)
+ SKIP(return, "userfaultfd not available");
+
+ api.api = UFFD_API;
+ api.features = 0;
+ ret = ioctl(uffd, UFFDIO_API, &api);
+ ASSERT_EQ(ret, 0);
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+
+ /* Create anonymous mapping */
+ buffer->ptr = mmap(NULL, size,
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS,
+ -1, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ /* Register the region with userfaultfd */
+ reg.range.start = (unsigned long)buffer->ptr;
+ reg.range.len = size;
+ reg.mode = UFFDIO_REGISTER_MODE_MISSING;
+ ret = ioctl(uffd, UFFDIO_REGISTER, ®);
+ ASSERT_EQ(ret, 0);
+
+ /* Set up the handler thread */
+ uffd_args.uffd = uffd;
+ stop_fd = eventfd(0, EFD_CLOEXEC);
+ ASSERT_GE(stop_fd, 0);
+ uffd_args.stop_fd = stop_fd;
+ uffd_args.page_buffer = malloc(self->page_size);
+ ASSERT_NE(uffd_args.page_buffer, NULL);
+ uffd_args.page_size = self->page_size;
+
+ ret = pthread_create(&thread, NULL, uffd_handler_thread, &uffd_args);
+ ASSERT_EQ(ret, 0);
+
+ /*
+ * Use the unlockable read path which allows the mmap lock to be
+ * dropped during the fault, enabling userfaultfd resolution.
+ */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_READ_UNLOCKABLE,
+ buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Verify the device read the data filled by the uffd handler */
+ ptr = buffer->mirror;
+ for (i = 0; i < size; ++i)
+ ASSERT_EQ(ptr[i], (unsigned char)0xAB);
+
+ nwrite = write(stop_fd, &stop, sizeof(stop));
+ ASSERT_EQ(nwrite, sizeof(stop));
+ pthread_join(thread, NULL);
+ close(stop_fd);
+ free(uffd_args.page_buffer);
+ close(uffd);
+ hmm_buffer_free(buffer);
+}
+
TEST_HARNESS_MAIN
^ permalink raw reply related
* [PATCH v3 2/3] mm/hmm: add hmm_range_fault_unlockable() for mmap lock-drop support
From: Stanislav Kinsburskii @ 2026-05-20 14:09 UTC (permalink / raw)
To: Liam.Howlett, akpm, akpm, david, jgg, corbet, leon, ljs, mhocko,
rppt, shuah, skhan, surenb, vbabka, skinsburskii
Cc: linux-doc, linux-kernel, linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <177928604779.589431.14703161356676674288.stgit@skinsburskii>
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
hmm_range_fault() holds the mmap read lock for the duration of the call.
This is incompatible with mappings whose fault handler may release the mmap
lock - notably userfaultfd-managed regions, where handle_mm_fault() returns
VM_FAULT_RETRY or VM_FAULT_COMPLETED after dropping the lock. Drivers that
need to populate device page tables for such mappings have no way to do so
today.
Add hmm_range_fault_unlockable(), modelled on the int *locked pattern from
get_user_pages_remote() in mm/gup.c. Callers set *locked = 1 and pass
&locked; the function may set *locked = 0 to report that handle_mm_fault()
dropped the mmap lock during a page fault, in which case the caller must
reacquire it and restart the walk with a fresh mmu_interval_read_begin()
sequence.
The implementation is local to hmm_do_fault() and the outer loop in
hmm_range_fault_unlockable(). hmm_do_fault() conditionally sets
FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE when locked is non-NULL and
translates VM_FAULT_RETRY / VM_FAULT_COMPLETED into *locked = 0 plus a
private return code consumed by the outer loop, which in turn returns 0 (or
-EINTR on fatal signal) to the caller.
The previous refactor that moved page fault handling out of the page-table
walk callbacks is what makes this change small. Faults now run after
walk_page_range() has unwound, with only the mmap lock held, so dropping it
does not interact with the walker's pte spinlock or hugetlb_vma_lock.
Hugetlb regions therefore participate in the unlockable path uniformly with
PTE- and PMD-level mappings; no special case is required.
hmm_range_fault() becomes a thin wrapper, preserving exact behaviour for
all existing callers. No EXPORT_SYMBOL behaviour change for
hmm_range_fault.
Documentation/mm/hmm.rst is updated with a description of the new API and
the recommended caller pattern.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
Documentation/mm/hmm.rst | 62 +++++++++++++++++++++++++++++++++++++
include/linux/hmm.h | 1 +
mm/hmm.c | 77 +++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 135 insertions(+), 5 deletions(-)
diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst
index 7d61b7a8b65b..a9309023ec23 100644
--- a/Documentation/mm/hmm.rst
+++ b/Documentation/mm/hmm.rst
@@ -208,6 +208,68 @@ invalidate() callback. That lock must be held before calling
mmu_interval_read_retry() to avoid any race with a concurrent CPU page table
update.
+Dropping the mmap lock during page faults
+=========================================
+
+Some VMAs have fault handlers that need to release the mmap lock while
+servicing a fault (for example, regions managed by ``userfaultfd``).
+``hmm_range_fault()`` cannot be used on such mappings because it must hold the
+mmap lock for the duration of the call. Drivers that need to support them
+should call::
+
+ int hmm_range_fault_unlockable(struct hmm_range *range, int *locked);
+
+The caller sets ``*locked = 1`` and holds ``mmap_read_lock`` before the call.
+If the mmap lock is dropped inside ``handle_mm_fault()``, the function sets
+``*locked = 0`` and returns ``0``; the caller is responsible for reacquiring
+the lock and restarting the walk from ``range->start`` with a fresh notifier
+sequence. When ``locked`` is ``NULL`` the function keeps the lock held for the
+duration of the call, identical to ``hmm_range_fault()``.
+
+A typical caller looks like this::
+
+ int driver_populate_range_unlockable(...)
+ {
+ struct hmm_range range;
+ int locked;
+ ...
+
+ range.notifier = &interval_sub;
+ range.start = ...;
+ range.end = ...;
+ range.hmm_pfns = ...;
+
+ if (!mmget_not_zero(interval_sub.mm))
+ return -EFAULT;
+
+ again:
+ range.notifier_seq = mmu_interval_read_begin(&interval_sub);
+ locked = 1;
+ mmap_read_lock(mm);
+ ret = hmm_range_fault_unlockable(&range, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ if (ret) {
+ if (ret == -EBUSY)
+ goto again;
+ return ret;
+ }
+ if (!locked)
+ goto again;
+
+ take_lock(driver->update);
+ if (mmu_interval_read_retry(&interval_sub, range.notifier_seq)) {
+ release_lock(driver->update);
+ goto again;
+ }
+
+ /* Use pfns array content to update device page table,
+ * under the update lock */
+
+ release_lock(driver->update);
+ return 0;
+ }
+
Leverage default_flags and pfn_flags_mask
=========================================
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index db75ffc949a7..46e581865c48 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -123,6 +123,7 @@ struct hmm_range {
* Please see Documentation/mm/hmm.rst for how to use the range API.
*/
int hmm_range_fault(struct hmm_range *range);
+int hmm_range_fault_unlockable(struct hmm_range *range, int *locked);
/*
* HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range
diff --git a/mm/hmm.c b/mm/hmm.c
index 446dd0c39b3a..b9ced5003e16 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -32,6 +32,7 @@
struct hmm_vma_walk {
struct hmm_range *range;
+ int *locked;
unsigned long last;
unsigned long end;
unsigned int required_fault;
@@ -44,6 +45,13 @@ struct hmm_vma_walk {
*/
#define HMM_FAULT_PENDING -EAGAIN
+/*
+ * Internal sentinel returned by hmm_do_fault() when handle_mm_fault() drops
+ * the mmap lock during a page fault. hmm_do_fault() sets *locked = 0; the
+ * outer loop consumes the sentinel and never propagates it to the caller.
+ */
+#define HMM_FAULT_UNLOCKED -ENOLCK
+
enum {
HMM_NEED_FAULT = 1 << 0,
HMM_NEED_WRITE_FAULT = 1 << 1,
@@ -639,6 +647,7 @@ static int hmm_do_fault(struct mm_struct *mm,
unsigned long end = hmm_vma_walk->end;
unsigned int required_fault = hmm_vma_walk->required_fault;
unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ int *locked = hmm_vma_walk->locked;
struct vm_area_struct *vma;
vma = vma_lookup(mm, addr);
@@ -651,10 +660,20 @@ static int hmm_do_fault(struct mm_struct *mm,
fault_flags |= FAULT_FLAG_WRITE;
}
- for (; addr < end; addr += PAGE_SIZE)
- if (handle_mm_fault(vma, addr, fault_flags, NULL) &
- VM_FAULT_ERROR)
+ if (locked)
+ fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+
+ for (; addr < end; addr += PAGE_SIZE) {
+ vm_fault_t ret;
+
+ ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+ if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
+ *locked = 0;
+ return HMM_FAULT_UNLOCKED;
+ }
+ if (ret & VM_FAULT_ERROR)
return -EFAULT;
+ }
return -EBUSY;
}
@@ -677,11 +696,53 @@ static int hmm_do_fault(struct mm_struct *mm,
*
* This is similar to get_user_pages(), except that it can read the page tables
* without mutating them (ie causing faults).
+ *
+ * The mmap lock must be held by the caller and will remain held on return.
+ * For a variant that allows the mmap lock to be dropped during faults (e.g.,
+ * for userfaultfd support), see hmm_range_fault_unlockable().
*/
int hmm_range_fault(struct hmm_range *range)
+{
+ return hmm_range_fault_unlockable(range, NULL);
+}
+EXPORT_SYMBOL(hmm_range_fault);
+
+/**
+ * hmm_range_fault_unlockable - fault in a range, possibly dropping the mmap lock
+ * @range: argument structure
+ * @locked: pointer to caller's lock state, or %NULL
+ *
+ * Behaves like hmm_range_fault(), but allows handle_mm_fault() to drop the
+ * mmap read lock during a fault. This makes the function usable on mappings
+ * whose fault path may release the lock (for example, userfaultfd-managed
+ * regions).
+ *
+ * If @locked is %NULL the mmap lock is never released and the function
+ * behaves exactly like hmm_range_fault().
+ *
+ * If @locked is non-%NULL the caller must hold mmap_read_lock and set
+ * *@locked = 1 before the call. On return:
+ *
+ * *@locked == 1: the mmap lock is still held. The return value has the
+ * same meaning as hmm_range_fault() (0 on success, or one
+ * of the error codes documented there).
+ *
+ * *@locked == 0: the mmap lock was dropped during a page fault. No PFNs
+ * collected so far are guaranteed to be valid because the
+ * address space may have changed under us. The return
+ * value is either 0 (caller must reacquire the lock and
+ * restart with a fresh mmu_interval_read_begin()) or
+ * -EINTR (a fatal signal is pending; abort).
+ *
+ * The caller is responsible for reacquiring mmap_read_lock and restarting
+ * the operation from range->start. See Documentation/mm/hmm.rst for the
+ * full usage pattern.
+ */
+int hmm_range_fault_unlockable(struct hmm_range *range, int *locked)
{
struct hmm_vma_walk hmm_vma_walk = {
.range = range,
+ .locked = locked,
.last = range->start,
};
struct mm_struct *mm = range->notifier->mm;
@@ -704,8 +765,14 @@ int hmm_range_fault(struct hmm_range *range)
* returns -EBUSY so the loop re-walks and picks up the
* now-present entries.
*/
- if (ret == HMM_FAULT_PENDING)
+ if (ret == HMM_FAULT_PENDING) {
ret = hmm_do_fault(mm, &hmm_vma_walk);
+ if (ret == HMM_FAULT_UNLOCKED) {
+ if (fatal_signal_pending(current))
+ return -EINTR;
+ return 0; /* caller must restart */
+ }
+ }
/*
* When -EBUSY is returned the loop restarts with
* hmm_vma_walk.last set to an address that has not been stored
@@ -715,7 +782,7 @@ int hmm_range_fault(struct hmm_range *range)
} while (ret == -EBUSY);
return ret;
}
-EXPORT_SYMBOL(hmm_range_fault);
+EXPORT_SYMBOL(hmm_range_fault_unlockable);
/**
* hmm_dma_map_alloc - Allocate HMM map structure
^ permalink raw reply related
* [PATCH v3 1/3] mm/hmm: move page fault handling out of walk callbacks
From: Stanislav Kinsburskii @ 2026-05-20 14:09 UTC (permalink / raw)
To: Liam.Howlett, akpm, akpm, david, jgg, corbet, leon, ljs, mhocko,
rppt, shuah, skhan, surenb, vbabka, skinsburskii
Cc: linux-doc, linux-kernel, linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <177928604779.589431.14703161356676674288.stgit@skinsburskii>
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
hmm_range_fault() currently triggers page faults from inside the page-table
walk callbacks: hmm_vma_walk_pmd(), hmm_vma_walk_pud(),
hmm_vma_walk_hugetlb_entry() and the pte-level helper all call
hmm_vma_fault(), which in turn calls handle_mm_fault() while the walker
still holds nested locks. The pte spinlock is dropped explicitly by each
caller, and the hugetlb path manually drops and retakes
hugetlb_vma_lock_read around the fault to dodge a deadlock against the walk
framework's unconditional unlock.
This layering does not extend cleanly to fault handlers that may release
mmap_lock (VM_FAULT_RETRY, VM_FAULT_COMPLETED). If the lock is dropped
while walk_page_range() is mid-traversal, the VMA can be freed before the
walk framework's matching hugetlb_vma_unlock_read(), turning that unlock
into a use-after-free.
Split the responsibilities the way get_user_pages() does. Walk callbacks
become inspect-only: when they detect a range that needs to be faulted in,
they record it in struct hmm_vma_walk and return a private sentinel
(HMM_FAULT_PENDING). The outer loop in hmm_range_fault() then drops out of
walk_page_range(), invokes a new helper hmm_do_fault() that calls
handle_mm_fault() with only mmap_lock held, and restarts the walk so the
now-present entries are collected into hmm_pfns.
No functional change for existing callers. As a side effect the hugetlb
callback no longer needs the hugetlb_vma_{un}lock_read dance, and every
fault-path exit from the callbacks now releases the pte spinlock on a
single, common path. This refactor is also a precursor for adding an
unlockable variant of hmm_range_fault() in a follow-up patch.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
mm/hmm.c | 118 +++++++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 75 insertions(+), 43 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index c72c9ddfdb2f..446dd0c39b3a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -33,8 +33,17 @@
struct hmm_vma_walk {
struct hmm_range *range;
unsigned long last;
+ unsigned long end;
+ unsigned int required_fault;
};
+/*
+ * Internal sentinel returned by walk callbacks when they need a page fault.
+ * The callback stores end/required_fault in hmm_vma_walk; the outer loop
+ * consumes the sentinel and never propagates it to the caller.
+ */
+#define HMM_FAULT_PENDING -EAGAIN
+
enum {
HMM_NEED_FAULT = 1 << 0,
HMM_NEED_WRITE_FAULT = 1 << 1,
@@ -60,37 +69,25 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
}
/*
- * hmm_vma_fault() - fault in a range lacking valid pmd or pte(s)
- * @addr: range virtual start address (inclusive)
- * @end: range virtual end address (exclusive)
- * @required_fault: HMM_NEED_* flags
- * @walk: mm_walk structure
- * Return: -EBUSY after page fault, or page fault error
+ * hmm_record_fault() - record a range that needs to be faulted in
*
- * This function will be called whenever pmd_none() or pte_none() returns true,
- * or whenever there is no page directory covering the virtual address range.
+ * Called by the walk callbacks when they discover that part of the range
+ * needs a page fault. The callback records what to fault and returns
+ * HMM_FAULT_PENDING; the outer loop in hmm_range_fault() drops back out of
+ * walk_page_range() and invokes handle_mm_fault() from a context where no
+ * page-table or hugetlb_vma_lock is held.
*/
-static int hmm_vma_fault(unsigned long addr, unsigned long end,
- unsigned int required_fault, struct mm_walk *walk)
+static int hmm_record_fault(unsigned long addr, unsigned long end,
+ unsigned int required_fault,
+ struct mm_walk *walk)
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
- struct vm_area_struct *vma = walk->vma;
- unsigned int fault_flags = FAULT_FLAG_REMOTE;
WARN_ON_ONCE(!required_fault);
hmm_vma_walk->last = addr;
-
- if (required_fault & HMM_NEED_WRITE_FAULT) {
- if (!(vma->vm_flags & VM_WRITE))
- return -EPERM;
- fault_flags |= FAULT_FLAG_WRITE;
- }
-
- for (; addr < end; addr += PAGE_SIZE)
- if (handle_mm_fault(vma, addr, fault_flags, NULL) &
- VM_FAULT_ERROR)
- return -EFAULT;
- return -EBUSY;
+ hmm_vma_walk->end = end;
+ hmm_vma_walk->required_fault = required_fault;
+ return HMM_FAULT_PENDING;
}
static unsigned int hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
@@ -174,7 +171,7 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
return hmm_pfns_fill(addr, end, range, HMM_PFN_ERROR);
}
if (required_fault)
- return hmm_vma_fault(addr, end, required_fault, walk);
+ return hmm_record_fault(addr, end, required_fault, walk);
return hmm_pfns_fill(addr, end, range, 0);
}
@@ -209,7 +206,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
required_fault =
hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, cpu_flags);
if (required_fault)
- return hmm_vma_fault(addr, end, required_fault, walk);
+ return hmm_record_fault(addr, end, required_fault, walk);
pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
@@ -328,7 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
fault:
pte_unmap(ptep);
/* Fault any virtual address we were asked to fault */
- return hmm_vma_fault(addr, end, required_fault, walk);
+ return hmm_record_fault(addr, end, required_fault, walk);
}
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
@@ -371,7 +368,7 @@ static int hmm_vma_handle_absent_pmd(struct mm_walk *walk, unsigned long start,
npages, 0);
if (required_fault) {
if (softleaf_is_device_private(entry))
- return hmm_vma_fault(addr, end, required_fault, walk);
+ return hmm_record_fault(addr, end, required_fault, walk);
else
return -EFAULT;
}
@@ -517,7 +514,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
npages, cpu_flags);
if (required_fault) {
spin_unlock(ptl);
- return hmm_vma_fault(addr, end, required_fault, walk);
+ return hmm_record_fault(addr, end, required_fault, walk);
}
pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
@@ -564,21 +561,8 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
required_fault =
hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
if (required_fault) {
- int ret;
-
spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
- /*
- * Avoid deadlock: drop the vma lock before calling
- * hmm_vma_fault(), which will itself potentially take and
- * drop the vma lock. This is also correct from a
- * protection point of view, because there is no further
- * use here of either pte or ptl after dropping the vma
- * lock.
- */
- ret = hmm_vma_fault(addr, end, required_fault, walk);
- hugetlb_vma_lock_read(vma);
- return ret;
+ return hmm_record_fault(addr, end, required_fault, walk);
}
pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT);
@@ -637,6 +621,44 @@ static const struct mm_walk_ops hmm_walk_ops = {
.walk_lock = PGWALK_RDLOCK,
};
+/*
+ * hmm_do_fault - fault in a range recorded by a walk callback
+ *
+ * Called from the outer loop in hmm_range_fault() after a callback
+ * returned HMM_FAULT_PENDING. At this point we hold only mmap_lock;
+ * the page-table spinlock and any hugetlb_vma_lock acquired by the walk
+ * framework have already been released by the unwind.
+ *
+ * Returns -EBUSY on success (all pages faulted, caller should re-walk).
+ * Returns a negative errno on failure.
+ */
+static int hmm_do_fault(struct mm_struct *mm,
+ struct hmm_vma_walk *hmm_vma_walk)
+{
+ unsigned long addr = hmm_vma_walk->last;
+ unsigned long end = hmm_vma_walk->end;
+ unsigned int required_fault = hmm_vma_walk->required_fault;
+ unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ struct vm_area_struct *vma;
+
+ vma = vma_lookup(mm, addr);
+ if (!vma)
+ return -EFAULT;
+
+ if (required_fault & HMM_NEED_WRITE_FAULT) {
+ if (!(vma->vm_flags & VM_WRITE))
+ return -EPERM;
+ fault_flags |= FAULT_FLAG_WRITE;
+ }
+
+ for (; addr < end; addr += PAGE_SIZE)
+ if (handle_mm_fault(vma, addr, fault_flags, NULL) &
+ VM_FAULT_ERROR)
+ return -EFAULT;
+
+ return -EBUSY;
+}
+
/**
* hmm_range_fault - try to fault some address in a virtual address range
* @range: argument structure
@@ -674,6 +696,16 @@ int hmm_range_fault(struct hmm_range *range)
return -EBUSY;
ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
&hmm_walk_ops, &hmm_vma_walk);
+ /*
+ * When HMM_FAULT_PENDING is returned a walk callback
+ * recorded a range that needs handle_mm_fault();
+ * hmm_do_fault() runs the fault outside walk_page_range()
+ * (so no page-table or hugetlb_vma_lock is held) and
+ * returns -EBUSY so the loop re-walks and picks up the
+ * now-present entries.
+ */
+ if (ret == HMM_FAULT_PENDING)
+ ret = hmm_do_fault(mm, &hmm_vma_walk);
/*
* When -EBUSY is returned the loop restarts with
* hmm_vma_walk.last set to an address that has not been stored
^ permalink raw reply related
* [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings
From: Stanislav Kinsburskii @ 2026-05-20 14:09 UTC (permalink / raw)
To: Liam.Howlett, akpm, akpm, david, jgg, corbet, leon, ljs, mhocko,
rppt, shuah, skhan, surenb, vbabka, skinsburskii
Cc: linux-doc, linux-kernel, linux-kernel, linux-kselftest, linux-mm
This series extends the HMM framework to support userfaultfd-backed memory
by allowing the mmap read lock to be dropped during hmm_range_fault().
Some page fault handlers — most notably userfaultfd — require the mmap lock
to be released so that userspace can resolve the fault. The current HMM
interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
in pages from userfaultfd-registered regions.
This series follows the established int *locked pattern from
get_user_pages_remote() in mm/gup.c. A new entry point,
hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
the caller to restart its walk. The existing hmm_range_fault() is
refactored into a thin wrapper that passes NULL, preserving current
behavior for all existing callers.
Faulting hugetlb pages on the unlockable path is not supported because
walk_hugetlb_range() unconditionally holds and releases
hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
inside the callback, the VMA may be freed before the walk framework's
unlock. Hugetlb pages already present in page tables are handled normally.
Possible approaches to lift this limitation are documented in
Documentation/mm/hmm.rst.
Changes in v3:
- Return -EFAULT from dmirror_fault_unlockable() when the mirrored mm can no longer be pinned.
- Add an eventfd stop signal for the userfaultfd handler thread to avoid waiting for the poll timeout on successful test completion.
Changes in v2:
- Split into a preparatory refactor (new patch 1) that moves
handle_mm_fault() out of the walk callbacks, plus a smaller feature
patch on top. Suggested by David Hildenbrand.
- Hugetlb regions are now supported on the unlockable path; the v1
-EFAULT short-circuit and the hugetlb_vma_lock_read drop/retake
dance are gone.
- Distinct internal sentinels for "needs fault" (HMM_FAULT_PENDING)
and "lock dropped" (HMM_FAULT_UNLOCKED).
- Outer loop now re-walks after a successful internal fault so the
faulted pfns end up in range->hmm_pfns.
- Kernel-doc on hmm_range_fault_unlockable() and the
Documentation/mm/hmm.rst example match the implementation.
- Dropped the mshv driver conversion (v1 patch 2); will post
separately.
- Selftest converted to drive the path through test_hmm with a
userfaultfd handler (new HMM_DMIRROR_READ_UNLOCKABLE ioctl).
---
Stanislav Kinsburskii (3):
mm/hmm: move page fault handling out of walk callbacks
mm/hmm: add hmm_range_fault_unlockable() for mmap lock-drop support
selftests/mm: add userfaultfd test for HMM unlockable path
Documentation/mm/hmm.rst | 62 +++++++++++
include/linux/hmm.h | 1
lib/test_hmm.c | 122 +++++++++++++++++++++
lib/test_hmm_uapi.h | 1
mm/hmm.c | 187 ++++++++++++++++++++++++--------
tools/testing/selftests/mm/hmm-tests.c | 149 +++++++++++++++++++++++++
6 files changed, 478 insertions(+), 44 deletions(-)
^ permalink raw reply
* Re: [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
From: David Woodhouse @ 2026-05-20 14:08 UTC (permalink / raw)
To: Dongli Zhang, kvm
Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
Sean Christopherson, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
Dave Hansen, Vitaly Kuznetsov, x86, Marc Zyngier, Juergen Gross,
Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
Jack Allister, Joey Gouly, joe.jin, linux-doc, linux-kernel,
xen-devel, linux-kselftest
In-Reply-To: <08a64760-a431-4d0a-9480-562f8f38c908@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 844 bytes --]
On Tue, 2026-05-19 at 16:34 -0700, Dongli Zhang wrote:
>
> I would really appreciate it if this document could be revived. I don't see it
> in your most recent v4 PATCH 7. It is very helpful as a guideline for how
> userspace VMMs should take advantage of these APIs.
In the kvmclock5 branch I'm revising for the next round.
https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=ae872d9b75
Now has a selftest which serves as an example of the process that was
documented. And an extra fix right before it in the tree, to update the
master clock on vCPU creation because otherwise otherwise KVM_GET_CLOCK
wasn't returning the host_tsc.
I think I need to change that to only do so if ka->use_master_clock
isn't *already* set though, or we risk introducing discrepancies on
hotplug again? Will tweak that...
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH 00/12] misc/syncobj: add /dev/syncobj device
From: Christian König @ 2026-05-20 14:06 UTC (permalink / raw)
To: Xaver Hugl
Cc: Julian Orth, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Sumit Semwal, Jonathan Corbet,
Shuah Khan, Arnd Bergmann, Greg Kroah-Hartman, dri-devel,
linux-kernel, linux-media, linaro-mm-sig, linux-doc,
wayland-devel, Michel Dänzer
In-Reply-To: <CAFZQkGxpPm081Fz8UtDuBA1PKD42+9YDA+cc6fbSpfawXwu9+g@mail.gmail.com>
On 5/20/26 14:33, Xaver Hugl wrote:
> Am Mi., 20. Mai 2026 um 10:08 Uhr schrieb Christian König
> <christian.koenig@amd.com>:
>> Well I would say the other way around is a pretty common use case.
>>
>> In other words the compositors uses the internal GPU for composing and displaying the picture. And the client uses the external GPU for fast rendering.
> Sure, but that's not what I'm talking about.
Yeah sorry for that, I wasn't sure if I misunderstood your use case because it's usually the other way around.
>>> - the buffers from the client stay valid
>>
>> Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings either result in a SIGBUS or are redirected to a dummy page.
> Again, not what I wrote about. The buffers are on the integrated GPU.
General rule of thumb is that as long as the exporter stays around the buffers stay around as well.
>>> - the syncobj stays valid on the client side
>>> - the syncobj becomes invalid on the compositor side
>>
>> Nope that's not correct. The syncobj itself stays valid even if you completely hot plug the device.
>>
>> It can just be that the fences inside the syncobj are terminated with an error.
> What about eventfd created for a point on the syncobj?
The eventfd unfortunately doesn't has error handling as far as I know, so when a fence signals with an error condition then the eventfd you only sees that it is signaled.
> Another (future) problem with hotplugs will be if the sync file hasn't
> materialized for the timeline point when the device is hotunplugged,
> since there can't be an error on the fence if there isn't one. Or
> could userspace somehow set an 'artificial' fence with an error in
> that case?
In general the answer is yes, userspace needs to take care of inserting fences when wait before signal is used and the work can not be submitted to the HW for some reason.
Currently we only have an IOCTL to insert the signaled dummy fence at some timeline sequence, but it should be trivial as well to insert a signaled fence with an error code.
But the compositor needs to be able to handle that case anyway, because it can be that a malicious or just buggy client just never inserts the fence.
So that a device is hot plugged is not different to just a client not inserting the fence in the first place.
>>> "invalid" there means either
>>> - the acquire point of the client is marked as signaled, before
>>> rendering on the client side is completed
>>> - the acquire point of the client is never signaled. Since the
>>> compositor waits for the acquire point, the Wayland surface is stuck
>>> forever
>>
>> Both of those would be a *massive* violation of documented kernel rules for hot-plugging which could lead to random data corruption and/or deadlocks.
>>
>> If you see any HW driver showing behavior like that please open up a bug report and ping the relevant maintainers immediately.
> If there are no error codes with syncobj yet, then to userspace, the
> latter behavior is exactly what we get, isn't it?
No, from userspace side you just see a signaled fence. It's just that you need to export the timeline point of the syncobj to a syncfile and then you can call the QUERY IOCTL on the syncfile to see the error code.
>> When a hotplug happens all operations of the device should return an -ENODEV error, even when exposed to other devices/application through syncobj or syncfile.
> Okay, that at least gives us a way to fail imports somewhat
> gracefully. Normally, failing to import a syncobj is a fatal error in
> the Wayland protocol.
So the task at hand would be to avoid importing the syncobj into a driver. That should be relatively trivial.
The only real problem I see is if you want to create a syncobj without having any device whatsoever.
>> One problem is that only syncfile allows for querying such error codes at the moment, we have patches pending to add that to syncobj as well but we lack a compositor with support for that as userspace client.
> As long as the error case can be detected with an eventfd,
Yeah that's the problem. The eventfd only tells you if the operation is completed (or at least has materialized).
To query the error you would need to ask the underlying syncobj or syncfile directly.
> implementing that in KWin shouldn't be a challenge.
>
>> Well the question here is if the device the compositor is using or the client is using is gone?
>>
>> If the client device is hot removed the compositor should be perfectly capable to import the syncobj.
>>
>> If the compositor device is gone then you don't have a device to display anything any more, so generating the next frame doesn't seem to make sense either.
>>
>> What could be is that you want the compositor to be kept alive even when the display device is gone to switch over to vkms or whatever so that a VNC session or other remote desktop still works.
> There are two GPUs in the example I gave. The compositor can use both
> for rendering (in cosmic-comp's case) or switch between them (what I'm
> trying to do with KWin), or use one device for rendering, and another
> for importing the syncobj.
Ah! I think I got the problem now. You basically want to avoid importing the syncobj because when the wrong device goes away you are busted.
The reason we didn't considered having the IOCTLs on the FD is because if you don't import them and instead keep them around you can run out file descriptors quite quickly.
When you have an use case where you receive an FD from the client and do a one shot conversion to an eventfd that will probably work, but for keeping them in the long run you need some kind of container for the syncobjs, don't you?
>>>>>>> 3. It removes the need to translate between syncobjs fds and handles.
>>>>>>
>>>>>> That's a pretty big no-go as well. The differentiation between FDs and handles is completely intentional.
>>>>> Could you expand on why it's needed? For compositors, the handle is
>>>>> just an intermediary thing when translating between file descriptors.
>>>>
>>>> Well what we could do is to add an IOCTL to directly attach an syncobj file descriptor to an eventfd.
>>> That would be nice.
>>
>> Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is used. Adding that functionality shouldn't be more than a typing exercise.
> Yeah, this patchset already adds that functionality (on the new device).
>
>> Do I see it right that this would already solve most problems in the compositor side?
> Skipping the syncobj handle step would only reduce the amounts of
> ioctls the compositor does, but afaict it wouldn't solve any
> compositor problems. At least not as long as it's still tied to a drm
> device.
Yeah, you need something like a syncobj container or dummy DRM device.
> For device hotplugs, the only new thing we need for correctly handling
> syncobj is a way to receive errors on the eventfd.
I need to look into the eventfd code, could be that this is somehow possible but it's clearly not something I used before.
> A device-independent way to create and use syncobj would still be
> useful to us though, both to simplify the compositor and to improve
> the software rendering use cases.
Yeah not sure how to cleanly do that. We could have a dummy /dev/dri/rendersync or something like that, but that would be quite a hack.
At least I understand the requirement now.
Thanks,
Christian.
>
> - Xaver
^ permalink raw reply
* Re: [PATCH v6 10/43] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2
From: Fuad Tabba @ 2026-05-20 14:00 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-10-91ab5a8b19a4@google.com>
On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce base support for KVM_SET_MEMORY_ATTRIBUTES2 in guest_memfd, which
> just updates attributes tracked by guest_memfd.
>
> Validate input fields in general. Guard usage of KVM_SET_MEMORY_ATTRIBUTES2
> by making sure requested attributes are supported for this instance of kvm.
>
> A new KVM_SET_MEMORY_ATTRIBUTES2 is defined to support writes (unlike
> KVM_SET_MEMORY_ATTRIBUTES) in addition to reads so it can provide error
> details to userspace. This will be used in a later patch.
>
> The two ioctls use their corresponding structs with no overlap, but
> backward compatibility is baked in for future support of
> KVM_SET_MEMORY_ATTRIBUTES2 and struct kvm_memory_attributes2 in the VM
> ioctl.
>
> The process of setting memory attributes is set up such that the later half
> will not fail due to allocation. Any necessary checks are performed before
> the point of no return.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> Co-developed-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Sean Christoperson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
/fuad
> ---
> include/uapi/linux/kvm.h | 13 ++++++
> virt/kvm/Kconfig | 1 +
> virt/kvm/guest_memfd.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++
> virt/kvm/kvm_main.c | 12 +++++
> 4 files changed, 140 insertions(+)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6c8afa2047bf3..e6bbf68a83813 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1648,6 +1648,19 @@ struct kvm_memory_attributes {
> __u64 flags;
> };
>
> +#define KVM_SET_MEMORY_ATTRIBUTES2 _IOWR(KVMIO, 0xd2, struct kvm_memory_attributes2)
> +
> +struct kvm_memory_attributes2 {
> + union {
> + __u64 address;
> + __u64 offset;
> + };
> + __u64 size;
> + __u64 attributes;
> + __u64 flags;
> + __u64 reserved[12];
> +};
> +
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 3fea89c45cfb4..e371e079e2c50 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -109,6 +109,7 @@ config KVM_VM_MEMORY_ATTRIBUTES
>
> config KVM_GUEST_MEMFD
> select XARRAY_MULTI
> + select KVM_MEMORY_ATTRIBUTES
> bool
>
> config HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 4f7c4824c3a45..91e89b188f583 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -540,11 +540,125 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> }
> EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_memory_attributes);
>
> +/*
> + * Preallocate memory for attributes to be stored on a maple tree, pointed to
> + * by mas. Adjacent ranges with attributes identical to the new attributes
> + * will be merged. Also sets mas's bounds up for storing attributes.
> + *
> + * This maintains the invariant that ranges with the same attributes will
> + * always be merged.
> + */
> +static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> + pgoff_t start, size_t nr_pages)
> +{
> + pgoff_t end = start + nr_pages;
> + pgoff_t last = end - 1;
> + void *entry;
> +
> + /* Try extending range. entry is NULL on overflow/wrap-around. */
> + mas_set(mas, end);
> + entry = mas_find(mas, end);
> + if (entry && xa_to_value(entry) == attributes)
> + last = mas->last;
> +
> + if (start > 0) {
> + mas_set(mas, start - 1);
> + entry = mas_find(mas, start - 1);
> + if (entry && xa_to_value(entry) == attributes)
> + start = mas->index;
> + }
> +
> + mas_set_range(mas, start, last);
> + return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> +}
> +
> +static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> + size_t nr_pages, uint64_t attrs)
> +{
> + struct address_space *mapping = inode->i_mapping;
> + struct gmem_inode *gi = GMEM_I(inode);
> + pgoff_t end = start + nr_pages;
> + struct maple_tree *mt;
> + struct ma_state mas;
> + int r;
> +
> + mt = &gi->attributes;
> +
> + filemap_invalidate_lock(mapping);
> +
> + mas_init(&mas, mt, start);
> + r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> + if (r)
> + goto out;
> +
> + /*
> + * From this point on guest_memfd has performed necessary
> + * checks and can proceed to do guest-breaking changes.
> + */
> +
> + kvm_gmem_invalidate_begin(inode, start, end);
> + mas_store_prealloc(&mas, xa_mk_value(attrs));
> + kvm_gmem_invalidate_end(inode, start, end);
> +out:
> + filemap_invalidate_unlock(mapping);
> + return r;
> +}
> +
> +static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> +{
> + struct gmem_file *f = file->private_data;
> + struct inode *inode = file_inode(file);
> + struct kvm_memory_attributes2 attrs;
> + size_t nr_pages;
> + pgoff_t index;
> + int i;
> +
> + if (copy_from_user(&attrs, argp, sizeof(attrs)))
> + return -EFAULT;
> +
> + if (attrs.flags)
> + return -EINVAL;
> + for (i = 0; i < ARRAY_SIZE(attrs.reserved); i++) {
> + if (attrs.reserved[i])
> + return -EINVAL;
> + }
> + if (attrs.attributes & ~kvm_supported_mem_attributes(f->kvm))
> + return -EINVAL;
> + if (attrs.size == 0 || attrs.offset + attrs.size < attrs.offset)
> + return -EINVAL;
> + if (!PAGE_ALIGNED(attrs.offset) || !PAGE_ALIGNED(attrs.size))
> + return -EINVAL;
> +
> + if (attrs.offset >= i_size_read(inode) ||
> + attrs.offset + attrs.size > i_size_read(inode))
> + return -EINVAL;
> +
> + nr_pages = attrs.size >> PAGE_SHIFT;
> + index = attrs.offset >> PAGE_SHIFT;
> + return __kvm_gmem_set_attributes(inode, index, nr_pages,
> + attrs.attributes);
> +}
> +
> +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
> + unsigned long arg)
> +{
> + switch (ioctl) {
> + case KVM_SET_MEMORY_ATTRIBUTES2:
> + if (vm_memory_attributes)
> + return -ENOTTY;
> +
> + return kvm_gmem_set_attributes(file, (void __user *)arg);
> + default:
> + return -ENOTTY;
> + }
> +}
> +
> static struct file_operations kvm_gmem_fops = {
> .mmap = kvm_gmem_mmap,
> .open = generic_file_open,
> .release = kvm_gmem_release,
> .fallocate = kvm_gmem_fallocate,
> + .unlocked_ioctl = kvm_gmem_ioctl,
> };
>
> static int kvm_gmem_migrate_folio(struct address_space *mapping,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff20e63143642..4d7bf52b7b717 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -110,6 +110,18 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_KEY(__kvm_get_memory_attributes));
> EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_TRAMP(__kvm_get_memory_attributes));
> #endif
>
> +#define MEMORY_ATTRIBUTES_MATCH(one, two) \
> + static_assert(offsetof(struct kvm_memory_attributes, one) == \
> + offsetof(struct kvm_memory_attributes2, two)); \
> + static_assert(sizeof_field(struct kvm_memory_attributes, one) ==\
> + sizeof_field(struct kvm_memory_attributes2, two))
> +
> +/* Ensure the common parts of the two structs are identical. */
> +MEMORY_ATTRIBUTES_MATCH(address, address);
> +MEMORY_ATTRIBUTES_MATCH(size, size);
> +MEMORY_ATTRIBUTES_MATCH(attributes, attributes);
> +MEMORY_ATTRIBUTES_MATCH(flags, flags);
> +
> /*
> * Ordering of locks:
> *
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply
* Re: [PATCH 13/15] accel/qda: Add DSP process creation and release
From: Dmitry Baryshkov @ 2026-05-20 14:00 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-13-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:46:03AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Implement the REMOTE_SESSION_CREATE and INIT_RELEASE FastRPC
> operations, which establish and tear down a user process on the
> DSP.
>
> DRM_IOCTL_QDA_REMOTE_SESSION_CREATE (drm_qda_init_create)
> Creates a new process on the DSP by sending an INIT_CREATE message
> via the FastRPC INIT_HANDLE. The caller provides an ELF file (via
> DMA-BUF fd or direct pointer) and optional process attributes. A
> 4 MB GEM buffer is allocated per session to hold the DSP process
> image; this buffer is stored in qda_file_priv and reused for the
> lifetime of the session.
>
> If attrs is non-zero, INIT_CREATE_ATTR is used instead of
> INIT_CREATE to pass the extended attribute and signature fields.
What is the difference?
>
> INIT_RELEASE
> Sends a release message to the DSP when the DRM file is closed
> (qda_postclose via qda_release_dsp_process), freeing the remote
> process and its resources. The release is skipped if the device
> has already been unplugged.
>
> qda_fastrpc.c
> fastrpc_prepare_args_init_create() marshals the six-argument
> create-process payload: the inbuf descriptor, process name,
> ELF file, physical pages, attrs, and siglen.
> fastrpc_prepare_args_release_process() marshals the single-
> argument release payload (remote_session_id).
>
> qda_drv.c
> qda_postclose() is extended to call qda_release_dsp_process()
> under drm_dev_enter() so the release message is only sent while
> the device is still accessible.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/qda_drv.c | 8 +++
> drivers/accel/qda/qda_drv.h | 5 ++
> drivers/accel/qda/qda_fastrpc.c | 140 ++++++++++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_fastrpc.h | 39 +++++++++--
> drivers/accel/qda/qda_ioctl.c | 52 +++++++++++++++
> drivers/accel/qda/qda_ioctl.h | 1 +
> include/uapi/drm/qda_accel.h | 32 ++++++++-
> 7 files changed, 270 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> index 704c7d3127d2..4eaba9b050c0 100644
> --- a/drivers/accel/qda/qda_drv.c
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -36,6 +36,13 @@ static int qda_open(struct drm_device *dev, struct drm_file *file)
> static void qda_postclose(struct drm_device *dev, struct drm_file *file)
> {
> struct qda_file_priv *qda_file_priv = file->driver_priv;
> + int idx;
> +
> + /* Only send the DSP release message while the device is accessible */
> + if (drm_dev_enter(dev, &idx)) {
> + qda_release_dsp_process(qda_file_priv->qda_dev, file);
> + drm_dev_exit(idx);
> + }
>
> if (qda_file_priv->assigned_iommu_dev) {
> struct qda_iommu_device *iommu_dev = qda_file_priv->assigned_iommu_dev;
> @@ -59,6 +66,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
> DRM_IOCTL_DEF_DRV(QDA_QUERY, qda_ioctl_query, 0),
> DRM_IOCTL_DEF_DRV(QDA_GEM_CREATE, qda_ioctl_gem_create, 0),
> DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
> + DRM_IOCTL_DEF_DRV(QDA_REMOTE_SESSION_CREATE, qda_ioctl_init_create, 0),
Why is it being added in the middle?
> DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
> };
>
> diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
> index 420cccff42bf..4b4639961d95 100644
> --- a/drivers/accel/qda/qda_drv.h
> +++ b/drivers/accel/qda/qda_drv.h
> @@ -28,6 +28,8 @@ struct qda_file_priv {
> struct qda_dev *qda_dev;
> /** @assigned_iommu_dev: IOMMU device assigned to this process */
> struct qda_iommu_device *assigned_iommu_dev;
> + /** @init_mem_gem_obj: GEM object for PD initialization memory */
> + struct qda_gem_obj *init_mem_gem_obj;
> /** @pid: Process ID for tracking */
> pid_t pid;
> /** @remote_session_id: Unique session identifier */
> @@ -83,4 +85,7 @@ void qda_deinit_device(struct qda_dev *qdev);
> int qda_register_device(struct qda_dev *qdev);
> void qda_unregister_device(struct qda_dev *qdev);
>
> +/* DSP process / protection domain management */
> +int qda_release_dsp_process(struct qda_dev *qdev, struct drm_file *file_priv);
> +
> #endif /* __QDA_DRV_H__ */
> diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
> index 0ec37175a098..305915022b91 100644
> --- a/drivers/accel/qda/qda_fastrpc.c
> +++ b/drivers/accel/qda/qda_fastrpc.c
> @@ -524,6 +524,138 @@ int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx,
> return err;
> }
>
> +static void setup_create_process_args(struct drm_qda_fastrpc_invoke_args *args,
> + struct fastrpc_create_process_inbuf *inbuf,
> + struct drm_qda_init_create *init,
> + struct fastrpc_phy_page *pages)
> +{
> + args[0].ptr = (u64)(uintptr_t)inbuf;
> + args[0].length = sizeof(*inbuf);
> + args[0].fd = -1;
> +
> + args[1].ptr = (u64)(uintptr_t)current->comm;
> + args[1].length = inbuf->namelen;
> + args[1].fd = -1;
> +
> + args[2].ptr = (u64)init->file;
> + args[2].length = inbuf->filelen;
> + args[2].fd = init->filefd; /* DMA-BUF fd forwarded to DSP */
> +
> + args[3].ptr = (u64)(uintptr_t)pages;
> + args[3].length = 1 * sizeof(*pages);
> + args[3].fd = -1;
> +
> + args[4].ptr = (u64)(uintptr_t)&inbuf->attrs;
> + args[4].length = sizeof(inbuf->attrs);
> + args[4].fd = -1;
> +
> + args[5].ptr = (u64)(uintptr_t)&inbuf->siglen;
> + args[5].length = sizeof(inbuf->siglen);
> + args[5].fd = -1;
> +}
> +
> +static void setup_single_arg(struct drm_qda_fastrpc_invoke_args *args, const void *ptr, size_t size)
> +{
> + args[0].ptr = (u64)(uintptr_t)ptr;
> + args[0].length = size;
> + args[0].fd = -1;
> +}
> +
> +static int fastrpc_prepare_args_release_process(struct fastrpc_invoke_context *ctx)
> +{
> + struct drm_qda_fastrpc_invoke_args *args;
> +
> + args = kzalloc_obj(*args);
> + if (!args)
> + return -ENOMEM;
> +
> + setup_single_arg(args, &ctx->remote_session_id, sizeof(ctx->remote_session_id));
> + ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_RELEASE, 1, 0);
> + ctx->args = args;
> + ctx->handle = FASTRPC_INIT_HANDLE;
> +
> + return 0;
> +}
> +
> +static int fastrpc_prepare_args_init_create(struct fastrpc_invoke_context *ctx,
> + char __user *argp)
> +{
> + struct drm_qda_init_create init;
> + struct drm_qda_fastrpc_invoke_args *args;
> + struct fastrpc_create_process_inbuf *inbuf;
> + int err;
> + u32 sc;
> +
> + args = kcalloc(FASTRPC_CREATE_PROCESS_NARGS, sizeof(*args), GFP_KERNEL);
> + if (!args)
> + return -ENOMEM;
> +
> + ctx->input_pages = kcalloc(1, sizeof(*ctx->input_pages), GFP_KERNEL);
> + if (!ctx->input_pages) {
> + err = -ENOMEM;
> + goto err_free_args;
> + }
> +
> + ctx->inbuf = kcalloc(1, sizeof(*inbuf), GFP_KERNEL);
> + if (!ctx->inbuf) {
> + err = -ENOMEM;
> + goto err_free_input_pages;
> + }
> + inbuf = ctx->inbuf;
> +
> + memcpy(&init, argp, sizeof(init));
> +
> + if (init.filelen > FASTRPC_INIT_FILELEN_MAX) {
> + err = -EINVAL;
> + goto err_free_inbuf;
> + }
> +
> + /*
> + * Validate that the DMA-BUF fd is importable. The fd itself is kept
> + * in init.filefd and forwarded to the DSP via setup_create_process_args().
> + */
> + if (init.filelen && init.filefd > 0) {
> + struct drm_gem_object *file_gem_obj;
> +
> + err = get_gem_obj_from_dmabuf_fd(ctx, init.filefd, &file_gem_obj);
> + if (err) {
> + err = -EINVAL;
> + goto err_free_inbuf;
> + }
> + drm_gem_object_put(file_gem_obj);
> + }
> +
> + inbuf->remote_session_id = ctx->remote_session_id;
> + inbuf->namelen = strlen(current->comm) + 1;
> + inbuf->filelen = init.filelen;
> + inbuf->pageslen = 1;
> + inbuf->attrs = init.attrs;
> + inbuf->siglen = init.siglen;
> +
> + setup_pages_from_gem_obj(ctx->init_mem_gem_obj, &ctx->input_pages[0]);
> +
> + setup_create_process_args(args, inbuf, &init, ctx->input_pages);
> +
> + sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE, 4, 0);
> + if (init.attrs)
> + sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_ATTR, 4, 0);
> + ctx->sc = sc;
> + ctx->args = args;
> + ctx->handle = FASTRPC_INIT_HANDLE;
> +
> + return 0;
> +
> +err_free_inbuf:
> + kfree(ctx->inbuf);
> + ctx->inbuf = NULL;
> +err_free_input_pages:
> + kfree(ctx->input_pages);
> + ctx->input_pages = NULL;
> +err_free_args:
> + kfree(args);
> + return err;
> +}
> +
> static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
> {
> struct drm_qda_invoke_args invoke_args;
> @@ -568,6 +700,14 @@ int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *ar
> int err;
>
> switch (ctx->type) {
> + case FASTRPC_RMID_INIT_RELEASE:
> + err = fastrpc_prepare_args_release_process(ctx);
> + break;
> + case FASTRPC_RMID_INIT_CREATE:
> + case FASTRPC_RMID_INIT_CREATE_ATTR:
> + ctx->pd = QDA_USER_PD;
> + err = fastrpc_prepare_args_init_create(ctx, argp);
> + break;
> case FASTRPC_RMID_INVOKE_DYNAMIC:
> err = fastrpc_prepare_args_invoke(ctx, argp);
> break;
> diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
> index ce77baeccfba..1c1236f9525e 100644
> --- a/drivers/accel/qda/qda_fastrpc.h
> +++ b/drivers/accel/qda/qda_fastrpc.h
> @@ -127,6 +127,27 @@ struct fastrpc_invoke_buf {
> u32 pgidx;
> };
>
> +/**
> + * struct fastrpc_create_process_inbuf - Input buffer for process creation
> + *
> + * This structure defines the input buffer format for creating a new
> + * process on the remote DSP.
> + */
> +struct fastrpc_create_process_inbuf {
> + /** @remote_session_id: Client identifier for the session */
> + int remote_session_id;
> + /** @namelen: Length of the process name string including NUL terminator */
> + u32 namelen;
> + /** @filelen: Length of the ELF shell file in bytes */
> + u32 filelen;
> + /** @pageslen: Number of physical page descriptors */
> + u32 pageslen;
> + /** @attrs: Process attribute flags */
> + u32 attrs;
> + /** @siglen: Length of the signature data in bytes */
> + u32 siglen;
> +};
> +
> /**
> * struct fastrpc_msg - FastRPC wire message for remote invocations
> *
> @@ -153,10 +174,6 @@ struct fastrpc_msg {
>
> /**
> * struct qda_msg - FastRPC message with kernel-internal bookkeeping
> - *
> - * The wire-format portion is kept in the embedded @fastrpc member (must
> - * be first) so that &qda_msg->fastrpc can be passed directly to
> - * rpmsg_send() without a copy.
> */
> struct qda_msg {
> /**
> @@ -245,7 +262,7 @@ struct fastrpc_invoke_context {
> struct qda_gem_obj *msg_gem_obj;
> /** @file_priv: DRM file private data */
> struct drm_file *file_priv;
> - /** @init_mem_gem_obj: GEM object for protection domain init memory */
> + /** @init_mem_gem_obj: GEM object for PD initialization memory */
> struct qda_gem_obj *init_mem_gem_obj;
> /** @req: Pointer to kernel-internal request buffer */
> void *req;
> @@ -256,11 +273,23 @@ struct fastrpc_invoke_context {
> };
>
> /* Remote Method ID table - identifies initialization and control operations */
> +#define FASTRPC_RMID_INIT_RELEASE 1 /* Release DSP process */
> +#define FASTRPC_RMID_INIT_CREATE 6 /* Create DSP process */
> +#define FASTRPC_RMID_INIT_CREATE_ATTR 7 /* Create DSP process with attributes */
> #define FASTRPC_RMID_INVOKE_DYNAMIC 0xFFFFFFFF /* Dynamic method invocation */
>
> /* Common handle for initialization operations */
> #define FASTRPC_INIT_HANDLE 0x1
>
> +/* Protection Domain (PD) identifiers */
> +#define QDA_ROOT_PD (0)
> +#define QDA_USER_PD (1)
> +
> +/* Number of arguments for process creation */
> +#define FASTRPC_CREATE_PROCESS_NARGS 6
> +/* Maximum initialization file size (4 MB) */
> +#define FASTRPC_INIT_FILELEN_MAX (4 * 1024 * 1024)
> +
> void qda_fastrpc_context_free(struct kref *ref);
> struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
> int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
> diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
> index c81268c20b04..33f0a798ad13 100644
> --- a/drivers/accel/qda/qda_ioctl.c
> +++ b/drivers/accel/qda/qda_ioctl.c
> @@ -109,6 +109,7 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
> struct drm_gem_object *gem_obj;
> int err;
> size_t hdr_size;
> + size_t initmem_size = FASTRPC_INIT_FILELEN_MAX;
>
> ctx = qda_fastrpc_context_alloc();
> if (IS_ERR(ctx))
> @@ -124,6 +125,27 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
> ctx->file_priv = file_priv;
> ctx->remote_session_id = qda_file_priv->remote_session_id;
>
> + if (type == FASTRPC_RMID_INIT_CREATE) {
> + struct drm_gem_object *initmem_gem_obj;
> +
> + if (qda_file_priv->init_mem_gem_obj) {
Why is it non-NULL here?
> + drm_gem_object_put(&qda_file_priv->init_mem_gem_obj->base);
> + qda_file_priv->init_mem_gem_obj = NULL;
> + }
> +
> + initmem_gem_obj = qda_gem_create_object(dev, qdev->iommu_mgr,
> + initmem_size, file_priv);
> + if (IS_ERR(initmem_gem_obj)) {
> + err = PTR_ERR(initmem_gem_obj);
> + goto err_context_free;
> + }
> +
> + ctx->init_mem_gem_obj = to_qda_gem_obj(initmem_gem_obj);
> + qda_file_priv->init_mem_gem_obj = ctx->init_mem_gem_obj;
> + } else if (type == FASTRPC_RMID_INIT_RELEASE) {
> + ctx->init_mem_gem_obj = qda_file_priv->init_mem_gem_obj;
> + }
> +
> err = qda_fastrpc_prepare_args(ctx, (char __user *)data);
> if (err)
> goto err_context_free;
> @@ -161,11 +183,41 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
> return 0;
>
> err_context_free:
> + if (type == FASTRPC_RMID_INIT_RELEASE && !err && qda_file_priv->init_mem_gem_obj) {
> + drm_gem_object_put(&qda_file_priv->init_mem_gem_obj->base);
> + qda_file_priv->init_mem_gem_obj = NULL;
> + }
> +
> fastrpc_context_put_id(ctx, qdev);
> kref_put(&ctx->refcount, qda_fastrpc_context_free);
> return err;
> }
>
> +/**
> + * qda_ioctl_init_create() - Create a DSP process
> + * @dev: DRM device structure
> + * @data: User-space data (struct drm_qda_init_create)
> + * @file_priv: DRM file private data
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_ioctl_init_create(struct drm_device *dev, void *data, struct drm_file *file_priv)
> +{
> + return fastrpc_invoke(FASTRPC_RMID_INIT_CREATE, dev, data, file_priv);
Where is INIT_CREATE_ATTR, which you described earlier?
> +}
> +
> +/**
> + * qda_release_dsp_process() - Release DSP process resources for a file
> + * @qdev: QDA device structure
> + * @file_priv: DRM file private data
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_release_dsp_process(struct qda_dev *qdev, struct drm_file *file_priv)
> +{
> + return fastrpc_invoke(FASTRPC_RMID_INIT_RELEASE, &qdev->drm_dev, NULL, file_priv);
> +}
> +
> /**
> * qda_ioctl_invoke() - Perform a dynamic FastRPC method invocation
> * @dev: DRM device structure
> diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
> index 3bb9cfd98370..192565434363 100644
> --- a/drivers/accel/qda/qda_ioctl.h
> +++ b/drivers/accel/qda/qda_ioctl.h
> @@ -9,6 +9,7 @@
> #include "qda_drv.h"
>
> int qda_ioctl_query(struct drm_device *dev, void *data, struct drm_file *file_priv);
> +int qda_ioctl_init_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
> int qda_ioctl_gem_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
> int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_file *file_priv);
> int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv);
> diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
> index 72512213741f..711e2523a570 100644
> --- a/include/uapi/drm/qda_accel.h
> +++ b/include/uapi/drm/qda_accel.h
> @@ -21,8 +21,9 @@ extern "C" {
> #define DRM_QDA_QUERY 0x00
> #define DRM_QDA_GEM_CREATE 0x01
> #define DRM_QDA_GEM_MMAP_OFFSET 0x02
> -/* Command numbers 0x03-0x06 reserved for INIT_ATTACH, INIT_CREATE, MAP, MUNMAP */
> -#define DRM_QDA_REMOTE_INVOKE 0x07
> +/* Command number 0x03 reserved for INIT_ATTACH; 0x05-0x06 reserved for MAP, MUNMAP */
> +#define DRM_QDA_REMOTE_SESSION_CREATE 0x04
> +#define DRM_QDA_REMOTE_INVOKE 0x07
>
> /*
> * QDA IOCTL definitions
> @@ -37,6 +38,9 @@ extern "C" {
> struct drm_qda_gem_create)
> #define DRM_IOCTL_QDA_GEM_MMAP_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_GEM_MMAP_OFFSET, \
> struct drm_qda_gem_mmap_offset)
> +#define DRM_IOCTL_QDA_REMOTE_SESSION_CREATE \
> + DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_SESSION_CREATE, \
> + struct drm_qda_init_create)
> #define DRM_IOCTL_QDA_REMOTE_INVOKE DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_INVOKE, \
> struct drm_qda_invoke_args)
>
> @@ -99,6 +103,30 @@ struct drm_qda_fastrpc_invoke_args {
> __u32 attr;
> };
>
> +/**
> + * struct drm_qda_init_create - Accelerator process initialization parameters
> + * @filelen: Length of the ELF file in bytes
> + * @filefd: DMA-BUF file descriptor containing the ELF file
> + * @attrs: Process attributes flags
> + * @siglen: Length of signature data in bytes
> + * @file: Pointer to ELF file data if not using filefd
> + *
> + * This structure is used with DRM_IOCTL_QDA_INIT_CREATE to initialize
> + * a new process on the accelerator. The process code is provided either
> + * via a file descriptor (filefd, typically a GEM object) or a direct
> + * pointer (file). Set file to 0 if using filefd.
> + *
> + * The attrs field contains bit flags for debug mode, privileged execution,
> + * and other process attributes.
> + */
> +struct drm_qda_init_create {
> + __u32 filelen;
> + __s32 filefd;
> + __u32 attrs;
> + __u32 siglen;
> + __u64 file;
> +};
> +
> /**
> * struct drm_qda_invoke_args - Dynamic FastRPC invocation parameters
> * @handle: Remote handle to invoke on the DSP
>
> --
> 2.34.1
>
>
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 12/15] accel/qda: Add FastRPC invocation support
From: Dmitry Baryshkov @ 2026-05-20 13:56 UTC (permalink / raw)
To: ekansh.gupta
Cc: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Bharath Kumar, Chenna Kesava Raju, srini,
andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-12-b2d984c297f8@oss.qualcomm.com>
On Tue, May 19, 2026 at 11:46:02AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
>
> Implement the FastRPC remote procedure call path, allowing user-space
> to invoke methods on the DSP via DRM_IOCTL_QDA_REMOTE_INVOKE.
>
> qda_fastrpc.c / qda_fastrpc.h
> Implements the FastRPC protocol layer: argument marshalling
> (qda_fastrpc_invoke_pack), response unmarshalling
> (qda_fastrpc_invoke_unpack), and invocation context lifecycle
> management. Each invocation allocates a fastrpc_invoke_context
> which tracks buffer descriptors, GEM objects, and the completion
> used to synchronise with the DSP response.
>
> Buffer arguments are handled in three ways:
> - DMA-BUF fd: imported via PRIME, IOMMU-mapped dma_addr used
> - Direct (inline): copied into the GEM-backed message buffer
> - DMA handle: fd forwarded to DSP, physical page descriptor computed
No. This needs to go away. The QDA should support only one way to pass
data - via the GEM buffers. Everything else should be handled by the
shim layer, etc.
>
> qda_rpmsg.c
> Implements qda_rpmsg_send_msg() which sends the wire-format
> fastrpc_msg (embedded as the first member of qda_msg) directly
> via rpmsg_send(), and qda_rpmsg_wait_for_rsp() which blocks on
> the context completion. The RPMsg callback dispatches responses
> to waiting contexts via the ctx_xa XArray.
>
> qda_ioctl.c
> qda_ioctl_invoke() drives the full invocation lifecycle:
> allocate context → assign XArray ID → prepare args → allocate
> GEM message buffer → pack → send → wait → unpack → free.
>
> qda_drv.h / qda_drv.c
> qda_dev gains ctx_xa (XArray for in-flight context lookup) and
> remote_session_id_counter (atomic counter for session IDs).
> qda_file_priv gains remote_session_id for per-session tracking.
>
> include/uapi/drm/qda_accel.h
> Adds DRM_IOCTL_QDA_REMOTE_INVOKE (command 0x07; command numbers
> 0x03–0x06 are reserved) and the associated drm_qda_invoke_args
> and drm_qda_fastrpc_invoke_args structures.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> drivers/accel/qda/Makefile | 1 +
> drivers/accel/qda/qda_drv.c | 17 ++
> drivers/accel/qda/qda_drv.h | 8 +
> drivers/accel/qda/qda_fastrpc.c | 597 ++++++++++++++++++++++++++++++++++++++++
> drivers/accel/qda/qda_fastrpc.h | 271 ++++++++++++++++++
> drivers/accel/qda/qda_ioctl.c | 104 +++++++
> drivers/accel/qda/qda_ioctl.h | 1 +
> drivers/accel/qda/qda_rpmsg.c | 136 ++++++++-
> drivers/accel/qda/qda_rpmsg.h | 17 ++
> include/uapi/drm/qda_accel.h | 39 +++
> 10 files changed, 1189 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index fb092e56d7f3..2d10420cd1ec 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_ACCEL_QDA) := qda.o
> qda-y := \
> qda_cb.o \
> qda_drv.o \
> + qda_fastrpc.o \
> qda_gem.o \
> qda_ioctl.o \
> qda_memory_dma.o \
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> index ef8bd573b836..704c7d3127d2 100644
> --- a/drivers/accel/qda/qda_drv.c
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -26,6 +26,8 @@ static int qda_open(struct drm_device *dev, struct drm_file *file)
>
> qda_file_priv->pid = current->pid;
> qda_file_priv->qda_dev = qda_dev_from_drm(dev);
> + qda_file_priv->remote_session_id =
> + atomic_inc_return(&qda_file_priv->qda_dev->remote_session_id_counter);
> file->driver_priv = qda_file_priv;
>
> return 0;
> @@ -57,6 +59,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
> DRM_IOCTL_DEF_DRV(QDA_QUERY, qda_ioctl_query, 0),
> DRM_IOCTL_DEF_DRV(QDA_GEM_CREATE, qda_ioctl_gem_create, 0),
> DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
> + DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
> };
>
> static const struct drm_driver qda_drm_driver = {
> @@ -93,6 +96,17 @@ static void cleanup_memory_manager(struct qda_dev *qdev)
> }
> }
>
> +static void cleanup_device_resources(struct qda_dev *qdev)
> +{
> + xa_destroy(&qdev->ctx_xa);
> +}
> +
> +static void init_device_resources(struct qda_dev *qdev)
> +{
> + atomic_set(&qdev->remote_session_id_counter, 0);
> + xa_init_flags(&qdev->ctx_xa, XA_FLAGS_ALLOC1);
> +}
> +
> static int init_memory_manager(struct qda_dev *qdev)
> {
> qdev->iommu_mgr = kzalloc_obj(*qdev->iommu_mgr);
> @@ -106,6 +120,7 @@ void qda_deinit_device(struct qda_dev *qdev)
> {
> mutex_destroy(&qdev->import_lock);
> cleanup_memory_manager(qdev);
> + cleanup_device_resources(qdev);
> }
>
> int qda_init_device(struct qda_dev *qdev)
> @@ -114,10 +129,12 @@ int qda_init_device(struct qda_dev *qdev)
>
> mutex_init(&qdev->import_lock);
> qdev->current_import_file_priv = NULL;
> + init_device_resources(qdev);
>
> ret = init_memory_manager(qdev);
> if (ret) {
> drm_err(&qdev->drm_dev, "Failed to initialize memory manager: %d\n", ret);
> + cleanup_device_resources(qdev);
> mutex_destroy(&qdev->import_lock);
> }
>
> diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
> index 96ce4135e2d9..420cccff42bf 100644
> --- a/drivers/accel/qda/qda_drv.h
> +++ b/drivers/accel/qda/qda_drv.h
> @@ -6,10 +6,12 @@
> #ifndef __QDA_DRV_H__
> #define __QDA_DRV_H__
>
> +#include <linux/atomic.h>
> #include <linux/device.h>
> #include <linux/list.h>
> #include <linux/rpmsg.h>
> #include <linux/types.h>
> +#include <linux/xarray.h>
> #include <drm/drm_device.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_file.h>
> @@ -28,6 +30,8 @@ struct qda_file_priv {
> struct qda_iommu_device *assigned_iommu_dev;
> /** @pid: Process ID for tracking */
> pid_t pid;
> + /** @remote_session_id: Unique session identifier */
> + u32 remote_session_id;
> };
>
> /**
> @@ -51,8 +55,12 @@ struct qda_dev {
> struct mutex import_lock;
> /** @current_import_file_priv: Current file_priv during prime import */
> struct drm_file *current_import_file_priv;
> + /** @ctx_xa: XArray for FastRPC context management */
> + struct xarray ctx_xa;
> /** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
> const char *dsp_name;
> + /** @remote_session_id_counter: Atomic counter for unique session IDs */
> + atomic_t remote_session_id_counter;
> };
>
> /**
> diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
> new file mode 100644
> index 000000000000..0ec37175a098
> --- /dev/null
> +++ b/drivers/accel/qda/qda_fastrpc.c
> @@ -0,0 +1,597 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/sort.h>
> +#include <linux/completion.h>
> +#include <linux/dma-buf.h>
> +#include <drm/drm_gem.h>
> +#include "qda_fastrpc.h"
> +#include "qda_drv.h"
> +#include "qda_gem.h"
> +#include "qda_memory_manager.h"
> +#include "qda_prime.h"
> +
> +/**
> + * get_gem_obj_from_dmabuf_fd() - Import a DMA-BUF fd and return the GEM object
> + * @ctx: FastRPC invocation context
> + * @dmabuf_fd: DMA-BUF file descriptor supplied by user space
> + * @gem_obj: Output GEM object (caller must call drm_gem_object_put() when done)
> + *
> + * Imports the DMA-BUF fd into the QDA device via qda_prime_fd_to_handle()
> + * (which performs IOMMU device assignment for newly imported buffers) and
> + * then looks up the resulting GEM object. The caller is responsible for
> + * calling drm_gem_object_put() on the returned object.
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +static int get_gem_obj_from_dmabuf_fd(struct fastrpc_invoke_context *ctx,
> + int dmabuf_fd,
> + struct drm_gem_object **gem_obj)
> +{
> + struct drm_device *dev = ctx->file_priv->minor->dev;
> + u32 handle;
> + int ret;
> +
> + ret = qda_prime_fd_to_handle(dev, ctx->file_priv, dmabuf_fd, &handle);
> + if (ret)
> + return ret;
> +
> + *gem_obj = drm_gem_object_lookup(ctx->file_priv, handle);
> + if (!*gem_obj)
> + return -ENOENT;
> +
> + return 0;
> +}
> +
> +static void setup_pages_from_gem_obj(struct qda_gem_obj *qda_gem_obj,
> + struct fastrpc_phy_page *pages)
> +{
> + pages->addr = qda_gem_obj->dma_addr;
> + pages->size = qda_gem_obj->size;
> +}
> +
> +static u64 calculate_vma_offset(u64 user_ptr)
> +{
> + struct vm_area_struct *vma;
> + u64 user_ptr_page_mask = user_ptr & PAGE_MASK;
> + u64 vma_offset = 0;
> +
> + mmap_read_lock(current->mm);
> + vma = find_vma(current->mm, user_ptr);
> + if (vma)
> + vma_offset = user_ptr_page_mask - vma->vm_start;
> + mmap_read_unlock(current->mm);
> +
> + return vma_offset;
> +}
> +
> +static u64 calculate_page_aligned_size(u64 ptr, u64 len)
> +{
> + u64 pg_start = (ptr & PAGE_MASK) >> PAGE_SHIFT;
> + u64 pg_end = ((ptr + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
> + u64 aligned_size = (pg_end - pg_start + 1) * PAGE_SIZE;
> +
> + return aligned_size;
> +}
> +
> +static struct fastrpc_invoke_buf *fastrpc_invoke_buf_start(union fastrpc_remote_arg *pra, int len)
> +{
> + return (struct fastrpc_invoke_buf *)(&pra[len]);
> +}
> +
> +static struct fastrpc_phy_page *fastrpc_phy_page_start(struct fastrpc_invoke_buf *buf, int len)
> +{
> + return (struct fastrpc_phy_page *)(&buf[len]);
> +}
> +
> +static int fastrpc_get_meta_size(struct fastrpc_invoke_context *ctx)
> +{
> + int size = 0;
> +
> + size = (sizeof(struct fastrpc_remote_buf) +
> + sizeof(struct fastrpc_invoke_buf) +
> + sizeof(struct fastrpc_phy_page)) * ctx->nscalars +
> + sizeof(u64) * FASTRPC_MAX_FDLIST +
> + sizeof(u32) * FASTRPC_MAX_CRCLIST;
> +
> + return size;
> +}
> +
> +static u64 fastrpc_get_payload_size(struct fastrpc_invoke_context *ctx, int metalen)
> +{
> + u64 size = 0;
> + int oix;
> +
> + size = ALIGN(metalen, FASTRPC_ALIGN);
> +
> + for (oix = 0; oix < ctx->nbufs; oix++) {
> + int i = ctx->olaps[oix].raix;
> +
> + if (ctx->args[i].fd == 0 || ctx->args[i].fd == -1) {
> + if (ctx->olaps[oix].offset == 0)
> + size = ALIGN(size, FASTRPC_ALIGN);
> +
> + size += (ctx->olaps[oix].mend - ctx->olaps[oix].mstart);
> + }
> + }
> +
> + return size;
> +}
> +
> +/**
> + * qda_fastrpc_context_free() - Free an invocation context
> + * @ref: Reference counter embedded in the context
> + *
> + * Called when the reference count reaches zero; releases all resources
> + * associated with the invocation context.
> + */
> +void qda_fastrpc_context_free(struct kref *ref)
> +{
> + struct fastrpc_invoke_context *ctx;
> + int i;
> +
> + ctx = container_of(ref, struct fastrpc_invoke_context, refcount);
> + if (ctx->gem_objs) {
> + for (i = 0; i < ctx->nscalars; ++i) {
> + if (ctx->gem_objs[i])
> + drm_gem_object_put(ctx->gem_objs[i]);
> + }
> + kfree(ctx->gem_objs);
> + }
> +
> + if (ctx->msg_gem_obj)
> + drm_gem_object_put(&ctx->msg_gem_obj->base);
> +
> + kfree(ctx->olaps);
> +
> + kfree(ctx->args);
> + kfree(ctx->req);
> + kfree(ctx->rsp);
> + kfree(ctx->input_pages);
> + kfree(ctx->inbuf);
> +
> + kfree(ctx);
> +}
> +
> +#define CMP(aa, bb) ((aa) == (bb) ? 0 : (aa) < (bb) ? -1 : 1)
> +
> +static int olaps_cmp(const void *a, const void *b)
> +{
> + struct fastrpc_buf_overlap *pa = (struct fastrpc_buf_overlap *)a;
> + struct fastrpc_buf_overlap *pb = (struct fastrpc_buf_overlap *)b;
> + /* sort with lowest starting buffer first */
> + int st = CMP(pa->start, pb->start);
> + /* sort with highest ending buffer first */
> + int ed = CMP(pb->end, pa->end);
> +
> + return st == 0 ? ed : st;
> +}
> +
> +static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_context *ctx)
> +{
> + u64 max_end = 0;
> + int i;
> +
> + for (i = 0; i < ctx->nbufs; ++i) {
> + ctx->olaps[i].start = ctx->args[i].ptr;
> + ctx->olaps[i].end = ctx->olaps[i].start + ctx->args[i].length;
> + ctx->olaps[i].raix = i;
> + }
> +
> + sort(ctx->olaps, ctx->nbufs, sizeof(*ctx->olaps), olaps_cmp, NULL);
> +
> + for (i = 0; i < ctx->nbufs; ++i) {
> + if (ctx->olaps[i].start < max_end) {
> + ctx->olaps[i].mstart = max_end;
> + ctx->olaps[i].mend = ctx->olaps[i].end;
> + ctx->olaps[i].offset = max_end - ctx->olaps[i].start;
> +
> + if (ctx->olaps[i].end > max_end) {
> + max_end = ctx->olaps[i].end;
> + } else {
> + ctx->olaps[i].mend = 0;
> + ctx->olaps[i].mstart = 0;
> + }
> + } else {
> + ctx->olaps[i].mend = ctx->olaps[i].end;
> + ctx->olaps[i].mstart = ctx->olaps[i].start;
> + ctx->olaps[i].offset = 0;
> + max_end = ctx->olaps[i].end;
> + }
> + }
> +}
> +
> +/**
> + * qda_fastrpc_context_alloc() - Allocate a new FastRPC invocation context
> + *
> + * Return: Pointer to allocated context, or ERR_PTR on failure
> + */
> +struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void)
> +{
> + struct fastrpc_invoke_context *ctx = NULL;
> +
> + ctx = kzalloc_obj(*ctx);
> + if (!ctx)
> + return ERR_PTR(-ENOMEM);
> +
> + INIT_LIST_HEAD(&ctx->node);
> +
> + ctx->retval = -1;
> + ctx->pid = current->pid;
> + init_completion(&ctx->work);
> + ctx->msg_gem_obj = NULL;
> + kref_init(&ctx->refcount);
> +
> + return ctx;
> +}
> +
> +/*
> + * process_fd_buffer() - Handle an in/out buffer argument backed by a DMA-BUF fd
> + *
> + * args[i].fd is a DMA-BUF fd. We import it to obtain the GEM object and its
> + * IOMMU-mapped dma_addr for the physical page descriptor. The DSP uses the
> + * physical address directly for this buffer type; the fd is not forwarded.
> + */
> +static int process_fd_buffer(struct fastrpc_invoke_context *ctx, int i,
> + union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages)
> +{
> + struct drm_gem_object *gem_obj;
> + struct qda_gem_obj *qda_gem_obj;
> + int err;
> + u64 len = ctx->args[i].length;
> + u64 vma_offset;
> +
> + err = get_gem_obj_from_dmabuf_fd(ctx, ctx->args[i].fd, &gem_obj);
> + if (err)
> + return err;
> +
> + ctx->gem_objs[i] = gem_obj;
> + qda_gem_obj = to_qda_gem_obj(gem_obj);
> +
> + rpra[i].buf.pv = (u64)ctx->args[i].ptr;
> +
> + pages[i].addr = qda_gem_obj->dma_addr;
> +
> + vma_offset = calculate_vma_offset(ctx->args[i].ptr);
> + pages[i].addr += vma_offset;
> + pages[i].size = calculate_page_aligned_size(ctx->args[i].ptr, len);
> +
> + return 0;
> +}
> +
> +static int process_direct_buffer(struct fastrpc_invoke_context *ctx, int i, int oix,
> + union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages,
> + uintptr_t *args, u64 *rlen, u64 pkt_size)
> +{
> + int mlen;
> + u64 len = ctx->args[i].length;
> + int inbufs = ctx->inbufs;
> +
> + if (ctx->olaps[oix].offset == 0) {
> + *rlen -= ALIGN(*args, FASTRPC_ALIGN) - *args;
> + *args = ALIGN(*args, FASTRPC_ALIGN);
> + }
> +
> + mlen = ctx->olaps[oix].mend - ctx->olaps[oix].mstart;
> +
> + if (*rlen < mlen)
> + return -ENOSPC;
> +
> + rpra[i].buf.pv = *args - ctx->olaps[oix].offset;
> +
> + pages[i].addr = ctx->msg->phys - ctx->olaps[oix].offset + (pkt_size - *rlen);
> + pages[i].addr = pages[i].addr & PAGE_MASK;
> + pages[i].size = calculate_page_aligned_size(rpra[i].buf.pv, len);
> +
> + *args = *args + mlen;
> + *rlen -= mlen;
> +
> + if (i < inbufs) {
> + void *dst = (void *)(uintptr_t)rpra[i].buf.pv;
> + void *src = (void *)(uintptr_t)ctx->args[i].ptr;
> +
> + /*
> + * For user-space invocations (INVOKE_DYNAMIC), ptr is a user
> + * virtual address and must be copied safely. For all other
> + * (kernel-internal) invocations, ptr is a kernel address set
> + * by the driver itself and can be copied directly.
> + */
> + if (ctx->type == FASTRPC_RMID_INVOKE_DYNAMIC) {
> + if (copy_from_user(dst, (void __user *)src, len))
> + return -EFAULT;
> + } else {
> + memcpy(dst, src, len);
> + }
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * process_dma_handle() - Handle a DMA-handle scalar argument
> + *
> + * args[i].fd is a DMA-BUF fd. We import it to get the physical page
> + * descriptor for the kernel, but forward the original DMA-BUF fd to the
> + * DSP in rpra[i].dma.fd so the DSP can identify the buffer by its fd.
> + */
> +static int process_dma_handle(struct fastrpc_invoke_context *ctx, int i,
> + union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages)
> +{
> + if (ctx->args[i].fd > 0) {
> + struct drm_gem_object *gem_obj;
> + struct qda_gem_obj *qda_gem_obj;
> + int err;
> +
> + err = get_gem_obj_from_dmabuf_fd(ctx, ctx->args[i].fd, &gem_obj);
> + if (err)
> + return err;
> +
> + ctx->gem_objs[i] = gem_obj;
> + qda_gem_obj = to_qda_gem_obj(gem_obj);
> +
> + setup_pages_from_gem_obj(qda_gem_obj, &pages[i]);
> +
> + /* Forward the original DMA-BUF fd to the DSP */
> + rpra[i].dma.fd = ctx->args[i].fd;
> + rpra[i].dma.len = ctx->args[i].length;
> + rpra[i].dma.offset = (u64)ctx->args[i].ptr;
> + } else {
> + rpra[i].buf.pv = ctx->args[i].ptr;
> + rpra[i].buf.len = ctx->args[i].length;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * qda_fastrpc_get_header_size() - Compute the FastRPC message header size
> + * @ctx: FastRPC invocation context
> + * @out_size: Pointer to store the aligned packet size in bytes
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_fastrpc_get_header_size(struct fastrpc_invoke_context *ctx, size_t *out_size)
> +{
> + ctx->inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
> + ctx->metalen = fastrpc_get_meta_size(ctx);
> + ctx->pkt_size = fastrpc_get_payload_size(ctx, ctx->metalen);
> +
> + ctx->aligned_pkt_size = PAGE_ALIGN(ctx->pkt_size);
> + if (ctx->aligned_pkt_size == 0)
> + return -EINVAL;
> +
> + *out_size = ctx->aligned_pkt_size;
> + return 0;
> +}
> +
> +static int fastrpc_get_args(struct fastrpc_invoke_context *ctx)
> +{
> + union fastrpc_remote_arg *rpra;
> + struct fastrpc_invoke_buf *list;
> + struct fastrpc_phy_page *pages;
> + int i, oix, err = 0;
> + u64 rlen;
> + uintptr_t args;
> + size_t hdr_size;
> +
> + ctx->inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
> + err = qda_fastrpc_get_header_size(ctx, &hdr_size);
> + if (err)
> + return err;
> +
> + ctx->msg->buf = ctx->msg_gem_obj->virt;
> + ctx->msg->phys = ctx->msg_gem_obj->dma_addr;
> +
> + memset(ctx->msg->buf, 0, ctx->aligned_pkt_size);
> +
> + rpra = (union fastrpc_remote_arg *)ctx->msg->buf;
> + ctx->list = fastrpc_invoke_buf_start(rpra, ctx->nscalars);
> + ctx->pages = fastrpc_phy_page_start(ctx->list, ctx->nscalars);
> + list = ctx->list;
> + pages = ctx->pages;
> + args = (uintptr_t)ctx->msg->buf + ctx->metalen;
> + rlen = ctx->pkt_size - ctx->metalen;
> + ctx->rpra = rpra;
> +
> + for (oix = 0; oix < ctx->nbufs; ++oix) {
> + i = ctx->olaps[oix].raix;
> +
> + rpra[i].buf.pv = 0;
> + rpra[i].buf.len = ctx->args[i].length;
> + list[i].num = ctx->args[i].length ? 1 : 0;
> + list[i].pgidx = i;
> +
> + if (!ctx->args[i].length)
> + continue;
> +
> + if (ctx->args[i].fd > 0)
> + err = process_fd_buffer(ctx, i, rpra, pages);
> + else
> + err = process_direct_buffer(ctx, i, oix, rpra, pages, &args, &rlen,
> + ctx->pkt_size);
> +
> + if (err)
> + goto bail_gem;
> + }
> +
> + for (i = ctx->nbufs; i < ctx->nscalars; ++i) {
> + list[i].num = ctx->args[i].length ? 1 : 0;
> + list[i].pgidx = i;
> +
> + err = process_dma_handle(ctx, i, rpra, pages);
> + if (err)
> + goto bail_gem;
> + }
> +
> + return 0;
> +
> +bail_gem:
> + if (ctx->msg_gem_obj) {
> + drm_gem_object_put(&ctx->msg_gem_obj->base);
> + ctx->msg_gem_obj = NULL;
> + }
> +
> + return err;
> +}
> +
> +static int fastrpc_put_args(struct fastrpc_invoke_context *ctx, struct qda_msg *msg)
> +{
> + union fastrpc_remote_arg *rpra;
> + int i, err = 0;
> +
> + if (!ctx)
> + return -EINVAL;
> +
> + rpra = ctx->rpra;
> + if (!rpra)
> + return -EINVAL;
> +
> + for (i = ctx->inbufs; i < ctx->nbufs; ++i) {
> + if (ctx->args[i].fd <= 0) {
> + void *src = (void *)(uintptr_t)rpra[i].buf.pv;
> + void *dst = (void *)(uintptr_t)ctx->args[i].ptr;
> + u64 len = rpra[i].buf.len;
> +
> + if (ctx->type == FASTRPC_RMID_INVOKE_DYNAMIC)
> + err = copy_to_user((void __user *)dst, src, len) ? -EFAULT : 0;
> + else
> + memcpy(dst, src, len);
> + if (err)
> + break;
> + }
> + }
> +
> + return err;
> +}
> +
> +/**
> + * qda_fastrpc_invoke_pack() - Pack an invocation context into a QDA message
> + * @ctx: FastRPC invocation context
> + * @msg: QDA message structure to pack into
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_fastrpc_invoke_pack(struct fastrpc_invoke_context *ctx,
> + struct qda_msg *msg)
> +{
> + int err = 0;
> +
> + if (ctx->handle == FASTRPC_INIT_HANDLE)
> + msg->fastrpc.remote_session_id = 0;
> + else
> + msg->fastrpc.remote_session_id = ctx->remote_session_id;
> +
> + ctx->msg = msg;
> +
> + err = fastrpc_get_args(ctx);
> + if (err)
> + return err;
> +
> + dma_wmb();
> +
> + msg->fastrpc.tid = ctx->pid;
> + msg->fastrpc.ctx = ctx->ctxid | ctx->pd;
> + msg->fastrpc.handle = ctx->handle;
> + msg->fastrpc.sc = ctx->sc;
> + msg->fastrpc.addr = ctx->msg->phys;
> + msg->fastrpc.size = roundup(ctx->pkt_size, PAGE_SIZE);
> + msg->fastrpc_ctx = ctx;
> + msg->file_priv = ctx->file_priv;
> +
> + return 0;
> +}
> +
> +/**
> + * qda_fastrpc_invoke_unpack() - Unpack a response message into an invocation context
> + * @ctx: FastRPC invocation context
> + * @msg: QDA message structure to unpack from
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx,
> + struct qda_msg *msg)
> +{
> + int err;
> +
> + dma_rmb();
> +
> + err = fastrpc_put_args(ctx, msg);
> + if (err)
> + return err;
> +
> + err = ctx->retval;
> + return err;
> +}
> +
> +static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
> +{
> + struct drm_qda_invoke_args invoke_args;
> + struct drm_qda_fastrpc_invoke_args *args = NULL;
> + u32 nscalars;
> +
> + /* argp is DRM ioctl data (kernel pointer); args pointer within it is user-space */
> + memcpy(&invoke_args, argp, sizeof(invoke_args));
> +
> + ctx->handle = invoke_args.handle;
> + ctx->sc = invoke_args.sc;
> +
> + nscalars = REMOTE_SCALARS_LENGTH(ctx->sc);
> + if (!nscalars) {
> + ctx->args = NULL;
> + return 0;
> + }
> +
> + args = kcalloc(nscalars, sizeof(*args), GFP_KERNEL);
> + if (!args)
> + return -ENOMEM;
> +
> + if (copy_from_user(args, u64_to_user_ptr(invoke_args.args),
> + nscalars * sizeof(*args))) {
> + kfree(args);
> + return -EFAULT;
> + }
> +
> + ctx->args = args;
> + return 0;
> +}
> +
> +/**
> + * qda_fastrpc_prepare_args() - Prepare arguments for a FastRPC invocation
> + * @ctx: FastRPC invocation context
> + * @argp: User-space pointer to invocation arguments
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp)
> +{
> + int err;
> +
> + switch (ctx->type) {
> + case FASTRPC_RMID_INVOKE_DYNAMIC:
> + err = fastrpc_prepare_args_invoke(ctx, argp);
> + break;
> + default:
> + return -EINVAL;
> + }
> + if (err)
> + return err;
> +
> + ctx->nscalars = REMOTE_SCALARS_LENGTH(ctx->sc);
> + ctx->nbufs = REMOTE_SCALARS_INBUFS(ctx->sc) + REMOTE_SCALARS_OUTBUFS(ctx->sc);
> +
> + if (ctx->nscalars) {
> + ctx->gem_objs = kcalloc(ctx->nscalars, sizeof(*ctx->gem_objs), GFP_KERNEL);
> + if (!ctx->gem_objs)
> + return -ENOMEM;
> + ctx->olaps = kcalloc(ctx->nscalars, sizeof(*ctx->olaps), GFP_KERNEL);
> + if (!ctx->olaps) {
> + kfree(ctx->gem_objs);
> + ctx->gem_objs = NULL;
> + return -ENOMEM;
> + }
> + fastrpc_get_buff_overlaps(ctx);
> + }
> +
> + return err;
> +}
> diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
> new file mode 100644
> index 000000000000..ce77baeccfba
> --- /dev/null
> +++ b/drivers/accel/qda/qda_fastrpc.h
> @@ -0,0 +1,271 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_FASTRPC_H__
> +#define __QDA_FASTRPC_H__
> +
> +#include <linux/completion.h>
> +#include <linux/kref.h>
> +#include <linux/list.h>
> +#include <linux/types.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +#include <drm/qda_accel.h>
> +
> +/* Forward declarations */
> +struct qda_gem_obj;
> +
> +/*
> + * FastRPC scalar extraction macros
> + *
> + * These macros extract different fields from the scalar value that describes
> + * the arguments passed in a FastRPC invocation.
> + */
> +#define REMOTE_SCALARS_INBUFS(sc) (((sc) >> 16) & 0x0ff)
> +#define REMOTE_SCALARS_OUTBUFS(sc) (((sc) >> 8) & 0x0ff)
> +#define REMOTE_SCALARS_INHANDLES(sc) (((sc) >> 4) & 0x0f)
> +#define REMOTE_SCALARS_OUTHANDLES(sc) ((sc) & 0x0f)
> +#define REMOTE_SCALARS_LENGTH(sc) (REMOTE_SCALARS_INBUFS(sc) + \
> + REMOTE_SCALARS_OUTBUFS(sc) + \
> + REMOTE_SCALARS_INHANDLES(sc) + \
> + REMOTE_SCALARS_OUTHANDLES(sc))
> +
> +/* FastRPC configuration constants */
> +#define FASTRPC_ALIGN 128 /* Alignment requirement */
> +#define FASTRPC_MAX_FDLIST 16 /* Maximum file descriptors */
> +#define FASTRPC_MAX_CRCLIST 64 /* Maximum CRC list entries */
> +
> +/*
> + * FastRPC scalar construction macros
> + *
> + * These macros build the scalar value that describes the arguments
> + * for a FastRPC invocation.
> + */
> +#define FASTRPC_BUILD_SCALARS(attr, method, in, out, oin, oout) \
> + (((attr & 0x07) << 29) | \
> + ((method & 0x1f) << 24) | \
> + ((in & 0xff) << 16) | \
> + ((out & 0xff) << 8) | \
> + ((oin & 0x0f) << 4) | \
> + (oout & 0x0f))
> +
> +#define FASTRPC_SCALARS(method, in, out) \
> + FASTRPC_BUILD_SCALARS(0, method, in, out, 0, 0)
> +
> +/**
> + * struct fastrpc_buf_overlap - Buffer overlap tracking structure
> + *
> + * Tracks overlapping buffer regions to optimise memory mapping and avoid
> + * redundant mappings of the same physical memory.
WHat for? Even if this is a valid optimization, implement it as a
subsequent patch. The first goal should be very simple - get GEM buffers
from the app, pass them to the DSP, read the results.
> + */
> +struct fastrpc_buf_overlap {
Stop clashing the names with the existing fastrpc driver.
> + /** @start: Start address of the buffer in user virtual address space */
> + u64 start;
> + /** @end: End address of the buffer in user virtual address space */
> + u64 end;
> + /** @raix: Remote argument index associated with this overlap */
> + int raix;
> + /** @mstart: Start address of the mapped region */
> + u64 mstart;
> + /** @mend: End address of the mapped region */
> + u64 mend;
> + /** @offset: Offset within the mapped region */
> + u64 offset;
> +};
> +
> +/**
> + * struct fastrpc_remote_dmahandle - Remote DMA handle descriptor
> + */
> +struct fastrpc_remote_dmahandle {
> + /** @fd: DMA-BUF file descriptor */
> + s32 fd;
> + /** @offset: Byte offset within the DMA-BUF */
> + u32 offset;
> + /** @len: Length of the region in bytes */
> + u32 len;
> +};
> +
> +/**
> + * struct fastrpc_remote_buf - Remote buffer descriptor
> + */
> +struct fastrpc_remote_buf {
> + /** @pv: Buffer pointer (user virtual address) */
> + u64 pv;
> + /** @len: Length of the buffer in bytes */
> + u64 len;
> +};
> +
> +/**
> + * union fastrpc_remote_arg - Remote argument (buffer or DMA handle)
> + */
> +union fastrpc_remote_arg {
> + /** @buf: Inline buffer descriptor */
> + struct fastrpc_remote_buf buf;
> + /** @dma: DMA-BUF handle descriptor */
> + struct fastrpc_remote_dmahandle dma;
> +};
> +
> +/**
> + * struct fastrpc_phy_page - Physical page descriptor
> + */
> +struct fastrpc_phy_page {
> + /** @addr: Physical (IOMMU) address of the page */
> + u64 addr;
> + /** @size: Size of the contiguous region in bytes */
> + u64 size;
> +};
> +
> +/**
> + * struct fastrpc_invoke_buf - Invoke buffer descriptor
> + */
> +struct fastrpc_invoke_buf {
> + /** @num: Number of contiguous physical regions */
> + u32 num;
> + /** @pgidx: Index into the physical page array */
> + u32 pgidx;
> +};
> +
> +/**
> + * struct fastrpc_msg - FastRPC wire message for remote invocations
> + *
> + * Sent to the remote processor via RPMsg. This is the exact layout
> + * the DSP expects; do not reorder or add fields without DSP firmware
> + * coordination.
> + */
> +struct fastrpc_msg {
> + /** @remote_session_id: Session identifier on the remote processor */
> + int remote_session_id;
> + /** @tid: Thread ID of the invoking thread */
> + int tid;
> + /** @ctx: Context identifier for matching request/response */
> + u64 ctx;
> + /** @handle: Handle of the remote method to invoke */
> + u32 handle;
> + /** @sc: Scalars value encoding in/out buffer counts */
> + u32 sc;
> + /** @addr: Physical address of the message payload buffer */
> + u64 addr;
> + /** @size: Size of the message payload in bytes */
> + u64 size;
> +};
> +
> +/**
> + * struct qda_msg - FastRPC message with kernel-internal bookkeeping
> + *
> + * The wire-format portion is kept in the embedded @fastrpc member (must
> + * be first) so that &qda_msg->fastrpc can be passed directly to
> + * rpmsg_send() without a copy.
> + */
> +struct qda_msg {
> + /**
> + * @fastrpc: Wire-format message sent to the DSP via RPMsg.
> + * Must be the first member.
> + */
> + struct fastrpc_msg fastrpc;
> + /** @buf: Kernel virtual address of the payload buffer */
> + void *buf;
> + /** @phys: Physical/DMA address of the payload buffer */
> + u64 phys;
> + /** @ret: Return value from the remote processor */
> + int ret;
> + /** @fastrpc_ctx: Back-pointer to the owning invocation context */
> + struct fastrpc_invoke_context *fastrpc_ctx;
> + /** @file_priv: DRM file private data for GEM object lookup */
> + struct drm_file *file_priv;
> +};
> +
> +/**
> + * struct fastrpc_invoke_context - Remote procedure call invocation context
> + *
> + * Maintains all state for a single remote procedure call, including buffer
> + * management, synchronisation, and result handling.
> + */
> +struct fastrpc_invoke_context {
> + /** @node: List node for linking contexts in a queue */
> + struct list_head node;
> + /** @ctxid: Unique context identifier (XArray key shifted left by 4) */
> + u64 ctxid;
> + /** @inbufs: Number of input buffers */
> + int inbufs;
> + /** @outbufs: Number of output buffers */
> + int outbufs;
> + /** @handles: Number of DMA-BUF handle arguments */
> + int handles;
> + /** @nscalars: Total number of scalar arguments */
> + int nscalars;
> + /** @nbufs: Total number of buffer arguments (inbufs + outbufs) */
> + int nbufs;
If it is inbufs + outbufs, why do you need it here?
> + /** @pid: Process ID of the calling process */
> + int pid;
> + /** @retval: Return value from the remote invocation */
> + int retval;
> + /** @metalen: Length of the FastRPC metadata header in bytes */
> + int metalen;
size_t, also why do you need it?
> + /** @remote_session_id: Session identifier on the remote processor */
> + int remote_session_id;
> + /** @pd: Protection domain identifier encoded into the context ID */
> + int pd;
> + /** @type: Invocation type (e.g. FASTRPC_RMID_INVOKE_DYNAMIC) */
> + int type;
> + /** @sc: Scalars value encoding in/out buffer counts */
> + u32 sc;
How is this different from the counts above?
> + /** @handle: Handle of the remote method being invoked */
> + u32 handle;
> + /** @crc: Pointer to CRC values for data integrity checking */
> + u32 *crc;
Add it later. It's unused. Drop all unused fields.
> + /** @fdlist: Pointer to array of DMA-BUF file descriptors */
> + u64 *fdlist;
Why do you need DMA-BUFs in the invocation context? They all should be
GEM buffers.
> + /** @pkt_size: Total payload size in bytes */
> + u64 pkt_size;
> + /** @aligned_pkt_size: Page-aligned payload size for GEM allocation */
> + u64 aligned_pkt_size;
> + /** @list: Array of invoke buffer descriptors */
> + struct fastrpc_invoke_buf *list;
> + /** @pages: Array of physical page descriptors for all arguments */
> + struct fastrpc_phy_page *pages;
> + /** @input_pages: Array of physical page descriptors for input buffers */
> + struct fastrpc_phy_page *input_pages;
I think you are trying to bring all the complexity from the old driver
with no added benefit. Please don't. Use the existing memory manager.
Let it handle all the gory details. If someting is not there, we should
consider extending GEM instead.
> + /** @work: Completion used to synchronise with the DSP response */
> + struct completion work;
> + /** @msg: Pointer to the QDA message structure for this invocation */
> + struct qda_msg *msg;
> + /** @rpra: Array of remote procedure arguments */
> + union fastrpc_remote_arg *rpra;
> + /** @gem_objs: Array of GEM objects imported for argument buffers */
> + struct drm_gem_object **gem_objs;
> + /** @args: User-space invoke argument descriptors */
> + struct drm_qda_fastrpc_invoke_args *args;
> + /** @olaps: Array of buffer overlap descriptors for deduplication */
> + struct fastrpc_buf_overlap *olaps;
> + /** @refcount: Reference counter for context lifetime management */
> + struct kref refcount;
> + /** @msg_gem_obj: GEM object backing the message payload buffer */
> + struct qda_gem_obj *msg_gem_obj;
> + /** @file_priv: DRM file private data */
> + struct drm_file *file_priv;
> + /** @init_mem_gem_obj: GEM object for protection domain init memory */
> + struct qda_gem_obj *init_mem_gem_obj;
> + /** @req: Pointer to kernel-internal request buffer */
> + void *req;
> + /** @rsp: Pointer to kernel-internal response buffer */
> + void *rsp;
> + /** @inbuf: Pointer to kernel-internal input buffer */
> + void *inbuf;
> +};
> +
> +/* Remote Method ID table - identifies initialization and control operations */
> +#define FASTRPC_RMID_INVOKE_DYNAMIC 0xFFFFFFFF /* Dynamic method invocation */
> +
> +/* Common handle for initialization operations */
> +#define FASTRPC_INIT_HANDLE 0x1
> +
> +void qda_fastrpc_context_free(struct kref *ref);
> +struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
> +int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
> +int qda_fastrpc_get_header_size(struct fastrpc_invoke_context *ctx, size_t *out_size);
> +int qda_fastrpc_invoke_pack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
> +int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
> +
> +#endif /* __QDA_FASTRPC_H__ */
> diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
> index 1769c85a3e98..c81268c20b04 100644
> --- a/drivers/accel/qda/qda_ioctl.c
> +++ b/drivers/accel/qda/qda_ioctl.c
> @@ -3,8 +3,10 @@
> #include <drm/drm_ioctl.h>
> #include <drm/qda_accel.h>
> #include "qda_drv.h"
> +#include "qda_fastrpc.h"
> #include "qda_gem.h"
> #include "qda_ioctl.h"
> +#include "qda_rpmsg.h"
>
> /**
> * qda_ioctl_query() - Query DSP device information
> @@ -74,3 +76,105 @@ int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_fil
>
> return drm_gem_dumb_map_offset(file_priv, dev, args->handle, &args->offset);
> }
> +
> +static int fastrpc_context_get_id(struct fastrpc_invoke_context *ctx, struct qda_dev *qdev)
> +{
> + int ret;
> + u32 id;
> +
> + if (!qdev)
> + return -EINVAL;
> +
> + ret = xa_alloc(&qdev->ctx_xa, &id, ctx, xa_limit_32b, GFP_KERNEL);
> + if (ret)
> + return ret;
> +
> + ctx->ctxid = id << 4;
Why is it being shifted?
> + return 0;
> +}
> +
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH v6 09/43] KVM: Move kvm_supported_mem_attributes() to kvm_host.h
From: Fuad Tabba @ 2026-05-20 13:53 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-9-91ab5a8b19a4@google.com>
On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Move kvm_supported_mem_attributes() from kvm_main.c to kvm_host.h and
> make it a static inline function. This allows the helper to be used in
> other parts of the KVM subsystem outside of kvm_main.c. This helper will be
> used later by guest_memfd.
>
> No functional change intended.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> include/linux/kvm_host.h | 10 ++++++++++
> virt/kvm/kvm_main.c | 10 ----------
> 2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1deab76dc0a2c..f9ea95e33d050 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2529,6 +2529,16 @@ static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
> }
>
> #ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> +static inline u64 kvm_supported_mem_attributes(struct kvm *kvm)
> +{
> +#ifdef kvm_arch_has_private_mem
> + if (!kvm || kvm_arch_has_private_mem(kvm))
> + return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +#endif
> +
> + return 0;
> +}
> +
> typedef unsigned long (kvm_get_memory_attributes_t)(struct kvm *kvm, gfn_t gfn);
> DECLARE_STATIC_CALL(__kvm_get_memory_attributes, kvm_get_memory_attributes_t);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0a4024948711a..ff20e63143642 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2428,16 +2428,6 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
> #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
> #ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> -static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> -{
> -#ifdef kvm_arch_has_private_mem
> - if (!kvm || kvm_arch_has_private_mem(kvm))
> - return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> -#endif
> -
> - return 0;
> -}
> -
> #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> {
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply
* Re: [PATCH v6 08/43] KVM: guest_memfd: Only prepare folios for private pages
From: Fuad Tabba @ 2026-05-20 13:51 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-8-91ab5a8b19a4@google.com>
On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> All-shared guest_memfd used to be only supported for non-CoCo VMs where
> preparation doesn't apply. INIT_SHARED is about to be supported for
> non-CoCo VMs in a later patch in this series.
>
> In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
> guest_memfd in a later patch in this series.
>
> This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
> shared folio for a CoCo VM where preparation applies.
>
> Add a check to make sure that preparation is only performed for private
> folios.
>
> Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
> conversion to shared.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> virt/kvm/guest_memfd.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9d025f518c025..4f7c4824c3a45 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -888,6 +888,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> int *max_order)
> {
> pgoff_t index = kvm_gmem_get_index(slot, gfn);
> + struct inode *inode;
> struct folio *folio;
> int r = 0;
>
> @@ -895,7 +896,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> if (!file)
> return -EFAULT;
>
> - filemap_invalidate_lock_shared(file_inode(file)->i_mapping);
> + inode = file_inode(file);
> + filemap_invalidate_lock_shared(inode->i_mapping);
>
> folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order);
> if (IS_ERR(folio)) {
> @@ -908,7 +910,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> folio_mark_uptodate(folio);
> }
>
> - r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
> + if (kvm_gmem_is_private_mem(inode, index))
> + r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>
> folio_unlock(folio);
>
> @@ -918,7 +921,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> folio_put(folio);
>
> out:
> - filemap_invalidate_unlock_shared(file_inode(file)->i_mapping);
> + filemap_invalidate_unlock_shared(inode->i_mapping);
> return r;
> }
> EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox