[PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
@ 2024-12-17 10:07 Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
                   ` (7 more replies)
  0 siblings, 8 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Hi,

This patch set allocates the restricted DMA-bufs via the TEE subsystem.

The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
restrictions for the memory used for the DMA-bufs.

I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
how to allocate the restricted physical memory.

TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
a use-case parameter. This is used by the backend TEE driver to decide on
allocation policy and which devices should be able to access the memory.

Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) has been identified so far to serve as examples of what can be
expected. More use-cases can be added in userspace ABI, but it's up to the
backend TEE drivers to provide the implementation.

Each use-case has it's own restricted memory pool since different use-cases
requires isolation from different parts of the system. A restricted memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA and made restricted as
needed by the TEE.

This can be tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
        -b prototype/sdp-v4
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic

The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%rm -rf ../trusted-firmware-a/build/qemu
for TF-A to be rebuilt properly using the new configuration.

https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.

The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory. There are also some
negative tests for out of bounds buffers etc.

Thanks,
Jens

Changes since V3:
* Make the use_case and flags field in struct tee_shm u32's instead of
  u16's
* Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
* Import namespace DMA_BUF in module tee, reported by lkp@intel.com
* Added a note in the commit message for "optee: account for direction
  while converting parameters" why it's needed
* Factor out dynamic restricted memory allocation from
  "optee: support restricted memory allocation" into two new commits
  "optee: FF-A: dynamic restricted memory allocation" and
  "optee: smc abi: dynamic restricted memory allocation"
* Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
  restricted memory allocate if CMA isn't configured

Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
  unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND

Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC

Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
  the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
  support"
* Replaced the word "secure" with "restricted" where applicable

Jens Wiklander (6):
  tee: add restricted memory allocation
  optee: account for direction while converting parameters
  optee: sync secure world ABI headers
  optee: support restricted memory allocation
  optee: FF-A: dynamic restricted memory allocation
  optee: smc abi: dynamic restricted memory allocation

 drivers/tee/Makefile              |   1 +
 drivers/tee/optee/Makefile        |   1 +
 drivers/tee/optee/call.c          |  10 +-
 drivers/tee/optee/core.c          |   1 +
 drivers/tee/optee/ffa_abi.c       | 178 +++++++++++++-
 drivers/tee/optee/optee_ffa.h     |  27 ++-
 drivers/tee/optee/optee_msg.h     |  65 ++++-
 drivers/tee/optee/optee_private.h |  75 ++++--
 drivers/tee/optee/optee_smc.h     |  71 +++++-
 drivers/tee/optee/rpc.c           |  31 ++-
 drivers/tee/optee/rstmem.c        | 388 ++++++++++++++++++++++++++++++
 drivers/tee/optee/smc_abi.c       | 213 ++++++++++++++--
 drivers/tee/tee_core.c            |  38 ++-
 drivers/tee/tee_private.h         |   2 +
 drivers/tee/tee_rstmem.c          | 201 ++++++++++++++++
 drivers/tee/tee_shm.c             |   2 +
 drivers/tee/tee_shm_pool.c        |  69 +++++-
 include/linux/tee_core.h          |  15 ++
 include/linux/tee_drv.h           |   2 +
 include/uapi/linux/tee.h          |  44 +++-
 20 files changed, 1358 insertions(+), 76 deletions(-)
 create mode 100644 drivers/tee/optee/rstmem.c
 create mode 100644 drivers/tee/tee_rstmem.c

base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
-- 
2.43.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/6] tee: add restricted memory allocation
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2025-01-08 16:54   ` Simona Vetter
  2025-02-13  8:44   ` Boris Brezillon
  2024-12-17 10:07 ` [PATCH v4 2/6] optee: account for direction while converting parameters Jens Wiklander
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Add restricted memory allocation to the TEE subsystem.

Restricted memory refers to memory buffers behind a hardware enforced
firewall. It is not accessible to the kernel during normal circumstances
but rather only accessible to certain hardware IPs or CPUs executing in
higher privileged mode than the kernel itself. This interface allows to
allocate and manage such restricted memory buffers via interaction with
a TEE implementation.

A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted
memory buffers.

The restricted memory is allocated for a specific use-case, like Secure
Video Playback, Trusted UI, or Secure Video Recording where certain
hardware devices can access the memory.

More use-cases can be added in userspace ABI, but it's up to the backend
drivers to provide the implementation.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/Makefile       |   1 +
 drivers/tee/tee_core.c     |  38 ++++++-
 drivers/tee/tee_private.h  |   2 +
 drivers/tee/tee_rstmem.c   | 201 +++++++++++++++++++++++++++++++++++++
 drivers/tee/tee_shm.c      |   2 +
 drivers/tee/tee_shm_pool.c |  69 ++++++++++++-
 include/linux/tee_core.h   |  15 +++
 include/linux/tee_drv.h    |   2 +
 include/uapi/linux/tee.h   |  44 +++++++-
 9 files changed, 370 insertions(+), 4 deletions(-)
 create mode 100644 drivers/tee/tee_rstmem.c

diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile
index 5488cba30bd2..a4c6b55444b9 100644
--- a/drivers/tee/Makefile
+++ b/drivers/tee/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o
 tee-objs += tee_core.o
 tee-objs += tee_shm.o
 tee-objs += tee_shm_pool.o
+tee-objs += tee_rstmem.o
 obj-$(CONFIG_OPTEE) += optee/
 obj-$(CONFIG_AMDTEE) += amdtee/
 obj-$(CONFIG_ARM_TSTEE) += tstee/
diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
index d113679b1e2d..f4a45b77753b 100644
--- a/drivers/tee/tee_core.c
+++ b/drivers/tee/tee_core.c
@@ -1,12 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2015-2016, Linaro Limited
+ * Copyright (c) 2015-2022, 2024, Linaro Limited
  */
 
 #define pr_fmt(fmt) "%s: " fmt, __func__
 
 #include <linux/cdev.h>
 #include <linux/cred.h>
+#include <linux/dma-buf.h>
 #include <linux/fs.h>
 #include <linux/idr.h>
 #include <linux/module.h>
@@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx,
 	return rc;
 }
 
+static int
+tee_ioctl_rstmem_alloc(struct tee_context *ctx,
+		       struct tee_ioctl_rstmem_alloc_data __user *udata)
+{
+	struct tee_ioctl_rstmem_alloc_data data;
+	struct dma_buf *dmabuf;
+	int id;
+	int fd;
+
+	if (copy_from_user(&data, udata, sizeof(data)))
+		return -EFAULT;
+
+	if (data.use_case == TEE_IOC_UC_RESERVED)
+		return -EINVAL;
+
+	dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
+				  &id);
+	if (IS_ERR(dmabuf))
+		return PTR_ERR(dmabuf);
+	if (put_user(id, &udata->id)) {
+		fd = -EFAULT;
+		goto err;
+	}
+	fd = dma_buf_fd(dmabuf, O_CLOEXEC);
+	if (fd < 0)
+		goto err;
+	return fd;
+err:
+	dma_buf_put(dmabuf);
+	return fd;
+}
+
 static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	struct tee_context *ctx = filp->private_data;
@@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		return tee_ioctl_supp_recv(ctx, uarg);
 	case TEE_IOC_SUPPL_SEND:
 		return tee_ioctl_supp_send(ctx, uarg);
+	case TEE_IOC_RSTMEM_ALLOC:
+		return tee_ioctl_rstmem_alloc(ctx, uarg);
 	default:
 		return -EINVAL;
 	}
@@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro");
 MODULE_DESCRIPTION("TEE Driver");
 MODULE_VERSION("1.0");
 MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS("DMA_BUF");
diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
index 9bc50605227c..bf97796909c0 100644
--- a/drivers/tee/tee_private.h
+++ b/drivers/tee/tee_private.h
@@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx);
 struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
 struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
 					  unsigned long addr, size_t length);
+struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
+				 u32 use_case, size_t size, int *shm_id);
 
 #endif /*TEE_PRIVATE_H*/
diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c
new file mode 100644
index 000000000000..536bca2901e2
--- /dev/null
+++ b/drivers/tee/tee_rstmem.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024 Linaro Limited
+ */
+#include <linux/device.h>
+#include <linux/dma-buf.h>
+#include <linux/genalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/tee_core.h>
+#include "tee_private.h"
+
+struct tee_rstmem_attachment {
+	struct sg_table table;
+	struct device *dev;
+};
+
+static int rstmem_dma_attach(struct dma_buf *dmabuf,
+			     struct dma_buf_attachment *attachment)
+{
+	struct tee_shm *shm = dmabuf->priv;
+	struct tee_rstmem_attachment *a;
+	int rc;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return -ENOMEM;
+
+	if (shm->pages) {
+		rc = sg_alloc_table_from_pages(&a->table, shm->pages,
+					       shm->num_pages, 0,
+					       shm->num_pages * PAGE_SIZE,
+					       GFP_KERNEL);
+		if (rc)
+			goto err;
+	} else {
+		rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
+		if (rc)
+			goto err;
+		sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
+			    0);
+	}
+
+	a->dev = attachment->dev;
+	attachment->priv = a;
+
+	return 0;
+err:
+	kfree(a);
+	return rc;
+}
+
+static void rstmem_dma_detach(struct dma_buf *dmabuf,
+			      struct dma_buf_attachment *attachment)
+{
+	struct tee_rstmem_attachment *a = attachment->priv;
+
+	sg_free_table(&a->table);
+	kfree(a);
+}
+
+static struct sg_table *
+rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
+		       enum dma_data_direction direction)
+{
+	struct tee_rstmem_attachment *a = attachment->priv;
+	int ret;
+
+	ret = dma_map_sgtable(attachment->dev, &a->table, direction,
+			      DMA_ATTR_SKIP_CPU_SYNC);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return &a->table;
+}
+
+static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
+				     struct sg_table *table,
+				     enum dma_data_direction direction)
+{
+	struct tee_rstmem_attachment *a = attachment->priv;
+
+	WARN_ON(&a->table != table);
+
+	dma_unmap_sgtable(attachment->dev, table, direction,
+			  DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
+					   enum dma_data_direction direction)
+{
+	return -EPERM;
+}
+
+static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
+					 enum dma_data_direction direction)
+{
+	return -EPERM;
+}
+
+static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
+			       struct vm_area_struct *vma)
+{
+	return -EPERM;
+}
+
+static void rstmem_dma_buf_free(struct dma_buf *dmabuf)
+{
+	struct tee_shm *shm = dmabuf->priv;
+
+	tee_shm_put(shm);
+}
+
+static const struct dma_buf_ops rstmem_generic_buf_ops = {
+	.attach = rstmem_dma_attach,
+	.detach = rstmem_dma_detach,
+	.map_dma_buf = rstmem_dma_map_dma_buf,
+	.unmap_dma_buf = rstmem_dma_unmap_dma_buf,
+	.begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
+	.end_cpu_access = rstmem_dma_buf_end_cpu_access,
+	.mmap = rstmem_dma_buf_mmap,
+	.release = rstmem_dma_buf_free,
+};
+
+struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
+				 u32 use_case, size_t size, int *shm_id)
+{
+	struct tee_device *teedev = ctx->teedev;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	struct dma_buf *dmabuf;
+	struct tee_shm *shm;
+	void *ret;
+	int rc;
+
+	if (!tee_device_get(teedev))
+		return ERR_PTR(-EINVAL);
+
+	if (!teedev->desc->ops->rstmem_alloc ||
+	    !teedev->desc->ops->rstmem_free) {
+		dmabuf = ERR_PTR(-EINVAL);
+		goto err;
+	}
+
+	shm = kzalloc(sizeof(*shm), GFP_KERNEL);
+	if (!shm) {
+		dmabuf = ERR_PTR(-ENOMEM);
+		goto err;
+	}
+
+	refcount_set(&shm->refcount, 1);
+	shm->flags = TEE_SHM_RESTRICTED;
+	shm->use_case = use_case;
+	shm->ctx = ctx;
+
+	mutex_lock(&teedev->mutex);
+	shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
+	mutex_unlock(&teedev->mutex);
+	if (shm->id < 0) {
+		dmabuf = ERR_PTR(shm->id);
+		goto err_kfree;
+	}
+
+	rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
+	if (rc) {
+		dmabuf = ERR_PTR(rc);
+		goto err_idr_remove;
+	}
+
+	mutex_lock(&teedev->mutex);
+	ret = idr_replace(&teedev->idr, shm, shm->id);
+	mutex_unlock(&teedev->mutex);
+	if (IS_ERR(ret)) {
+		dmabuf = ret;
+		goto err_rstmem_free;
+	}
+	teedev_ctx_get(ctx);
+
+	exp_info.ops = &rstmem_generic_buf_ops;
+	exp_info.size = shm->size;
+	exp_info.priv = shm;
+	dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(dmabuf)) {
+		tee_shm_put(shm);
+		return dmabuf;
+	}
+
+	*shm_id = shm->id;
+	return dmabuf;
+
+err_rstmem_free:
+	teedev->desc->ops->rstmem_free(ctx, shm);
+err_idr_remove:
+	mutex_lock(&teedev->mutex);
+	idr_remove(&teedev->idr, shm->id);
+	mutex_unlock(&teedev->mutex);
+err_kfree:
+	kfree(shm);
+err:
+	tee_device_put(teedev);
+	return dmabuf;
+}
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index daf6e5cfd59a..416f7f25d885 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm)
 				"unregister shm %p failed: %d", shm, rc);
 
 		release_registered_pages(shm);
+	} else if (shm->flags & TEE_SHM_RESTRICTED) {
+		teedev->desc->ops->rstmem_free(shm->ctx, shm);
 	}
 
 	teedev_ctx_put(shm->ctx);
diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c
index 80004b55628d..ee57ef157a77 100644
--- a/drivers/tee/tee_shm_pool.c
+++ b/drivers/tee/tee_shm_pool.c
@@ -1,9 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2015, 2017, 2022 Linaro Limited
+ * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
  */
 #include <linux/device.h>
-#include <linux/dma-buf.h>
 #include <linux/genalloc.h>
 #include <linux/slab.h>
 #include <linux/tee_core.h>
@@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr,
 	return ERR_PTR(rc);
 }
 EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
+
+static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
+				    struct tee_shm *shm, size_t size,
+				    size_t align)
+{
+	size_t sz = ALIGN(size, PAGE_SIZE);
+	phys_addr_t pa;
+
+	pa = gen_pool_alloc(pool->private_data, sz);
+	if (!pa)
+		return -ENOMEM;
+
+	shm->size = sz;
+	shm->paddr = pa;
+
+	return 0;
+}
+
+static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
+				    struct tee_shm *shm)
+{
+	gen_pool_free(pool->private_data, shm->paddr, shm->size);
+	shm->paddr = 0;
+}
+
+static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
+	.alloc = rstmem_pool_op_gen_alloc,
+	.free = rstmem_pool_op_gen_free,
+	.destroy_pool = pool_op_gen_destroy_pool,
+};
+
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size)
+{
+	const size_t page_mask = PAGE_SIZE - 1;
+	struct tee_shm_pool *pool;
+	int rc;
+
+	/* Check it's page aligned */
+	if ((paddr | size) & page_mask)
+		return ERR_PTR(-EINVAL);
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return ERR_PTR(-ENOMEM);
+
+	pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
+	if (!pool->private_data) {
+		rc = -ENOMEM;
+		goto err_free;
+	}
+
+	rc = gen_pool_add(pool->private_data, paddr, size, -1);
+	if (rc)
+		goto err_free_pool;
+
+	pool->ops = &rstmem_pool_ops_generic;
+	return pool;
+
+err_free_pool:
+	gen_pool_destroy(pool->private_data);
+err_free:
+	kfree(pool);
+
+	return ERR_PTR(rc);
+}
+EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc);
diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
index a38494d6b5f4..608302f494fe 100644
--- a/include/linux/tee_core.h
+++ b/include/linux/tee_core.h
@@ -26,6 +26,7 @@
 #define TEE_SHM_USER_MAPPED	BIT(1)  /* Memory mapped in user space */
 #define TEE_SHM_POOL		BIT(2)  /* Memory allocated from pool */
 #define TEE_SHM_PRIV		BIT(3)  /* Memory private to TEE driver */
+#define TEE_SHM_RESTRICTED	BIT(4)  /* Restricted memory */
 
 #define TEE_DEVICE_FLAG_REGISTERED	0x1
 #define TEE_MAX_DEV_NAME_LEN		32
@@ -76,6 +77,8 @@ struct tee_device {
  * @supp_send:		called for supplicant to send a response
  * @shm_register:	register shared memory buffer in TEE
  * @shm_unregister:	unregister shared memory buffer in TEE
+ * @rstmem_alloc:	allocate restricted memory
+ * @rstmem_free:	free restricted memory
  */
 struct tee_driver_ops {
 	void (*get_version)(struct tee_device *teedev,
@@ -99,6 +102,9 @@ struct tee_driver_ops {
 			    struct page **pages, size_t num_pages,
 			    unsigned long start);
 	int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
+	int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
+			    u32 flags, u32 use_case, size_t size);
+	void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
 };
 
 /**
@@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool)
 	pool->ops->destroy_pool(pool);
 }
 
+/**
+ * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
+ * @paddr:	Physical address of start of pool
+ * @size:	Size in bytes of the pool
+ *
+ * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
+ */
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
+
 /**
  * tee_get_drvdata() - Return driver_data pointer
  * @returns the driver_data pointer supplied to tee_register().
diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
index a54c203000ed..cba067715d14 100644
--- a/include/linux/tee_drv.h
+++ b/include/linux/tee_drv.h
@@ -55,6 +55,7 @@ struct tee_context {
  * @pages:	locked pages from userspace
  * @num_pages:	number of locked pages
  * @refcount:	reference counter
+ * @use_case:	defined by TEE_IOC_UC_* in tee.h
  * @flags:	defined by TEE_SHM_* in tee_core.h
  * @id:		unique id of a shared memory object on this device, shared
  *		with user space
@@ -71,6 +72,7 @@ struct tee_shm {
 	struct page **pages;
 	size_t num_pages;
 	refcount_t refcount;
+	u32 use_case;
 	u32 flags;
 	int id;
 	u64 sec_world_id;
diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
index d0430bee8292..88834448debb 100644
--- a/include/uapi/linux/tee.h
+++ b/include/uapi/linux/tee.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015-2016, Linaro Limited
+ * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@
 #define TEE_GEN_CAP_PRIVILEGED	(1 << 1)/* Privileged device (for supplicant) */
 #define TEE_GEN_CAP_REG_MEM	(1 << 2)/* Supports registering shared memory */
 #define TEE_GEN_CAP_MEMREF_NULL	(1 << 3)/* NULL MemRef support */
+#define TEE_GEN_CAP_RSTMEM	(1 << 4)/* Supports restricted memory */
 
 #define TEE_MEMREF_NULL		(__u64)(-1) /* NULL MemRef Buffer */
 
@@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data {
  */
 #define TEE_IOC_SHM_REGISTER   _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \
 				     struct tee_ioctl_shm_register_data)
+
+#define TEE_IOC_UC_RESERVED		0
+#define TEE_IOC_UC_SECURE_VIDEO_PLAY	1
+#define TEE_IOC_UC_TRUSTED_UI		2
+#define TEE_IOC_UC_SECURE_VIDEO_RECORD	3
+
+/**
+ * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
+ * @size:	[in/out] Size of restricted memory to allocate
+ * @flags:	[in/out] Flags to/from allocate
+ * @use_case	[in] Restricted memory use case, TEE_IOC_UC_*
+ * @id:		[out] Identifier of the restricted memory
+ */
+struct tee_ioctl_rstmem_alloc_data {
+	__u64 size;
+	__u32 flags;
+	__u32 use_case;
+	__s32 id;
+};
+
+/**
+ * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
+ *
+ * Allocates restricted physically memory normally not accessible by the
+ * kernel.
+ *
+ * Restricted memory refers to memory buffers behind a hardware enforced
+ * firewall. It is not accessible to the kernel during normal circumstances
+ * but rather only accessible to certain hardware IPs or CPUs executing in
+ * higher privileged mode than the kernel itself. This interface allows to
+ * allocate and manage such restricted memory buffers via interaction with
+ * a TEE implementation.
+ *
+ * Returns a file descriptor on success or < 0 on failure
+ *
+ * The returned file descriptor is a dma-buf that can be attached and
+ * mapped for device with permission to access the physical memory.
+ */
+#define TEE_IOC_RSTMEM_ALLOC     _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
+				       struct tee_ioctl_rstmem_alloc_data)
+
 /*
  * Five syscalls are used when communicating with the TEE driver.
  * open(): opens the device associated with the driver
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/6] optee: account for direction while converting parameters
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 3/6] optee: sync secure world ABI headers Jens Wiklander
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

The OP-TEE backend driver has two internal function pointers to convert
between the subsystem type struct tee_param and the OP-TEE type struct
optee_msg_param.

The conversion is done from one of the types to the other, which is then
involved in some operation and finally converted back to the original
type. When converting to prepare the parameters for the operation, all
fields must be taken into account, but then converting back, it's enough
to update only out-values and out-sizes. So, an update_out parameter is
added to the conversion functions to tell if all or only some fields
must be copied.

This is needed in a later patch where it might get confusing when
converting back in from_msg_param() callback since an allocated
restricted SHM can be using the sec_world_id of the used restricted
memory pool and that doesn't translate back well.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/optee/call.c          | 10 ++--
 drivers/tee/optee/ffa_abi.c       | 43 +++++++++++++----
 drivers/tee/optee/optee_private.h | 42 +++++++++++------
 drivers/tee/optee/rpc.c           | 31 +++++++++----
 drivers/tee/optee/smc_abi.c       | 76 +++++++++++++++++++++++--------
 5 files changed, 144 insertions(+), 58 deletions(-)

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index 16eb953e14bb..f1533b894726 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -400,7 +400,8 @@ int optee_open_session(struct tee_context *ctx,
 	export_uuid(msg_arg->params[1].u.octets, &client_uuid);
 
 	rc = optee->ops->to_msg_param(optee, msg_arg->params + 2,
-				      arg->num_params, param);
+				      arg->num_params, param,
+				      false /*!update_out*/);
 	if (rc)
 		goto out;
 
@@ -427,7 +428,8 @@ int optee_open_session(struct tee_context *ctx,
 	}
 
 	if (optee->ops->from_msg_param(optee, param, arg->num_params,
-				       msg_arg->params + 2)) {
+				       msg_arg->params + 2,
+				       true /*update_out*/)) {
 		arg->ret = TEEC_ERROR_COMMUNICATION;
 		arg->ret_origin = TEEC_ORIGIN_COMMS;
 		/* Close session again to avoid leakage */
@@ -541,7 +543,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
 	msg_arg->cancel_id = arg->cancel_id;
 
 	rc = optee->ops->to_msg_param(optee, msg_arg->params, arg->num_params,
-				      param);
+				      param, false /*!update_out*/);
 	if (rc)
 		goto out;
 
@@ -551,7 +553,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
 	}
 
 	if (optee->ops->from_msg_param(optee, param, arg->num_params,
-				       msg_arg->params)) {
+				       msg_arg->params, true /*update_out*/)) {
 		msg_arg->ret = TEEC_ERROR_COMMUNICATION;
 		msg_arg->ret_origin = TEEC_ORIGIN_COMMS;
 	}
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index f3af5666bb11..02e6175ac5f0 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -122,15 +122,21 @@ static int optee_shm_rem_ffa_handle(struct optee *optee, u64 global_id)
  */
 
 static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p,
-				   u32 attr, const struct optee_msg_param *mp)
+				   u32 attr, const struct optee_msg_param *mp,
+				   bool update_out)
 {
 	struct tee_shm *shm = NULL;
 	u64 offs_high = 0;
 	u64 offs_low = 0;
 
+	if (update_out) {
+		if (attr == OPTEE_MSG_ATTR_TYPE_FMEM_INPUT)
+			return;
+		goto out;
+	}
+
 	p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT +
 		  attr - OPTEE_MSG_ATTR_TYPE_FMEM_INPUT;
-	p->u.memref.size = mp->u.fmem.size;
 
 	if (mp->u.fmem.global_id != OPTEE_MSG_FMEM_INVALID_GLOBAL_ID)
 		shm = optee_shm_from_ffa_handle(optee, mp->u.fmem.global_id);
@@ -141,6 +147,8 @@ static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p,
 		offs_high = mp->u.fmem.offs_high;
 	}
 	p->u.memref.shm_offs = offs_low | offs_high << 32;
+out:
+	p->u.memref.size = mp->u.fmem.size;
 }
 
 /**
@@ -150,12 +158,14 @@ static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p,
  * @params:	subsystem internal parameter representation
  * @num_params:	number of elements in the parameter arrays
  * @msg_params:	OPTEE_MSG parameters
+ * @update_out: update parameter for output only
  *
  * Returns 0 on success or <0 on failure
  */
 static int optee_ffa_from_msg_param(struct optee *optee,
 				    struct tee_param *params, size_t num_params,
-				    const struct optee_msg_param *msg_params)
+				    const struct optee_msg_param *msg_params,
+				    bool update_out)
 {
 	size_t n;
 
@@ -166,18 +176,20 @@ static int optee_ffa_from_msg_param(struct optee *optee,
 
 		switch (attr) {
 		case OPTEE_MSG_ATTR_TYPE_NONE:
+			if (update_out)
+				break;
 			p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE;
 			memset(&p->u, 0, sizeof(p->u));
 			break;
 		case OPTEE_MSG_ATTR_TYPE_VALUE_INPUT:
 		case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT:
 		case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT:
-			optee_from_msg_param_value(p, attr, mp);
+			optee_from_msg_param_value(p, attr, mp, update_out);
 			break;
 		case OPTEE_MSG_ATTR_TYPE_FMEM_INPUT:
 		case OPTEE_MSG_ATTR_TYPE_FMEM_OUTPUT:
 		case OPTEE_MSG_ATTR_TYPE_FMEM_INOUT:
-			from_msg_param_ffa_mem(optee, p, attr, mp);
+			from_msg_param_ffa_mem(optee, p, attr, mp, update_out);
 			break;
 		default:
 			return -EINVAL;
@@ -188,10 +200,16 @@ static int optee_ffa_from_msg_param(struct optee *optee,
 }
 
 static int to_msg_param_ffa_mem(struct optee_msg_param *mp,
-				const struct tee_param *p)
+				const struct tee_param *p, bool update_out)
 {
 	struct tee_shm *shm = p->u.memref.shm;
 
+	if (update_out) {
+		if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT)
+			return 0;
+		goto out;
+	}
+
 	mp->attr = OPTEE_MSG_ATTR_TYPE_FMEM_INPUT + p->attr -
 		   TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
 
@@ -211,6 +229,7 @@ static int to_msg_param_ffa_mem(struct optee_msg_param *mp,
 		memset(&mp->u, 0, sizeof(mp->u));
 		mp->u.fmem.global_id = OPTEE_MSG_FMEM_INVALID_GLOBAL_ID;
 	}
+out:
 	mp->u.fmem.size = p->u.memref.size;
 
 	return 0;
@@ -222,13 +241,15 @@ static int to_msg_param_ffa_mem(struct optee_msg_param *mp,
  * @optee:	main service struct
  * @msg_params:	OPTEE_MSG parameters
  * @num_params:	number of elements in the parameter arrays
- * @params:	subsystem itnernal parameter representation
+ * @params:	subsystem internal parameter representation
+ * @update_out: update parameter for output only
  * Returns 0 on success or <0 on failure
  */
 static int optee_ffa_to_msg_param(struct optee *optee,
 				  struct optee_msg_param *msg_params,
 				  size_t num_params,
-				  const struct tee_param *params)
+				  const struct tee_param *params,
+				  bool update_out)
 {
 	size_t n;
 
@@ -238,18 +259,20 @@ static int optee_ffa_to_msg_param(struct optee *optee,
 
 		switch (p->attr) {
 		case TEE_IOCTL_PARAM_ATTR_TYPE_NONE:
+			if (update_out)
+				break;
 			mp->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE;
 			memset(&mp->u, 0, sizeof(mp->u));
 			break;
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT:
-			optee_to_msg_param_value(mp, p);
+			optee_to_msg_param_value(mp, p, update_out);
 			break;
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
-			if (to_msg_param_ffa_mem(mp, p))
+			if (to_msg_param_ffa_mem(mp, p, update_out))
 				return -EINVAL;
 			break;
 		default:
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index dc0f355ef72a..20eda508dbac 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -185,10 +185,12 @@ struct optee_ops {
 				bool system_thread);
 	int (*to_msg_param)(struct optee *optee,
 			    struct optee_msg_param *msg_params,
-			    size_t num_params, const struct tee_param *params);
+			    size_t num_params, const struct tee_param *params,
+			    bool update_out);
 	int (*from_msg_param)(struct optee *optee, struct tee_param *params,
 			      size_t num_params,
-			      const struct optee_msg_param *msg_params);
+			      const struct optee_msg_param *msg_params,
+			      bool update_out);
 };
 
 /**
@@ -316,23 +318,35 @@ void optee_release(struct tee_context *ctx);
 void optee_release_supp(struct tee_context *ctx);
 
 static inline void optee_from_msg_param_value(struct tee_param *p, u32 attr,
-					      const struct optee_msg_param *mp)
+					      const struct optee_msg_param *mp,
+					      bool update_out)
 {
-	p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT +
-		  attr - OPTEE_MSG_ATTR_TYPE_VALUE_INPUT;
-	p->u.value.a = mp->u.value.a;
-	p->u.value.b = mp->u.value.b;
-	p->u.value.c = mp->u.value.c;
+	if (!update_out)
+		p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT +
+			  attr - OPTEE_MSG_ATTR_TYPE_VALUE_INPUT;
+
+	if (attr == OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT ||
+	    attr == OPTEE_MSG_ATTR_TYPE_VALUE_INOUT || !update_out) {
+		p->u.value.a = mp->u.value.a;
+		p->u.value.b = mp->u.value.b;
+		p->u.value.c = mp->u.value.c;
+	}
 }
 
 static inline void optee_to_msg_param_value(struct optee_msg_param *mp,
-					    const struct tee_param *p)
+					    const struct tee_param *p,
+					    bool update_out)
 {
-	mp->attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + p->attr -
-		   TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT;
-	mp->u.value.a = p->u.value.a;
-	mp->u.value.b = p->u.value.b;
-	mp->u.value.c = p->u.value.c;
+	if (!update_out)
+		mp->attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + p->attr -
+			   TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT;
+
+	if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT ||
+	    p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT || !update_out) {
+		mp->u.value.a = p->u.value.a;
+		mp->u.value.b = p->u.value.b;
+		mp->u.value.c = p->u.value.c;
+	}
 }
 
 void optee_cq_init(struct optee_call_queue *cq, int thread_count);
diff --git a/drivers/tee/optee/rpc.c b/drivers/tee/optee/rpc.c
index ebbbd42b0e3e..580e6b9b0606 100644
--- a/drivers/tee/optee/rpc.c
+++ b/drivers/tee/optee/rpc.c
@@ -63,7 +63,7 @@ static void handle_rpc_func_cmd_i2c_transfer(struct tee_context *ctx,
 	}
 
 	if (optee->ops->from_msg_param(optee, params, arg->num_params,
-				       arg->params))
+				       arg->params, false /*!update_out*/))
 		goto bad;
 
 	for (i = 0; i < arg->num_params; i++) {
@@ -107,7 +107,8 @@ static void handle_rpc_func_cmd_i2c_transfer(struct tee_context *ctx,
 	} else {
 		params[3].u.value.a = msg.len;
 		if (optee->ops->to_msg_param(optee, arg->params,
-					     arg->num_params, params))
+					     arg->num_params, params,
+					     true /*update_out*/))
 			arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		else
 			arg->ret = TEEC_SUCCESS;
@@ -188,6 +189,7 @@ static void handle_rpc_func_cmd_wait(struct optee_msg_arg *arg)
 static void handle_rpc_supp_cmd(struct tee_context *ctx, struct optee *optee,
 				struct optee_msg_arg *arg)
 {
+	bool update_out = false;
 	struct tee_param *params;
 
 	arg->ret_origin = TEEC_ORIGIN_COMMS;
@@ -200,15 +202,21 @@ static void handle_rpc_supp_cmd(struct tee_context *ctx, struct optee *optee,
 	}
 
 	if (optee->ops->from_msg_param(optee, params, arg->num_params,
-				       arg->params)) {
+				       arg->params, update_out)) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		goto out;
 	}
 
 	arg->ret = optee_supp_thrd_req(ctx, arg->cmd, arg->num_params, params);
 
+	/*
+	 * Special treatment for OPTEE_RPC_CMD_SHM_ALLOC since input is a
+	 * value type, but the output is a memref type.
+	 */
+	if (arg->cmd != OPTEE_RPC_CMD_SHM_ALLOC)
+		update_out = true;
 	if (optee->ops->to_msg_param(optee, arg->params, arg->num_params,
-				     params))
+				     params, update_out))
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 out:
 	kfree(params);
@@ -270,7 +278,7 @@ static void handle_rpc_func_rpmb_probe_reset(struct tee_context *ctx,
 
 	if (arg->num_params != ARRAY_SIZE(params) ||
 	    optee->ops->from_msg_param(optee, params, arg->num_params,
-				       arg->params) ||
+				       arg->params, false /*!update_out*/) ||
 	    params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		return;
@@ -280,7 +288,8 @@ static void handle_rpc_func_rpmb_probe_reset(struct tee_context *ctx,
 	params[0].u.value.b = 0;
 	params[0].u.value.c = 0;
 	if (optee->ops->to_msg_param(optee, arg->params,
-				     arg->num_params, params)) {
+				     arg->num_params, params,
+				     true /*update_out*/)) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		return;
 	}
@@ -324,7 +333,7 @@ static void handle_rpc_func_rpmb_probe_next(struct tee_context *ctx,
 
 	if (arg->num_params != ARRAY_SIZE(params) ||
 	    optee->ops->from_msg_param(optee, params, arg->num_params,
-				       arg->params) ||
+				       arg->params, false /*!update_out*/) ||
 	    params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT ||
 	    params[1].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
@@ -358,7 +367,8 @@ static void handle_rpc_func_rpmb_probe_next(struct tee_context *ctx,
 	params[0].u.value.b = rdev->descr.capacity;
 	params[0].u.value.c = rdev->descr.reliable_wr_count;
 	if (optee->ops->to_msg_param(optee, arg->params,
-				     arg->num_params, params)) {
+				     arg->num_params, params,
+				     true /*update_out*/)) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		return;
 	}
@@ -384,7 +394,7 @@ static void handle_rpc_func_rpmb_frames(struct tee_context *ctx,
 
 	if (arg->num_params != ARRAY_SIZE(params) ||
 	    optee->ops->from_msg_param(optee, params, arg->num_params,
-				       arg->params) ||
+				       arg->params, false /*!update_out*/) ||
 	    params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT ||
 	    params[1].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
@@ -401,7 +411,8 @@ static void handle_rpc_func_rpmb_frames(struct tee_context *ctx,
 		goto out;
 	}
 	if (optee->ops->to_msg_param(optee, arg->params,
-				     arg->num_params, params)) {
+				     arg->num_params, params,
+				     true /*update_out*/)) {
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
 		goto out;
 	}
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index 3e33cf2af73b..788919a473d6 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -81,20 +81,26 @@ static int optee_cpuhp_disable_pcpu_irq(unsigned int cpu)
  */
 
 static int from_msg_param_tmp_mem(struct tee_param *p, u32 attr,
-				  const struct optee_msg_param *mp)
+				  const struct optee_msg_param *mp,
+				  bool update_out)
 {
 	struct tee_shm *shm;
 	phys_addr_t pa;
 	int rc;
 
+	if (update_out) {
+		if (attr == OPTEE_MSG_ATTR_TYPE_TMEM_INPUT)
+			return 0;
+		goto out;
+	}
+
 	p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT +
 		  attr - OPTEE_MSG_ATTR_TYPE_TMEM_INPUT;
-	p->u.memref.size = mp->u.tmem.size;
 	shm = (struct tee_shm *)(unsigned long)mp->u.tmem.shm_ref;
 	if (!shm) {
 		p->u.memref.shm_offs = 0;
 		p->u.memref.shm = NULL;
-		return 0;
+		goto out;
 	}
 
 	rc = tee_shm_get_pa(shm, 0, &pa);
@@ -103,18 +109,25 @@ static int from_msg_param_tmp_mem(struct tee_param *p, u32 attr,
 
 	p->u.memref.shm_offs = mp->u.tmem.buf_ptr - pa;
 	p->u.memref.shm = shm;
-
+out:
+	p->u.memref.size = mp->u.tmem.size;
 	return 0;
 }
 
 static void from_msg_param_reg_mem(struct tee_param *p, u32 attr,
-				   const struct optee_msg_param *mp)
+				   const struct optee_msg_param *mp,
+				   bool update_out)
 {
 	struct tee_shm *shm;
 
+	if (update_out) {
+		if (attr == OPTEE_MSG_ATTR_TYPE_RMEM_INPUT)
+			return;
+		goto out;
+	}
+
 	p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT +
 		  attr - OPTEE_MSG_ATTR_TYPE_RMEM_INPUT;
-	p->u.memref.size = mp->u.rmem.size;
 	shm = (struct tee_shm *)(unsigned long)mp->u.rmem.shm_ref;
 
 	if (shm) {
@@ -124,6 +137,8 @@ static void from_msg_param_reg_mem(struct tee_param *p, u32 attr,
 		p->u.memref.shm_offs = 0;
 		p->u.memref.shm = NULL;
 	}
+out:
+	p->u.memref.size = mp->u.rmem.size;
 }
 
 /**
@@ -133,11 +148,13 @@ static void from_msg_param_reg_mem(struct tee_param *p, u32 attr,
  * @params:	subsystem internal parameter representation
  * @num_params:	number of elements in the parameter arrays
  * @msg_params:	OPTEE_MSG parameters
+ * @update_out:	update parameter for output only
  * Returns 0 on success or <0 on failure
  */
 static int optee_from_msg_param(struct optee *optee, struct tee_param *params,
 				size_t num_params,
-				const struct optee_msg_param *msg_params)
+				const struct optee_msg_param *msg_params,
+				bool update_out)
 {
 	int rc;
 	size_t n;
@@ -149,25 +166,27 @@ static int optee_from_msg_param(struct optee *optee, struct tee_param *params,
 
 		switch (attr) {
 		case OPTEE_MSG_ATTR_TYPE_NONE:
+			if (update_out)
+				break;
 			p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE;
 			memset(&p->u, 0, sizeof(p->u));
 			break;
 		case OPTEE_MSG_ATTR_TYPE_VALUE_INPUT:
 		case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT:
 		case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT:
-			optee_from_msg_param_value(p, attr, mp);
+			optee_from_msg_param_value(p, attr, mp, update_out);
 			break;
 		case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT:
 		case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT:
 		case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT:
-			rc = from_msg_param_tmp_mem(p, attr, mp);
+			rc = from_msg_param_tmp_mem(p, attr, mp, update_out);
 			if (rc)
 				return rc;
 			break;
 		case OPTEE_MSG_ATTR_TYPE_RMEM_INPUT:
 		case OPTEE_MSG_ATTR_TYPE_RMEM_OUTPUT:
 		case OPTEE_MSG_ATTR_TYPE_RMEM_INOUT:
-			from_msg_param_reg_mem(p, attr, mp);
+			from_msg_param_reg_mem(p, attr, mp, update_out);
 			break;
 
 		default:
@@ -178,20 +197,25 @@ static int optee_from_msg_param(struct optee *optee, struct tee_param *params,
 }
 
 static int to_msg_param_tmp_mem(struct optee_msg_param *mp,
-				const struct tee_param *p)
+				const struct tee_param *p, bool update_out)
 {
 	int rc;
 	phys_addr_t pa;
 
+	if (update_out) {
+		if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT)
+			return 0;
+		goto out;
+	}
+
 	mp->attr = OPTEE_MSG_ATTR_TYPE_TMEM_INPUT + p->attr -
 		   TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
 
 	mp->u.tmem.shm_ref = (unsigned long)p->u.memref.shm;
-	mp->u.tmem.size = p->u.memref.size;
 
 	if (!p->u.memref.shm) {
 		mp->u.tmem.buf_ptr = 0;
-		return 0;
+		goto out;
 	}
 
 	rc = tee_shm_get_pa(p->u.memref.shm, p->u.memref.shm_offs, &pa);
@@ -201,19 +225,27 @@ static int to_msg_param_tmp_mem(struct optee_msg_param *mp,
 	mp->u.tmem.buf_ptr = pa;
 	mp->attr |= OPTEE_MSG_ATTR_CACHE_PREDEFINED <<
 		    OPTEE_MSG_ATTR_CACHE_SHIFT;
-
+out:
+	mp->u.tmem.size = p->u.memref.size;
 	return 0;
 }
 
 static int to_msg_param_reg_mem(struct optee_msg_param *mp,
-				const struct tee_param *p)
+				const struct tee_param *p, bool update_out)
 {
+	if (update_out) {
+		if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT)
+			return 0;
+		goto out;
+	}
+
 	mp->attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT + p->attr -
 		   TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
 
 	mp->u.rmem.shm_ref = (unsigned long)p->u.memref.shm;
-	mp->u.rmem.size = p->u.memref.size;
 	mp->u.rmem.offs = p->u.memref.shm_offs;
+out:
+	mp->u.rmem.size = p->u.memref.size;
 	return 0;
 }
 
@@ -223,11 +255,13 @@ static int to_msg_param_reg_mem(struct optee_msg_param *mp,
  * @msg_params:	OPTEE_MSG parameters
  * @num_params:	number of elements in the parameter arrays
  * @params:	subsystem itnernal parameter representation
+ * @update_out:	update parameter for output only
  * Returns 0 on success or <0 on failure
  */
 static int optee_to_msg_param(struct optee *optee,
 			      struct optee_msg_param *msg_params,
-			      size_t num_params, const struct tee_param *params)
+			      size_t num_params, const struct tee_param *params,
+			      bool update_out)
 {
 	int rc;
 	size_t n;
@@ -238,21 +272,23 @@ static int optee_to_msg_param(struct optee *optee,
 
 		switch (p->attr) {
 		case TEE_IOCTL_PARAM_ATTR_TYPE_NONE:
+			if (update_out)
+				break;
 			mp->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE;
 			memset(&mp->u, 0, sizeof(mp->u));
 			break;
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT:
-			optee_to_msg_param_value(mp, p);
+			optee_to_msg_param_value(mp, p, update_out);
 			break;
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
 		case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
 			if (tee_shm_is_dynamic(p->u.memref.shm))
-				rc = to_msg_param_reg_mem(mp, p);
+				rc = to_msg_param_reg_mem(mp, p, update_out);
 			else
-				rc = to_msg_param_tmp_mem(mp, p);
+				rc = to_msg_param_tmp_mem(mp, p, update_out);
 			if (rc)
 				return rc;
 			break;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 3/6] optee: sync secure world ABI headers
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 2/6] optee: account for direction while converting parameters Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 4/6] optee: support restricted memory allocation Jens Wiklander
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Update the header files describing the secure world ABI, both with and
without FF-A. The ABI is extended to deal with restricted memory, but as
usual backward compatible.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/optee/optee_ffa.h | 27 ++++++++++---
 drivers/tee/optee/optee_msg.h | 65 ++++++++++++++++++++++++++++++--
 drivers/tee/optee/optee_smc.h | 71 ++++++++++++++++++++++++++++++++++-
 3 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/drivers/tee/optee/optee_ffa.h b/drivers/tee/optee/optee_ffa.h
index 257735ae5b56..7bd037200343 100644
--- a/drivers/tee/optee/optee_ffa.h
+++ b/drivers/tee/optee/optee_ffa.h
@@ -81,7 +81,7 @@
  *                   as the second MSG arg struct for
  *                   OPTEE_FFA_YIELDING_CALL_WITH_ARG.
  *        Bit[31:8]: Reserved (MBZ)
- * w5:	  Bitfield of secure world capabilities OPTEE_FFA_SEC_CAP_* below,
+ * w5:	  Bitfield of OP-TEE capabilities OPTEE_FFA_SEC_CAP_*
  * w6:	  The maximum secure world notification number
  * w7:	  Not used (MBZ)
  */
@@ -94,6 +94,8 @@
 #define OPTEE_FFA_SEC_CAP_ASYNC_NOTIF	BIT(1)
 /* OP-TEE supports probing for RPMB device if needed */
 #define OPTEE_FFA_SEC_CAP_RPMB_PROBE	BIT(2)
+/* OP-TEE supports Restricted Memory for secure data path */
+#define OPTEE_FFA_SEC_CAP_RSTMEM	BIT(3)
 
 #define OPTEE_FFA_EXCHANGE_CAPABILITIES OPTEE_FFA_BLOCKING_CALL(2)
 
@@ -108,7 +110,7 @@
  *
  * Return register usage:
  * w3:    Error code, 0 on success
- * w4-w7: Note used (MBZ)
+ * w4-w7: Not used (MBZ)
  */
 #define OPTEE_FFA_UNREGISTER_SHM	OPTEE_FFA_BLOCKING_CALL(3)
 
@@ -119,16 +121,31 @@
  * Call register usage:
  * w3:    Service ID, OPTEE_FFA_ENABLE_ASYNC_NOTIF
  * w4:	  Notification value to request bottom half processing, should be
- *	  less than OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE.
+ *	  less than OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE
  * w5-w7: Not used (MBZ)
  *
  * Return register usage:
  * w3:    Error code, 0 on success
- * w4-w7: Note used (MBZ)
+ * w4-w7: Not used (MBZ)
  */
 #define OPTEE_FFA_ENABLE_ASYNC_NOTIF	OPTEE_FFA_BLOCKING_CALL(5)
 
-#define OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE 64
+#define OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE	64
+
+/*
+ * Release Restricted memory
+ *
+ * Call register usage:
+ * w3:    Service ID, OPTEE_FFA_RECLAIM_RSTMEM
+ * w4:    Shared memory handle, lower bits
+ * w5:    Shared memory handle, higher bits
+ * w6-w7: Not used (MBZ)
+ *
+ * Return register usage:
+ * w3:    Error code, 0 on success
+ * w4-w7: Note used (MBZ)
+ */
+#define OPTEE_FFA_RELEASE_RSTMEM	OPTEE_FFA_BLOCKING_CALL(8)
 
 /*
  * Call with struct optee_msg_arg as argument in the supplied shared memory
diff --git a/drivers/tee/optee/optee_msg.h b/drivers/tee/optee/optee_msg.h
index e8840a82b983..1b558526e7d9 100644
--- a/drivers/tee/optee/optee_msg.h
+++ b/drivers/tee/optee/optee_msg.h
@@ -133,13 +133,13 @@ struct optee_msg_param_rmem {
 };
 
 /**
- * struct optee_msg_param_fmem - ffa memory reference parameter
+ * struct optee_msg_param_fmem - FF-A memory reference parameter
  * @offs_lower:	   Lower bits of offset into shared memory reference
  * @offs_upper:	   Upper bits of offset into shared memory reference
  * @internal_offs: Internal offset into the first page of shared memory
  *		   reference
  * @size:	   Size of the buffer
- * @global_id:	   Global identifier of Shared memory
+ * @global_id:	   Global identifier of the shared memory
  */
 struct optee_msg_param_fmem {
 	u32 offs_low;
@@ -165,7 +165,7 @@ struct optee_msg_param_value {
  * @attr:	attributes
  * @tmem:	parameter by temporary memory reference
  * @rmem:	parameter by registered memory reference
- * @fmem:	parameter by ffa registered memory reference
+ * @fmem:	parameter by FF-A registered memory reference
  * @value:	parameter by opaque value
  * @octets:	parameter by octet string
  *
@@ -296,6 +296,18 @@ struct optee_msg_arg {
  */
 #define OPTEE_MSG_FUNCID_GET_OS_REVISION	0x0001
 
+/*
+ * Values used in OPTEE_MSG_CMD_LEND_RSTMEM below
+ * OPTEE_MSG_RSTMEM_RESERVED		Reserved
+ * OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY	Secure Video Playback
+ * OPTEE_MSG_RSTMEM_TRUSTED_UI		Trused UI
+ * OPTEE_MSG_RSTMEM_SECURE_VIDEO_RECORD	Secure Video Recording
+ */
+#define OPTEE_MSG_RSTMEM_RESERVED		0
+#define OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY	1
+#define OPTEE_MSG_RSTMEM_TRUSTED_UI		2
+#define OPTEE_MSG_RSTMEM_SECURE_VIDEO_RECORD	3
+
 /*
  * Do a secure call with struct optee_msg_arg as argument
  * The OPTEE_MSG_CMD_* below defines what goes in struct optee_msg_arg::cmd
@@ -337,6 +349,49 @@ struct optee_msg_arg {
  * OPTEE_MSG_CMD_STOP_ASYNC_NOTIF informs secure world that from now is
  * normal world unable to process asynchronous notifications. Typically
  * used when the driver is shut down.
+ *
+ * OPTEE_MSG_CMD_LEND_RSTMEM lends restricted memory. The passed normal
+ * physical memory is restricted from normal world access. The memory
+ * should be unmapped prior to this call since it becomes inaccessible
+ * during the request.
+ * Parameters are passed as:
+ * [in] param[0].attr			OPTEE_MSG_ATTR_TYPE_VALUE_INPUT
+ * [in] param[0].u.value.a		OPTEE_MSG_RSTMEM_* defined above
+ * [in] param[1].attr			OPTEE_MSG_ATTR_TYPE_TMEM_INPUT
+ * [in] param[1].u.tmem.buf_ptr		physical address
+ * [in] param[1].u.tmem.size		size
+ * [in] param[1].u.tmem.shm_ref		holds restricted memory reference
+ *
+ * OPTEE_MSG_CMD_RECLAIM_RSTMEM reclaims a previously lent restricted
+ * memory reference. The physical memory is accessible by the normal world
+ * after this function has return and can be mapped again. The information
+ * is passed as:
+ * [in] param[0].attr			OPTEE_MSG_ATTR_TYPE_VALUE_INPUT
+ * [in] param[0].u.value.a		holds restricted memory cookie
+ *
+ * OPTEE_MSG_CMD_GET_RSTMEM_CONFIG get configuration for a specific
+ * restricted memory use case. Parameters are passed as:
+ * [in] param[0].attr			OPTEE_MSG_ATTR_TYPE_VALUE_INOUT
+ * [in] param[0].value.a		OPTEE_MSG_RSTMEM_*
+ * [in] param[1].attr			OPTEE_MSG_ATTR_TYPE_{R,F}MEM_OUTPUT
+ * [in] param[1].u.{r,f}mem		Buffer or NULL
+ * [in] param[1].u.{r,f}mem.size	Provided size of buffer or 0 for query
+ * output for the restricted use case:
+ * [out] param[0].value.a		Minimal size of SDP memory
+ * [out] param[0].value.b		Required alignment of size and start of
+ *					restricted memory
+ * [out] param[1].{r,f}mem.size		Size of output data
+ * [out] param[1].{r,f}mem		If non-NULL, contains an array of
+ *					uint16_t holding endpoints that
+ *					must be included when lending
+ *					memory for this use case
+ *
+ * OPTEE_MSG_CMD_ASSIGN_RSTMEM assigns use-case to restricted memory
+ * previously lent using the FFA_LEND framework ABI. Parameters are passed
+ * as:
+ * [in] param[0].attr			OPTEE_MSG_ATTR_TYPE_VALUE_INPUT
+ * [in] param[0].u.value.a		holds restricted memory cookie
+ * [in] param[0].u.value.b		OPTEE_MSG_RSTMEM_* defined above
  */
 #define OPTEE_MSG_CMD_OPEN_SESSION	0
 #define OPTEE_MSG_CMD_INVOKE_COMMAND	1
@@ -346,6 +401,10 @@ struct optee_msg_arg {
 #define OPTEE_MSG_CMD_UNREGISTER_SHM	5
 #define OPTEE_MSG_CMD_DO_BOTTOM_HALF	6
 #define OPTEE_MSG_CMD_STOP_ASYNC_NOTIF	7
+#define OPTEE_MSG_CMD_LEND_RSTMEM	8
+#define OPTEE_MSG_CMD_RECLAIM_RSTMEM	9
+#define OPTEE_MSG_CMD_GET_RSTMEM_CONFIG	10
+#define OPTEE_MSG_CMD_ASSIGN_RSTMEM	11
 #define OPTEE_MSG_FUNCID_CALL_WITH_ARG	0x0004
 
 #endif /* _OPTEE_MSG_H */
diff --git a/drivers/tee/optee/optee_smc.h b/drivers/tee/optee/optee_smc.h
index 879426300821..abc379ce190c 100644
--- a/drivers/tee/optee/optee_smc.h
+++ b/drivers/tee/optee/optee_smc.h
@@ -264,7 +264,6 @@ struct optee_smc_get_shm_config_result {
 #define OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM	BIT(0)
 /* Secure world can communicate via previously unregistered shared memory */
 #define OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM	BIT(1)
-
 /*
  * Secure world supports commands "register/unregister shared memory",
  * secure world accepts command buffers located in any parts of non-secure RAM
@@ -280,6 +279,10 @@ struct optee_smc_get_shm_config_result {
 #define OPTEE_SMC_SEC_CAP_RPC_ARG		BIT(6)
 /* Secure world supports probing for RPMB device if needed */
 #define OPTEE_SMC_SEC_CAP_RPMB_PROBE		BIT(7)
+/* Secure world supports Secure Data Path */
+#define OPTEE_SMC_SEC_CAP_SDP			BIT(8)
+/* Secure world supports dynamic restricted memory */
+#define OPTEE_SMC_SEC_CAP_DYNAMIC_RSTMEM	BIT(9)
 
 #define OPTEE_SMC_FUNCID_EXCHANGE_CAPABILITIES	9
 #define OPTEE_SMC_EXCHANGE_CAPABILITIES \
@@ -451,6 +454,72 @@ struct optee_smc_disable_shm_cache_result {
 
 /* See OPTEE_SMC_CALL_WITH_REGD_ARG above */
 #define OPTEE_SMC_FUNCID_CALL_WITH_REGD_ARG	19
+/*
+ * Get Secure Data Path memory config
+ *
+ * Returns the Secure Data Path memory config.
+ *
+ * Call register usage:
+ * a0   SMC Function ID, OPTEE_SMC_GET_SDP_CONFIG
+ * a2-6	Not used, must be zero
+ * a7	Hypervisor Client ID register
+ *
+ * Have config return register usage:
+ * a0	OPTEE_SMC_RETURN_OK
+ * a1	Physical address of start of SDP memory
+ * a2	Size of SDP memory
+ * a3	Not used
+ * a4-7	Preserved
+ *
+ * Not available register usage:
+ * a0	OPTEE_SMC_RETURN_ENOTAVAIL
+ * a1-3 Not used
+ * a4-7	Preserved
+ */
+#define OPTEE_SMC_FUNCID_GET_SDP_CONFIG		20
+#define OPTEE_SMC_GET_SDP_CONFIG \
+	OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_SDP_CONFIG)
+
+struct optee_smc_get_sdp_config_result {
+	unsigned long status;
+	unsigned long start;
+	unsigned long size;
+	unsigned long flags;
+};
+
+/*
+ * Get Secure Data Path dynamic memory config
+ *
+ * Returns the Secure Data Path dynamic memory config.
+ *
+ * Call register usage:
+ * a0	SMC Function ID, OPTEE_SMC_GET_DYN_SHM_CONFIG
+ * a2-6	Not used, must be zero
+ * a7	Hypervisor Client ID register
+ *
+ * Have config return register usage:
+ * a0	OPTEE_SMC_RETURN_OK
+ * a1	Minamal size of SDP memory
+ * a2	Required alignment of size and start of registered SDP memory
+ * a3	Not used
+ * a4-7	Preserved
+ *
+ * Not available register usage:
+ * a0	OPTEE_SMC_RETURN_ENOTAVAIL
+ * a1-3 Not used
+ * a4-7	Preserved
+ */
+
+#define OPTEE_SMC_FUNCID_GET_DYN_SDP_CONFIG	21
+#define OPTEE_SMC_GET_DYN_SDP_CONFIG \
+	OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_DYN_SDP_CONFIG)
+
+struct optee_smc_get_dyn_sdp_config_result {
+	unsigned long status;
+	unsigned long size;
+	unsigned long align;
+	unsigned long flags;
+};
 
 /*
  * Resume from RPC (for example after processing a foreign interrupt)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 4/6] optee: support restricted memory allocation
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
                   ` (2 preceding siblings ...)
  2024-12-17 10:07 ` [PATCH v4 3/6] optee: sync secure world ABI headers Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2024-12-17 10:07 ` [PATCH v4 5/6] optee: FF-A: dynamic " Jens Wiklander
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Add support in the OP-TEE backend driver for restricted memory
allocation. The support is limited to only the SMC ABI and for secure
video buffers.

OP-TEE is probed for the range of restricted physical memory and a
memory pool allocator is initialized if OP-TEE have support for such
memory.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/optee/Makefile        |  1 +
 drivers/tee/optee/core.c          |  1 +
 drivers/tee/optee/optee_private.h | 23 ++++++++++
 drivers/tee/optee/rstmem.c        | 76 +++++++++++++++++++++++++++++++
 drivers/tee/optee/smc_abi.c       | 69 ++++++++++++++++++++++++++--
 5 files changed, 167 insertions(+), 3 deletions(-)
 create mode 100644 drivers/tee/optee/rstmem.c

diff --git a/drivers/tee/optee/Makefile b/drivers/tee/optee/Makefile
index a6eff388d300..498969fb8e40 100644
--- a/drivers/tee/optee/Makefile
+++ b/drivers/tee/optee/Makefile
@@ -4,6 +4,7 @@ optee-objs += core.o
 optee-objs += call.o
 optee-objs += notif.o
 optee-objs += rpc.o
+optee-objs += rstmem.o
 optee-objs += supp.o
 optee-objs += device.o
 optee-objs += smc_abi.o
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index c75fddc83576..f4fa494789a4 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -182,6 +182,7 @@ void optee_remove_common(struct optee *optee)
 	tee_device_unregister(optee->teedev);
 
 	tee_shm_pool_free(optee->pool);
+	optee_rstmem_pools_uninit(optee);
 	optee_supp_uninit(&optee->supp);
 	mutex_destroy(&optee->call_queue.mutex);
 	rpmb_dev_put(optee->rpmb_dev);
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index 20eda508dbac..0491889e5b0e 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -193,6 +193,20 @@ struct optee_ops {
 			      bool update_out);
 };
 
+/**
+ * struct optee_rstmem_pools - restricted memory pools
+ * @mutex:	serializes write access to @xa when adding a new pool.
+ * @xa:		XArray of struct tee_shm_pool where the index is the
+ *		use case ID TEE_IOC_UC_* supplied for TEE_IOC_RSTMEM_ALLOC.
+ */
+struct optee_rstmem_pools {
+	/*
+	 * Serializes write access to @xa when adding a new pool.
+	 */
+	struct mutex mutex;
+	struct xarray xa;
+};
+
 /**
  * struct optee - main service struct
  * @supp_teedev:	supplicant device
@@ -206,6 +220,7 @@ struct optee_ops {
  * @notif:		notification synchronization struct
  * @supp:		supplicant synchronization struct for RPC to supplicant
  * @pool:		shared memory pool
+ * @rstmem_pool:	restricted memory pool for secure data path
  * @mutex:		mutex protecting @rpmb_dev
  * @rpmb_dev:		current RPMB device or NULL
  * @rpmb_scan_bus_done	flag if device registation of RPMB dependent devices
@@ -230,6 +245,7 @@ struct optee {
 	struct optee_notif notif;
 	struct optee_supp supp;
 	struct tee_shm_pool *pool;
+	struct optee_rstmem_pools *rstmem_pools;
 	/* Protects rpmb_dev pointer */
 	struct mutex rpmb_dev_mutex;
 	struct rpmb_dev *rpmb_dev;
@@ -286,6 +302,9 @@ void optee_supp_init(struct optee_supp *supp);
 void optee_supp_uninit(struct optee_supp *supp);
 void optee_supp_release(struct optee_supp *supp);
 
+int optee_rstmem_pools_init(struct optee *optee);
+void optee_rstmem_pools_uninit(struct optee *optee);
+
 int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
 		    struct tee_param *param);
 int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
@@ -378,6 +397,10 @@ void optee_rpc_cmd(struct tee_context *ctx, struct optee *optee,
 int optee_do_bottom_half(struct tee_context *ctx);
 int optee_stop_async_notif(struct tee_context *ctx);
 
+int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
+		       u32 flags, u32 use_case, size_t size);
+void optee_rstmem_free(struct tee_context *ctx, struct tee_shm *shm);
+
 /*
  * Small helpers
  */
diff --git a/drivers/tee/optee/rstmem.c b/drivers/tee/optee/rstmem.c
new file mode 100644
index 000000000000..01456bc3e2f6
--- /dev/null
+++ b/drivers/tee/optee/rstmem.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, Linaro Limited
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/dma-map-ops.h>
+#include <linux/errno.h>
+#include <linux/genalloc.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/tee_core.h>
+#include <linux/types.h>
+#include "optee_private.h"
+
+int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
+		       u32 flags, u32 use_case, size_t size)
+{
+	struct optee *optee = tee_get_drvdata(ctx->teedev);
+	struct tee_shm_pool *pool;
+
+	if (!optee->rstmem_pools)
+		return -EINVAL;
+	if (flags)
+		return -EINVAL;
+
+	pool = xa_load(&optee->rstmem_pools->xa, use_case);
+	if (!pool)
+		return -EINVAL;
+
+	return pool->ops->alloc(pool, shm, size, 0);
+}
+
+void optee_rstmem_free(struct tee_context *ctx, struct tee_shm *shm)
+{
+	struct optee *optee = tee_get_drvdata(ctx->teedev);
+	struct tee_shm_pool *pool;
+
+	pool = xa_load(&optee->rstmem_pools->xa, shm->use_case);
+	if (pool)
+		pool->ops->free(pool, shm);
+	else
+		pr_err("Can't find pool for use_case %u\n", shm->use_case);
+}
+
+int optee_rstmem_pools_init(struct optee *optee)
+{
+	struct optee_rstmem_pools *pools;
+
+	pools = kmalloc(sizeof(*pools), GFP_KERNEL);
+	if (!pools)
+		return -ENOMEM;
+
+	mutex_init(&pools->mutex);
+	xa_init(&pools->xa);
+	optee->rstmem_pools = pools;
+	return 0;
+}
+
+void optee_rstmem_pools_uninit(struct optee *optee)
+{
+	if (optee->rstmem_pools) {
+		struct tee_shm_pool *pool;
+		u_long idx;
+
+		xa_for_each(&optee->rstmem_pools->xa, idx, pool) {
+			xa_erase(&optee->rstmem_pools->xa, idx);
+			pool->ops->destroy_pool(pool);
+		}
+
+		xa_destroy(&optee->rstmem_pools->xa);
+		mutex_destroy(&optee->rstmem_pools->mutex);
+		kfree(optee->rstmem_pools);
+		optee->rstmem_pools = NULL;
+	}
+}
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index 788919a473d6..f5fd5f1d9a6b 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1201,6 +1201,8 @@ static void optee_get_version(struct tee_device *teedev,
 		v.gen_caps |= TEE_GEN_CAP_REG_MEM;
 	if (optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_MEMREF_NULL)
 		v.gen_caps |= TEE_GEN_CAP_MEMREF_NULL;
+	if (optee->rstmem_pools)
+		v.gen_caps |= TEE_GEN_CAP_RSTMEM;
 	*vers = v;
 }
 
@@ -1223,6 +1225,8 @@ static const struct tee_driver_ops optee_clnt_ops = {
 	.cancel_req = optee_cancel_req,
 	.shm_register = optee_shm_register,
 	.shm_unregister = optee_shm_unregister,
+	.rstmem_alloc = optee_rstmem_alloc,
+	.rstmem_free = optee_rstmem_free,
 };
 
 static const struct tee_desc optee_clnt_desc = {
@@ -1239,6 +1243,8 @@ static const struct tee_driver_ops optee_supp_ops = {
 	.supp_send = optee_supp_send,
 	.shm_register = optee_shm_register_supp,
 	.shm_unregister = optee_shm_unregister_supp,
+	.rstmem_alloc = optee_rstmem_alloc,
+	.rstmem_free = optee_rstmem_free,
 };
 
 static const struct tee_desc optee_supp_desc = {
@@ -1619,6 +1625,57 @@ static inline int optee_load_fw(struct platform_device *pdev,
 }
 #endif
 
+static int optee_sdp_pool_init(struct optee *optee)
+{
+	bool sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP;
+	struct tee_shm_pool *pool;
+	int rc;
+
+	/*
+	 * optee_sdp_pools_init() must be called if secure world has any
+	 * SDP capability. If the static carvout is available initialize
+	 * and add a pool for that.
+	 */
+	if (!sdp)
+		return 0;
+
+	rc = optee_rstmem_pools_init(optee);
+	if (rc)
+		return rc;
+
+	if (optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP) {
+		union {
+			struct arm_smccc_res smccc;
+			struct optee_smc_get_sdp_config_result result;
+		} res;
+
+		optee->smc.invoke_fn(OPTEE_SMC_GET_SDP_CONFIG, 0, 0, 0, 0, 0, 0,
+				     0, &res.smccc);
+		if (res.result.status != OPTEE_SMC_RETURN_OK) {
+			pr_err("Secure Data Path service not available\n");
+			goto err;
+		}
+
+		pool = tee_rstmem_gen_pool_alloc(res.result.start,
+						 res.result.size);
+		if (IS_ERR(pool)) {
+			rc = PTR_ERR(pool);
+			goto err;
+		}
+		rc = xa_insert(&optee->rstmem_pools->xa,
+			       TEE_IOC_UC_SECURE_VIDEO_PLAY, pool, GFP_KERNEL);
+		if (rc) {
+			pool->ops->destroy_pool(pool);
+			goto err;
+		}
+	}
+
+	return 0;
+err:
+	optee_rstmem_pools_uninit(optee);
+	return rc;
+}
+
 static int optee_probe(struct platform_device *pdev)
 {
 	optee_invoke_fn *invoke_fn;
@@ -1714,7 +1771,7 @@ static int optee_probe(struct platform_device *pdev)
 	optee = kzalloc(sizeof(*optee), GFP_KERNEL);
 	if (!optee) {
 		rc = -ENOMEM;
-		goto err_free_pool;
+		goto err_free_shm_pool;
 	}
 
 	optee->ops = &optee_ops;
@@ -1726,10 +1783,14 @@ static int optee_probe(struct platform_device *pdev)
 	    (sec_caps & OPTEE_SMC_SEC_CAP_RPMB_PROBE))
 		optee->in_kernel_rpmb_routing = true;
 
+	rc = optee_sdp_pool_init(optee);
+	if (rc)
+		goto err_free_optee;
+
 	teedev = tee_device_alloc(&optee_clnt_desc, NULL, pool, optee);
 	if (IS_ERR(teedev)) {
 		rc = PTR_ERR(teedev);
-		goto err_free_optee;
+		goto err_rstmem_pools_uninit;
 	}
 	optee->teedev = teedev;
 
@@ -1836,9 +1897,11 @@ static int optee_probe(struct platform_device *pdev)
 	tee_device_unregister(optee->supp_teedev);
 err_unreg_teedev:
 	tee_device_unregister(optee->teedev);
+err_rstmem_pools_uninit:
+	optee_rstmem_pools_uninit(optee);
 err_free_optee:
 	kfree(optee);
-err_free_pool:
+err_free_shm_pool:
 	tee_shm_pool_free(pool);
 	if (memremaped_shm)
 		memunmap(memremaped_shm);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
                   ` (3 preceding siblings ...)
  2024-12-17 10:07 ` [PATCH v4 4/6] optee: support restricted memory allocation Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2024-12-20 14:38   ` kernel test robot
  2024-12-21  9:40   ` kernel test robot
  2024-12-17 10:07 ` [PATCH v4 6/6] optee: smc abi: " Jens Wiklander
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Add support in the OP-TEE backend driver dynamic restricted memory
allocation with FF-A.

The restricted memory pools for dynamically allocated restrict memory
are instantiated when requested by user-space. This instantiation can
fail if OP-TEE doesn't support the requested use-case of restricted
memory.

Restricted memory pools based on a static carveout or dynamic allocation
can coexist for different use-cases. We use only dynamic allocation with
FF-A.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/optee/ffa_abi.c       | 135 ++++++++++++-
 drivers/tee/optee/optee_private.h |  10 +-
 drivers/tee/optee/rstmem.c        | 316 +++++++++++++++++++++++++++++-
 3 files changed, 457 insertions(+), 4 deletions(-)

diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index 02e6175ac5f0..f500cf101c8d 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -672,6 +672,122 @@ static int optee_ffa_do_call_with_arg(struct tee_context *ctx,
 	return optee_ffa_yielding_call(ctx, &data, rpc_arg, system_thread);
 }
 
+static int do_call_lend_rstmem(struct optee *optee, u64 cookie, u32 use_case)
+{
+	struct optee_shm_arg_entry *entry;
+	struct optee_msg_arg *msg_arg;
+	struct tee_shm *shm;
+	u_int offs;
+	int rc;
+
+	msg_arg = optee_get_msg_arg(optee->ctx, 1, &entry, &shm, &offs);
+	if (IS_ERR(msg_arg))
+		return PTR_ERR(msg_arg);
+
+	msg_arg->cmd = OPTEE_MSG_CMD_ASSIGN_RSTMEM;
+	msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT;
+	msg_arg->params[0].u.value.a = cookie;
+	msg_arg->params[0].u.value.b = use_case;
+
+	rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false);
+	if (rc)
+		goto out;
+	if (msg_arg->ret != TEEC_SUCCESS) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+out:
+	optee_free_msg_arg(optee->ctx, entry, offs);
+	return rc;
+}
+
+static int optee_ffa_lend_rstmem(struct optee *optee, struct tee_shm *rstmem,
+				 u16 *end_points, unsigned int ep_count)
+{
+	struct ffa_device *ffa_dev = optee->ffa.ffa_dev;
+	const struct ffa_mem_ops *mem_ops = ffa_dev->ops->mem_ops;
+	const struct ffa_msg_ops *msg_ops = ffa_dev->ops->msg_ops;
+	struct ffa_send_direct_data data;
+	struct ffa_mem_region_attributes *mem_attr;
+	struct ffa_mem_ops_args args = {
+		.use_txbuf = true,
+		.tag = rstmem->use_case,
+	};
+	struct page *page;
+	struct scatterlist sgl;
+	unsigned int n;
+	int rc;
+
+	mem_attr = kcalloc(ep_count, sizeof(*mem_attr), GFP_KERNEL);
+	for (n = 0; n < ep_count; n++) {
+		mem_attr[n].receiver = end_points[n];
+		mem_attr[n].attrs = FFA_MEM_RW;
+	}
+	args.attrs = mem_attr;
+	args.nattrs = ep_count;
+
+	page = phys_to_page(rstmem->paddr);
+	sg_init_table(&sgl, 1);
+	sg_set_page(&sgl, page, rstmem->size, 0);
+
+	args.sg = &sgl;
+	rc = mem_ops->memory_lend(&args);
+	kfree(mem_attr);
+	if (rc)
+		return rc;
+
+	rc = do_call_lend_rstmem(optee, args.g_handle, rstmem->use_case);
+	if (rc)
+		goto err_reclaim;
+
+	rc = optee_shm_add_ffa_handle(optee, rstmem, args.g_handle);
+	if (rc)
+		goto err_unreg;
+
+	rstmem->sec_world_id = args.g_handle;
+
+	return 0;
+
+err_unreg:
+	data = (struct ffa_send_direct_data){
+		.data0 = OPTEE_FFA_RELEASE_RSTMEM,
+		.data1 = (u32)args.g_handle,
+		.data2 = (u32)(args.g_handle >> 32),
+	};
+	msg_ops->sync_send_receive(ffa_dev, &data);
+err_reclaim:
+	mem_ops->memory_reclaim(args.g_handle, 0);
+	return rc;
+}
+
+static int optee_ffa_reclaim_rstmem(struct optee *optee, struct tee_shm *rstmem)
+{
+	struct ffa_device *ffa_dev = optee->ffa.ffa_dev;
+	const struct ffa_msg_ops *msg_ops = ffa_dev->ops->msg_ops;
+	const struct ffa_mem_ops *mem_ops = ffa_dev->ops->mem_ops;
+	u64 global_handle = rstmem->sec_world_id;
+	struct ffa_send_direct_data data = {
+		.data0 = OPTEE_FFA_RELEASE_RSTMEM,
+		.data1 = (u32)global_handle,
+		.data2 = (u32)(global_handle >> 32)
+	};
+	int rc;
+
+	optee_shm_rem_ffa_handle(optee, global_handle);
+	rstmem->sec_world_id = 0;
+
+	rc = msg_ops->sync_send_receive(ffa_dev, &data);
+	if (rc)
+		pr_err("Release SHM id 0x%llx rc %d\n", global_handle, rc);
+
+	rc = mem_ops->memory_reclaim(global_handle, 0);
+	if (rc)
+		pr_err("mem_reclaim: 0x%llx %d", global_handle, rc);
+
+	return rc;
+}
+
 /*
  * 6. Driver initialization
  *
@@ -785,7 +901,10 @@ static void optee_ffa_get_version(struct tee_device *teedev,
 		.gen_caps = TEE_GEN_CAP_GP | TEE_GEN_CAP_REG_MEM |
 			    TEE_GEN_CAP_MEMREF_NULL,
 	};
+	struct optee *optee = tee_get_drvdata(teedev);
 
+	if (optee->rstmem_pools)
+		v.gen_caps |= TEE_GEN_CAP_RSTMEM;
 	*vers = v;
 }
 
@@ -804,6 +923,8 @@ static const struct tee_driver_ops optee_ffa_clnt_ops = {
 	.cancel_req = optee_cancel_req,
 	.shm_register = optee_ffa_shm_register,
 	.shm_unregister = optee_ffa_shm_unregister,
+	.rstmem_alloc = optee_rstmem_alloc,
+	.rstmem_free = optee_rstmem_free,
 };
 
 static const struct tee_desc optee_ffa_clnt_desc = {
@@ -820,6 +941,8 @@ static const struct tee_driver_ops optee_ffa_supp_ops = {
 	.supp_send = optee_supp_send,
 	.shm_register = optee_ffa_shm_register, /* same as for clnt ops */
 	.shm_unregister = optee_ffa_shm_unregister_supp,
+	.rstmem_alloc = optee_rstmem_alloc,
+	.rstmem_free = optee_rstmem_free,
 };
 
 static const struct tee_desc optee_ffa_supp_desc = {
@@ -833,6 +956,8 @@ static const struct optee_ops optee_ffa_ops = {
 	.do_call_with_arg = optee_ffa_do_call_with_arg,
 	.to_msg_param = optee_ffa_to_msg_param,
 	.from_msg_param = optee_ffa_from_msg_param,
+	.lend_rstmem = optee_ffa_lend_rstmem,
+	.reclaim_rstmem = optee_ffa_reclaim_rstmem,
 };
 
 static void optee_ffa_remove(struct ffa_device *ffa_dev)
@@ -937,11 +1062,17 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
 	    (sec_caps & OPTEE_FFA_SEC_CAP_RPMB_PROBE))
 		optee->in_kernel_rpmb_routing = true;
 
+	if (sec_caps & OPTEE_FFA_SEC_CAP_RSTMEM) {
+		rc = optee_rstmem_pools_init(optee);
+		if (rc)
+			goto err_free_pool;
+	}
+
 	teedev = tee_device_alloc(&optee_ffa_clnt_desc, NULL, optee->pool,
 				  optee);
 	if (IS_ERR(teedev)) {
 		rc = PTR_ERR(teedev);
-		goto err_free_pool;
+		goto err_free_rstmem_pools;
 	}
 	optee->teedev = teedev;
 
@@ -1020,6 +1151,8 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
 	tee_device_unregister(optee->teedev);
 err_free_pool:
 	tee_shm_pool_free(pool);
+err_free_rstmem_pools:
+	optee_rstmem_pools_uninit(optee);
 err_free_optee:
 	kfree(optee);
 	return rc;
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index 0491889e5b0e..dd2a3a2224bc 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -174,9 +174,14 @@ struct optee;
  * @do_call_with_arg:	enters OP-TEE in secure world
  * @to_msg_param:	converts from struct tee_param to OPTEE_MSG parameters
  * @from_msg_param:	converts from OPTEE_MSG parameters to struct tee_param
+ * @lend_rstmem:	lends physically contiguous memory as restricted
+ *			memory, inaccessible by the kernel
+ * @reclaim_rstmem:	reclaims restricted memory previously lent with
+ *			@lend_rstmem() and makes it accessible by the
+ *			kernel again
  *
  * These OPs are only supposed to be used internally in the OP-TEE driver
- * as a way of abstracting the different methogs of entering OP-TEE in
+ * as a way of abstracting the different methods of entering OP-TEE in
  * secure world.
  */
 struct optee_ops {
@@ -191,6 +196,9 @@ struct optee_ops {
 			      size_t num_params,
 			      const struct optee_msg_param *msg_params,
 			      bool update_out);
+	int (*lend_rstmem)(struct optee *optee, struct tee_shm *rstmem,
+			   u16 *end_points, unsigned int ep_count);
+	int (*reclaim_rstmem)(struct optee *optee, struct tee_shm *rstmem);
 };
 
 /**
diff --git a/drivers/tee/optee/rstmem.c b/drivers/tee/optee/rstmem.c
index 01456bc3e2f6..8c561d638733 100644
--- a/drivers/tee/optee/rstmem.c
+++ b/drivers/tee/optee/rstmem.c
@@ -4,6 +4,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/cma.h>
 #include <linux/dma-map-ops.h>
 #include <linux/errno.h>
 #include <linux/genalloc.h>
@@ -13,11 +14,313 @@
 #include <linux/types.h>
 #include "optee_private.h"
 
+#ifdef CONFIG_CMA
+struct optee_rstmem_cma_pool {
+	struct tee_shm_pool pool;
+	struct page *page;
+	struct optee *optee;
+	size_t page_count;
+	u16 *end_points;
+	u_int end_point_count;
+	u_int align;
+	refcount_t refcount;
+	struct tee_shm rstmem;
+	/* Protects when initializing and tearing down this struct */
+	struct mutex mutex;
+};
+
+static struct optee_rstmem_cma_pool *
+to_rstmem_cma_pool(struct tee_shm_pool *pool)
+{
+	return container_of(pool, struct optee_rstmem_cma_pool, pool);
+}
+
+static int init_cma_rstmem(struct optee_rstmem_cma_pool *rp)
+{
+	struct cma *cma = dev_get_cma_area(&rp->optee->teedev->dev);
+	int rc;
+
+	rp->page = cma_alloc(cma, rp->page_count, rp->align, true/*no_warn*/);
+	if (!rp->page)
+		return -ENOMEM;
+
+	/*
+	 * TODO unmap the memory range since the physical memory will
+	 * become inaccesible after the lend_rstmem() call.
+	 */
+
+	rp->rstmem.paddr = page_to_phys(rp->page);
+	rp->rstmem.size = rp->page_count * PAGE_SIZE;
+	rc = rp->optee->ops->lend_rstmem(rp->optee, &rp->rstmem,
+					 rp->end_points, rp->end_point_count);
+	if (rc)
+		goto err_release;
+
+	rp->pool.private_data = gen_pool_create(PAGE_SHIFT, -1);
+	if (!rp->pool.private_data) {
+		rc = -ENOMEM;
+		goto err_reclaim;
+	}
+
+	rc = gen_pool_add(rp->pool.private_data, rp->rstmem.paddr,
+			  rp->rstmem.size, -1);
+	if (rc)
+		goto err_free_pool;
+
+	refcount_set(&rp->refcount, 1);
+	return 0;
+
+err_free_pool:
+	gen_pool_destroy(rp->pool.private_data);
+err_reclaim:
+	rp->optee->ops->reclaim_rstmem(rp->optee, &rp->rstmem);
+err_release:
+	cma_release(cma, rp->page, rp->page_count);
+	rp->rstmem.paddr = 0;
+	rp->rstmem.size = 0;
+	rp->rstmem.sec_world_id = 0;
+	return rc;
+}
+
+static int get_cma_rstmem(struct optee_rstmem_cma_pool *rp)
+{
+	int rc = 0;
+
+	if (!refcount_inc_not_zero(&rp->refcount)) {
+		mutex_lock(&rp->mutex);
+		if (rp->pool.private_data) {
+			/*
+			 * Another thread has already initialized the pool
+			 * before us, or the pool was just about to be torn
+			 * down. Either way we only need to increase the
+			 * refcount and we're done.
+			 */
+			refcount_inc(&rp->refcount);
+		} else {
+			rc = init_cma_rstmem(rp);
+		}
+		mutex_unlock(&rp->mutex);
+	}
+
+	return rc;
+}
+
+static void release_cma_rstmem(struct optee_rstmem_cma_pool *rp)
+{
+	gen_pool_destroy(rp->pool.private_data);
+	rp->optee->ops->reclaim_rstmem(rp->optee, &rp->rstmem);
+	cma_release(dev_get_cma_area(&rp->optee->teedev->dev), rp->page,
+		    rp->page_count);
+
+	rp->pool.private_data = NULL;
+	rp->page = NULL;
+	rp->rstmem.paddr = 0;
+	rp->rstmem.size = 0;
+	rp->rstmem.sec_world_id = 0;
+}
+
+static void put_cma_rstmem(struct optee_rstmem_cma_pool *rp)
+{
+	if (refcount_dec_and_test(&rp->refcount)) {
+		mutex_lock(&rp->mutex);
+		if (rp->pool.private_data)
+			release_cma_rstmem(rp);
+		mutex_unlock(&rp->mutex);
+	}
+}
+
+static int rstmem_pool_op_cma_alloc(struct tee_shm_pool *pool,
+				    struct tee_shm *shm, size_t size,
+				    size_t align)
+{
+	struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool);
+	size_t sz = ALIGN(size, PAGE_SIZE);
+	phys_addr_t pa;
+	int rc;
+
+	rc = get_cma_rstmem(rp);
+	if (rc)
+		return rc;
+
+	pa = gen_pool_alloc(rp->pool.private_data, sz);
+	if (!pa) {
+		put_cma_rstmem(rp);
+		return -ENOMEM;
+	}
+
+	shm->size = sz;
+	shm->paddr = pa;
+	shm->offset = pa - page_to_phys(rp->page);
+	shm->sec_world_id = rp->rstmem.sec_world_id;
+
+	return 0;
+}
+
+static void rstmem_pool_op_cma_free(struct tee_shm_pool *pool,
+				    struct tee_shm *shm)
+{
+	struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool);
+
+	gen_pool_free(rp->pool.private_data, shm->paddr, shm->size);
+	shm->size = 0;
+	shm->paddr = 0;
+	shm->offset = 0;
+	shm->sec_world_id = 0;
+	put_cma_rstmem(rp);
+}
+
+static void pool_op_cma_destroy_pool(struct tee_shm_pool *pool)
+{
+	struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool);
+
+	mutex_destroy(&rp->mutex);
+	kfree(rp);
+}
+
+static struct tee_shm_pool_ops rstmem_pool_ops_cma = {
+	.alloc = rstmem_pool_op_cma_alloc,
+	.free = rstmem_pool_op_cma_free,
+	.destroy_pool = pool_op_cma_destroy_pool,
+};
+
+static int get_rstmem_config(struct optee *optee, u32 use_case,
+			     size_t *min_size, u_int *min_align,
+			     u16 *end_points, u_int *ep_count)
+{
+	struct tee_param params[2] = {
+		[0] = {
+			.attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT,
+			.u.value.a = use_case,
+		},
+		[1] = {
+			.attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT,
+		},
+	};
+	struct optee_shm_arg_entry *entry;
+	struct tee_shm *shm_param = NULL;
+	struct optee_msg_arg *msg_arg;
+	struct tee_shm *shm;
+	u_int offs;
+	int rc;
+
+	if (end_points && *ep_count) {
+		params[1].u.memref.size = *ep_count * sizeof(*end_points);
+		shm_param = tee_shm_alloc_priv_buf(optee->ctx,
+						   params[1].u.memref.size);
+		if (IS_ERR(shm_param))
+			return PTR_ERR(shm_param);
+		params[1].u.memref.shm = shm_param;
+	}
+
+	msg_arg = optee_get_msg_arg(optee->ctx, ARRAY_SIZE(params), &entry,
+				    &shm, &offs);
+	if (IS_ERR(msg_arg)) {
+		rc = PTR_ERR(msg_arg);
+		goto out_free_shm;
+	}
+	msg_arg->cmd = OPTEE_MSG_CMD_GET_RSTMEM_CONFIG;
+
+	rc = optee->ops->to_msg_param(optee, msg_arg->params,
+				      ARRAY_SIZE(params), params,
+				      false /*!update_out*/);
+	if (rc)
+		goto out_free_msg;
+
+	rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false);
+	if (rc)
+		goto out_free_msg;
+	if (msg_arg->ret && msg_arg->ret != TEEC_ERROR_SHORT_BUFFER) {
+		rc = -EINVAL;
+		goto out_free_msg;
+	}
+
+	rc = optee->ops->from_msg_param(optee, params, ARRAY_SIZE(params),
+					msg_arg->params, true /*update_out*/);
+	if (rc)
+		goto out_free_msg;
+
+	if (!msg_arg->ret && end_points &&
+	    *ep_count < params[1].u.memref.size / sizeof(u16)) {
+		rc = -EINVAL;
+		goto out_free_msg;
+	}
+
+	*min_size = params[0].u.value.a;
+	*min_align = params[0].u.value.b;
+	*ep_count = params[1].u.memref.size / sizeof(u16);
+
+	if (msg_arg->ret == TEEC_ERROR_SHORT_BUFFER) {
+		rc = -ENOSPC;
+		goto out_free_msg;
+	}
+
+	if (end_points)
+		memcpy(end_points, tee_shm_get_va(shm_param, 0),
+		       params[1].u.memref.size);
+
+out_free_msg:
+	optee_free_msg_arg(optee->ctx, entry, offs);
+out_free_shm:
+	if (shm_param)
+		tee_shm_free(shm_param);
+	return rc;
+}
+
+static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case)
+{
+	struct optee_rstmem_cma_pool *rp;
+	size_t min_size;
+	int rc;
+
+	rp = kzalloc(sizeof(*rp), GFP_KERNEL);
+	if (!rp)
+		return ERR_PTR(-ENOMEM);
+	rp->rstmem.use_case = use_case;
+
+	rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL,
+			       &rp->end_point_count);
+	if (rc) {
+		if (rc != -ENOSPC)
+			goto err;
+		rp->end_points = kcalloc(rp->end_point_count,
+					 sizeof(*rp->end_points), GFP_KERNEL);
+		if (!rp->end_points) {
+			rc = -ENOMEM;
+			goto err;
+		}
+		rc = get_rstmem_config(optee, use_case, &min_size, &rp->align,
+				       rp->end_points, &rp->end_point_count);
+		if (rc)
+			goto err_kfree_eps;
+	}
+
+	rp->pool.ops = &rstmem_pool_ops_cma;
+	rp->optee = optee;
+	rp->page_count = min_size / PAGE_SIZE;
+	mutex_init(&rp->mutex);
+
+	return &rp->pool;
+
+err_kfree_eps:
+	kfree(rp->end_points);
+err:
+	kfree(rp);
+	return ERR_PTR(rc);
+}
+#else /*CONFIG_CMA*/
+static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
+					      u32 use_case __unused)
+{
+	return ERR_PTR(-EINVAL);
+}
+#endif /*CONFIG_CMA*/
+
 int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
 		       u32 flags, u32 use_case, size_t size)
 {
 	struct optee *optee = tee_get_drvdata(ctx->teedev);
 	struct tee_shm_pool *pool;
+	int rc;
 
 	if (!optee->rstmem_pools)
 		return -EINVAL;
@@ -25,8 +328,17 @@ int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
 		return -EINVAL;
 
 	pool = xa_load(&optee->rstmem_pools->xa, use_case);
-	if (!pool)
-		return -EINVAL;
+	if (!pool) {
+		pool = alloc_rstmem_pool(optee, use_case);
+		if (IS_ERR(pool))
+			return PTR_ERR(pool);
+		rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool,
+			       GFP_KERNEL);
+		if (rc) {
+			pool->ops->destroy_pool(pool);
+			return rc;
+		}
+	}
 
 	return pool->ops->alloc(pool, shm, size, 0);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 6/6] optee: smc abi: dynamic restricted memory allocation
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
                   ` (4 preceding siblings ...)
  2024-12-17 10:07 ` [PATCH v4 5/6] optee: FF-A: dynamic " Jens Wiklander
@ 2024-12-17 10:07 ` Jens Wiklander
  2024-12-18 11:06 ` [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Simona Vetter
  2025-02-12 19:56 ` Boris Brezillon
  7 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2024-12-17 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel
  Cc: Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Sumit Garg, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Jens Wiklander

Add support in the OP-TEE backend driver for dynamic restricted memory
allocation using the SMC ABI.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
 drivers/tee/optee/smc_abi.c | 74 +++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index f5fd5f1d9a6b..25b02dbbc76e 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1001,6 +1001,67 @@ static int optee_smc_do_call_with_arg(struct tee_context *ctx,
 	return rc;
 }
 
+static int optee_smc_lend_rstmem(struct optee *optee, struct tee_shm *rstmem,
+				 u16 *end_points, unsigned int ep_count)
+{
+	struct optee_shm_arg_entry *entry;
+	struct optee_msg_arg *msg_arg;
+	struct tee_shm *shm;
+	u_int offs;
+	int rc;
+
+	msg_arg = optee_get_msg_arg(optee->ctx, 2, &entry, &shm, &offs);
+	if (IS_ERR(msg_arg))
+		return PTR_ERR(msg_arg);
+
+	msg_arg->cmd = OPTEE_MSG_CMD_LEND_RSTMEM;
+	msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT;
+	msg_arg->params[0].u.value.a = OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY;
+	msg_arg->params[1].attr = OPTEE_MSG_ATTR_TYPE_TMEM_INPUT;
+	msg_arg->params[1].u.tmem.buf_ptr = rstmem->paddr;
+	msg_arg->params[1].u.tmem.size = rstmem->size;
+	msg_arg->params[1].u.tmem.shm_ref = (u_long)rstmem;
+
+	rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false);
+	if (rc)
+		goto out;
+	if (msg_arg->ret != TEEC_SUCCESS) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+out:
+	optee_free_msg_arg(optee->ctx, entry, offs);
+	return rc;
+}
+
+static int optee_smc_reclaim_rstmem(struct optee *optee, struct tee_shm *rstmem)
+{
+	struct optee_shm_arg_entry *entry;
+	struct optee_msg_arg *msg_arg;
+	struct tee_shm *shm;
+	u_int offs;
+	int rc;
+
+	msg_arg = optee_get_msg_arg(optee->ctx, 1, &entry, &shm, &offs);
+	if (IS_ERR(msg_arg))
+		return PTR_ERR(msg_arg);
+
+	msg_arg->cmd = OPTEE_MSG_CMD_RECLAIM_RSTMEM;
+	msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT;
+	msg_arg->params[0].u.rmem.shm_ref = (u_long)rstmem;
+
+	rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false);
+	if (rc)
+		goto out;
+	if (msg_arg->ret != TEEC_SUCCESS)
+		rc = -EINVAL;
+
+out:
+	optee_free_msg_arg(optee->ctx, entry, offs);
+	return rc;
+}
+
 /*
  * 5. Asynchronous notification
  */
@@ -1258,6 +1319,8 @@ static const struct optee_ops optee_ops = {
 	.do_call_with_arg = optee_smc_do_call_with_arg,
 	.to_msg_param = optee_to_msg_param,
 	.from_msg_param = optee_from_msg_param,
+	.lend_rstmem = optee_smc_lend_rstmem,
+	.reclaim_rstmem = optee_smc_reclaim_rstmem,
 };
 
 static int enable_async_notif(optee_invoke_fn *invoke_fn)
@@ -1627,6 +1690,7 @@ static inline int optee_load_fw(struct platform_device *pdev,
 
 static int optee_sdp_pool_init(struct optee *optee)
 {
+	bool dyn_sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_RSTMEM;
 	bool sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP;
 	struct tee_shm_pool *pool;
 	int rc;
@@ -1634,9 +1698,11 @@ static int optee_sdp_pool_init(struct optee *optee)
 	/*
 	 * optee_sdp_pools_init() must be called if secure world has any
 	 * SDP capability. If the static carvout is available initialize
-	 * and add a pool for that.
+	 * and add a pool for that. If there's an error from secure world
+	 * we complain but don't call optee_sdp_pools_uninit() unless we
+	 * know that there is no SDP capability left.
 	 */
-	if (!sdp)
+	if (!dyn_sdp && !sdp)
 		return 0;
 
 	rc = optee_rstmem_pools_init(optee);
@@ -1653,7 +1719,9 @@ static int optee_sdp_pool_init(struct optee *optee)
 				     0, &res.smccc);
 		if (res.result.status != OPTEE_SMC_RETURN_OK) {
 			pr_err("Secure Data Path service not available\n");
-			goto err;
+			if (!dyn_sdp)
+				goto err;
+			return 0;
 		}
 
 		pool = tee_rstmem_gen_pool_alloc(res.result.start,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
                   ` (5 preceding siblings ...)
  2024-12-17 10:07 ` [PATCH v4 6/6] optee: smc abi: " Jens Wiklander
@ 2024-12-18 11:06 ` Simona Vetter
  2024-12-24  6:35   ` Sumit Garg
  2025-02-12 19:56 ` Boris Brezillon
  7 siblings, 1 reply; 40+ messages in thread
From: Simona Vetter @ 2024-12-18 11:06 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
> Hi,
> 
> This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> 
> The TEE subsystem handles the DMA-buf allocations since it is the TEE
> (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> restrictions for the memory used for the DMA-bufs.
> 
> I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> how to allocate the restricted physical memory.
> 
> TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
> a use-case parameter. This is used by the backend TEE driver to decide on
> allocation policy and which devices should be able to access the memory.
> 
> Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
> Recording) has been identified so far to serve as examples of what can be
> expected. More use-cases can be added in userspace ABI, but it's up to the
> backend TEE drivers to provide the implementation.
> 
> Each use-case has it's own restricted memory pool since different use-cases
> requires isolation from different parts of the system. A restricted memory
> pool can be based on a static carveout instantiated while probing the TEE
> backend driver, or dynamically allocated from CMA and made restricted as
> needed by the TEE.
> 
> This can be tested on QEMU with the following steps:
> repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
>         -b prototype/sdp-v4
> repo sync -j8
> cd build
> make toolchains -j$(nproc)
> make SPMC_AT_EL=1 all -j$(nproc)
> make SPMC_AT_EL=1 run-only
> # login and at the prompt:
> xtest --sdp-basic
> 
> The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
> S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
> without FF-A using the original SMC ABI instead. Please remember to do
> %rm -rf ../trusted-firmware-a/build/qemu
> for TF-A to be rebuilt properly using the new configuration.
> 
> https://optee.readthedocs.io/en/latest/building/prerequisites.html
> list dependencies needed to build the above.
> 
> The tests are pretty basic, mostly checking that a Trusted Application in
> the secure world can access and manipulate the memory. There are also some
> negative tests for out of bounds buffers etc.

I think I've dropped this on earlier encrypted dma-buf discussions for
TEE, but can't find one right now ...

Do we have some open source userspace for this? To my knowledge we have
two implementations of encrypted/content protected dma-buf in upstream
right now in the amd and intel gpu drivers, and unless I'm mistaken they
both have some minimal userspace supporting EXT_protected_textures:

https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt

It's not great, but it does just barely clear the bar in my opinion. I
guess something in gstreamer or similar video pipeline framework would
also do the job.

Especially with the context of the uapi discussion in the v1/RFC thread I
think we need more than a bare-bones testcase to make sure this works in
actual use.

Cheers, Sima

> 
> Thanks,
> Jens
> 
> Changes since V3:
> * Make the use_case and flags field in struct tee_shm u32's instead of
>   u16's
> * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
> * Import namespace DMA_BUF in module tee, reported by lkp@intel.com
> * Added a note in the commit message for "optee: account for direction
>   while converting parameters" why it's needed
> * Factor out dynamic restricted memory allocation from
>   "optee: support restricted memory allocation" into two new commits
>   "optee: FF-A: dynamic restricted memory allocation" and
>   "optee: smc abi: dynamic restricted memory allocation"
> * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
>   restricted memory allocate if CMA isn't configured
> 
> Changes since the V2 RFC:
> * Based on v6.12
> * Replaced the flags for SVP and Trusted UID memory with a u32 field with
>   unique id for each use case
> * Added dynamic allocation of restricted memory pools
> * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
> * Added support for FF-A with FFA_LEND
> 
> Changes since the V1 RFC:
> * Based on v6.11
> * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
> 
> Changes since Olivier's post [2]:
> * Based on Yong Wu's post [1] where much of dma-buf handling is done in
>   the generic restricted heap
> * Simplifications and cleanup
> * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
>   support"
> * Replaced the word "secure" with "restricted" where applicable
> 
> Jens Wiklander (6):
>   tee: add restricted memory allocation
>   optee: account for direction while converting parameters
>   optee: sync secure world ABI headers
>   optee: support restricted memory allocation
>   optee: FF-A: dynamic restricted memory allocation
>   optee: smc abi: dynamic restricted memory allocation
> 
>  drivers/tee/Makefile              |   1 +
>  drivers/tee/optee/Makefile        |   1 +
>  drivers/tee/optee/call.c          |  10 +-
>  drivers/tee/optee/core.c          |   1 +
>  drivers/tee/optee/ffa_abi.c       | 178 +++++++++++++-
>  drivers/tee/optee/optee_ffa.h     |  27 ++-
>  drivers/tee/optee/optee_msg.h     |  65 ++++-
>  drivers/tee/optee/optee_private.h |  75 ++++--
>  drivers/tee/optee/optee_smc.h     |  71 +++++-
>  drivers/tee/optee/rpc.c           |  31 ++-
>  drivers/tee/optee/rstmem.c        | 388 ++++++++++++++++++++++++++++++
>  drivers/tee/optee/smc_abi.c       | 213 ++++++++++++++--
>  drivers/tee/tee_core.c            |  38 ++-
>  drivers/tee/tee_private.h         |   2 +
>  drivers/tee/tee_rstmem.c          | 201 ++++++++++++++++
>  drivers/tee/tee_shm.c             |   2 +
>  drivers/tee/tee_shm_pool.c        |  69 +++++-
>  include/linux/tee_core.h          |  15 ++
>  include/linux/tee_drv.h           |   2 +
>  include/uapi/linux/tee.h          |  44 +++-
>  20 files changed, 1358 insertions(+), 76 deletions(-)
>  create mode 100644 drivers/tee/optee/rstmem.c
>  create mode 100644 drivers/tee/tee_rstmem.c
> 
> 
> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> -- 
> 2.43.0
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation
  2024-12-17 10:07 ` [PATCH v4 5/6] optee: FF-A: dynamic " Jens Wiklander
@ 2024-12-20 14:38   ` kernel test robot
  2024-12-21  9:40   ` kernel test robot
  1 sibling, 0 replies; 40+ messages in thread
From: kernel test robot @ 2024-12-20 14:38 UTC (permalink / raw)
  To: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel
  Cc: llvm, oe-kbuild-all, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Jens Wiklander

Hi Jens,

kernel test robot noticed the following build errors:

[auto build test ERROR on fac04efc5c793dccbd07e2d59af9f90b7fc0dca4]

url:    https://github.com/intel-lab-lkp/linux/commits/Jens-Wiklander/tee-add-restricted-memory-allocation/20241217-181101
base:   fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
patch link:    https://lore.kernel.org/r/20241217100809.3962439-6-jens.wiklander%40linaro.org
patch subject: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation
config: arm64-randconfig-002-20241220 (https://download.01.org/0day-ci/archive/20241220/202412202214.OdgzpOke-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241220/202412202214.OdgzpOke-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412202214.OdgzpOke-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/tee/optee/rstmem.c:8:
   In file included from include/linux/dma-map-ops.h:9:
   In file included from include/linux/dma-mapping.h:8:
   In file included from include/linux/scatterlist.h:8:
   In file included from include/linux/mm.h:2223:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
>> drivers/tee/optee/rstmem.c:311:67: error: expected ')'
     311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
         |                                                                   ^
   drivers/tee/optee/rstmem.c:311:46: note: to match this '('
     311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
         |                                              ^
>> drivers/tee/optee/rstmem.c:332:35: error: too many arguments to function call, expected single argument 'optee', have 2 arguments
     332 |                 pool = alloc_rstmem_pool(optee, use_case);
         |                        ~~~~~~~~~~~~~~~~~        ^~~~~~~~
   drivers/tee/optee/rstmem.c:311:29: note: 'alloc_rstmem_pool' declared here
     311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
         |                             ^                 ~~~~~~~~~~~~~~~~~~~
   1 warning and 2 errors generated.


vim +311 drivers/tee/optee/rstmem.c

   268	
   269	static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case)
   270	{
   271		struct optee_rstmem_cma_pool *rp;
   272		size_t min_size;
   273		int rc;
   274	
   275		rp = kzalloc(sizeof(*rp), GFP_KERNEL);
   276		if (!rp)
   277			return ERR_PTR(-ENOMEM);
   278		rp->rstmem.use_case = use_case;
   279	
   280		rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL,
   281				       &rp->end_point_count);
   282		if (rc) {
   283			if (rc != -ENOSPC)
   284				goto err;
   285			rp->end_points = kcalloc(rp->end_point_count,
   286						 sizeof(*rp->end_points), GFP_KERNEL);
   287			if (!rp->end_points) {
   288				rc = -ENOMEM;
   289				goto err;
   290			}
   291			rc = get_rstmem_config(optee, use_case, &min_size, &rp->align,
   292					       rp->end_points, &rp->end_point_count);
   293			if (rc)
   294				goto err_kfree_eps;
   295		}
   296	
   297		rp->pool.ops = &rstmem_pool_ops_cma;
   298		rp->optee = optee;
   299		rp->page_count = min_size / PAGE_SIZE;
   300		mutex_init(&rp->mutex);
   301	
   302		return &rp->pool;
   303	
   304	err_kfree_eps:
   305		kfree(rp->end_points);
   306	err:
   307		kfree(rp);
   308		return ERR_PTR(rc);
   309	}
   310	#else /*CONFIG_CMA*/
 > 311	static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
   312						      u32 use_case __unused)
   313	{
   314		return ERR_PTR(-EINVAL);
   315	}
   316	#endif /*CONFIG_CMA*/
   317	
   318	int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
   319			       u32 flags, u32 use_case, size_t size)
   320	{
   321		struct optee *optee = tee_get_drvdata(ctx->teedev);
   322		struct tee_shm_pool *pool;
   323		int rc;
   324	
   325		if (!optee->rstmem_pools)
   326			return -EINVAL;
   327		if (flags)
   328			return -EINVAL;
   329	
   330		pool = xa_load(&optee->rstmem_pools->xa, use_case);
   331		if (!pool) {
 > 332			pool = alloc_rstmem_pool(optee, use_case);
   333			if (IS_ERR(pool))
   334				return PTR_ERR(pool);
   335			rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool,
   336				       GFP_KERNEL);
   337			if (rc) {
   338				pool->ops->destroy_pool(pool);
   339				return rc;
   340			}
   341		}
   342	
   343		return pool->ops->alloc(pool, shm, size, 0);
   344	}
   345	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation
  2024-12-17 10:07 ` [PATCH v4 5/6] optee: FF-A: dynamic " Jens Wiklander
  2024-12-20 14:38   ` kernel test robot
@ 2024-12-21  9:40   ` kernel test robot
  1 sibling, 0 replies; 40+ messages in thread
From: kernel test robot @ 2024-12-21  9:40 UTC (permalink / raw)
  To: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel
  Cc: oe-kbuild-all, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Jens Wiklander

Hi Jens,

kernel test robot noticed the following build errors:

[auto build test ERROR on fac04efc5c793dccbd07e2d59af9f90b7fc0dca4]

url:    https://github.com/intel-lab-lkp/linux/commits/Jens-Wiklander/tee-add-restricted-memory-allocation/20241217-181101
base:   fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
patch link:    https://lore.kernel.org/r/20241217100809.3962439-6-jens.wiklander%40linaro.org
patch subject: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation
config: arm64-randconfig-r112-20241221 (https://download.01.org/0day-ci/archive/20241221/202412211952.LOCC8Lbo-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 14.2.0
reproduce: (https://download.01.org/0day-ci/archive/20241221/202412211952.LOCC8Lbo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412211952.LOCC8Lbo-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/tee/optee/rstmem.c:311:67: error: expected ';', ',' or ')' before '__unused'
     311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
         |                                                                   ^~~~~~~~
   drivers/tee/optee/rstmem.c: In function 'optee_rstmem_alloc':
>> drivers/tee/optee/rstmem.c:332:24: error: implicit declaration of function 'alloc_rstmem_pool' [-Wimplicit-function-declaration]
     332 |                 pool = alloc_rstmem_pool(optee, use_case);
         |                        ^~~~~~~~~~~~~~~~~
>> drivers/tee/optee/rstmem.c:332:22: error: assignment to 'struct tee_shm_pool *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     332 |                 pool = alloc_rstmem_pool(optee, use_case);
         |                      ^


vim +311 drivers/tee/optee/rstmem.c

   268	
   269	static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case)
   270	{
   271		struct optee_rstmem_cma_pool *rp;
   272		size_t min_size;
   273		int rc;
   274	
   275		rp = kzalloc(sizeof(*rp), GFP_KERNEL);
   276		if (!rp)
   277			return ERR_PTR(-ENOMEM);
   278		rp->rstmem.use_case = use_case;
   279	
   280		rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL,
   281				       &rp->end_point_count);
   282		if (rc) {
   283			if (rc != -ENOSPC)
   284				goto err;
   285			rp->end_points = kcalloc(rp->end_point_count,
   286						 sizeof(*rp->end_points), GFP_KERNEL);
   287			if (!rp->end_points) {
   288				rc = -ENOMEM;
   289				goto err;
   290			}
   291			rc = get_rstmem_config(optee, use_case, &min_size, &rp->align,
   292					       rp->end_points, &rp->end_point_count);
   293			if (rc)
   294				goto err_kfree_eps;
   295		}
   296	
   297		rp->pool.ops = &rstmem_pool_ops_cma;
   298		rp->optee = optee;
   299		rp->page_count = min_size / PAGE_SIZE;
   300		mutex_init(&rp->mutex);
   301	
   302		return &rp->pool;
   303	
   304	err_kfree_eps:
   305		kfree(rp->end_points);
   306	err:
   307		kfree(rp);
   308		return ERR_PTR(rc);
   309	}
   310	#else /*CONFIG_CMA*/
 > 311	static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
   312						      u32 use_case __unused)
   313	{
   314		return ERR_PTR(-EINVAL);
   315	}
   316	#endif /*CONFIG_CMA*/
   317	
   318	int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm,
   319			       u32 flags, u32 use_case, size_t size)
   320	{
   321		struct optee *optee = tee_get_drvdata(ctx->teedev);
   322		struct tee_shm_pool *pool;
   323		int rc;
   324	
   325		if (!optee->rstmem_pools)
   326			return -EINVAL;
   327		if (flags)
   328			return -EINVAL;
   329	
   330		pool = xa_load(&optee->rstmem_pools->xa, use_case);
   331		if (!pool) {
 > 332			pool = alloc_rstmem_pool(optee, use_case);
   333			if (IS_ERR(pool))
   334				return PTR_ERR(pool);
   335			rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool,
   336				       GFP_KERNEL);
   337			if (rc) {
   338				pool->ops->destroy_pool(pool);
   339				return rc;
   340			}
   341		}
   342	
   343		return pool->ops->alloc(pool, shm, size, 0);
   344	}
   345	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2024-12-18 11:06 ` [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Simona Vetter
@ 2024-12-24  6:35   ` Sumit Garg
       [not found]     ` <Z2p-v-xjhzhPso6u@wunner.de>
  2025-01-08 16:57     ` Simona Vetter
  0 siblings, 2 replies; 40+ messages in thread
From: Sumit Garg @ 2024-12-24  6:35 UTC (permalink / raw)
  To: simona.vetter
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

Hi Simona,

On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote:
>
> On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
> > Hi,
> >
> > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> >
> > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > restrictions for the memory used for the DMA-bufs.
> >
> > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > how to allocate the restricted physical memory.
> >
> > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
> > a use-case parameter. This is used by the backend TEE driver to decide on
> > allocation policy and which devices should be able to access the memory.
> >
> > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
> > Recording) has been identified so far to serve as examples of what can be
> > expected. More use-cases can be added in userspace ABI, but it's up to the
> > backend TEE drivers to provide the implementation.
> >
> > Each use-case has it's own restricted memory pool since different use-cases
> > requires isolation from different parts of the system. A restricted memory
> > pool can be based on a static carveout instantiated while probing the TEE
> > backend driver, or dynamically allocated from CMA and made restricted as
> > needed by the TEE.
> >
> > This can be tested on QEMU with the following steps:
> > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
> >         -b prototype/sdp-v4
> > repo sync -j8
> > cd build
> > make toolchains -j$(nproc)
> > make SPMC_AT_EL=1 all -j$(nproc)
> > make SPMC_AT_EL=1 run-only
> > # login and at the prompt:
> > xtest --sdp-basic
> >
> > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
> > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
> > without FF-A using the original SMC ABI instead. Please remember to do
> > %rm -rf ../trusted-firmware-a/build/qemu
> > for TF-A to be rebuilt properly using the new configuration.
> >
> > https://optee.readthedocs.io/en/latest/building/prerequisites.html
> > list dependencies needed to build the above.
> >
> > The tests are pretty basic, mostly checking that a Trusted Application in
> > the secure world can access and manipulate the memory. There are also some
> > negative tests for out of bounds buffers etc.
>
> I think I've dropped this on earlier encrypted dma-buf discussions for
> TEE, but can't find one right now ...

Thanks for raising this query.

>
> Do we have some open source userspace for this? To my knowledge we have
> two implementations of encrypted/content protected dma-buf in upstream
> right now in the amd and intel gpu drivers, and unless I'm mistaken they
> both have some minimal userspace supporting EXT_protected_textures:

First of all to clarify the support Jens is adding here for allocating
restricted shared memory allocation in TEE subsystem is meant to be
generic and not specific to only secure media pipeline use-case. Then
here we not only have open source test applications but rather open
source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a
core feature where we maintain a stable and extensible ABI among the
kernel and the OP-TEE core.

Restricted memory is a feature enforced by hardware specific firewalls
where a particular TEE implementation governs which particular block
of memory is accessible to a particular peripheral or a CPU running in
a higher privileged mode than the Linux kernel. There can be numeric
use-cases surrounding that as follows:

- Secure media pipeline where the contents gets decrypted and stored
in a restricted buffer which are then accessible only to media display
pipeline peripherals.
- Trusted user interface where a peripheral takes input from the user
and stores it in a restricted buffer which then is accessible to TEE
implementation only.
- Another possible use-case can be for the TEE implementation to store
key material in a restricted buffer which is only accessible to the
hardware crypto accelerator.

I am sure there will be more use-cases related to this feature but
those will only be possible once we provide a stable and extensible
restricted memory interface among the Linux user-space and the secure
world user-space (normally referred to as Trusted Applications).

[1] https://github.com/OP-TEE/optee_os/pull/7159

>
> https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt
>
> It's not great, but it does just barely clear the bar in my opinion. I
> guess something in gstreamer or similar video pipeline framework would
> also do the job.
>
> Especially with the context of the uapi discussion in the v1/RFC thread I
> think we need more than a bare-bones testcase to make sure this works in
> actual use.

Currently the TEE subsystem already supports a stable ABI for shared
memory allocator among Linux user-space and secure world user-space
here [2]. And the stable ABI for restricted memory is also along the
same lines meant to be a vendor neutral abstraction for the user-space
access. The current test cases not only test the interface but also
perform regression tests too.

I am also in favour of end to end open source use-cases. But I fear
without progressing in a step wise manner as with this proposal we
would rather force developers to upstream all the software pieces in
one go which will be kind of a chicken and egg situation. I am sure
once this feature lands Mediatek folks will be interested to port
their secure video playback patchset [3] on top of it. Similarly other
silicon vendors like NXP, Qcom etc. will be motivated to do the same.

[2] https://docs.kernel.org/userspace-api/tee.html
[3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/

-Sumit

>
> Cheers, Sima
>
> >
> > Thanks,
> > Jens
> >
> > Changes since V3:
> > * Make the use_case and flags field in struct tee_shm u32's instead of
> >   u16's
> > * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
> > * Import namespace DMA_BUF in module tee, reported by lkp@intel.com
> > * Added a note in the commit message for "optee: account for direction
> >   while converting parameters" why it's needed
> > * Factor out dynamic restricted memory allocation from
> >   "optee: support restricted memory allocation" into two new commits
> >   "optee: FF-A: dynamic restricted memory allocation" and
> >   "optee: smc abi: dynamic restricted memory allocation"
> > * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
> >   restricted memory allocate if CMA isn't configured
> >
> > Changes since the V2 RFC:
> > * Based on v6.12
> > * Replaced the flags for SVP and Trusted UID memory with a u32 field with
> >   unique id for each use case
> > * Added dynamic allocation of restricted memory pools
> > * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
> > * Added support for FF-A with FFA_LEND
> >
> > Changes since the V1 RFC:
> > * Based on v6.11
> > * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
> >
> > Changes since Olivier's post [2]:
> > * Based on Yong Wu's post [1] where much of dma-buf handling is done in
> >   the generic restricted heap
> > * Simplifications and cleanup
> > * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
> >   support"
> > * Replaced the word "secure" with "restricted" where applicable
> >
> > Jens Wiklander (6):
> >   tee: add restricted memory allocation
> >   optee: account for direction while converting parameters
> >   optee: sync secure world ABI headers
> >   optee: support restricted memory allocation
> >   optee: FF-A: dynamic restricted memory allocation
> >   optee: smc abi: dynamic restricted memory allocation
> >
> >  drivers/tee/Makefile              |   1 +
> >  drivers/tee/optee/Makefile        |   1 +
> >  drivers/tee/optee/call.c          |  10 +-
> >  drivers/tee/optee/core.c          |   1 +
> >  drivers/tee/optee/ffa_abi.c       | 178 +++++++++++++-
> >  drivers/tee/optee/optee_ffa.h     |  27 ++-
> >  drivers/tee/optee/optee_msg.h     |  65 ++++-
> >  drivers/tee/optee/optee_private.h |  75 ++++--
> >  drivers/tee/optee/optee_smc.h     |  71 +++++-
> >  drivers/tee/optee/rpc.c           |  31 ++-
> >  drivers/tee/optee/rstmem.c        | 388 ++++++++++++++++++++++++++++++
> >  drivers/tee/optee/smc_abi.c       | 213 ++++++++++++++--
> >  drivers/tee/tee_core.c            |  38 ++-
> >  drivers/tee/tee_private.h         |   2 +
> >  drivers/tee/tee_rstmem.c          | 201 ++++++++++++++++
> >  drivers/tee/tee_shm.c             |   2 +
> >  drivers/tee/tee_shm_pool.c        |  69 +++++-
> >  include/linux/tee_core.h          |  15 ++
> >  include/linux/tee_drv.h           |   2 +
> >  include/uapi/linux/tee.h          |  44 +++-
> >  20 files changed, 1358 insertions(+), 76 deletions(-)
> >  create mode 100644 drivers/tee/optee/rstmem.c
> >  create mode 100644 drivers/tee/tee_rstmem.c
> >
> >
> > base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> > --
> > 2.43.0
> >
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
       [not found]       ` <Z2p_ucfNOyKcbWhH@wunner.de>
@ 2024-12-24 10:00         ` Dmitry Baryshkov
  0 siblings, 0 replies; 40+ messages in thread
From: Dmitry Baryshkov @ 2024-12-24 10:00 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sumit Garg, simona.vetter, Jens Wiklander, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel,
	Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi

On Tue, Dec 24, 2024 at 10:32:41AM +0100, Lukas Wunner wrote:
> On Tue, Dec 24, 2024 at 10:28:31AM +0100, Lukas Wunner wrote:
> > I did raise a concern about this to the maintainer, but to no avail:
> > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
> 
> Sorry, wrong link.  This is the one I meant to copy-paste... :(
> 
> https://lore.kernel.org/r/Z0rPxCGdD7r8HFKb@wunner.de/

Herbert asked a logical question, which got no response from your side.

-- 
With best wishes
Dmitry


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
       [not found]     ` <Z2p-v-xjhzhPso6u@wunner.de>
       [not found]       ` <Z2p_ucfNOyKcbWhH@wunner.de>
@ 2024-12-26  5:59       ` Sumit Garg
       [not found]         ` <Z209ZegsmgN1xlNG@wunner.de>
  1 sibling, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2024-12-26  5:59 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: simona.vetter, Jens Wiklander, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

Hi Lukas,

On Tue, 24 Dec 2024 at 14:58, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
> > Restricted memory is a feature enforced by hardware specific firewalls
> > where a particular TEE implementation governs which particular block
> > of memory is accessible to a particular peripheral or a CPU running in
> > a higher privileged mode than the Linux kernel.
> [...]
> > - Another possible use-case can be for the TEE implementation to store
> > key material in a restricted buffer which is only accessible to the
> > hardware crypto accelerator.
>
> Just a heads-up:
>
> For RSA sign/verify operations using rsassa-pkcs1 encoding,
> the message to be signed/verified (which I understand could
> be located in restricted memory) is prepended by a padding.
>
> The crypto subsystem does the prepending of the padding in software.
> The actual signature generation/verification (which is an RSA encrypt
> or decrypt operation) may be performed in hardware by a crypto
> accelerator.
>
> Before commit 8552cb04e083 ("crypto: rsassa-pkcs1 - Copy source
> data for SG list"), the kernel constructed a scatterlist
> consisting of the padding on the one hand, and of the message
> to be signed/verified on the other hand.  I believe this worked
> for use cases where the message is located in restricted memory.
>
> However since that commit, the kernel kmalloc's a new buffer and
> copies the message to be signed/verified into it.  The argument
> was that although the *kernel* may be able to access the data,
> the crypto accelerator may *not* be able to do so.  In particular,
> portions of the padding are located in the kernel's .rodata section
> which is a valid virtual address on x86 but not on arm64 and
> which may be inaccessible to a crypto accelerator.
>
> However in the case of restricted memory, the situation is exactly
> the opposite:  The kernel may *not* be able to access the data,
> but the crypto accelerator can access it just fine.
>
> I did raise a concern about this to the maintainer, but to no avail:
> https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/

Herbert's point is valid that there isn't any point for mapping
restricted memory in the kernel virtual address space as any kernel
access to that space can lead to platform specific hardware error
scenarios. And for that reason we simply disallow dma_buf_mmap() and
don't support dma_buf_vmap() for DMA-bufs holding TEE restricted
memory. The only consumers for those DMA-bufs will be the DMA capable
peripherals granted access permissions by the TEE implementation. IOW,
kernel role here will be to just provide the DMA-buf infrastructure
for buffers to be set up by TEE and then setting up DMA addresses for
peripherals to access them. The hardware crypto accelerator can be one
such peripheral.

>
> This is the alternative solution I would have preferred:
> https://lore.kernel.org/r/3de5d373c86dcaa5abc36f501c1398c4fbf05f2f.1732865109.git.lukas@wunner.de/
>
> > I am also in favour of end to end open source use-cases. But I fear
> > without progressing in a step wise manner as with this proposal we
> > would rather force developers to upstream all the software pieces in
> > one go which will be kind of a chicken and egg situation. I am sure
> > once this feature lands Mediatek folks will be interested to port
> > their secure video playback patchset [3] on top of it. Similarly other
> > silicon vendors like NXP, Qcom etc. will be motivated to do the same.
>
> The crypto use case may be easier to bring up than the video decoding
> use case because you don't need to implement a huge amount of
> user space code.

Agree, if you already have such an existing hardware use-case then
please feel free to build up on this patch-set.

-Sumit

>
> Thanks,
>
> Lukas


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/6] tee: add restricted memory allocation
  2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
@ 2025-01-08 16:54   ` Simona Vetter
  2025-01-09  7:17     ` Jens Wiklander
  2025-02-13  8:44   ` Boris Brezillon
  1 sibling, 1 reply; 40+ messages in thread
From: Simona Vetter @ 2025-01-08 16:54 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

On Tue, Dec 17, 2024 at 11:07:37AM +0100, Jens Wiklander wrote:
> Add restricted memory allocation to the TEE subsystem.
> 
> Restricted memory refers to memory buffers behind a hardware enforced
> firewall. It is not accessible to the kernel during normal circumstances
> but rather only accessible to certain hardware IPs or CPUs executing in
> higher privileged mode than the kernel itself. This interface allows to
> allocate and manage such restricted memory buffers via interaction with
> a TEE implementation.
> 
> A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted
> memory buffers.
> 
> The restricted memory is allocated for a specific use-case, like Secure
> Video Playback, Trusted UI, or Secure Video Recording where certain
> hardware devices can access the memory.
> 
> More use-cases can be added in userspace ABI, but it's up to the backend
> drivers to provide the implementation.
> 
> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> ---
>  drivers/tee/Makefile       |   1 +
>  drivers/tee/tee_core.c     |  38 ++++++-
>  drivers/tee/tee_private.h  |   2 +
>  drivers/tee/tee_rstmem.c   | 201 +++++++++++++++++++++++++++++++++++++
>  drivers/tee/tee_shm.c      |   2 +
>  drivers/tee/tee_shm_pool.c |  69 ++++++++++++-
>  include/linux/tee_core.h   |  15 +++
>  include/linux/tee_drv.h    |   2 +
>  include/uapi/linux/tee.h   |  44 +++++++-
>  9 files changed, 370 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/tee/tee_rstmem.c
> 
> diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile
> index 5488cba30bd2..a4c6b55444b9 100644
> --- a/drivers/tee/Makefile
> +++ b/drivers/tee/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o
>  tee-objs += tee_core.o
>  tee-objs += tee_shm.o
>  tee-objs += tee_shm_pool.o
> +tee-objs += tee_rstmem.o
>  obj-$(CONFIG_OPTEE) += optee/
>  obj-$(CONFIG_AMDTEE) += amdtee/
>  obj-$(CONFIG_ARM_TSTEE) += tstee/
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index d113679b1e2d..f4a45b77753b 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -1,12 +1,13 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /*
> - * Copyright (c) 2015-2016, Linaro Limited
> + * Copyright (c) 2015-2022, 2024, Linaro Limited
>   */
>  
>  #define pr_fmt(fmt) "%s: " fmt, __func__
>  
>  #include <linux/cdev.h>
>  #include <linux/cred.h>
> +#include <linux/dma-buf.h>
>  #include <linux/fs.h>
>  #include <linux/idr.h>
>  #include <linux/module.h>
> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx,
>  	return rc;
>  }
>  
> +static int
> +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
> +		       struct tee_ioctl_rstmem_alloc_data __user *udata)
> +{
> +	struct tee_ioctl_rstmem_alloc_data data;
> +	struct dma_buf *dmabuf;
> +	int id;
> +	int fd;
> +
> +	if (copy_from_user(&data, udata, sizeof(data)))
> +		return -EFAULT;
> +
> +	if (data.use_case == TEE_IOC_UC_RESERVED)
> +		return -EINVAL;
> +
> +	dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
> +				  &id);
> +	if (IS_ERR(dmabuf))
> +		return PTR_ERR(dmabuf);
> +	if (put_user(id, &udata->id)) {
> +		fd = -EFAULT;
> +		goto err;
> +	}
> +	fd = dma_buf_fd(dmabuf, O_CLOEXEC);
> +	if (fd < 0)
> +		goto err;
> +	return fd;
> +err:
> +	dma_buf_put(dmabuf);
> +	return fd;
> +}
> +
>  static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  {
>  	struct tee_context *ctx = filp->private_data;
> @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		return tee_ioctl_supp_recv(ctx, uarg);
>  	case TEE_IOC_SUPPL_SEND:
>  		return tee_ioctl_supp_send(ctx, uarg);
> +	case TEE_IOC_RSTMEM_ALLOC:
> +		return tee_ioctl_rstmem_alloc(ctx, uarg);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro");
>  MODULE_DESCRIPTION("TEE Driver");
>  MODULE_VERSION("1.0");
>  MODULE_LICENSE("GPL v2");
> +MODULE_IMPORT_NS("DMA_BUF");
> diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
> index 9bc50605227c..bf97796909c0 100644
> --- a/drivers/tee/tee_private.h
> +++ b/drivers/tee/tee_private.h
> @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx);
>  struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
>  struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
>  					  unsigned long addr, size_t length);
> +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> +				 u32 use_case, size_t size, int *shm_id);
>  
>  #endif /*TEE_PRIVATE_H*/
> diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c
> new file mode 100644
> index 000000000000..536bca2901e2
> --- /dev/null
> +++ b/drivers/tee/tee_rstmem.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024 Linaro Limited
> + */
> +#include <linux/device.h>
> +#include <linux/dma-buf.h>
> +#include <linux/genalloc.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/tee_core.h>
> +#include "tee_private.h"
> +
> +struct tee_rstmem_attachment {
> +	struct sg_table table;
> +	struct device *dev;
> +};
> +
> +static int rstmem_dma_attach(struct dma_buf *dmabuf,
> +			     struct dma_buf_attachment *attachment)
> +{
> +	struct tee_shm *shm = dmabuf->priv;
> +	struct tee_rstmem_attachment *a;
> +	int rc;
> +
> +	a = kzalloc(sizeof(*a), GFP_KERNEL);
> +	if (!a)
> +		return -ENOMEM;
> +
> +	if (shm->pages) {
> +		rc = sg_alloc_table_from_pages(&a->table, shm->pages,
> +					       shm->num_pages, 0,
> +					       shm->num_pages * PAGE_SIZE,
> +					       GFP_KERNEL);
> +		if (rc)
> +			goto err;
> +	} else {
> +		rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
> +		if (rc)
> +			goto err;
> +		sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
> +			    0);
> +	}
> +
> +	a->dev = attachment->dev;
> +	attachment->priv = a;
> +
> +	return 0;
> +err:
> +	kfree(a);
> +	return rc;
> +}
> +
> +static void rstmem_dma_detach(struct dma_buf *dmabuf,
> +			      struct dma_buf_attachment *attachment)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +
> +	sg_free_table(&a->table);
> +	kfree(a);
> +}
> +
> +static struct sg_table *
> +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
> +		       enum dma_data_direction direction)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +	int ret;
> +
> +	ret = dma_map_sgtable(attachment->dev, &a->table, direction,
> +			      DMA_ATTR_SKIP_CPU_SYNC);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return &a->table;
> +}
> +
> +static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
> +				     struct sg_table *table,
> +				     enum dma_data_direction direction)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +
> +	WARN_ON(&a->table != table);
> +
> +	dma_unmap_sgtable(attachment->dev, table, direction,
> +			  DMA_ATTR_SKIP_CPU_SYNC);
> +}
> +
> +static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> +					   enum dma_data_direction direction)
> +{
> +	return -EPERM;
> +}
> +
> +static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> +					 enum dma_data_direction direction)
> +{
> +	return -EPERM;
> +}
> +
> +static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
> +			       struct vm_area_struct *vma)
> +{
> +	return -EPERM;

Just an aside, you should just not implment mmap if it's not possible.
Then you also don't need the cpu access functions. Both userspace mmap and
in-kernel vmap support is intentionally optional for dma-buf.
-Sima

> +}
> +
> +static void rstmem_dma_buf_free(struct dma_buf *dmabuf)
> +{
> +	struct tee_shm *shm = dmabuf->priv;
> +
> +	tee_shm_put(shm);
> +}
> +
> +static const struct dma_buf_ops rstmem_generic_buf_ops = {
> +	.attach = rstmem_dma_attach,
> +	.detach = rstmem_dma_detach,
> +	.map_dma_buf = rstmem_dma_map_dma_buf,
> +	.unmap_dma_buf = rstmem_dma_unmap_dma_buf,
> +	.begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
> +	.end_cpu_access = rstmem_dma_buf_end_cpu_access,
> +	.mmap = rstmem_dma_buf_mmap,
> +	.release = rstmem_dma_buf_free,
> +};
> +
> +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> +				 u32 use_case, size_t size, int *shm_id)
> +{
> +	struct tee_device *teedev = ctx->teedev;
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	struct dma_buf *dmabuf;
> +	struct tee_shm *shm;
> +	void *ret;
> +	int rc;
> +
> +	if (!tee_device_get(teedev))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!teedev->desc->ops->rstmem_alloc ||
> +	    !teedev->desc->ops->rstmem_free) {
> +		dmabuf = ERR_PTR(-EINVAL);
> +		goto err;
> +	}
> +
> +	shm = kzalloc(sizeof(*shm), GFP_KERNEL);
> +	if (!shm) {
> +		dmabuf = ERR_PTR(-ENOMEM);
> +		goto err;
> +	}
> +
> +	refcount_set(&shm->refcount, 1);
> +	shm->flags = TEE_SHM_RESTRICTED;
> +	shm->use_case = use_case;
> +	shm->ctx = ctx;
> +
> +	mutex_lock(&teedev->mutex);
> +	shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
> +	mutex_unlock(&teedev->mutex);
> +	if (shm->id < 0) {
> +		dmabuf = ERR_PTR(shm->id);
> +		goto err_kfree;
> +	}
> +
> +	rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
> +	if (rc) {
> +		dmabuf = ERR_PTR(rc);
> +		goto err_idr_remove;
> +	}
> +
> +	mutex_lock(&teedev->mutex);
> +	ret = idr_replace(&teedev->idr, shm, shm->id);
> +	mutex_unlock(&teedev->mutex);
> +	if (IS_ERR(ret)) {
> +		dmabuf = ret;
> +		goto err_rstmem_free;
> +	}
> +	teedev_ctx_get(ctx);
> +
> +	exp_info.ops = &rstmem_generic_buf_ops;
> +	exp_info.size = shm->size;
> +	exp_info.priv = shm;
> +	dmabuf = dma_buf_export(&exp_info);
> +	if (IS_ERR(dmabuf)) {
> +		tee_shm_put(shm);
> +		return dmabuf;
> +	}
> +
> +	*shm_id = shm->id;
> +	return dmabuf;
> +
> +err_rstmem_free:
> +	teedev->desc->ops->rstmem_free(ctx, shm);
> +err_idr_remove:
> +	mutex_lock(&teedev->mutex);
> +	idr_remove(&teedev->idr, shm->id);
> +	mutex_unlock(&teedev->mutex);
> +err_kfree:
> +	kfree(shm);
> +err:
> +	tee_device_put(teedev);
> +	return dmabuf;
> +}
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index daf6e5cfd59a..416f7f25d885 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm)
>  				"unregister shm %p failed: %d", shm, rc);
>  
>  		release_registered_pages(shm);
> +	} else if (shm->flags & TEE_SHM_RESTRICTED) {
> +		teedev->desc->ops->rstmem_free(shm->ctx, shm);
>  	}
>  
>  	teedev_ctx_put(shm->ctx);
> diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c
> index 80004b55628d..ee57ef157a77 100644
> --- a/drivers/tee/tee_shm_pool.c
> +++ b/drivers/tee/tee_shm_pool.c
> @@ -1,9 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /*
> - * Copyright (c) 2015, 2017, 2022 Linaro Limited
> + * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
>   */
>  #include <linux/device.h>
> -#include <linux/dma-buf.h>
>  #include <linux/genalloc.h>
>  #include <linux/slab.h>
>  #include <linux/tee_core.h>
> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr,
>  	return ERR_PTR(rc);
>  }
>  EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
> +
> +static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
> +				    struct tee_shm *shm, size_t size,
> +				    size_t align)
> +{
> +	size_t sz = ALIGN(size, PAGE_SIZE);
> +	phys_addr_t pa;
> +
> +	pa = gen_pool_alloc(pool->private_data, sz);
> +	if (!pa)
> +		return -ENOMEM;
> +
> +	shm->size = sz;
> +	shm->paddr = pa;
> +
> +	return 0;
> +}
> +
> +static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
> +				    struct tee_shm *shm)
> +{
> +	gen_pool_free(pool->private_data, shm->paddr, shm->size);
> +	shm->paddr = 0;
> +}
> +
> +static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
> +	.alloc = rstmem_pool_op_gen_alloc,
> +	.free = rstmem_pool_op_gen_free,
> +	.destroy_pool = pool_op_gen_destroy_pool,
> +};
> +
> +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size)
> +{
> +	const size_t page_mask = PAGE_SIZE - 1;
> +	struct tee_shm_pool *pool;
> +	int rc;
> +
> +	/* Check it's page aligned */
> +	if ((paddr | size) & page_mask)
> +		return ERR_PTR(-EINVAL);
> +
> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> +	if (!pool)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
> +	if (!pool->private_data) {
> +		rc = -ENOMEM;
> +		goto err_free;
> +	}
> +
> +	rc = gen_pool_add(pool->private_data, paddr, size, -1);
> +	if (rc)
> +		goto err_free_pool;
> +
> +	pool->ops = &rstmem_pool_ops_generic;
> +	return pool;
> +
> +err_free_pool:
> +	gen_pool_destroy(pool->private_data);
> +err_free:
> +	kfree(pool);
> +
> +	return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc);
> diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
> index a38494d6b5f4..608302f494fe 100644
> --- a/include/linux/tee_core.h
> +++ b/include/linux/tee_core.h
> @@ -26,6 +26,7 @@
>  #define TEE_SHM_USER_MAPPED	BIT(1)  /* Memory mapped in user space */
>  #define TEE_SHM_POOL		BIT(2)  /* Memory allocated from pool */
>  #define TEE_SHM_PRIV		BIT(3)  /* Memory private to TEE driver */
> +#define TEE_SHM_RESTRICTED	BIT(4)  /* Restricted memory */
>  
>  #define TEE_DEVICE_FLAG_REGISTERED	0x1
>  #define TEE_MAX_DEV_NAME_LEN		32
> @@ -76,6 +77,8 @@ struct tee_device {
>   * @supp_send:		called for supplicant to send a response
>   * @shm_register:	register shared memory buffer in TEE
>   * @shm_unregister:	unregister shared memory buffer in TEE
> + * @rstmem_alloc:	allocate restricted memory
> + * @rstmem_free:	free restricted memory
>   */
>  struct tee_driver_ops {
>  	void (*get_version)(struct tee_device *teedev,
> @@ -99,6 +102,9 @@ struct tee_driver_ops {
>  			    struct page **pages, size_t num_pages,
>  			    unsigned long start);
>  	int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
> +	int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
> +			    u32 flags, u32 use_case, size_t size);
> +	void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
>  };
>  
>  /**
> @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool)
>  	pool->ops->destroy_pool(pool);
>  }
>  
> +/**
> + * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
> + * @paddr:	Physical address of start of pool
> + * @size:	Size in bytes of the pool
> + *
> + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
> + */
> +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
> +
>  /**
>   * tee_get_drvdata() - Return driver_data pointer
>   * @returns the driver_data pointer supplied to tee_register().
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index a54c203000ed..cba067715d14 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -55,6 +55,7 @@ struct tee_context {
>   * @pages:	locked pages from userspace
>   * @num_pages:	number of locked pages
>   * @refcount:	reference counter
> + * @use_case:	defined by TEE_IOC_UC_* in tee.h
>   * @flags:	defined by TEE_SHM_* in tee_core.h
>   * @id:		unique id of a shared memory object on this device, shared
>   *		with user space
> @@ -71,6 +72,7 @@ struct tee_shm {
>  	struct page **pages;
>  	size_t num_pages;
>  	refcount_t refcount;
> +	u32 use_case;
>  	u32 flags;
>  	int id;
>  	u64 sec_world_id;
> diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> index d0430bee8292..88834448debb 100644
> --- a/include/uapi/linux/tee.h
> +++ b/include/uapi/linux/tee.h
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2015-2016, Linaro Limited
> + * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
>   * All rights reserved.
>   *
>   * Redistribution and use in source and binary forms, with or without
> @@ -48,6 +48,7 @@
>  #define TEE_GEN_CAP_PRIVILEGED	(1 << 1)/* Privileged device (for supplicant) */
>  #define TEE_GEN_CAP_REG_MEM	(1 << 2)/* Supports registering shared memory */
>  #define TEE_GEN_CAP_MEMREF_NULL	(1 << 3)/* NULL MemRef support */
> +#define TEE_GEN_CAP_RSTMEM	(1 << 4)/* Supports restricted memory */
>  
>  #define TEE_MEMREF_NULL		(__u64)(-1) /* NULL MemRef Buffer */
>  
> @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data {
>   */
>  #define TEE_IOC_SHM_REGISTER   _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \
>  				     struct tee_ioctl_shm_register_data)
> +
> +#define TEE_IOC_UC_RESERVED		0
> +#define TEE_IOC_UC_SECURE_VIDEO_PLAY	1
> +#define TEE_IOC_UC_TRUSTED_UI		2
> +#define TEE_IOC_UC_SECURE_VIDEO_RECORD	3
> +
> +/**
> + * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
> + * @size:	[in/out] Size of restricted memory to allocate
> + * @flags:	[in/out] Flags to/from allocate
> + * @use_case	[in] Restricted memory use case, TEE_IOC_UC_*
> + * @id:		[out] Identifier of the restricted memory
> + */
> +struct tee_ioctl_rstmem_alloc_data {
> +	__u64 size;
> +	__u32 flags;
> +	__u32 use_case;
> +	__s32 id;
> +};
> +
> +/**
> + * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
> + *
> + * Allocates restricted physically memory normally not accessible by the
> + * kernel.
> + *
> + * Restricted memory refers to memory buffers behind a hardware enforced
> + * firewall. It is not accessible to the kernel during normal circumstances
> + * but rather only accessible to certain hardware IPs or CPUs executing in
> + * higher privileged mode than the kernel itself. This interface allows to
> + * allocate and manage such restricted memory buffers via interaction with
> + * a TEE implementation.
> + *
> + * Returns a file descriptor on success or < 0 on failure
> + *
> + * The returned file descriptor is a dma-buf that can be attached and
> + * mapped for device with permission to access the physical memory.
> + */
> +#define TEE_IOC_RSTMEM_ALLOC     _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
> +				       struct tee_ioctl_rstmem_alloc_data)
> +
>  /*
>   * Five syscalls are used when communicating with the TEE driver.
>   * open(): opens the device associated with the driver
> -- 
> 2.43.0
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2024-12-24  6:35   ` Sumit Garg
       [not found]     ` <Z2p-v-xjhzhPso6u@wunner.de>
@ 2025-01-08 16:57     ` Simona Vetter
  2025-01-09  6:08       ` Sumit Garg
  1 sibling, 1 reply; 40+ messages in thread
From: Simona Vetter @ 2025-01-08 16:57 UTC (permalink / raw)
  To: Sumit Garg
  Cc: simona.vetter, Jens Wiklander, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
> Hi Simona,
> 
> On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote:
> >
> > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
> > > Hi,
> > >
> > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> > >
> > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > restrictions for the memory used for the DMA-bufs.
> > >
> > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > how to allocate the restricted physical memory.
> > >
> > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
> > > a use-case parameter. This is used by the backend TEE driver to decide on
> > > allocation policy and which devices should be able to access the memory.
> > >
> > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
> > > Recording) has been identified so far to serve as examples of what can be
> > > expected. More use-cases can be added in userspace ABI, but it's up to the
> > > backend TEE drivers to provide the implementation.
> > >
> > > Each use-case has it's own restricted memory pool since different use-cases
> > > requires isolation from different parts of the system. A restricted memory
> > > pool can be based on a static carveout instantiated while probing the TEE
> > > backend driver, or dynamically allocated from CMA and made restricted as
> > > needed by the TEE.
> > >
> > > This can be tested on QEMU with the following steps:
> > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
> > >         -b prototype/sdp-v4
> > > repo sync -j8
> > > cd build
> > > make toolchains -j$(nproc)
> > > make SPMC_AT_EL=1 all -j$(nproc)
> > > make SPMC_AT_EL=1 run-only
> > > # login and at the prompt:
> > > xtest --sdp-basic
> > >
> > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
> > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
> > > without FF-A using the original SMC ABI instead. Please remember to do
> > > %rm -rf ../trusted-firmware-a/build/qemu
> > > for TF-A to be rebuilt properly using the new configuration.
> > >
> > > https://optee.readthedocs.io/en/latest/building/prerequisites.html
> > > list dependencies needed to build the above.
> > >
> > > The tests are pretty basic, mostly checking that a Trusted Application in
> > > the secure world can access and manipulate the memory. There are also some
> > > negative tests for out of bounds buffers etc.
> >
> > I think I've dropped this on earlier encrypted dma-buf discussions for
> > TEE, but can't find one right now ...
> 
> Thanks for raising this query.
> 
> >
> > Do we have some open source userspace for this? To my knowledge we have
> > two implementations of encrypted/content protected dma-buf in upstream
> > right now in the amd and intel gpu drivers, and unless I'm mistaken they
> > both have some minimal userspace supporting EXT_protected_textures:
> 
> First of all to clarify the support Jens is adding here for allocating
> restricted shared memory allocation in TEE subsystem is meant to be
> generic and not specific to only secure media pipeline use-case. Then
> here we not only have open source test applications but rather open
> source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a
> core feature where we maintain a stable and extensible ABI among the
> kernel and the OP-TEE core.
> 
> Restricted memory is a feature enforced by hardware specific firewalls
> where a particular TEE implementation governs which particular block
> of memory is accessible to a particular peripheral or a CPU running in
> a higher privileged mode than the Linux kernel. There can be numeric
> use-cases surrounding that as follows:
> 
> - Secure media pipeline where the contents gets decrypted and stored
> in a restricted buffer which are then accessible only to media display
> pipeline peripherals.
> - Trusted user interface where a peripheral takes input from the user
> and stores it in a restricted buffer which then is accessible to TEE
> implementation only.
> - Another possible use-case can be for the TEE implementation to store
> key material in a restricted buffer which is only accessible to the
> hardware crypto accelerator.
> 
> I am sure there will be more use-cases related to this feature but
> those will only be possible once we provide a stable and extensible
> restricted memory interface among the Linux user-space and the secure
> world user-space (normally referred to as Trusted Applications).
> 
> [1] https://github.com/OP-TEE/optee_os/pull/7159
> 
> >
> > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt
> >
> > It's not great, but it does just barely clear the bar in my opinion. I
> > guess something in gstreamer or similar video pipeline framework would
> > also do the job.
> >
> > Especially with the context of the uapi discussion in the v1/RFC thread I
> > think we need more than a bare-bones testcase to make sure this works in
> > actual use.
> 
> Currently the TEE subsystem already supports a stable ABI for shared
> memory allocator among Linux user-space and secure world user-space
> here [2]. And the stable ABI for restricted memory is also along the
> same lines meant to be a vendor neutral abstraction for the user-space
> access. The current test cases not only test the interface but also
> perform regression tests too.
> 
> I am also in favour of end to end open source use-cases. But I fear
> without progressing in a step wise manner as with this proposal we
> would rather force developers to upstream all the software pieces in
> one go which will be kind of a chicken and egg situation. I am sure
> once this feature lands Mediatek folks will be interested to port
> their secure video playback patchset [3] on top of it. Similarly other
> silicon vendors like NXP, Qcom etc. will be motivated to do the same.
> 
> [2] https://docs.kernel.org/userspace-api/tee.html
> [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/

We get entire opengl/vulkan driver stacks ready before we merge new drm
drivers, I really don't think this is too hard from a technical pov. And I
think the mediatek patches had the same issue of lacking userspace for it,
so that's not moving things forward.
-Sima

> 
> -Sumit
> 
> >
> > Cheers, Sima
> >
> > >
> > > Thanks,
> > > Jens
> > >
> > > Changes since V3:
> > > * Make the use_case and flags field in struct tee_shm u32's instead of
> > >   u16's
> > > * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
> > > * Import namespace DMA_BUF in module tee, reported by lkp@intel.com
> > > * Added a note in the commit message for "optee: account for direction
> > >   while converting parameters" why it's needed
> > > * Factor out dynamic restricted memory allocation from
> > >   "optee: support restricted memory allocation" into two new commits
> > >   "optee: FF-A: dynamic restricted memory allocation" and
> > >   "optee: smc abi: dynamic restricted memory allocation"
> > > * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
> > >   restricted memory allocate if CMA isn't configured
> > >
> > > Changes since the V2 RFC:
> > > * Based on v6.12
> > > * Replaced the flags for SVP and Trusted UID memory with a u32 field with
> > >   unique id for each use case
> > > * Added dynamic allocation of restricted memory pools
> > > * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
> > > * Added support for FF-A with FFA_LEND
> > >
> > > Changes since the V1 RFC:
> > > * Based on v6.11
> > > * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
> > >
> > > Changes since Olivier's post [2]:
> > > * Based on Yong Wu's post [1] where much of dma-buf handling is done in
> > >   the generic restricted heap
> > > * Simplifications and cleanup
> > > * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
> > >   support"
> > > * Replaced the word "secure" with "restricted" where applicable
> > >
> > > Jens Wiklander (6):
> > >   tee: add restricted memory allocation
> > >   optee: account for direction while converting parameters
> > >   optee: sync secure world ABI headers
> > >   optee: support restricted memory allocation
> > >   optee: FF-A: dynamic restricted memory allocation
> > >   optee: smc abi: dynamic restricted memory allocation
> > >
> > >  drivers/tee/Makefile              |   1 +
> > >  drivers/tee/optee/Makefile        |   1 +
> > >  drivers/tee/optee/call.c          |  10 +-
> > >  drivers/tee/optee/core.c          |   1 +
> > >  drivers/tee/optee/ffa_abi.c       | 178 +++++++++++++-
> > >  drivers/tee/optee/optee_ffa.h     |  27 ++-
> > >  drivers/tee/optee/optee_msg.h     |  65 ++++-
> > >  drivers/tee/optee/optee_private.h |  75 ++++--
> > >  drivers/tee/optee/optee_smc.h     |  71 +++++-
> > >  drivers/tee/optee/rpc.c           |  31 ++-
> > >  drivers/tee/optee/rstmem.c        | 388 ++++++++++++++++++++++++++++++
> > >  drivers/tee/optee/smc_abi.c       | 213 ++++++++++++++--
> > >  drivers/tee/tee_core.c            |  38 ++-
> > >  drivers/tee/tee_private.h         |   2 +
> > >  drivers/tee/tee_rstmem.c          | 201 ++++++++++++++++
> > >  drivers/tee/tee_shm.c             |   2 +
> > >  drivers/tee/tee_shm_pool.c        |  69 +++++-
> > >  include/linux/tee_core.h          |  15 ++
> > >  include/linux/tee_drv.h           |   2 +
> > >  include/uapi/linux/tee.h          |  44 +++-
> > >  20 files changed, 1358 insertions(+), 76 deletions(-)
> > >  create mode 100644 drivers/tee/optee/rstmem.c
> > >  create mode 100644 drivers/tee/tee_rstmem.c
> > >
> > >
> > > base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> > > --
> > > 2.43.0
> > >
> >
> > --
> > Simona Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
       [not found]         ` <Z209ZegsmgN1xlNG@wunner.de>
@ 2025-01-08 17:00           ` Simona Vetter
  0 siblings, 0 replies; 40+ messages in thread
From: Simona Vetter @ 2025-01-08 17:00 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sumit Garg, simona.vetter, Jens Wiklander, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel,
	Olivier Masse, Thierry Reding, Yong Wu, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T . J . Mercier,
	Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi

On Thu, Dec 26, 2024 at 12:26:29PM +0100, Lukas Wunner wrote:
> On Thu, Dec 26, 2024 at 11:29:23AM +0530, Sumit Garg wrote:
> > On Tue, 24 Dec 2024 at 14:58, Lukas Wunner <lukas@wunner.de> wrote:
> > > However in the case of restricted memory, the situation is exactly
> > > the opposite:  The kernel may *not* be able to access the data,
> > > but the crypto accelerator can access it just fine.
> > >
> > > I did raise a concern about this to the maintainer, but to no avail:
> > > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
> > 
> > Herbert's point is valid that there isn't any point for mapping
> > restricted memory in the kernel virtual address space as any kernel
> > access to that space can lead to platform specific hardware error
> > scenarios. And for that reason we simply disallow dma_buf_mmap() and
> > don't support dma_buf_vmap() for DMA-bufs holding TEE restricted
> > memory.
> 
> The API for signature generation/verification (e.g. crypto_sig_sign(),
> crypto_sig_verify()) no longer accepts scatterlists, only buffers in
> virtual address space:
> 
> https://lore.kernel.org/all/ZIrnPcPj9Zbq51jK@gondor.apana.org.au/
> 
> Hence in order to use buffers in restricted memory for signature
> generation/verification, you'd need to map them into virtual address
> space first.

Nope, you need to get that old api back. Kernel virtual address space
mappings for dma-buf are very intentionally optional.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-01-08 16:57     ` Simona Vetter
@ 2025-01-09  6:08       ` Sumit Garg
  0 siblings, 0 replies; 40+ messages in thread
From: Sumit Garg @ 2025-01-09  6:08 UTC (permalink / raw)
  To: simona.vetter
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

On Wed, 8 Jan 2025 at 22:27, Simona Vetter <simona.vetter@ffwll.ch> wrote:
>
> On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
> > Hi Simona,
> >
> > On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote:
> > >
> > > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
> > > > Hi,
> > > >
> > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> > > >
> > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > > restrictions for the memory used for the DMA-bufs.
> > > >
> > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > > how to allocate the restricted physical memory.
> > > >
> > > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
> > > > a use-case parameter. This is used by the backend TEE driver to decide on
> > > > allocation policy and which devices should be able to access the memory.
> > > >
> > > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
> > > > Recording) has been identified so far to serve as examples of what can be
> > > > expected. More use-cases can be added in userspace ABI, but it's up to the
> > > > backend TEE drivers to provide the implementation.
> > > >
> > > > Each use-case has it's own restricted memory pool since different use-cases
> > > > requires isolation from different parts of the system. A restricted memory
> > > > pool can be based on a static carveout instantiated while probing the TEE
> > > > backend driver, or dynamically allocated from CMA and made restricted as
> > > > needed by the TEE.
> > > >
> > > > This can be tested on QEMU with the following steps:
> > > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
> > > >         -b prototype/sdp-v4
> > > > repo sync -j8
> > > > cd build
> > > > make toolchains -j$(nproc)
> > > > make SPMC_AT_EL=1 all -j$(nproc)
> > > > make SPMC_AT_EL=1 run-only
> > > > # login and at the prompt:
> > > > xtest --sdp-basic
> > > >
> > > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
> > > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
> > > > without FF-A using the original SMC ABI instead. Please remember to do
> > > > %rm -rf ../trusted-firmware-a/build/qemu
> > > > for TF-A to be rebuilt properly using the new configuration.
> > > >
> > > > https://optee.readthedocs.io/en/latest/building/prerequisites.html
> > > > list dependencies needed to build the above.
> > > >
> > > > The tests are pretty basic, mostly checking that a Trusted Application in
> > > > the secure world can access and manipulate the memory. There are also some
> > > > negative tests for out of bounds buffers etc.
> > >
> > > I think I've dropped this on earlier encrypted dma-buf discussions for
> > > TEE, but can't find one right now ...
> >
> > Thanks for raising this query.
> >
> > >
> > > Do we have some open source userspace for this? To my knowledge we have
> > > two implementations of encrypted/content protected dma-buf in upstream
> > > right now in the amd and intel gpu drivers, and unless I'm mistaken they
> > > both have some minimal userspace supporting EXT_protected_textures:
> >
> > First of all to clarify the support Jens is adding here for allocating
> > restricted shared memory allocation in TEE subsystem is meant to be
> > generic and not specific to only secure media pipeline use-case. Then
> > here we not only have open source test applications but rather open
> > source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a
> > core feature where we maintain a stable and extensible ABI among the
> > kernel and the OP-TEE core.
> >
> > Restricted memory is a feature enforced by hardware specific firewalls
> > where a particular TEE implementation governs which particular block
> > of memory is accessible to a particular peripheral or a CPU running in
> > a higher privileged mode than the Linux kernel. There can be numeric
> > use-cases surrounding that as follows:
> >
> > - Secure media pipeline where the contents gets decrypted and stored
> > in a restricted buffer which are then accessible only to media display
> > pipeline peripherals.
> > - Trusted user interface where a peripheral takes input from the user
> > and stores it in a restricted buffer which then is accessible to TEE
> > implementation only.
> > - Another possible use-case can be for the TEE implementation to store
> > key material in a restricted buffer which is only accessible to the
> > hardware crypto accelerator.
> >
> > I am sure there will be more use-cases related to this feature but
> > those will only be possible once we provide a stable and extensible
> > restricted memory interface among the Linux user-space and the secure
> > world user-space (normally referred to as Trusted Applications).
> >
> > [1] https://github.com/OP-TEE/optee_os/pull/7159
> >
> > >
> > > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt
> > >
> > > It's not great, but it does just barely clear the bar in my opinion. I
> > > guess something in gstreamer or similar video pipeline framework would
> > > also do the job.
> > >
> > > Especially with the context of the uapi discussion in the v1/RFC thread I
> > > think we need more than a bare-bones testcase to make sure this works in
> > > actual use.
> >
> > Currently the TEE subsystem already supports a stable ABI for shared
> > memory allocator among Linux user-space and secure world user-space
> > here [2]. And the stable ABI for restricted memory is also along the
> > same lines meant to be a vendor neutral abstraction for the user-space
> > access. The current test cases not only test the interface but also
> > perform regression tests too.
> >
> > I am also in favour of end to end open source use-cases. But I fear
> > without progressing in a step wise manner as with this proposal we
> > would rather force developers to upstream all the software pieces in
> > one go which will be kind of a chicken and egg situation. I am sure
> > once this feature lands Mediatek folks will be interested to port
> > their secure video playback patchset [3] on top of it. Similarly other
> > silicon vendors like NXP, Qcom etc. will be motivated to do the same.
> >
> > [2] https://docs.kernel.org/userspace-api/tee.html
> > [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/
>
> We get entire opengl/vulkan driver stacks ready before we merge new drm
> drivers, I really don't think this is too hard from a technical pov. And I
> think the mediatek patches had the same issue of lacking userspace for it,
> so that's not moving things forward.
> -Sima
>

Okay fair enough, I think I get your point. Currently we are missing
at least one peripheral support being the consumer for these
restricted DMA-bufs. So I discussed with Jens offline that we can try
with a crypto peripheral use-case first which can simply be
demonstrated using the current OP-TEE client user-space.

Also, in crypto peripheral use-case we can target the symmetric crypto
use-case first which already has a concept of hardware backed
symmetric key [1]. IOW, we should be able to come up with a generic
symmetric crypto algorithm which can be supported by different crypto
accelerators using a TEE backed restricted key DMA buffer.

[1] https://www.youtube.com/watch?v=GbcpwUBFGDw

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/6] tee: add restricted memory allocation
  2025-01-08 16:54   ` Simona Vetter
@ 2025-01-09  7:17     ` Jens Wiklander
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2025-01-09  7:17 UTC (permalink / raw)
  To: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Sumit Garg, Matthias Brugger, AngeloGioacchino Del Regno,
	azarrabi

On Wed, Jan 8, 2025 at 5:54 PM Simona Vetter <simona.vetter@ffwll.ch> wrote:
>
> On Tue, Dec 17, 2024 at 11:07:37AM +0100, Jens Wiklander wrote:
> > Add restricted memory allocation to the TEE subsystem.
> >
> > Restricted memory refers to memory buffers behind a hardware enforced
> > firewall. It is not accessible to the kernel during normal circumstances
> > but rather only accessible to certain hardware IPs or CPUs executing in
> > higher privileged mode than the kernel itself. This interface allows to
> > allocate and manage such restricted memory buffers via interaction with
> > a TEE implementation.
> >
> > A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted
> > memory buffers.
> >
> > The restricted memory is allocated for a specific use-case, like Secure
> > Video Playback, Trusted UI, or Secure Video Recording where certain
> > hardware devices can access the memory.
> >
> > More use-cases can be added in userspace ABI, but it's up to the backend
> > drivers to provide the implementation.
> >
> > Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> > ---
> >  drivers/tee/Makefile       |   1 +
> >  drivers/tee/tee_core.c     |  38 ++++++-
> >  drivers/tee/tee_private.h  |   2 +
> >  drivers/tee/tee_rstmem.c   | 201 +++++++++++++++++++++++++++++++++++++
> >  drivers/tee/tee_shm.c      |   2 +
> >  drivers/tee/tee_shm_pool.c |  69 ++++++++++++-
> >  include/linux/tee_core.h   |  15 +++
> >  include/linux/tee_drv.h    |   2 +
> >  include/uapi/linux/tee.h   |  44 +++++++-
> >  9 files changed, 370 insertions(+), 4 deletions(-)
> >  create mode 100644 drivers/tee/tee_rstmem.c
> >
> > diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile
> > index 5488cba30bd2..a4c6b55444b9 100644
> > --- a/drivers/tee/Makefile
> > +++ b/drivers/tee/Makefile
> > @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o
> >  tee-objs += tee_core.o
> >  tee-objs += tee_shm.o
> >  tee-objs += tee_shm_pool.o
> > +tee-objs += tee_rstmem.o
> >  obj-$(CONFIG_OPTEE) += optee/
> >  obj-$(CONFIG_AMDTEE) += amdtee/
> >  obj-$(CONFIG_ARM_TSTEE) += tstee/
> > diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> > index d113679b1e2d..f4a45b77753b 100644
> > --- a/drivers/tee/tee_core.c
> > +++ b/drivers/tee/tee_core.c
> > @@ -1,12 +1,13 @@
> >  // SPDX-License-Identifier: GPL-2.0-only
> >  /*
> > - * Copyright (c) 2015-2016, Linaro Limited
> > + * Copyright (c) 2015-2022, 2024, Linaro Limited
> >   */
> >
> >  #define pr_fmt(fmt) "%s: " fmt, __func__
> >
> >  #include <linux/cdev.h>
> >  #include <linux/cred.h>
> > +#include <linux/dma-buf.h>
> >  #include <linux/fs.h>
> >  #include <linux/idr.h>
> >  #include <linux/module.h>
> > @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx,
> >       return rc;
> >  }
> >
> > +static int
> > +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
> > +                    struct tee_ioctl_rstmem_alloc_data __user *udata)
> > +{
> > +     struct tee_ioctl_rstmem_alloc_data data;
> > +     struct dma_buf *dmabuf;
> > +     int id;
> > +     int fd;
> > +
> > +     if (copy_from_user(&data, udata, sizeof(data)))
> > +             return -EFAULT;
> > +
> > +     if (data.use_case == TEE_IOC_UC_RESERVED)
> > +             return -EINVAL;
> > +
> > +     dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
> > +                               &id);
> > +     if (IS_ERR(dmabuf))
> > +             return PTR_ERR(dmabuf);
> > +     if (put_user(id, &udata->id)) {
> > +             fd = -EFAULT;
> > +             goto err;
> > +     }
> > +     fd = dma_buf_fd(dmabuf, O_CLOEXEC);
> > +     if (fd < 0)
> > +             goto err;
> > +     return fd;
> > +err:
> > +     dma_buf_put(dmabuf);
> > +     return fd;
> > +}
> > +
> >  static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  {
> >       struct tee_context *ctx = filp->private_data;
> > @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >               return tee_ioctl_supp_recv(ctx, uarg);
> >       case TEE_IOC_SUPPL_SEND:
> >               return tee_ioctl_supp_send(ctx, uarg);
> > +     case TEE_IOC_RSTMEM_ALLOC:
> > +             return tee_ioctl_rstmem_alloc(ctx, uarg);
> >       default:
> >               return -EINVAL;
> >       }
> > @@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro");
> >  MODULE_DESCRIPTION("TEE Driver");
> >  MODULE_VERSION("1.0");
> >  MODULE_LICENSE("GPL v2");
> > +MODULE_IMPORT_NS("DMA_BUF");
> > diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
> > index 9bc50605227c..bf97796909c0 100644
> > --- a/drivers/tee/tee_private.h
> > +++ b/drivers/tee/tee_private.h
> > @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx);
> >  struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
> >  struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
> >                                         unsigned long addr, size_t length);
> > +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> > +                              u32 use_case, size_t size, int *shm_id);
> >
> >  #endif /*TEE_PRIVATE_H*/
> > diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c
> > new file mode 100644
> > index 000000000000..536bca2901e2
> > --- /dev/null
> > +++ b/drivers/tee/tee_rstmem.c
> > @@ -0,0 +1,201 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2024 Linaro Limited
> > + */
> > +#include <linux/device.h>
> > +#include <linux/dma-buf.h>
> > +#include <linux/genalloc.h>
> > +#include <linux/scatterlist.h>
> > +#include <linux/slab.h>
> > +#include <linux/tee_core.h>
> > +#include "tee_private.h"
> > +
> > +struct tee_rstmem_attachment {
> > +     struct sg_table table;
> > +     struct device *dev;
> > +};
> > +
> > +static int rstmem_dma_attach(struct dma_buf *dmabuf,
> > +                          struct dma_buf_attachment *attachment)
> > +{
> > +     struct tee_shm *shm = dmabuf->priv;
> > +     struct tee_rstmem_attachment *a;
> > +     int rc;
> > +
> > +     a = kzalloc(sizeof(*a), GFP_KERNEL);
> > +     if (!a)
> > +             return -ENOMEM;
> > +
> > +     if (shm->pages) {
> > +             rc = sg_alloc_table_from_pages(&a->table, shm->pages,
> > +                                            shm->num_pages, 0,
> > +                                            shm->num_pages * PAGE_SIZE,
> > +                                            GFP_KERNEL);
> > +             if (rc)
> > +                     goto err;
> > +     } else {
> > +             rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
> > +             if (rc)
> > +                     goto err;
> > +             sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
> > +                         0);
> > +     }
> > +
> > +     a->dev = attachment->dev;
> > +     attachment->priv = a;
> > +
> > +     return 0;
> > +err:
> > +     kfree(a);
> > +     return rc;
> > +}
> > +
> > +static void rstmem_dma_detach(struct dma_buf *dmabuf,
> > +                           struct dma_buf_attachment *attachment)
> > +{
> > +     struct tee_rstmem_attachment *a = attachment->priv;
> > +
> > +     sg_free_table(&a->table);
> > +     kfree(a);
> > +}
> > +
> > +static struct sg_table *
> > +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
> > +                    enum dma_data_direction direction)
> > +{
> > +     struct tee_rstmem_attachment *a = attachment->priv;
> > +     int ret;
> > +
> > +     ret = dma_map_sgtable(attachment->dev, &a->table, direction,
> > +                           DMA_ATTR_SKIP_CPU_SYNC);
> > +     if (ret)
> > +             return ERR_PTR(ret);
> > +
> > +     return &a->table;
> > +}
> > +
> > +static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
> > +                                  struct sg_table *table,
> > +                                  enum dma_data_direction direction)
> > +{
> > +     struct tee_rstmem_attachment *a = attachment->priv;
> > +
> > +     WARN_ON(&a->table != table);
> > +
> > +     dma_unmap_sgtable(attachment->dev, table, direction,
> > +                       DMA_ATTR_SKIP_CPU_SYNC);
> > +}
> > +
> > +static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> > +                                        enum dma_data_direction direction)
> > +{
> > +     return -EPERM;
> > +}
> > +
> > +static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> > +                                      enum dma_data_direction direction)
> > +{
> > +     return -EPERM;
> > +}
> > +
> > +static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
> > +                            struct vm_area_struct *vma)
> > +{
> > +     return -EPERM;
>
> Just an aside, you should just not implment mmap if it's not possible.
> Then you also don't need the cpu access functions. Both userspace mmap and
> in-kernel vmap support is intentionally optional for dma-buf.

Got it, thanks. I'll remove those in the next version.

Cheers,
Jens

> -Sima
>
> > +}
> > +
> > +static void rstmem_dma_buf_free(struct dma_buf *dmabuf)
> > +{
> > +     struct tee_shm *shm = dmabuf->priv;
> > +
> > +     tee_shm_put(shm);
> > +}
> > +
> > +static const struct dma_buf_ops rstmem_generic_buf_ops = {
> > +     .attach = rstmem_dma_attach,
> > +     .detach = rstmem_dma_detach,
> > +     .map_dma_buf = rstmem_dma_map_dma_buf,
> > +     .unmap_dma_buf = rstmem_dma_unmap_dma_buf,
> > +     .begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
> > +     .end_cpu_access = rstmem_dma_buf_end_cpu_access,
> > +     .mmap = rstmem_dma_buf_mmap,
> > +     .release = rstmem_dma_buf_free,
> > +};
> > +
> > +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> > +                              u32 use_case, size_t size, int *shm_id)
> > +{
> > +     struct tee_device *teedev = ctx->teedev;
> > +     DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> > +     struct dma_buf *dmabuf;
> > +     struct tee_shm *shm;
> > +     void *ret;
> > +     int rc;
> > +
> > +     if (!tee_device_get(teedev))
> > +             return ERR_PTR(-EINVAL);
> > +
> > +     if (!teedev->desc->ops->rstmem_alloc ||
> > +         !teedev->desc->ops->rstmem_free) {
> > +             dmabuf = ERR_PTR(-EINVAL);
> > +             goto err;
> > +     }
> > +
> > +     shm = kzalloc(sizeof(*shm), GFP_KERNEL);
> > +     if (!shm) {
> > +             dmabuf = ERR_PTR(-ENOMEM);
> > +             goto err;
> > +     }
> > +
> > +     refcount_set(&shm->refcount, 1);
> > +     shm->flags = TEE_SHM_RESTRICTED;
> > +     shm->use_case = use_case;
> > +     shm->ctx = ctx;
> > +
> > +     mutex_lock(&teedev->mutex);
> > +     shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
> > +     mutex_unlock(&teedev->mutex);
> > +     if (shm->id < 0) {
> > +             dmabuf = ERR_PTR(shm->id);
> > +             goto err_kfree;
> > +     }
> > +
> > +     rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
> > +     if (rc) {
> > +             dmabuf = ERR_PTR(rc);
> > +             goto err_idr_remove;
> > +     }
> > +
> > +     mutex_lock(&teedev->mutex);
> > +     ret = idr_replace(&teedev->idr, shm, shm->id);
> > +     mutex_unlock(&teedev->mutex);
> > +     if (IS_ERR(ret)) {
> > +             dmabuf = ret;
> > +             goto err_rstmem_free;
> > +     }
> > +     teedev_ctx_get(ctx);
> > +
> > +     exp_info.ops = &rstmem_generic_buf_ops;
> > +     exp_info.size = shm->size;
> > +     exp_info.priv = shm;
> > +     dmabuf = dma_buf_export(&exp_info);
> > +     if (IS_ERR(dmabuf)) {
> > +             tee_shm_put(shm);
> > +             return dmabuf;
> > +     }
> > +
> > +     *shm_id = shm->id;
> > +     return dmabuf;
> > +
> > +err_rstmem_free:
> > +     teedev->desc->ops->rstmem_free(ctx, shm);
> > +err_idr_remove:
> > +     mutex_lock(&teedev->mutex);
> > +     idr_remove(&teedev->idr, shm->id);
> > +     mutex_unlock(&teedev->mutex);
> > +err_kfree:
> > +     kfree(shm);
> > +err:
> > +     tee_device_put(teedev);
> > +     return dmabuf;
> > +}
> > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> > index daf6e5cfd59a..416f7f25d885 100644
> > --- a/drivers/tee/tee_shm.c
> > +++ b/drivers/tee/tee_shm.c
> > @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm)
> >                               "unregister shm %p failed: %d", shm, rc);
> >
> >               release_registered_pages(shm);
> > +     } else if (shm->flags & TEE_SHM_RESTRICTED) {
> > +             teedev->desc->ops->rstmem_free(shm->ctx, shm);
> >       }
> >
> >       teedev_ctx_put(shm->ctx);
> > diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c
> > index 80004b55628d..ee57ef157a77 100644
> > --- a/drivers/tee/tee_shm_pool.c
> > +++ b/drivers/tee/tee_shm_pool.c
> > @@ -1,9 +1,8 @@
> >  // SPDX-License-Identifier: GPL-2.0-only
> >  /*
> > - * Copyright (c) 2015, 2017, 2022 Linaro Limited
> > + * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
> >   */
> >  #include <linux/device.h>
> > -#include <linux/dma-buf.h>
> >  #include <linux/genalloc.h>
> >  #include <linux/slab.h>
> >  #include <linux/tee_core.h>
> > @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr,
> >       return ERR_PTR(rc);
> >  }
> >  EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
> > +
> > +static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
> > +                                 struct tee_shm *shm, size_t size,
> > +                                 size_t align)
> > +{
> > +     size_t sz = ALIGN(size, PAGE_SIZE);
> > +     phys_addr_t pa;
> > +
> > +     pa = gen_pool_alloc(pool->private_data, sz);
> > +     if (!pa)
> > +             return -ENOMEM;
> > +
> > +     shm->size = sz;
> > +     shm->paddr = pa;
> > +
> > +     return 0;
> > +}
> > +
> > +static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
> > +                                 struct tee_shm *shm)
> > +{
> > +     gen_pool_free(pool->private_data, shm->paddr, shm->size);
> > +     shm->paddr = 0;
> > +}
> > +
> > +static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
> > +     .alloc = rstmem_pool_op_gen_alloc,
> > +     .free = rstmem_pool_op_gen_free,
> > +     .destroy_pool = pool_op_gen_destroy_pool,
> > +};
> > +
> > +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size)
> > +{
> > +     const size_t page_mask = PAGE_SIZE - 1;
> > +     struct tee_shm_pool *pool;
> > +     int rc;
> > +
> > +     /* Check it's page aligned */
> > +     if ((paddr | size) & page_mask)
> > +             return ERR_PTR(-EINVAL);
> > +
> > +     pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> > +     if (!pool)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
> > +     if (!pool->private_data) {
> > +             rc = -ENOMEM;
> > +             goto err_free;
> > +     }
> > +
> > +     rc = gen_pool_add(pool->private_data, paddr, size, -1);
> > +     if (rc)
> > +             goto err_free_pool;
> > +
> > +     pool->ops = &rstmem_pool_ops_generic;
> > +     return pool;
> > +
> > +err_free_pool:
> > +     gen_pool_destroy(pool->private_data);
> > +err_free:
> > +     kfree(pool);
> > +
> > +     return ERR_PTR(rc);
> > +}
> > +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc);
> > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
> > index a38494d6b5f4..608302f494fe 100644
> > --- a/include/linux/tee_core.h
> > +++ b/include/linux/tee_core.h
> > @@ -26,6 +26,7 @@
> >  #define TEE_SHM_USER_MAPPED  BIT(1)  /* Memory mapped in user space */
> >  #define TEE_SHM_POOL         BIT(2)  /* Memory allocated from pool */
> >  #define TEE_SHM_PRIV         BIT(3)  /* Memory private to TEE driver */
> > +#define TEE_SHM_RESTRICTED   BIT(4)  /* Restricted memory */
> >
> >  #define TEE_DEVICE_FLAG_REGISTERED   0x1
> >  #define TEE_MAX_DEV_NAME_LEN         32
> > @@ -76,6 +77,8 @@ struct tee_device {
> >   * @supp_send:               called for supplicant to send a response
> >   * @shm_register:    register shared memory buffer in TEE
> >   * @shm_unregister:  unregister shared memory buffer in TEE
> > + * @rstmem_alloc:    allocate restricted memory
> > + * @rstmem_free:     free restricted memory
> >   */
> >  struct tee_driver_ops {
> >       void (*get_version)(struct tee_device *teedev,
> > @@ -99,6 +102,9 @@ struct tee_driver_ops {
> >                           struct page **pages, size_t num_pages,
> >                           unsigned long start);
> >       int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
> > +     int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
> > +                         u32 flags, u32 use_case, size_t size);
> > +     void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
> >  };
> >
> >  /**
> > @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool)
> >       pool->ops->destroy_pool(pool);
> >  }
> >
> > +/**
> > + * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
> > + * @paddr:   Physical address of start of pool
> > + * @size:    Size in bytes of the pool
> > + *
> > + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
> > + */
> > +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
> > +
> >  /**
> >   * tee_get_drvdata() - Return driver_data pointer
> >   * @returns the driver_data pointer supplied to tee_register().
> > diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> > index a54c203000ed..cba067715d14 100644
> > --- a/include/linux/tee_drv.h
> > +++ b/include/linux/tee_drv.h
> > @@ -55,6 +55,7 @@ struct tee_context {
> >   * @pages:   locked pages from userspace
> >   * @num_pages:       number of locked pages
> >   * @refcount:        reference counter
> > + * @use_case:        defined by TEE_IOC_UC_* in tee.h
> >   * @flags:   defined by TEE_SHM_* in tee_core.h
> >   * @id:              unique id of a shared memory object on this device, shared
> >   *           with user space
> > @@ -71,6 +72,7 @@ struct tee_shm {
> >       struct page **pages;
> >       size_t num_pages;
> >       refcount_t refcount;
> > +     u32 use_case;
> >       u32 flags;
> >       int id;
> >       u64 sec_world_id;
> > diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> > index d0430bee8292..88834448debb 100644
> > --- a/include/uapi/linux/tee.h
> > +++ b/include/uapi/linux/tee.h
> > @@ -1,5 +1,5 @@
> >  /*
> > - * Copyright (c) 2015-2016, Linaro Limited
> > + * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
> >   * All rights reserved.
> >   *
> >   * Redistribution and use in source and binary forms, with or without
> > @@ -48,6 +48,7 @@
> >  #define TEE_GEN_CAP_PRIVILEGED       (1 << 1)/* Privileged device (for supplicant) */
> >  #define TEE_GEN_CAP_REG_MEM  (1 << 2)/* Supports registering shared memory */
> >  #define TEE_GEN_CAP_MEMREF_NULL      (1 << 3)/* NULL MemRef support */
> > +#define TEE_GEN_CAP_RSTMEM   (1 << 4)/* Supports restricted memory */
> >
> >  #define TEE_MEMREF_NULL              (__u64)(-1) /* NULL MemRef Buffer */
> >
> > @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data {
> >   */
> >  #define TEE_IOC_SHM_REGISTER   _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \
> >                                    struct tee_ioctl_shm_register_data)
> > +
> > +#define TEE_IOC_UC_RESERVED          0
> > +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1
> > +#define TEE_IOC_UC_TRUSTED_UI                2
> > +#define TEE_IOC_UC_SECURE_VIDEO_RECORD       3
> > +
> > +/**
> > + * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
> > + * @size:    [in/out] Size of restricted memory to allocate
> > + * @flags:   [in/out] Flags to/from allocate
> > + * @use_case [in] Restricted memory use case, TEE_IOC_UC_*
> > + * @id:              [out] Identifier of the restricted memory
> > + */
> > +struct tee_ioctl_rstmem_alloc_data {
> > +     __u64 size;
> > +     __u32 flags;
> > +     __u32 use_case;
> > +     __s32 id;
> > +};
> > +
> > +/**
> > + * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
> > + *
> > + * Allocates restricted physically memory normally not accessible by the
> > + * kernel.
> > + *
> > + * Restricted memory refers to memory buffers behind a hardware enforced
> > + * firewall. It is not accessible to the kernel during normal circumstances
> > + * but rather only accessible to certain hardware IPs or CPUs executing in
> > + * higher privileged mode than the kernel itself. This interface allows to
> > + * allocate and manage such restricted memory buffers via interaction with
> > + * a TEE implementation.
> > + *
> > + * Returns a file descriptor on success or < 0 on failure
> > + *
> > + * The returned file descriptor is a dma-buf that can be attached and
> > + * mapped for device with permission to access the physical memory.
> > + */
> > +#define TEE_IOC_RSTMEM_ALLOC     _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
> > +                                    struct tee_ioctl_rstmem_alloc_data)
> > +
> >  /*
> >   * Five syscalls are used when communicating with the TEE driver.
> >   * open(): opens the device associated with the driver
> > --
> > 2.43.0
> >
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
                   ` (6 preceding siblings ...)
  2024-12-18 11:06 ` [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Simona Vetter
@ 2025-02-12 19:56 ` Boris Brezillon
  2025-02-13  6:41   ` Sumit Garg
  7 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-02-12 19:56 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

+Florent, who's working on protected-mode support in Panthor.

Hi Jens,

On Tue, 17 Dec 2024 11:07:36 +0100
Jens Wiklander <jens.wiklander@linaro.org> wrote:

> Hi,
> 
> This patch set allocates the restricted DMA-bufs via the TEE subsystem.

We're currently working on protected-mode support for Panthor [1] and it
looks like your series (and the OP-TEE implementation that goes with
it) would allow us to have a fully upstream/open solution for the
protected content use case we're trying to support. I need a bit more
time to play with the implementation but this looks very promising
(especially the lend rstmem feature, which might help us allocate our
FW sections that are supposed to execute code accessing protected
content).

> 
> The TEE subsystem handles the DMA-buf allocations since it is the TEE
> (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> restrictions for the memory used for the DMA-bufs.
> 
> I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> how to allocate the restricted physical memory.

I'll probably have more questions soon, but here's one to start: any
particular reason you didn't go for a dma-heap to expose restricted
buffer allocation to userspace? I see you already have a cdev you can
take ioctl()s from, but my understanding was that dma-heap was the
standard solution for these device-agnostic/central allocators.

Regards,

Boris

[1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-12 19:56 ` Boris Brezillon
@ 2025-02-13  6:41   ` Sumit Garg
  2025-02-13  8:19     ` Jens Wiklander
  2025-02-13  8:35     ` Boris Brezillon
  0 siblings, 2 replies; 40+ messages in thread
From: Sumit Garg @ 2025-02-13  6:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi Boris,

On Thu, 13 Feb 2025 at 01:26, Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> +Florent, who's working on protected-mode support in Panthor.
>
> Hi Jens,
>
> On Tue, 17 Dec 2024 11:07:36 +0100
> Jens Wiklander <jens.wiklander@linaro.org> wrote:
>
> > Hi,
> >
> > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
>
> We're currently working on protected-mode support for Panthor [1] and it
> looks like your series (and the OP-TEE implementation that goes with
> it) would allow us to have a fully upstream/open solution for the
> protected content use case we're trying to support. I need a bit more
> time to play with the implementation but this looks very promising
> (especially the lend rstmem feature, which might help us allocate our
> FW sections that are supposed to execute code accessing protected
> content).

Glad to hear that, if you can demonstrate an open source use case
based on this series then it will help to land it. We really would
love to see support for restricted DMA-buf consumers be it GPU, crypto
accelerator, media pipeline etc.

>
> >
> > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > restrictions for the memory used for the DMA-bufs.
> >
> > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > how to allocate the restricted physical memory.
>
> I'll probably have more questions soon, but here's one to start: any
> particular reason you didn't go for a dma-heap to expose restricted
> buffer allocation to userspace? I see you already have a cdev you can
> take ioctl()s from, but my understanding was that dma-heap was the
> standard solution for these device-agnostic/central allocators.

This series started with the DMA heap approach only here [1] but later
discussions [2] lead us here. To point out specifically:

- DMA heaps require reliance on DT to discover static restricted
regions carve-outs whereas via the TEE implementation driver (eg.
OP-TEE) those can be discovered dynamically.
- Dynamic allocation of buffers and making them restricted requires
vendor specific driver hooks with DMA heaps whereas the TEE subsystem
abstracts that out with underlying TEE implementation (eg. OP-TEE)
managing the dynamic buffer restriction.
- TEE subsystem already has a well defined user-space interface for
managing shared memory buffers with TEE and restricted DMA buffers
will be yet another interface managed along similar lines.

[1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqiehaxk@u67fcarhm6ge/T/
[2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFtNZZA@mail.gmail.com/

-Sumit

>
> Regards,
>
> Boris
>
> [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13  6:41   ` Sumit Garg
@ 2025-02-13  8:19     ` Jens Wiklander
  2025-02-13  8:35     ` Boris Brezillon
  1 sibling, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2025-02-13  8:19 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Boris Brezillon, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi,

On Thu, Feb 13, 2025 at 7:42 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> Hi Boris,
>
> On Thu, 13 Feb 2025 at 01:26, Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > +Florent, who's working on protected-mode support in Panthor.
> >
> > Hi Jens,
> >
> > On Tue, 17 Dec 2024 11:07:36 +0100
> > Jens Wiklander <jens.wiklander@linaro.org> wrote:
> >
> > > Hi,
> > >
> > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> >
> > We're currently working on protected-mode support for Panthor [1] and it
> > looks like your series (and the OP-TEE implementation that goes with
> > it) would allow us to have a fully upstream/open solution for the
> > protected content use case we're trying to support. I need a bit more
> > time to play with the implementation but this looks very promising
> > (especially the lend rstmem feature, which might help us allocate our
> > FW sections that are supposed to execute code accessing protected
> > content).
>
> Glad to hear that, if you can demonstrate an open source use case
> based on this series then it will help to land it. We really would
> love to see support for restricted DMA-buf consumers be it GPU, crypto
> accelerator, media pipeline etc.

I'm preparing a demo based on GStreamer to share. It helps with more
real-world examples to see that APIs etc work.

>
> >
> > >
> > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > restrictions for the memory used for the DMA-bufs.
> > >
> > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > how to allocate the restricted physical memory.
> >
> > I'll probably have more questions soon, but here's one to start: any
> > particular reason you didn't go for a dma-heap to expose restricted
> > buffer allocation to userspace? I see you already have a cdev you can
> > take ioctl()s from, but my understanding was that dma-heap was the
> > standard solution for these device-agnostic/central allocators.
>
> This series started with the DMA heap approach only here [1] but later
> discussions [2] lead us here. To point out specifically:
>
> - DMA heaps require reliance on DT to discover static restricted
> regions carve-outs whereas via the TEE implementation driver (eg.
> OP-TEE) those can be discovered dynamically.
> - Dynamic allocation of buffers and making them restricted requires
> vendor specific driver hooks with DMA heaps whereas the TEE subsystem
> abstracts that out with underlying TEE implementation (eg. OP-TEE)
> managing the dynamic buffer restriction.
> - TEE subsystem already has a well defined user-space interface for
> managing shared memory buffers with TEE and restricted DMA buffers
> will be yet another interface managed along similar lines.
>
> [1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqiehaxk@u67fcarhm6ge/T/
> [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFtNZZA@mail.gmail.com/

Thanks for the good summary. :-)

Cheers,
Jens

>
> -Sumit
>
> >
> > Regards,
> >
> > Boris
> >
> > [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13  6:41   ` Sumit Garg
  2025-02-13  8:19     ` Jens Wiklander
@ 2025-02-13  8:35     ` Boris Brezillon
  2025-02-13  9:16       ` Sumit Garg
  1 sibling, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-02-13  8:35 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Thu, 13 Feb 2025 12:11:52 +0530
Sumit Garg <sumit.garg@linaro.org> wrote:

> Hi Boris,
> 
> On Thu, 13 Feb 2025 at 01:26, Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > +Florent, who's working on protected-mode support in Panthor.
> >
> > Hi Jens,
> >
> > On Tue, 17 Dec 2024 11:07:36 +0100
> > Jens Wiklander <jens.wiklander@linaro.org> wrote:
> >  
> > > Hi,
> > >
> > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.  
> >
> > We're currently working on protected-mode support for Panthor [1] and it
> > looks like your series (and the OP-TEE implementation that goes with
> > it) would allow us to have a fully upstream/open solution for the
> > protected content use case we're trying to support. I need a bit more
> > time to play with the implementation but this looks very promising
> > (especially the lend rstmem feature, which might help us allocate our
> > FW sections that are supposed to execute code accessing protected
> > content).  
> 
> Glad to hear that, if you can demonstrate an open source use case
> based on this series then it will help to land it. We really would
> love to see support for restricted DMA-buf consumers be it GPU, crypto
> accelerator, media pipeline etc.
> 
> >  
> > >
> > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > restrictions for the memory used for the DMA-bufs.
> > >
> > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > how to allocate the restricted physical memory.  
> >
> > I'll probably have more questions soon, but here's one to start: any
> > particular reason you didn't go for a dma-heap to expose restricted
> > buffer allocation to userspace? I see you already have a cdev you can
> > take ioctl()s from, but my understanding was that dma-heap was the
> > standard solution for these device-agnostic/central allocators.  
> 
> This series started with the DMA heap approach only here [1] but later
> discussions [2] lead us here. To point out specifically:
> 
> - DMA heaps require reliance on DT to discover static restricted
> regions carve-outs whereas via the TEE implementation driver (eg.
> OP-TEE) those can be discovered dynamically.

Hm, the system heap [1] doesn't rely on any DT information AFAICT.
The dynamic allocation scheme, where the TEE implementation allocates a
chunk of protected memory for us would have a similar behavior, I guess.

> - Dynamic allocation of buffers and making them restricted requires
> vendor specific driver hooks with DMA heaps whereas the TEE subsystem
> abstracts that out with underlying TEE implementation (eg. OP-TEE)
> managing the dynamic buffer restriction.

Yeah, the lend rstmem feature is clearly something tee specific, and I
think that's okay to assume the user knows the protection request
should go through the tee subsystem in that case.

> - TEE subsystem already has a well defined user-space interface for
> managing shared memory buffers with TEE and restricted DMA buffers
> will be yet another interface managed along similar lines.

Okay, so the very reason I'm asking about the dma-buf heap interface is
because there might be cases where the protected/restricted allocation
doesn't go through the TEE (Mediatek has a TEE-free implementation
for instance, but I realize vendor implementations are probably not the
best selling point :-/). If we expose things as a dma-heap, we have
a solution where integrators can pick the dma-heap they think is
relevant for protected buffer allocations without the various drivers
(GPU, video codec, ...) having to implement a dispatch function for all
possible implementations. The same goes for userspace allocations,
where passing a dma-heap name, is simpler than supporting different
ioctl()s based on the allocation backend.

[1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L424


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/6] tee: add restricted memory allocation
  2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
  2025-01-08 16:54   ` Simona Vetter
@ 2025-02-13  8:44   ` Boris Brezillon
  1 sibling, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-02-13  8:44 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: linux-kernel, linux-media, dri-devel, linaro-mm-sig, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Sumit Garg,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi

Hi Jens,

On Tue, 17 Dec 2024 11:07:37 +0100
Jens Wiklander <jens.wiklander@linaro.org> wrote:

> Add restricted memory allocation to the TEE subsystem.
> 
> Restricted memory refers to memory buffers behind a hardware enforced
> firewall. It is not accessible to the kernel during normal circumstances
> but rather only accessible to certain hardware IPs or CPUs executing in
> higher privileged mode than the kernel itself. This interface allows to
> allocate and manage such restricted memory buffers via interaction with
> a TEE implementation.
> 
> A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted
> memory buffers.
> 
> The restricted memory is allocated for a specific use-case, like Secure
> Video Playback, Trusted UI, or Secure Video Recording where certain
> hardware devices can access the memory.
> 
> More use-cases can be added in userspace ABI, but it's up to the backend
> drivers to provide the implementation.
> 
> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> ---
>  drivers/tee/Makefile       |   1 +
>  drivers/tee/tee_core.c     |  38 ++++++-
>  drivers/tee/tee_private.h  |   2 +
>  drivers/tee/tee_rstmem.c   | 201 +++++++++++++++++++++++++++++++++++++
>  drivers/tee/tee_shm.c      |   2 +
>  drivers/tee/tee_shm_pool.c |  69 ++++++++++++-
>  include/linux/tee_core.h   |  15 +++
>  include/linux/tee_drv.h    |   2 +
>  include/uapi/linux/tee.h   |  44 +++++++-
>  9 files changed, 370 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/tee/tee_rstmem.c
> 
> diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile
> index 5488cba30bd2..a4c6b55444b9 100644
> --- a/drivers/tee/Makefile
> +++ b/drivers/tee/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o
>  tee-objs += tee_core.o
>  tee-objs += tee_shm.o
>  tee-objs += tee_shm_pool.o
> +tee-objs += tee_rstmem.o
>  obj-$(CONFIG_OPTEE) += optee/
>  obj-$(CONFIG_AMDTEE) += amdtee/
>  obj-$(CONFIG_ARM_TSTEE) += tstee/
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index d113679b1e2d..f4a45b77753b 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -1,12 +1,13 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /*
> - * Copyright (c) 2015-2016, Linaro Limited
> + * Copyright (c) 2015-2022, 2024, Linaro Limited
>   */
>  
>  #define pr_fmt(fmt) "%s: " fmt, __func__
>  
>  #include <linux/cdev.h>
>  #include <linux/cred.h>
> +#include <linux/dma-buf.h>
>  #include <linux/fs.h>
>  #include <linux/idr.h>
>  #include <linux/module.h>
> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx,
>  	return rc;
>  }
>  
> +static int
> +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
> +		       struct tee_ioctl_rstmem_alloc_data __user *udata)
> +{
> +	struct tee_ioctl_rstmem_alloc_data data;
> +	struct dma_buf *dmabuf;
> +	int id;
> +	int fd;
> +
> +	if (copy_from_user(&data, udata, sizeof(data)))
> +		return -EFAULT;
> +
> +	if (data.use_case == TEE_IOC_UC_RESERVED)
> +		return -EINVAL;
> +
> +	dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
> +				  &id);
> +	if (IS_ERR(dmabuf))
> +		return PTR_ERR(dmabuf);
> +	if (put_user(id, &udata->id)) {
> +		fd = -EFAULT;
> +		goto err;
> +	}
> +	fd = dma_buf_fd(dmabuf, O_CLOEXEC);
> +	if (fd < 0)
> +		goto err;
> +	return fd;
> +err:
> +	dma_buf_put(dmabuf);
> +	return fd;
> +}
> +
>  static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  {
>  	struct tee_context *ctx = filp->private_data;
> @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		return tee_ioctl_supp_recv(ctx, uarg);
>  	case TEE_IOC_SUPPL_SEND:
>  		return tee_ioctl_supp_send(ctx, uarg);
> +	case TEE_IOC_RSTMEM_ALLOC:
> +		return tee_ioctl_rstmem_alloc(ctx, uarg);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro");
>  MODULE_DESCRIPTION("TEE Driver");
>  MODULE_VERSION("1.0");
>  MODULE_LICENSE("GPL v2");
> +MODULE_IMPORT_NS("DMA_BUF");
> diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
> index 9bc50605227c..bf97796909c0 100644
> --- a/drivers/tee/tee_private.h
> +++ b/drivers/tee/tee_private.h
> @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx);
>  struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
>  struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
>  					  unsigned long addr, size_t length);
> +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> +				 u32 use_case, size_t size, int *shm_id);
>  
>  #endif /*TEE_PRIVATE_H*/
> diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c
> new file mode 100644
> index 000000000000..536bca2901e2
> --- /dev/null
> +++ b/drivers/tee/tee_rstmem.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024 Linaro Limited
> + */
> +#include <linux/device.h>
> +#include <linux/dma-buf.h>
> +#include <linux/genalloc.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/tee_core.h>
> +#include "tee_private.h"
> +
> +struct tee_rstmem_attachment {
> +	struct sg_table table;
> +	struct device *dev;
> +};
> +
> +static int rstmem_dma_attach(struct dma_buf *dmabuf,
> +			     struct dma_buf_attachment *attachment)
> +{
> +	struct tee_shm *shm = dmabuf->priv;
> +	struct tee_rstmem_attachment *a;
> +	int rc;
> +
> +	a = kzalloc(sizeof(*a), GFP_KERNEL);
> +	if (!a)
> +		return -ENOMEM;
> +
> +	if (shm->pages) {
> +		rc = sg_alloc_table_from_pages(&a->table, shm->pages,
> +					       shm->num_pages, 0,
> +					       shm->num_pages * PAGE_SIZE,
> +					       GFP_KERNEL);
> +		if (rc)
> +			goto err;
> +	} else {
> +		rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
> +		if (rc)
> +			goto err;
> +		sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
> +			    0);
> +	}
> +
> +	a->dev = attachment->dev;
> +	attachment->priv = a;
> +
> +	return 0;
> +err:
> +	kfree(a);
> +	return rc;
> +}
> +
> +static void rstmem_dma_detach(struct dma_buf *dmabuf,
> +			      struct dma_buf_attachment *attachment)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +
> +	sg_free_table(&a->table);
> +	kfree(a);
> +}
> +
> +static struct sg_table *
> +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
> +		       enum dma_data_direction direction)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +	int ret;
> +
> +	ret = dma_map_sgtable(attachment->dev, &a->table, direction,
> +			      DMA_ATTR_SKIP_CPU_SYNC);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return &a->table;
> +}
> +
> +static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
> +				     struct sg_table *table,
> +				     enum dma_data_direction direction)
> +{
> +	struct tee_rstmem_attachment *a = attachment->priv;
> +
> +	WARN_ON(&a->table != table);
> +
> +	dma_unmap_sgtable(attachment->dev, table, direction,
> +			  DMA_ATTR_SKIP_CPU_SYNC);
> +}
> +
> +static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> +					   enum dma_data_direction direction)
> +{
> +	return -EPERM;
> +}
> +
> +static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> +					 enum dma_data_direction direction)
> +{
> +	return -EPERM;
> +}
> +
> +static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
> +			       struct vm_area_struct *vma)
> +{
> +	return -EPERM;
> +}
> +
> +static void rstmem_dma_buf_free(struct dma_buf *dmabuf)
> +{
> +	struct tee_shm *shm = dmabuf->priv;
> +
> +	tee_shm_put(shm);
> +}
> +
> +static const struct dma_buf_ops rstmem_generic_buf_ops = {
> +	.attach = rstmem_dma_attach,
> +	.detach = rstmem_dma_detach,
> +	.map_dma_buf = rstmem_dma_map_dma_buf,
> +	.unmap_dma_buf = rstmem_dma_unmap_dma_buf,
> +	.begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
> +	.end_cpu_access = rstmem_dma_buf_end_cpu_access,
> +	.mmap = rstmem_dma_buf_mmap,
> +	.release = rstmem_dma_buf_free,
> +};
> +
> +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
> +				 u32 use_case, size_t size, int *shm_id)
> +{
> +	struct tee_device *teedev = ctx->teedev;
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	struct dma_buf *dmabuf;
> +	struct tee_shm *shm;
> +	void *ret;
> +	int rc;
> +
> +	if (!tee_device_get(teedev))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!teedev->desc->ops->rstmem_alloc ||
> +	    !teedev->desc->ops->rstmem_free) {
> +		dmabuf = ERR_PTR(-EINVAL);
> +		goto err;
> +	}
> +
> +	shm = kzalloc(sizeof(*shm), GFP_KERNEL);
> +	if (!shm) {
> +		dmabuf = ERR_PTR(-ENOMEM);
> +		goto err;
> +	}
> +
> +	refcount_set(&shm->refcount, 1);
> +	shm->flags = TEE_SHM_RESTRICTED;
> +	shm->use_case = use_case;
> +	shm->ctx = ctx;
> +
> +	mutex_lock(&teedev->mutex);
> +	shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
> +	mutex_unlock(&teedev->mutex);
> +	if (shm->id < 0) {
> +		dmabuf = ERR_PTR(shm->id);
> +		goto err_kfree;
> +	}
> +
> +	rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
> +	if (rc) {
> +		dmabuf = ERR_PTR(rc);
> +		goto err_idr_remove;
> +	}
> +
> +	mutex_lock(&teedev->mutex);
> +	ret = idr_replace(&teedev->idr, shm, shm->id);
> +	mutex_unlock(&teedev->mutex);
> +	if (IS_ERR(ret)) {
> +		dmabuf = ret;
> +		goto err_rstmem_free;
> +	}
> +	teedev_ctx_get(ctx);
> +
> +	exp_info.ops = &rstmem_generic_buf_ops;
> +	exp_info.size = shm->size;
> +	exp_info.priv = shm;
> +	dmabuf = dma_buf_export(&exp_info);
> +	if (IS_ERR(dmabuf)) {
> +		tee_shm_put(shm);
> +		return dmabuf;
> +	}
> +
> +	*shm_id = shm->id;
> +	return dmabuf;
> +
> +err_rstmem_free:
> +	teedev->desc->ops->rstmem_free(ctx, shm);
> +err_idr_remove:
> +	mutex_lock(&teedev->mutex);
> +	idr_remove(&teedev->idr, shm->id);
> +	mutex_unlock(&teedev->mutex);
> +err_kfree:
> +	kfree(shm);
> +err:
> +	tee_device_put(teedev);
> +	return dmabuf;
> +}
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index daf6e5cfd59a..416f7f25d885 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm)
>  				"unregister shm %p failed: %d", shm, rc);
>  
>  		release_registered_pages(shm);
> +	} else if (shm->flags & TEE_SHM_RESTRICTED) {
> +		teedev->desc->ops->rstmem_free(shm->ctx, shm);
>  	}
>  
>  	teedev_ctx_put(shm->ctx);
> diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c
> index 80004b55628d..ee57ef157a77 100644
> --- a/drivers/tee/tee_shm_pool.c
> +++ b/drivers/tee/tee_shm_pool.c
> @@ -1,9 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /*
> - * Copyright (c) 2015, 2017, 2022 Linaro Limited
> + * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
>   */
>  #include <linux/device.h>
> -#include <linux/dma-buf.h>
>  #include <linux/genalloc.h>
>  #include <linux/slab.h>
>  #include <linux/tee_core.h>
> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr,
>  	return ERR_PTR(rc);
>  }
>  EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
> +
> +static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
> +				    struct tee_shm *shm, size_t size,
> +				    size_t align)
> +{
> +	size_t sz = ALIGN(size, PAGE_SIZE);
> +	phys_addr_t pa;
> +
> +	pa = gen_pool_alloc(pool->private_data, sz);
> +	if (!pa)
> +		return -ENOMEM;
> +
> +	shm->size = sz;
> +	shm->paddr = pa;
> +
> +	return 0;
> +}
> +
> +static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
> +				    struct tee_shm *shm)
> +{
> +	gen_pool_free(pool->private_data, shm->paddr, shm->size);
> +	shm->paddr = 0;
> +}
> +
> +static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
> +	.alloc = rstmem_pool_op_gen_alloc,
> +	.free = rstmem_pool_op_gen_free,
> +	.destroy_pool = pool_op_gen_destroy_pool,
> +};
> +
> +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size)
> +{
> +	const size_t page_mask = PAGE_SIZE - 1;
> +	struct tee_shm_pool *pool;
> +	int rc;
> +
> +	/* Check it's page aligned */
> +	if ((paddr | size) & page_mask)
> +		return ERR_PTR(-EINVAL);
> +
> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> +	if (!pool)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
> +	if (!pool->private_data) {
> +		rc = -ENOMEM;
> +		goto err_free;
> +	}
> +
> +	rc = gen_pool_add(pool->private_data, paddr, size, -1);
> +	if (rc)
> +		goto err_free_pool;
> +
> +	pool->ops = &rstmem_pool_ops_generic;
> +	return pool;
> +
> +err_free_pool:
> +	gen_pool_destroy(pool->private_data);
> +err_free:
> +	kfree(pool);
> +
> +	return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc);
> diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
> index a38494d6b5f4..608302f494fe 100644
> --- a/include/linux/tee_core.h
> +++ b/include/linux/tee_core.h
> @@ -26,6 +26,7 @@
>  #define TEE_SHM_USER_MAPPED	BIT(1)  /* Memory mapped in user space */
>  #define TEE_SHM_POOL		BIT(2)  /* Memory allocated from pool */
>  #define TEE_SHM_PRIV		BIT(3)  /* Memory private to TEE driver */
> +#define TEE_SHM_RESTRICTED	BIT(4)  /* Restricted memory */
>  
>  #define TEE_DEVICE_FLAG_REGISTERED	0x1
>  #define TEE_MAX_DEV_NAME_LEN		32
> @@ -76,6 +77,8 @@ struct tee_device {
>   * @supp_send:		called for supplicant to send a response
>   * @shm_register:	register shared memory buffer in TEE
>   * @shm_unregister:	unregister shared memory buffer in TEE
> + * @rstmem_alloc:	allocate restricted memory
> + * @rstmem_free:	free restricted memory
>   */
>  struct tee_driver_ops {
>  	void (*get_version)(struct tee_device *teedev,
> @@ -99,6 +102,9 @@ struct tee_driver_ops {
>  			    struct page **pages, size_t num_pages,
>  			    unsigned long start);
>  	int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
> +	int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
> +			    u32 flags, u32 use_case, size_t size);
> +	void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
>  };
>  
>  /**
> @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool)
>  	pool->ops->destroy_pool(pool);
>  }
>  
> +/**
> + * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
> + * @paddr:	Physical address of start of pool
> + * @size:	Size in bytes of the pool
> + *
> + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
> + */
> +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
> +
>  /**
>   * tee_get_drvdata() - Return driver_data pointer
>   * @returns the driver_data pointer supplied to tee_register().
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index a54c203000ed..cba067715d14 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -55,6 +55,7 @@ struct tee_context {
>   * @pages:	locked pages from userspace
>   * @num_pages:	number of locked pages
>   * @refcount:	reference counter
> + * @use_case:	defined by TEE_IOC_UC_* in tee.h
>   * @flags:	defined by TEE_SHM_* in tee_core.h
>   * @id:		unique id of a shared memory object on this device, shared
>   *		with user space
> @@ -71,6 +72,7 @@ struct tee_shm {
>  	struct page **pages;
>  	size_t num_pages;
>  	refcount_t refcount;
> +	u32 use_case;
>  	u32 flags;
>  	int id;
>  	u64 sec_world_id;
> diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> index d0430bee8292..88834448debb 100644
> --- a/include/uapi/linux/tee.h
> +++ b/include/uapi/linux/tee.h
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2015-2016, Linaro Limited
> + * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
>   * All rights reserved.
>   *
>   * Redistribution and use in source and binary forms, with or without
> @@ -48,6 +48,7 @@
>  #define TEE_GEN_CAP_PRIVILEGED	(1 << 1)/* Privileged device (for supplicant) */
>  #define TEE_GEN_CAP_REG_MEM	(1 << 2)/* Supports registering shared memory */
>  #define TEE_GEN_CAP_MEMREF_NULL	(1 << 3)/* NULL MemRef support */
> +#define TEE_GEN_CAP_RSTMEM	(1 << 4)/* Supports restricted memory */
>  
>  #define TEE_MEMREF_NULL		(__u64)(-1) /* NULL MemRef Buffer */
>  
> @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data {
>   */
>  #define TEE_IOC_SHM_REGISTER   _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \
>  				     struct tee_ioctl_shm_register_data)
> +
> +#define TEE_IOC_UC_RESERVED		0
> +#define TEE_IOC_UC_SECURE_VIDEO_PLAY	1
> +#define TEE_IOC_UC_TRUSTED_UI		2
> +#define TEE_IOC_UC_SECURE_VIDEO_RECORD	3
> +
> +/**
> + * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
> + * @size:	[in/out] Size of restricted memory to allocate
> + * @flags:	[in/out] Flags to/from allocate
> + * @use_case	[in] Restricted memory use case, TEE_IOC_UC_*
> + * @id:		[out] Identifier of the restricted memory
> + */
> +struct tee_ioctl_rstmem_alloc_data {
> +	__u64 size;
> +	__u32 flags;
> +	__u32 use_case;
> +	__s32 id;
> +};
> +
> +/**
> + * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
> + *
> + * Allocates restricted physically memory normally not accessible by the
> + * kernel.
> + *
> + * Restricted memory refers to memory buffers behind a hardware enforced
> + * firewall. It is not accessible to the kernel during normal circumstances
> + * but rather only accessible to certain hardware IPs or CPUs executing in
> + * higher privileged mode than the kernel itself. This interface allows to
> + * allocate and manage such restricted memory buffers via interaction with
> + * a TEE implementation.
> + *
> + * Returns a file descriptor on success or < 0 on failure
> + *
> + * The returned file descriptor is a dma-buf that can be attached and
> + * mapped for device with permission to access the physical memory.
> + */
> +#define TEE_IOC_RSTMEM_ALLOC     _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
> +				       struct tee_ioctl_rstmem_alloc_data)
> +
>  /*
>   * Five syscalls are used when communicating with the TEE driver.
>   * open(): opens the device associated with the driver



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13  8:35     ` Boris Brezillon
@ 2025-02-13  9:16       ` Sumit Garg
  2025-02-13 12:40         ` Boris Brezillon
  0 siblings, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2025-02-13  9:16 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Thu, 13 Feb 2025 at 14:06, Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Thu, 13 Feb 2025 12:11:52 +0530
> Sumit Garg <sumit.garg@linaro.org> wrote:
>
> > Hi Boris,
> >
> > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > +Florent, who's working on protected-mode support in Panthor.
> > >
> > > Hi Jens,
> > >
> > > On Tue, 17 Dec 2024 11:07:36 +0100
> > > Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > >
> > > > Hi,
> > > >
> > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.
> > >
> > > We're currently working on protected-mode support for Panthor [1] and it
> > > looks like your series (and the OP-TEE implementation that goes with
> > > it) would allow us to have a fully upstream/open solution for the
> > > protected content use case we're trying to support. I need a bit more
> > > time to play with the implementation but this looks very promising
> > > (especially the lend rstmem feature, which might help us allocate our
> > > FW sections that are supposed to execute code accessing protected
> > > content).
> >
> > Glad to hear that, if you can demonstrate an open source use case
> > based on this series then it will help to land it. We really would
> > love to see support for restricted DMA-buf consumers be it GPU, crypto
> > accelerator, media pipeline etc.
> >
> > >
> > > >
> > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > > restrictions for the memory used for the DMA-bufs.
> > > >
> > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > > how to allocate the restricted physical memory.
> > >
> > > I'll probably have more questions soon, but here's one to start: any
> > > particular reason you didn't go for a dma-heap to expose restricted
> > > buffer allocation to userspace? I see you already have a cdev you can
> > > take ioctl()s from, but my understanding was that dma-heap was the
> > > standard solution for these device-agnostic/central allocators.
> >
> > This series started with the DMA heap approach only here [1] but later
> > discussions [2] lead us here. To point out specifically:
> >
> > - DMA heaps require reliance on DT to discover static restricted
> > regions carve-outs whereas via the TEE implementation driver (eg.
> > OP-TEE) those can be discovered dynamically.
>
> Hm, the system heap [1] doesn't rely on any DT information AFAICT.

Yeah but all the prior vendor specific secure/restricted DMA heaps
relied on DT information.

> The dynamic allocation scheme, where the TEE implementation allocates a
> chunk of protected memory for us would have a similar behavior, I guess.

In a dynamic scheme, the allocation will still be from CMA or system
heap depending on TEE implementation capabilities but the restriction
will be enforced via interaction with TEE.

>
> > - Dynamic allocation of buffers and making them restricted requires
> > vendor specific driver hooks with DMA heaps whereas the TEE subsystem
> > abstracts that out with underlying TEE implementation (eg. OP-TEE)
> > managing the dynamic buffer restriction.
>
> Yeah, the lend rstmem feature is clearly something tee specific, and I
> think that's okay to assume the user knows the protection request
> should go through the tee subsystem in that case.

Yeah but how will the user discover that? Rather than that it's better
for the user to directly ask the TEE device to allocate restricted
memory without worrying about how the memory restriction gets
enforced.

>
> > - TEE subsystem already has a well defined user-space interface for
> > managing shared memory buffers with TEE and restricted DMA buffers
> > will be yet another interface managed along similar lines.
>
> Okay, so the very reason I'm asking about the dma-buf heap interface is
> because there might be cases where the protected/restricted allocation
> doesn't go through the TEE (Mediatek has a TEE-free implementation
> for instance, but I realize vendor implementations are probably not the
> best selling point :-/).

You can always have a system with memory and peripheral access
permissions setup during boot (or even have a pre-configured hardware
as a special case) prior to booting up the kernel too. But that even
gets somehow configured by a TEE implementation during boot, so
calling it a TEE-free implementation seems over-simplified and not a
scalable solution. However, this patchset [1] from Mediatek requires
runtime TEE interaction too.

[1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/

> If we expose things as a dma-heap, we have
> a solution where integrators can pick the dma-heap they think is
> relevant for protected buffer allocations without the various drivers
> (GPU, video codec, ...) having to implement a dispatch function for all
> possible implementations. The same goes for userspace allocations,
> where passing a dma-heap name, is simpler than supporting different
> ioctl()s based on the allocation backend.

There have been several attempts with DMA heaps in the past which all
resulted in a very vendor specific vertically integrated solution. But
the solution with TEE subsystem aims to make it generic and vendor
agnostic.

>
> [1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L424

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13  9:16       ` Sumit Garg
@ 2025-02-13 12:40         ` Boris Brezillon
  2025-02-13 14:05           ` Daniel Stone
  0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-02-13 12:40 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Thu, 13 Feb 2025 14:46:01 +0530
Sumit Garg <sumit.garg@linaro.org> wrote:

> On Thu, 13 Feb 2025 at 14:06, Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Thu, 13 Feb 2025 12:11:52 +0530
> > Sumit Garg <sumit.garg@linaro.org> wrote:
> >  
> > > Hi Boris,
> > >
> > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:  
> > > >
> > > > +Florent, who's working on protected-mode support in Panthor.
> > > >
> > > > Hi Jens,
> > > >
> > > > On Tue, 17 Dec 2024 11:07:36 +0100
> > > > Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > > >  
> > > > > Hi,
> > > > >
> > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem.  
> > > >
> > > > We're currently working on protected-mode support for Panthor [1] and it
> > > > looks like your series (and the OP-TEE implementation that goes with
> > > > it) would allow us to have a fully upstream/open solution for the
> > > > protected content use case we're trying to support. I need a bit more
> > > > time to play with the implementation but this looks very promising
> > > > (especially the lend rstmem feature, which might help us allocate our
> > > > FW sections that are supposed to execute code accessing protected
> > > > content).  
> > >
> > > Glad to hear that, if you can demonstrate an open source use case
> > > based on this series then it will help to land it. We really would
> > > love to see support for restricted DMA-buf consumers be it GPU, crypto
> > > accelerator, media pipeline etc.
> > >  
> > > >  
> > > > >
> > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE
> > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
> > > > > restrictions for the memory used for the DMA-bufs.
> > > > >
> > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
> > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
> > > > > how to allocate the restricted physical memory.  
> > > >
> > > > I'll probably have more questions soon, but here's one to start: any
> > > > particular reason you didn't go for a dma-heap to expose restricted
> > > > buffer allocation to userspace? I see you already have a cdev you can
> > > > take ioctl()s from, but my understanding was that dma-heap was the
> > > > standard solution for these device-agnostic/central allocators.  
> > >
> > > This series started with the DMA heap approach only here [1] but later
> > > discussions [2] lead us here. To point out specifically:
> > >
> > > - DMA heaps require reliance on DT to discover static restricted
> > > regions carve-outs whereas via the TEE implementation driver (eg.
> > > OP-TEE) those can be discovered dynamically.  
> >
> > Hm, the system heap [1] doesn't rely on any DT information AFAICT.  
> 
> Yeah but all the prior vendor specific secure/restricted DMA heaps
> relied on DT information.

Right, but there's nothing in the DMA heap provider API forcing that.

> 
> > The dynamic allocation scheme, where the TEE implementation allocates a
> > chunk of protected memory for us would have a similar behavior, I guess.  
> 
> In a dynamic scheme, the allocation will still be from CMA or system
> heap depending on TEE implementation capabilities but the restriction
> will be enforced via interaction with TEE.

Sorry, that's a wording issue. By dynamic allocation I meant the mode
where allocations goes through the TEE, not the lend rstmem thing. BTW,
calling the lend mode dynamic-allocation is kinda confusing, because in
a sense, both modes can be considered dynamic allocation from the user
PoV. I get that when the TEE allocates memory, it's picking from its
fixed address/size pool, hence the name, but when I first read this, I
thought the dynamic mode was the other one, and the static mode was the
one where you reserve a mem range from the DT, query it from the driver
and pass it to the TEE to restrict access post reservation/static
allocation.

> 
> >  
> > > - Dynamic allocation of buffers and making them restricted requires
> > > vendor specific driver hooks with DMA heaps whereas the TEE subsystem
> > > abstracts that out with underlying TEE implementation (eg. OP-TEE)
> > > managing the dynamic buffer restriction.  
> >
> > Yeah, the lend rstmem feature is clearly something tee specific, and I
> > think that's okay to assume the user knows the protection request
> > should go through the tee subsystem in that case.  
> 
> Yeah but how will the user discover that?

There's nothing to discover here. It would just be explicitly specified:

- for in-kernel users it can be a module parameter (or a DT prop if
  that's deemed acceptable)
- for userspace, it can be an envvar, a config file, or whatever the
  app/lib uses to get config options

> Rather than that it's better
> for the user to directly ask the TEE device to allocate restricted
> memory without worrying about how the memory restriction gets
> enforced.

If the consensus is that restricted/protected memory allocation should
always be routed to the TEE, sure, but I had the feeling this wasn't as
clear as that. OTOH, using a dma-heap to expose the TEE-SDP
implementation provides the same benefits, without making potential
future non-TEE based implementations a pain for users. The dma-heap
ioctl being common to all implementations, it just becomes a
configuration matter if we want to change the heap we rely on for
protected/restricted buffer allocation. And because heaps have
unique/well-known names, users can still default to (or rely solely on)
the TEE-SPD implementation if they want.

> 
> >  
> > > - TEE subsystem already has a well defined user-space interface for
> > > managing shared memory buffers with TEE and restricted DMA buffers
> > > will be yet another interface managed along similar lines.  
> >
> > Okay, so the very reason I'm asking about the dma-buf heap interface is
> > because there might be cases where the protected/restricted allocation
> > doesn't go through the TEE (Mediatek has a TEE-free implementation
> > for instance, but I realize vendor implementations are probably not the
> > best selling point :-/).  
> 
> You can always have a system with memory and peripheral access
> permissions setup during boot (or even have a pre-configured hardware
> as a special case) prior to booting up the kernel too. But that even
> gets somehow configured by a TEE implementation during boot, so
> calling it a TEE-free implementation seems over-simplified and not a
> scalable solution. However, this patchset [1] from Mediatek requires
> runtime TEE interaction too.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/
> 
> > If we expose things as a dma-heap, we have
> > a solution where integrators can pick the dma-heap they think is
> > relevant for protected buffer allocations without the various drivers
> > (GPU, video codec, ...) having to implement a dispatch function for all
> > possible implementations. The same goes for userspace allocations,
> > where passing a dma-heap name, is simpler than supporting different
> > ioctl()s based on the allocation backend.  
> 
> There have been several attempts with DMA heaps in the past which all
> resulted in a very vendor specific vertically integrated solution. But
> the solution with TEE subsystem aims to make it generic and vendor
> agnostic.

Just because all previous protected/restricted dma-heap effort
failed to make it upstream, doesn't mean dma-heap is the wrong way of
exposing this feature IMHO.

Regards,

Boris


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13 12:40         ` Boris Brezillon
@ 2025-02-13 14:05           ` Daniel Stone
  2025-02-13 15:57             ` Jens Wiklander
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Stone @ 2025-02-13 14:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Sumit Garg, Jens Wiklander, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi,

On Thu, 13 Feb 2025 at 12:40, Boris Brezillon
<boris.brezillon@collabora.com> wrote:
> On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg@linaro.org> wrote:
> > Yeah but all the prior vendor specific secure/restricted DMA heaps
> > relied on DT information.
>
> Right, but there's nothing in the DMA heap provider API forcing that.

Yeah. DMA heaps are just a way to allocate memory from a specific
place. It allows people to settle on having a single way to do
allocations from weird platform-specific places; the only weird
platform-specific part userspace needs to deal with is figuring out
the name to use. The rest is at least a unified API: the point of
dma-heaps was exactly to have a single coherent API for userspace, not
to create one API for ZONE_CMA and DT ranges and everyone else doing
their own thing.

> > Rather than that it's better
> > for the user to directly ask the TEE device to allocate restricted
> > memory without worrying about how the memory restriction gets
> > enforced.
>
> If the consensus is that restricted/protected memory allocation should
> always be routed to the TEE, sure, but I had the feeling this wasn't as
> clear as that. OTOH, using a dma-heap to expose the TEE-SDP
> implementation provides the same benefits, without making potential
> future non-TEE based implementations a pain for users. The dma-heap
> ioctl being common to all implementations, it just becomes a
> configuration matter if we want to change the heap we rely on for
> protected/restricted buffer allocation. And because heaps have
> unique/well-known names, users can still default to (or rely solely on)
> the TEE-SPD implementation if they want.
>
> > There have been several attempts with DMA heaps in the past which all
> > resulted in a very vendor specific vertically integrated solution. But
> > the solution with TEE subsystem aims to make it generic and vendor
> > agnostic.
>
> Just because all previous protected/restricted dma-heap effort
> failed to make it upstream, doesn't mean dma-heap is the wrong way of
> exposing this feature IMHO.

To be fair, having a TEE implementation does give us a much better
chance of having a sensible cross-vendor plan. And the fact it's
already (sort of accidentally and only on one platform AFAICT) ready
for a 'test' interface, where we can still exercise protected
allocation paths but without having to go through all the
platform-specific setup that is inaccessible to most people, is also
really great! That's probably been the biggest barrier to having this
tested outside of IHVs and OEMs.

But just because TEE is one good backend implementation, doesn't mean
it should be the userspace ABI. Why should userspace care that TEE has
mediated the allocation instead of it being a predefined range within
DT? How does userspace pick which TEE device to use?  What advantage
does userspace get from having to have a different codepath to get a
different handle to memory?  What about x86?

I think this proposal is looking at it from the wrong direction.
Instead of working upwards from the implementation to userspace, start
with userspace and work downwards. The interesting property to focus
on is allocating memory, not that EL1 is involved behind the scenes.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13 14:05           ` Daniel Stone
@ 2025-02-13 15:57             ` Jens Wiklander
  2025-02-13 17:39               ` Daniel Stone
  0 siblings, 1 reply; 40+ messages in thread
From: Jens Wiklander @ 2025-02-13 15:57 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Boris Brezillon, Sumit Garg, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi,

On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Thu, 13 Feb 2025 at 12:40, Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> > On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg@linaro.org> wrote:
> > > Yeah but all the prior vendor specific secure/restricted DMA heaps
> > > relied on DT information.
> >
> > Right, but there's nothing in the DMA heap provider API forcing that.
>
> Yeah. DMA heaps are just a way to allocate memory from a specific
> place. It allows people to settle on having a single way to do
> allocations from weird platform-specific places; the only weird
> platform-specific part userspace needs to deal with is figuring out
> the name to use. The rest is at least a unified API: the point of
> dma-heaps was exactly to have a single coherent API for userspace, not
> to create one API for ZONE_CMA and DT ranges and everyone else doing
> their own thing.
>
> > > Rather than that it's better
> > > for the user to directly ask the TEE device to allocate restricted
> > > memory without worrying about how the memory restriction gets
> > > enforced.
> >
> > If the consensus is that restricted/protected memory allocation should
> > always be routed to the TEE, sure, but I had the feeling this wasn't as
> > clear as that. OTOH, using a dma-heap to expose the TEE-SDP
> > implementation provides the same benefits, without making potential
> > future non-TEE based implementations a pain for users. The dma-heap
> > ioctl being common to all implementations, it just becomes a
> > configuration matter if we want to change the heap we rely on for
> > protected/restricted buffer allocation. And because heaps have
> > unique/well-known names, users can still default to (or rely solely on)
> > the TEE-SPD implementation if they want.
> >
> > > There have been several attempts with DMA heaps in the past which all
> > > resulted in a very vendor specific vertically integrated solution. But
> > > the solution with TEE subsystem aims to make it generic and vendor
> > > agnostic.
> >
> > Just because all previous protected/restricted dma-heap effort
> > failed to make it upstream, doesn't mean dma-heap is the wrong way of
> > exposing this feature IMHO.
>
> To be fair, having a TEE implementation does give us a much better
> chance of having a sensible cross-vendor plan. And the fact it's
> already (sort of accidentally and only on one platform AFAICT) ready
> for a 'test' interface, where we can still exercise protected
> allocation paths but without having to go through all the
> platform-specific setup that is inaccessible to most people, is also
> really great! That's probably been the biggest barrier to having this
> tested outside of IHVs and OEMs.
>
> But just because TEE is one good backend implementation, doesn't mean
> it should be the userspace ABI. Why should userspace care that TEE has
> mediated the allocation instead of it being a predefined range within
> DT?

The TEE may very well use a predefined range that part is abstracted
with the interface.

> How does userspace pick which TEE device to use?

There's normally only one and even if there is more than one it should
be safe to assume that only one of them should be used when allocating
restricted memory (TEE_GEN_CAP_RSTMEM from TEE_IOC_VERSION).

>  What advantage
> does userspace get from having to have a different codepath to get a
> different handle to memory? What about x86?
>
> I think this proposal is looking at it from the wrong direction.
> Instead of working upwards from the implementation to userspace, start
> with userspace and work downwards. The interesting property to focus
> on is allocating memory, not that EL1 is involved behind the scenes.

From what I've gathered from earlier discussions, it wasn't much of a
problem for userspace to handle this. If the kernel were to provide it
via a different ABI, how would it be easier to implement in the
kernel? I think we need an example to understand your suggestion.

Cheers,
Jens


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13 15:57             ` Jens Wiklander
@ 2025-02-13 17:39               ` Daniel Stone
  2025-02-14 10:07                 ` Jens Wiklander
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Stone @ 2025-02-13 17:39 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Boris Brezillon, Sumit Garg, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi,

On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > But just because TEE is one good backend implementation, doesn't mean
> > it should be the userspace ABI. Why should userspace care that TEE has
> > mediated the allocation instead of it being a predefined range within
> > DT?
>
> The TEE may very well use a predefined range that part is abstracted
> with the interface.

Of course. But you can also (and this has been shipped on real
devices) handle this without any per-allocation TEE needs by simply
allocating from a memory range which is predefined within DT.

From the userspace point of view, why should there be one ABI to
allocate memory from a predefined range which is delivered by DT to
the kernel, and one ABI to allocate memory from a predefined range
which is mediated by TEE?

> >  What advantage
> > does userspace get from having to have a different codepath to get a
> > different handle to memory? What about x86?
> >
> > I think this proposal is looking at it from the wrong direction.
> > Instead of working upwards from the implementation to userspace, start
> > with userspace and work downwards. The interesting property to focus
> > on is allocating memory, not that EL1 is involved behind the scenes.
>
> From what I've gathered from earlier discussions, it wasn't much of a
> problem for userspace to handle this. If the kernel were to provide it
> via a different ABI, how would it be easier to implement in the
> kernel? I think we need an example to understand your suggestion.

It is a problem for userspace, because we need to expose acceptable
parameters for allocation through the entire stack. If you look at the
dmabuf documentation in the kernel for how buffers should be allocated
and exchanged, you can see the negotiation flow for modifiers. This
permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.

Standardising on heaps allows us to add those in a similar way. If we
have to add different allocation mechanisms, then the complexity
increases, permeating not only into all the different userspace APIs,
but also into the drivers which need to support every different
allocation mechanism even if they have no opinion on it - e.g. Mali
doesn't care in any way whether the allocation comes from a heap or
TEE or ACPI or whatever, it cares only that the memory is protected.

Does that help?

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-13 17:39               ` Daniel Stone
@ 2025-02-14 10:07                 ` Jens Wiklander
  2025-02-14 13:07                   ` Sumit Garg
  0 siblings, 1 reply; 40+ messages in thread
From: Jens Wiklander @ 2025-02-14 10:07 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Boris Brezillon, Sumit Garg, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

Hi,

On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > But just because TEE is one good backend implementation, doesn't mean
> > > it should be the userspace ABI. Why should userspace care that TEE has
> > > mediated the allocation instead of it being a predefined range within
> > > DT?
> >
> > The TEE may very well use a predefined range that part is abstracted
> > with the interface.
>
> Of course. But you can also (and this has been shipped on real
> devices) handle this without any per-allocation TEE needs by simply
> allocating from a memory range which is predefined within DT.
>
> From the userspace point of view, why should there be one ABI to
> allocate memory from a predefined range which is delivered by DT to
> the kernel, and one ABI to allocate memory from a predefined range
> which is mediated by TEE?

We need some way to specify the protection profile (or use case as
I've called it in the ABI) required for the buffer. Whether it's
defined in DT seems irrelevant.

>
> > >  What advantage
> > > does userspace get from having to have a different codepath to get a
> > > different handle to memory? What about x86?
> > >
> > > I think this proposal is looking at it from the wrong direction.
> > > Instead of working upwards from the implementation to userspace, start
> > > with userspace and work downwards. The interesting property to focus
> > > on is allocating memory, not that EL1 is involved behind the scenes.
> >
> > From what I've gathered from earlier discussions, it wasn't much of a
> > problem for userspace to handle this. If the kernel were to provide it
> > via a different ABI, how would it be easier to implement in the
> > kernel? I think we need an example to understand your suggestion.
>
> It is a problem for userspace, because we need to expose acceptable
> parameters for allocation through the entire stack. If you look at the
> dmabuf documentation in the kernel for how buffers should be allocated
> and exchanged, you can see the negotiation flow for modifiers. This
> permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.

What dma-buf properties are you referring to?
dma_heap_ioctl_allocate() accepts a few flags for the resulting file
descriptor and no flags for the heap itself.

>
> Standardising on heaps allows us to add those in a similar way.

How would you solve this with heaps? Would you use one heap for each
protection profile (use case), add heap_flags, or do a bit of both?

> If we
> have to add different allocation mechanisms, then the complexity
> increases, permeating not only into all the different userspace APIs,
> but also into the drivers which need to support every different
> allocation mechanism even if they have no opinion on it - e.g. Mali
> doesn't care in any way whether the allocation comes from a heap or
> TEE or ACPI or whatever, it cares only that the memory is protected.
>
> Does that help?

I think you're missing the stage where an unprotected buffer is
received and decrypted into a protected buffer. If you use the TEE for
decryption or to configure the involved devices for the use case, it
makes sense to let the TEE allocate the buffers, too. A TEE doesn't
have to be an OS in the secure world, it can be an abstraction to
support the use case depending on the design. So the restricted buffer
is already allocated before we reach Mali in your example.

Allocating restricted buffers from the TEE subsystem saves us from
maintaining proxy dma-buf heaps.

Cheers,
Jens

>
> Cheers,
> Daniel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-14 10:07                 ` Jens Wiklander
@ 2025-02-14 13:07                   ` Sumit Garg
  2025-02-14 15:48                     ` Boris Brezillon
  0 siblings, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2025-02-14 13:07 UTC (permalink / raw)
  To: Jens Wiklander, Daniel Stone
  Cc: Boris Brezillon, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote:
>
> Hi,
>
> On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote:
> >
> > Hi,
> >
> > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > > But just because TEE is one good backend implementation, doesn't mean
> > > > it should be the userspace ABI. Why should userspace care that TEE has
> > > > mediated the allocation instead of it being a predefined range within
> > > > DT?
> > >
> > > The TEE may very well use a predefined range that part is abstracted
> > > with the interface.
> >
> > Of course. But you can also (and this has been shipped on real
> > devices) handle this without any per-allocation TEE needs by simply
> > allocating from a memory range which is predefined within DT.
> >
> > From the userspace point of view, why should there be one ABI to
> > allocate memory from a predefined range which is delivered by DT to
> > the kernel, and one ABI to allocate memory from a predefined range
> > which is mediated by TEE?
>
> We need some way to specify the protection profile (or use case as
> I've called it in the ABI) required for the buffer. Whether it's
> defined in DT seems irrelevant.
>
> >
> > > >  What advantage
> > > > does userspace get from having to have a different codepath to get a
> > > > different handle to memory? What about x86?
> > > >
> > > > I think this proposal is looking at it from the wrong direction.
> > > > Instead of working upwards from the implementation to userspace, start
> > > > with userspace and work downwards. The interesting property to focus
> > > > on is allocating memory, not that EL1 is involved behind the scenes.
> > >
> > > From what I've gathered from earlier discussions, it wasn't much of a
> > > problem for userspace to handle this. If the kernel were to provide it
> > > via a different ABI, how would it be easier to implement in the
> > > kernel? I think we need an example to understand your suggestion.
> >
> > It is a problem for userspace, because we need to expose acceptable
> > parameters for allocation through the entire stack. If you look at the
> > dmabuf documentation in the kernel for how buffers should be allocated
> > and exchanged, you can see the negotiation flow for modifiers. This
> > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
>
> What dma-buf properties are you referring to?
> dma_heap_ioctl_allocate() accepts a few flags for the resulting file
> descriptor and no flags for the heap itself.
>
> >
> > Standardising on heaps allows us to add those in a similar way.
>
> How would you solve this with heaps? Would you use one heap for each
> protection profile (use case), add heap_flags, or do a bit of both?

Christian gave an historical background here [1] as to why that hasn't
worked in the past with DMA heaps given the scalability issues.

[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/

>
> > If we
> > have to add different allocation mechanisms, then the complexity
> > increases, permeating not only into all the different userspace APIs,
> > but also into the drivers which need to support every different
> > allocation mechanism even if they have no opinion on it - e.g. Mali
> > doesn't care in any way whether the allocation comes from a heap or
> > TEE or ACPI or whatever, it cares only that the memory is protected.
> >
> > Does that help?
>
> I think you're missing the stage where an unprotected buffer is
> received and decrypted into a protected buffer. If you use the TEE for
> decryption or to configure the involved devices for the use case, it
> makes sense to let the TEE allocate the buffers, too. A TEE doesn't
> have to be an OS in the secure world, it can be an abstraction to
> support the use case depending on the design. So the restricted buffer
> is already allocated before we reach Mali in your example.
>
> Allocating restricted buffers from the TEE subsystem saves us from
> maintaining proxy dma-buf heaps.

+1

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-14 13:07                   ` Sumit Garg
@ 2025-02-14 15:48                     ` Boris Brezillon
  2025-02-17  6:12                       ` Sumit Garg
  0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-02-14 15:48 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Jens Wiklander, Daniel Stone, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Fri, 14 Feb 2025 18:37:14 +0530
Sumit Garg <sumit.garg@linaro.org> wrote:

> On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> >
> > Hi,
> >
> > On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote:  
> > >
> > > Hi,
> > >
> > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:  
> > > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:  
> > > > > But just because TEE is one good backend implementation, doesn't mean
> > > > > it should be the userspace ABI. Why should userspace care that TEE has
> > > > > mediated the allocation instead of it being a predefined range within
> > > > > DT?  
> > > >
> > > > The TEE may very well use a predefined range that part is abstracted
> > > > with the interface.  
> > >
> > > Of course. But you can also (and this has been shipped on real
> > > devices) handle this without any per-allocation TEE needs by simply
> > > allocating from a memory range which is predefined within DT.
> > >
> > > From the userspace point of view, why should there be one ABI to
> > > allocate memory from a predefined range which is delivered by DT to
> > > the kernel, and one ABI to allocate memory from a predefined range
> > > which is mediated by TEE?  
> >
> > We need some way to specify the protection profile (or use case as
> > I've called it in the ABI) required for the buffer. Whether it's
> > defined in DT seems irrelevant.
> >  
> > >  
> > > > >  What advantage
> > > > > does userspace get from having to have a different codepath to get a
> > > > > different handle to memory? What about x86?
> > > > >
> > > > > I think this proposal is looking at it from the wrong direction.
> > > > > Instead of working upwards from the implementation to userspace, start
> > > > > with userspace and work downwards. The interesting property to focus
> > > > > on is allocating memory, not that EL1 is involved behind the scenes.  
> > > >
> > > > From what I've gathered from earlier discussions, it wasn't much of a
> > > > problem for userspace to handle this. If the kernel were to provide it
> > > > via a different ABI, how would it be easier to implement in the
> > > > kernel? I think we need an example to understand your suggestion.  
> > >
> > > It is a problem for userspace, because we need to expose acceptable
> > > parameters for allocation through the entire stack. If you look at the
> > > dmabuf documentation in the kernel for how buffers should be allocated
> > > and exchanged, you can see the negotiation flow for modifiers. This
> > > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.  
> >
> > What dma-buf properties are you referring to?
> > dma_heap_ioctl_allocate() accepts a few flags for the resulting file
> > descriptor and no flags for the heap itself.
> >  
> > >
> > > Standardising on heaps allows us to add those in a similar way.  
> >
> > How would you solve this with heaps? Would you use one heap for each
> > protection profile (use case), add heap_flags, or do a bit of both?

I would say one heap per-profile.

> 
> Christian gave an historical background here [1] as to why that hasn't
> worked in the past with DMA heaps given the scalability issues.
> 
> [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/

Hm, I fail to see where Christian dismiss the dma-heaps solution in
this email. He even says:

> If the memory is not physically attached to any device, but rather just 
memory attached to the CPU or a system wide memory controller then 
expose the memory as DMA-heap with specific requirements (e.g. certain 
sized pages, contiguous, restricted, encrypted, ...).

> 
> >  
> > > If we
> > > have to add different allocation mechanisms, then the complexity
> > > increases, permeating not only into all the different userspace APIs,
> > > but also into the drivers which need to support every different
> > > allocation mechanism even if they have no opinion on it - e.g. Mali
> > > doesn't care in any way whether the allocation comes from a heap or
> > > TEE or ACPI or whatever, it cares only that the memory is protected.
> > >
> > > Does that help?  
> >
> > I think you're missing the stage where an unprotected buffer is
> > received and decrypted into a protected buffer. If you use the TEE for
> > decryption or to configure the involved devices for the use case, it
> > makes sense to let the TEE allocate the buffers, too. A TEE doesn't
> > have to be an OS in the secure world, it can be an abstraction to
> > support the use case depending on the design. So the restricted buffer
> > is already allocated before we reach Mali in your example.
> >
> > Allocating restricted buffers from the TEE subsystem saves us from
> > maintaining proxy dma-buf heaps.  

Honestly, when I look at dma-heap implementations, they seem
to be trivial shells around existing (more complex) allocators, and the
boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
implementation, you already have, so we're talking about a hundred
lines of code to maintain, which shouldn't be significantly more than
what you have for the new ioctl() to be honest. And I'll insist on what
Daniel said, it's a small price to pay to have a standard interface to
expose to userspace. If dma-heaps are not used for this kind things, I
honestly wonder what they will be used for...

Regards,

Boris

[1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L314


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-14 15:48                     ` Boris Brezillon
@ 2025-02-17  6:12                       ` Sumit Garg
  2025-02-18 16:22                         ` Daniel Stone
  0 siblings, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2025-02-17  6:12 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Jens Wiklander, Daniel Stone, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, op-tee, linux-arm-kernel, Olivier Masse,
	Thierry Reding, Yong Wu, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T . J . Mercier, Christian König,
	Matthias Brugger, AngeloGioacchino Del Regno, azarrabi,
	Florent Tomasin

On Fri, 14 Feb 2025 at 21:19, Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Fri, 14 Feb 2025 18:37:14 +0530
> Sumit Garg <sumit.garg@linaro.org> wrote:
>
> > On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:
> > > > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > > > > But just because TEE is one good backend implementation, doesn't mean
> > > > > > it should be the userspace ABI. Why should userspace care that TEE has
> > > > > > mediated the allocation instead of it being a predefined range within
> > > > > > DT?
> > > > >
> > > > > The TEE may very well use a predefined range that part is abstracted
> > > > > with the interface.
> > > >
> > > > Of course. But you can also (and this has been shipped on real
> > > > devices) handle this without any per-allocation TEE needs by simply
> > > > allocating from a memory range which is predefined within DT.
> > > >
> > > > From the userspace point of view, why should there be one ABI to
> > > > allocate memory from a predefined range which is delivered by DT to
> > > > the kernel, and one ABI to allocate memory from a predefined range
> > > > which is mediated by TEE?
> > >
> > > We need some way to specify the protection profile (or use case as
> > > I've called it in the ABI) required for the buffer. Whether it's
> > > defined in DT seems irrelevant.
> > >
> > > >
> > > > > >  What advantage
> > > > > > does userspace get from having to have a different codepath to get a
> > > > > > different handle to memory? What about x86?
> > > > > >
> > > > > > I think this proposal is looking at it from the wrong direction.
> > > > > > Instead of working upwards from the implementation to userspace, start
> > > > > > with userspace and work downwards. The interesting property to focus
> > > > > > on is allocating memory, not that EL1 is involved behind the scenes.
> > > > >
> > > > > From what I've gathered from earlier discussions, it wasn't much of a
> > > > > problem for userspace to handle this. If the kernel were to provide it
> > > > > via a different ABI, how would it be easier to implement in the
> > > > > kernel? I think we need an example to understand your suggestion.
> > > >
> > > > It is a problem for userspace, because we need to expose acceptable
> > > > parameters for allocation through the entire stack. If you look at the
> > > > dmabuf documentation in the kernel for how buffers should be allocated
> > > > and exchanged, you can see the negotiation flow for modifiers. This
> > > > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
> > >
> > > What dma-buf properties are you referring to?
> > > dma_heap_ioctl_allocate() accepts a few flags for the resulting file
> > > descriptor and no flags for the heap itself.
> > >
> > > >
> > > > Standardising on heaps allows us to add those in a similar way.
> > >
> > > How would you solve this with heaps? Would you use one heap for each
> > > protection profile (use case), add heap_flags, or do a bit of both?
>
> I would say one heap per-profile.
>

And then it would have a per vendor multiplication factor as each
vendor enforces memory restriction in a platform specific manner which
won't scale.

> >
> > Christian gave an historical background here [1] as to why that hasn't
> > worked in the past with DMA heaps given the scalability issues.
> >
> > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/
>
> Hm, I fail to see where Christian dismiss the dma-heaps solution in
> this email. He even says:
>
> > If the memory is not physically attached to any device, but rather just
> memory attached to the CPU or a system wide memory controller then
> expose the memory as DMA-heap with specific requirements (e.g. certain
> sized pages, contiguous, restricted, encrypted, ...).

I am not saying Christian dismissed DMA heaps but rather how
scalability is an issue. What we are proposing here is a generic
interface via TEE to the firmware/Trusted OS which can perform all the
platform specific memory restrictions. This solution will scale across
vendors.

>
> >
> > >
> > > > If we
> > > > have to add different allocation mechanisms, then the complexity
> > > > increases, permeating not only into all the different userspace APIs,
> > > > but also into the drivers which need to support every different
> > > > allocation mechanism even if they have no opinion on it - e.g. Mali
> > > > doesn't care in any way whether the allocation comes from a heap or
> > > > TEE or ACPI or whatever, it cares only that the memory is protected.
> > > >
> > > > Does that help?
> > >
> > > I think you're missing the stage where an unprotected buffer is
> > > received and decrypted into a protected buffer. If you use the TEE for
> > > decryption or to configure the involved devices for the use case, it
> > > makes sense to let the TEE allocate the buffers, too. A TEE doesn't
> > > have to be an OS in the secure world, it can be an abstraction to
> > > support the use case depending on the design. So the restricted buffer
> > > is already allocated before we reach Mali in your example.
> > >
> > > Allocating restricted buffers from the TEE subsystem saves us from
> > > maintaining proxy dma-buf heaps.
>
> Honestly, when I look at dma-heap implementations, they seem
> to be trivial shells around existing (more complex) allocators, and the
> boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
> implementation, you already have, so we're talking about a hundred
> lines of code to maintain, which shouldn't be significantly more than
> what you have for the new ioctl() to be honest.

It will rather be redundant vendor specific code under DMA heaps
calling into firmware/Trusted OS to enforce memory restrictions as you
can look into Mediatek example [1]. With TEE subsystem managing that
it won't be the case as we will provide a common abstraction for the
communication with underlying firmware/Trusted OS.

[1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/

> And I'll insist on what
> Daniel said, it's a small price to pay to have a standard interface to
> expose to userspace. If dma-heaps are not used for this kind things, I
> honestly wonder what they will be used for...

Let's try not to forcefully find a use-case for DMA heaps when there
is a better alternative available. I am still failing to see why you
don't consider following as a standardised user-space interface:

"When user-space has to work with restricted memory, ask TEE device to
allocate it"

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-17  6:12                       ` Sumit Garg
@ 2025-02-18 16:22                         ` Daniel Stone
  2025-02-19 13:22                           ` Simona Vetter
  2025-02-21 11:24                           ` Sumit Garg
  0 siblings, 2 replies; 40+ messages in thread
From: Daniel Stone @ 2025-02-18 16:22 UTC (permalink / raw)
  To: Boris Brezillon, Jens Wiklander, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

Hi Sumit,

On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote:
> On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > I would say one heap per-profile.
>
> And then it would have a per vendor multiplication factor as each
> vendor enforces memory restriction in a platform specific manner which
> won't scale.

Yes, they do enforce it in a platform-specific manner, but so does
TEE. There is no one golden set of semantics which is globally
applicable between all hardware and all products in a useful manner.

So, if we define protected,secure-video +
protected,secure-video-record + protected,trusted-ui heap names, we
have exactly the same number of axes. The only change is from uint32_t
to string.

> > > Christian gave an historical background here [1] as to why that hasn't
> > > worked in the past with DMA heaps given the scalability issues.
> > >
> > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/
> >
> > Hm, I fail to see where Christian dismiss the dma-heaps solution in
> > this email. He even says:
> >
> > > If the memory is not physically attached to any device, but rather just
> > memory attached to the CPU or a system wide memory controller then
> > expose the memory as DMA-heap with specific requirements (e.g. certain
> > sized pages, contiguous, restricted, encrypted, ...).
>
> I am not saying Christian dismissed DMA heaps but rather how
> scalability is an issue. What we are proposing here is a generic
> interface via TEE to the firmware/Trusted OS which can perform all the
> platform specific memory restrictions. This solution will scale across
> vendors.

I read something completely different into Christian's mail.

What Christian is saying is that injecting generic constraint solving
into the kernel doesn't scale. It's not OK to build out generic
infrastructure in the kernel which queries a bunch of leaf drivers and
attempts to somehow come up with something which satisfies
userspace-provided constraints.

But this isn't the same thing as saying 'dma-heaps is wrong'! Again,
there is no additional complexity in the kernel between a dma-heap
which bridges over to TEE, and a TEE userspace interface which also
bridges over to TEE. Both of them are completely fine according to
what he's said.

> > Honestly, when I look at dma-heap implementations, they seem
> > to be trivial shells around existing (more complex) allocators, and the
> > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
> > implementation, you already have, so we're talking about a hundred
> > lines of code to maintain, which shouldn't be significantly more than
> > what you have for the new ioctl() to be honest.
>
> It will rather be redundant vendor specific code under DMA heaps
> calling into firmware/Trusted OS to enforce memory restrictions as you
> can look into Mediatek example [1]. With TEE subsystem managing that
> it won't be the case as we will provide a common abstraction for the
> communication with underlying firmware/Trusted OS.

Yes, it's common for everyone who uses TEE to implement SVP. It's not
common for the people who do _not_ use TEE to implement SVP. Which
means that userspace has to type out both, and what we're asking in
this thread is: why?

Why should userspace have to support dma-heap allocation for platforms
supporting SVP via a static DT-defined carveout as well as supporting
TEE API allocation for platforms supporting SVP via a dynamic
carveout? What benefit does it bring to have this surfaced as a
completely separate uAPI?

> > And I'll insist on what
> > Daniel said, it's a small price to pay to have a standard interface to
> > expose to userspace. If dma-heaps are not used for this kind things, I
> > honestly wonder what they will be used for...
>
> Let's try not to forcefully find a use-case for DMA heaps when there
> is a better alternative available.

What makes it better? If you could explain very clearly the benefit
userspace will gain from asking TEE to allocate $n bytes for
TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate
$n bytes for protected,secure-video, I think that would really help.
Right now, I don't understand how it would be better in any way
whatsoever for userspace. And I think your decision to implement it as
a separate API is based on a misunderstanding of Christian's position.

> I am still failing to see why you
> don't consider following as a standardised user-space interface:
>
> "When user-space has to work with restricted memory, ask TEE device to
> allocate it"

As far as I can tell, having userspace work with the TEE interface
brings zero benefit (again, please correct me if I'm wrong and explain
how it's better). The direct cost - call it a disbenefit - it brings
is that we have to spend a pile of time typing out support for TEE
allocation in every media/GPU/display driver/application, and when we
do any kind of negotiation, we have to have one protocol definition
for TEE and one for non-TEE.

dma-heaps was created to solve the problem of having too many
'allocate $n bytes from $specialplace' uAPIs. The proliferation was
painful and making it difficult for userspace to do what it needed to
do. Userspace doesn't _yet_ make full use of it, but the solution is
to make userspace make full use of it, not to go create entirely
separate allocation paths for unclear reasons.

Besides, I'm writing this from a platform that implements SVP not via
TEE. I've worked on platforms which implement SVP without any TEE,
where the TEE implementation would be at best a no-op stub, and at
worst flat-out impossible.

So that's 'why not TEE as the single uAPI for SVP'. So, again, let's
please turn this around: _why_ TEE? Who benefits from exposing this as
completely separate to the more generic uAPI that we specifically
designed to handle things like this?

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-18 16:22                         ` Daniel Stone
@ 2025-02-19 13:22                           ` Simona Vetter
  2025-02-21 11:24                           ` Sumit Garg
  1 sibling, 0 replies; 40+ messages in thread
From: Simona Vetter @ 2025-02-19 13:22 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Boris Brezillon, Jens Wiklander, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

On Tue, Feb 18, 2025 at 04:22:10PM +0000, Daniel Stone wrote:
> Hi Sumit,
> 
> On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote:
> > On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > > I would say one heap per-profile.
> >
> > And then it would have a per vendor multiplication factor as each
> > vendor enforces memory restriction in a platform specific manner which
> > won't scale.
> 
> Yes, they do enforce it in a platform-specific manner, but so does
> TEE. There is no one golden set of semantics which is globally
> applicable between all hardware and all products in a useful manner.
> 
> So, if we define protected,secure-video +
> protected,secure-video-record + protected,trusted-ui heap names, we
> have exactly the same number of axes. The only change is from uint32_t
> to string.
> 
> > > > Christian gave an historical background here [1] as to why that hasn't
> > > > worked in the past with DMA heaps given the scalability issues.
> > > >
> > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/
> > >
> > > Hm, I fail to see where Christian dismiss the dma-heaps solution in
> > > this email. He even says:
> > >
> > > > If the memory is not physically attached to any device, but rather just
> > > memory attached to the CPU or a system wide memory controller then
> > > expose the memory as DMA-heap with specific requirements (e.g. certain
> > > sized pages, contiguous, restricted, encrypted, ...).
> >
> > I am not saying Christian dismissed DMA heaps but rather how
> > scalability is an issue. What we are proposing here is a generic
> > interface via TEE to the firmware/Trusted OS which can perform all the
> > platform specific memory restrictions. This solution will scale across
> > vendors.
> 
> I read something completely different into Christian's mail.
> 
> What Christian is saying is that injecting generic constraint solving
> into the kernel doesn't scale. It's not OK to build out generic
> infrastructure in the kernel which queries a bunch of leaf drivers and
> attempts to somehow come up with something which satisfies
> userspace-provided constraints.

Fully agreeing. The one thing we discussed, but haven't implemented yet,
is that we'd add sysfs links from devices to the dma-heaps they support.
Including allowing for priorities and different use-cases on the same
device. We just haven't gotten there yet.

But even with that it's up to userspace to do the constraint solving, not
the kernel.

> But this isn't the same thing as saying 'dma-heaps is wrong'! Again,
> there is no additional complexity in the kernel between a dma-heap
> which bridges over to TEE, and a TEE userspace interface which also
> bridges over to TEE. Both of them are completely fine according to
> what he's said.
>
> > > Honestly, when I look at dma-heap implementations, they seem
> > > to be trivial shells around existing (more complex) allocators, and the
> > > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
> > > implementation, you already have, so we're talking about a hundred
> > > lines of code to maintain, which shouldn't be significantly more than
> > > what you have for the new ioctl() to be honest.
> >
> > It will rather be redundant vendor specific code under DMA heaps
> > calling into firmware/Trusted OS to enforce memory restrictions as you
> > can look into Mediatek example [1]. With TEE subsystem managing that
> > it won't be the case as we will provide a common abstraction for the
> > communication with underlying firmware/Trusted OS.
> 
> Yes, it's common for everyone who uses TEE to implement SVP. It's not
> common for the people who do _not_ use TEE to implement SVP. Which
> means that userspace has to type out both, and what we're asking in
> this thread is: why?
> 
> Why should userspace have to support dma-heap allocation for platforms
> supporting SVP via a static DT-defined carveout as well as supporting
> TEE API allocation for platforms supporting SVP via a dynamic
> carveout? What benefit does it bring to have this surfaced as a
> completely separate uAPI?
> 
> > > And I'll insist on what
> > > Daniel said, it's a small price to pay to have a standard interface to
> > > expose to userspace. If dma-heaps are not used for this kind things, I
> > > honestly wonder what they will be used for...
> >
> > Let's try not to forcefully find a use-case for DMA heaps when there
> > is a better alternative available.
> 
> What makes it better? If you could explain very clearly the benefit
> userspace will gain from asking TEE to allocate $n bytes for
> TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate
> $n bytes for protected,secure-video, I think that would really help.
> Right now, I don't understand how it would be better in any way
> whatsoever for userspace. And I think your decision to implement it as
> a separate API is based on a misunderstanding of Christian's position.
> 
> > I am still failing to see why you
> > don't consider following as a standardised user-space interface:
> >
> > "When user-space has to work with restricted memory, ask TEE device to
> > allocate it"
> 
> As far as I can tell, having userspace work with the TEE interface
> brings zero benefit (again, please correct me if I'm wrong and explain
> how it's better). The direct cost - call it a disbenefit - it brings
> is that we have to spend a pile of time typing out support for TEE
> allocation in every media/GPU/display driver/application, and when we
> do any kind of negotiation, we have to have one protocol definition
> for TEE and one for non-TEE.
> 
> dma-heaps was created to solve the problem of having too many
> 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> painful and making it difficult for userspace to do what it needed to
> do. Userspace doesn't _yet_ make full use of it, but the solution is
> to make userspace make full use of it, not to go create entirely
> separate allocation paths for unclear reasons.
> 
> Besides, I'm writing this from a platform that implements SVP not via
> TEE. I've worked on platforms which implement SVP without any TEE,
> where the TEE implementation would be at best a no-op stub, and at
> worst flat-out impossible.
> 
> So that's 'why not TEE as the single uAPI for SVP'. So, again, let's
> please turn this around: _why_ TEE? Who benefits from exposing this as
> completely separate to the more generic uAPI that we specifically
> designed to handle things like this?

Completely concurring on everything said above. TEE exposed through a
dma-buf heap (or maybe special v4l allocation flag for secure video
playback) and then we prime import that on the display side. Maybe also
through drm render drivers for the EGL/VK protected content extensions.
Same for any other hw means to allocate content protected buffers, TEE is
not special here at all.

Anything else needs seriously good justifications why the entire dma-buf
heap design is busted.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-18 16:22                         ` Daniel Stone
  2025-02-19 13:22                           ` Simona Vetter
@ 2025-02-21 11:24                           ` Sumit Garg
  2025-02-21 14:12                             ` Daniel Stone
  1 sibling, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2025-02-21 11:24 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Boris Brezillon, Jens Wiklander, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi Sumit,
>
> On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote:
> > On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > > I would say one heap per-profile.
> >
> > And then it would have a per vendor multiplication factor as each
> > vendor enforces memory restriction in a platform specific manner which
> > won't scale.
>
> Yes, they do enforce it in a platform-specific manner, but so does
> TEE. There is no one golden set of semantics which is globally
> applicable between all hardware and all products in a useful manner.
>
> So, if we define protected,secure-video +
> protected,secure-video-record + protected,trusted-ui heap names, we
> have exactly the same number of axes. The only change is from uint32_t
> to string.
>
> > > > Christian gave an historical background here [1] as to why that hasn't
> > > > worked in the past with DMA heaps given the scalability issues.
> > > >
> > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/
> > >
> > > Hm, I fail to see where Christian dismiss the dma-heaps solution in
> > > this email. He even says:
> > >
> > > > If the memory is not physically attached to any device, but rather just
> > > memory attached to the CPU or a system wide memory controller then
> > > expose the memory as DMA-heap with specific requirements (e.g. certain
> > > sized pages, contiguous, restricted, encrypted, ...).
> >
> > I am not saying Christian dismissed DMA heaps but rather how
> > scalability is an issue. What we are proposing here is a generic
> > interface via TEE to the firmware/Trusted OS which can perform all the
> > platform specific memory restrictions. This solution will scale across
> > vendors.
>
> I read something completely different into Christian's mail.
>
> What Christian is saying is that injecting generic constraint solving
> into the kernel doesn't scale. It's not OK to build out generic
> infrastructure in the kernel which queries a bunch of leaf drivers and
> attempts to somehow come up with something which satisfies
> userspace-provided constraints.
>
> But this isn't the same thing as saying 'dma-heaps is wrong'! Again,
> there is no additional complexity in the kernel between a dma-heap
> which bridges over to TEE, and a TEE userspace interface which also
> bridges over to TEE. Both of them are completely fine according to
> what he's said.
>
> > > Honestly, when I look at dma-heap implementations, they seem
> > > to be trivial shells around existing (more complex) allocators, and the
> > > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
> > > implementation, you already have, so we're talking about a hundred
> > > lines of code to maintain, which shouldn't be significantly more than
> > > what you have for the new ioctl() to be honest.
> >
> > It will rather be redundant vendor specific code under DMA heaps
> > calling into firmware/Trusted OS to enforce memory restrictions as you
> > can look into Mediatek example [1]. With TEE subsystem managing that
> > it won't be the case as we will provide a common abstraction for the
> > communication with underlying firmware/Trusted OS.
>
> Yes, it's common for everyone who uses TEE to implement SVP. It's not
> common for the people who do _not_ use TEE to implement SVP. Which
> means that userspace has to type out both, and what we're asking in
> this thread is: why?
>
> Why should userspace have to support dma-heap allocation for platforms
> supporting SVP via a static DT-defined carveout as well as supporting
> TEE API allocation for platforms supporting SVP via a dynamic
> carveout? What benefit does it bring to have this surfaced as a
> completely separate uAPI?
>
> > > And I'll insist on what
> > > Daniel said, it's a small price to pay to have a standard interface to
> > > expose to userspace. If dma-heaps are not used for this kind things, I
> > > honestly wonder what they will be used for...
> >
> > Let's try not to forcefully find a use-case for DMA heaps when there
> > is a better alternative available.
>
> What makes it better? If you could explain very clearly the benefit
> userspace will gain from asking TEE to allocate $n bytes for
> TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate
> $n bytes for protected,secure-video, I think that would really help.
> Right now, I don't understand how it would be better in any way
> whatsoever for userspace. And I think your decision to implement it as
> a separate API is based on a misunderstanding of Christian's position.
>
> > I am still failing to see why you
> > don't consider following as a standardised user-space interface:
> >
> > "When user-space has to work with restricted memory, ask TEE device to
> > allocate it"
>
> As far as I can tell, having userspace work with the TEE interface
> brings zero benefit (again, please correct me if I'm wrong and explain
> how it's better). The direct cost - call it a disbenefit - it brings
> is that we have to spend a pile of time typing out support for TEE
> allocation in every media/GPU/display driver/application, and when we
> do any kind of negotiation, we have to have one protocol definition
> for TEE and one for non-TEE.
>
> dma-heaps was created to solve the problem of having too many
> 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> painful and making it difficult for userspace to do what it needed to
> do. Userspace doesn't _yet_ make full use of it, but the solution is
> to make userspace make full use of it, not to go create entirely
> separate allocation paths for unclear reasons.
>
> Besides, I'm writing this from a platform that implements SVP not via
> TEE. I've worked on platforms which implement SVP without any TEE,
> where the TEE implementation would be at best a no-op stub, and at
> worst flat-out impossible.

Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
bit more? As to how the protected/encrypted media content pipeline
works? Which architecture support does your use-case require? Is there
any higher privileged level firmware interaction required to perform
media content decryption into restricted memory? Do you plan to
upstream corresponding support in near future?

Let me try to elaborate on the Secure Video Path (SVP) flow requiring
a TEE implementation (in general terms a higher privileged firmware
managing the pipeline as the kernel/user-space has no access
permissions to the plain text media content):

- Firstly a content decryption key is securely provisioned into the
TEE implementation.
- Interaction with TEE to set up access permissions of different
peripherals in the media pipeline so that they can access restricted
memory.
- Interaction with TEE to allocate restricted memory buffers.
- Interaction with TEE to decrypt downloaded encrypted media content
from normal memory buffers to restricted memory buffers.
- Then the further media pipeline is able to process the plain media
content in restricted buffers and display it.

>
> So that's 'why not TEE as the single uAPI for SVP'.

Let's try to see if your SVP use-case really converges with TEE based
SVP such that we really need a single uAPI.

> So, again, let's
> please turn this around: _why_ TEE? Who benefits from exposing this as
> completely separate to the more generic uAPI that we specifically
> designed to handle things like this?

The bridging between DMA heaps and TEE would still require user-space
to perform an IOCTL into TEE to register the DMA-bufs as you can see
here [1]. Then it will rather be two handles for user-space to manage.
Similarly during restricted memory allocation/free we need another
glue layer under DMA heaps to TEE subsystem.

The reason is simply which has been iterated over many times in the
past threads that:

    "If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs
too"

[1] https://lkml.indiana.edu/hypermail/linux/kernel/2408.3/08296.html

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-21 11:24                           ` Sumit Garg
@ 2025-02-21 14:12                             ` Daniel Stone
  2025-03-04  7:17                               ` Jens Wiklander
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Stone @ 2025-02-21 14:12 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Boris Brezillon, Jens Wiklander, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

Hi Sumit,

On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote:
> On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
> > dma-heaps was created to solve the problem of having too many
> > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> > painful and making it difficult for userspace to do what it needed to
> > do. Userspace doesn't _yet_ make full use of it, but the solution is
> > to make userspace make full use of it, not to go create entirely
> > separate allocation paths for unclear reasons.
> >
> > Besides, I'm writing this from a platform that implements SVP not via
> > TEE. I've worked on platforms which implement SVP without any TEE,
> > where the TEE implementation would be at best a no-op stub, and at
> > worst flat-out impossible.
>
> Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
> bit more? As to how the protected/encrypted media content pipeline
> works? Which architecture support does your use-case require? Is there
> any higher privileged level firmware interaction required to perform
> media content decryption into restricted memory? Do you plan to
> upstream corresponding support in near future?

You can see the MTK SVP patches on list which use the MTK SMC to mediate it.

There are TI Jacinto platforms which implement a 'secure' area
configured statically by (IIRC) BL2, with static permissions defined
for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've
heard of another SoC vendor doing the same, but I don't think I can
share those details. There is no TEE interaction.

I'm writing this message from an AMD laptop which implements
restricted content paths outside of TEE. I don't have the full picture
of how SVP is implemented on AMD systems, but I do know that I don't
have any TEE devices exposed.

> Let me try to elaborate on the Secure Video Path (SVP) flow requiring
> a TEE implementation (in general terms a higher privileged firmware
> managing the pipeline as the kernel/user-space has no access
> permissions to the plain text media content):
>
> - [...]

Yeah, I totally understand the TEE usecase. I think that TEE is a good
design to implement this. I think that TEE should be used for SVP
where it makes sense.

Please understand that I am _not_ arguing that no-one should use TEE for SVP!

> > So, again, let's
> > please turn this around: _why_ TEE? Who benefits from exposing this as
> > completely separate to the more generic uAPI that we specifically
> > designed to handle things like this?
>
> The bridging between DMA heaps and TEE would still require user-space
> to perform an IOCTL into TEE to register the DMA-bufs as you can see
> here [1]. Then it will rather be two handles for user-space to manage.

Yes, the decoder would need to do this. That's common though: if you
want to share a buffer between V4L2 and DRM, you have three handles:
the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to
bridge the two.

> Similarly during restricted memory allocation/free we need another
> glue layer under DMA heaps to TEE subsystem.

Yep.

> The reason is simply which has been iterated over many times in the
> past threads that:
>
>     "If user-space has to interact with a TEE device for SVP use-case
> then why it's not better to ask TEE to allocate restricted DMA-bufs
> too"

The first word in your proposition is load-bearing.

Build out the usecase a little more here. You have a DRMed video
stream coming in, which you need to decode (involving TEE for this
usecase). You get a dmabuf handle to the decoded frame. You need to
pass the dmabuf across to the Wayland compositor. The compositor needs
to pass it to EGL/Vulkan to import and do composition, which in turn
passes it to the GPU DRM driver. The output of the composition is in
turn shared between the GPU DRM driver and the separate KMS DRM
driver, with the involvement of GBM.

For the platforms I'm interested in, the GPU DRM driver needs to
switch into protected mode, which has no involvement at all with TEE -
it's architecturally impossible to have TEE involved without moving
most of the GPU driver into TEE and destroying performance. The
display hardware also needs to engage protected mode, which again has
no involvement with TEE and again would need to have half the driver
moved into TEE for no benefit in order to do so. The Wayland
compositor also has no interest in TEE: it tells the GPU DRM driver
about the protected status of its buffers, and that's it.

What these components _are_ opinionated about, is the way buffers are
allocated and managed. We built out dmabuf modifiers for this usecase,
and we have a good negotiation protocol around that. We also really
care about buffer placement in some usecases - e.g. some display/codec
hardware requires buffers to be sourced from contiguous memory, other
hardware needs to know that when it shares buffers with another
device, it needs to place the buffers outside of inaccessible/slow
local RAM. So we built out dma-heaps, so every part of the component
in the stack can communicate their buffer-placement needs in the same
way as we do modifiers, and negotiate an acceptable allocation.

That's my starting point for this discussion. We have a mechanism to
deal with the fact that buffers need to be shared between different IP
blocks which have their own constraints on buffer placement, avoiding
the current problem of having every subsystem reinvent their own
allocation uAPI which was burying us in impedance mismatch and
confusion. That mechanism is dma-heaps. It seems like your starting
point from this discussion is that you've implemented a TEE-centric
design for SVP, and so all of userspace should bypass our existing
cross-subsystem special-purpose allocation mechanism, and write
specifically to one implementation. I believe that is a massive step
backwards and an immediate introduction of technical debt.

Again, having an implementation of SVP via TEE makes a huge amount of
sense. Having _most_ SVP implementations via TEE still makes a lot of
sense. Having _all_ SVP implementations eventually be via TEE would
still make sense. But even if we were at that point - which we aren't
- it still doesn't justify telling userspace 'use the generic dma-heap
uAPI for every device-specific allocation constraint, apart from SVP
which has a completely different way to allocate some bytes'.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-02-21 14:12                             ` Daniel Stone
@ 2025-03-04  7:17                               ` Jens Wiklander
  2025-03-04  7:45                                 ` Sumit Garg
  0 siblings, 1 reply; 40+ messages in thread
From: Jens Wiklander @ 2025-03-04  7:17 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Sumit Garg, Boris Brezillon, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

Hi Daniel,

On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi Sumit,
>
> On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote:
> > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
> > > dma-heaps was created to solve the problem of having too many
> > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> > > painful and making it difficult for userspace to do what it needed to
> > > do. Userspace doesn't _yet_ make full use of it, but the solution is
> > > to make userspace make full use of it, not to go create entirely
> > > separate allocation paths for unclear reasons.
> > >
> > > Besides, I'm writing this from a platform that implements SVP not via
> > > TEE. I've worked on platforms which implement SVP without any TEE,
> > > where the TEE implementation would be at best a no-op stub, and at
> > > worst flat-out impossible.
> >
> > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
> > bit more? As to how the protected/encrypted media content pipeline
> > works? Which architecture support does your use-case require? Is there
> > any higher privileged level firmware interaction required to perform
> > media content decryption into restricted memory? Do you plan to
> > upstream corresponding support in near future?
>
> You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
>
> There are TI Jacinto platforms which implement a 'secure' area
> configured statically by (IIRC) BL2, with static permissions defined
> for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've
> heard of another SoC vendor doing the same, but I don't think I can
> share those details. There is no TEE interaction.
>
> I'm writing this message from an AMD laptop which implements
> restricted content paths outside of TEE. I don't have the full picture
> of how SVP is implemented on AMD systems, but I do know that I don't
> have any TEE devices exposed.
>
> > Let me try to elaborate on the Secure Video Path (SVP) flow requiring
> > a TEE implementation (in general terms a higher privileged firmware
> > managing the pipeline as the kernel/user-space has no access
> > permissions to the plain text media content):
> >
> > - [...]
>
> Yeah, I totally understand the TEE usecase. I think that TEE is a good
> design to implement this. I think that TEE should be used for SVP
> where it makes sense.
>
> Please understand that I am _not_ arguing that no-one should use TEE for SVP!
>
> > > So, again, let's
> > > please turn this around: _why_ TEE? Who benefits from exposing this as
> > > completely separate to the more generic uAPI that we specifically
> > > designed to handle things like this?
> >
> > The bridging between DMA heaps and TEE would still require user-space
> > to perform an IOCTL into TEE to register the DMA-bufs as you can see
> > here [1]. Then it will rather be two handles for user-space to manage.
>
> Yes, the decoder would need to do this. That's common though: if you
> want to share a buffer between V4L2 and DRM, you have three handles:
> the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to
> bridge the two.
>
> > Similarly during restricted memory allocation/free we need another
> > glue layer under DMA heaps to TEE subsystem.
>
> Yep.
>
> > The reason is simply which has been iterated over many times in the
> > past threads that:
> >
> >     "If user-space has to interact with a TEE device for SVP use-case
> > then why it's not better to ask TEE to allocate restricted DMA-bufs
> > too"
>
> The first word in your proposition is load-bearing.
>
> Build out the usecase a little more here. You have a DRMed video
> stream coming in, which you need to decode (involving TEE for this
> usecase). You get a dmabuf handle to the decoded frame. You need to
> pass the dmabuf across to the Wayland compositor. The compositor needs
> to pass it to EGL/Vulkan to import and do composition, which in turn
> passes it to the GPU DRM driver. The output of the composition is in
> turn shared between the GPU DRM driver and the separate KMS DRM
> driver, with the involvement of GBM.
>
> For the platforms I'm interested in, the GPU DRM driver needs to
> switch into protected mode, which has no involvement at all with TEE -
> it's architecturally impossible to have TEE involved without moving
> most of the GPU driver into TEE and destroying performance. The
> display hardware also needs to engage protected mode, which again has
> no involvement with TEE and again would need to have half the driver
> moved into TEE for no benefit in order to do so. The Wayland
> compositor also has no interest in TEE: it tells the GPU DRM driver
> about the protected status of its buffers, and that's it.
>
> What these components _are_ opinionated about, is the way buffers are
> allocated and managed. We built out dmabuf modifiers for this usecase,
> and we have a good negotiation protocol around that. We also really
> care about buffer placement in some usecases - e.g. some display/codec
> hardware requires buffers to be sourced from contiguous memory, other
> hardware needs to know that when it shares buffers with another
> device, it needs to place the buffers outside of inaccessible/slow
> local RAM. So we built out dma-heaps, so every part of the component
> in the stack can communicate their buffer-placement needs in the same
> way as we do modifiers, and negotiate an acceptable allocation.
>
> That's my starting point for this discussion. We have a mechanism to
> deal with the fact that buffers need to be shared between different IP
> blocks which have their own constraints on buffer placement, avoiding
> the current problem of having every subsystem reinvent their own
> allocation uAPI which was burying us in impedance mismatch and
> confusion. That mechanism is dma-heaps. It seems like your starting
> point from this discussion is that you've implemented a TEE-centric
> design for SVP, and so all of userspace should bypass our existing
> cross-subsystem special-purpose allocation mechanism, and write
> specifically to one implementation. I believe that is a massive step
> backwards and an immediate introduction of technical debt.
>
> Again, having an implementation of SVP via TEE makes a huge amount of
> sense. Having _most_ SVP implementations via TEE still makes a lot of
> sense. Having _all_ SVP implementations eventually be via TEE would
> still make sense. But even if we were at that point - which we aren't
> - it still doesn't justify telling userspace 'use the generic dma-heap
> uAPI for every device-specific allocation constraint, apart from SVP
> which has a completely different way to allocate some bytes'.

I must admit that I don't see how this makes a significant difference,
but then I haven't hacked much in the stacks you're talking about, so
I'm going to take your word for it.

I've experimented with providing a dma-heap replacing the TEE API. The
implementation is more complex than I first anticipated, adding about
400 lines to the patch set. From user space, it looks like another
dma-heap. I'm using the names you gave earlier,
protected,secure-video, protected,trusted-ui, and
protected,secure-video-record. However, I wonder if we shouldn't use
"restricted" instead of "protected" since we had agreed to call it
restricted memory earlier.

I'll soon post this in a v6 and an updated demo.

Cheers,
Jens


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-03-04  7:17                               ` Jens Wiklander
@ 2025-03-04  7:45                                 ` Sumit Garg
  2025-03-18 18:38                                   ` Nicolas Dufresne
  0 siblings, 1 reply; 40+ messages in thread
From: Sumit Garg @ 2025-03-04  7:45 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Daniel Stone, Boris Brezillon, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
> Hi Daniel,
> 
> On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote:
> >
> > Hi Sumit,
> >
> > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote:
> > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
> > > > dma-heaps was created to solve the problem of having too many
> > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> > > > painful and making it difficult for userspace to do what it needed to
> > > > do. Userspace doesn't _yet_ make full use of it, but the solution is
> > > > to make userspace make full use of it, not to go create entirely
> > > > separate allocation paths for unclear reasons.
> > > >
> > > > Besides, I'm writing this from a platform that implements SVP not via
> > > > TEE. I've worked on platforms which implement SVP without any TEE,
> > > > where the TEE implementation would be at best a no-op stub, and at
> > > > worst flat-out impossible.
> > >
> > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
> > > bit more? As to how the protected/encrypted media content pipeline
> > > works? Which architecture support does your use-case require? Is there
> > > any higher privileged level firmware interaction required to perform
> > > media content decryption into restricted memory? Do you plan to
> > > upstream corresponding support in near future?
> >
> > You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
> >
> > There are TI Jacinto platforms which implement a 'secure' area
> > configured statically by (IIRC) BL2, with static permissions defined
> > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've
> > heard of another SoC vendor doing the same, but I don't think I can
> > share those details. There is no TEE interaction.
> >
> > I'm writing this message from an AMD laptop which implements
> > restricted content paths outside of TEE. I don't have the full picture
> > of how SVP is implemented on AMD systems, but I do know that I don't
> > have any TEE devices exposed.
> >
> > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring
> > > a TEE implementation (in general terms a higher privileged firmware
> > > managing the pipeline as the kernel/user-space has no access
> > > permissions to the plain text media content):
> > >
> > > - [...]
> >
> > Yeah, I totally understand the TEE usecase. I think that TEE is a good
> > design to implement this. I think that TEE should be used for SVP
> > where it makes sense.
> >
> > Please understand that I am _not_ arguing that no-one should use TEE for SVP!
> >
> > > > So, again, let's
> > > > please turn this around: _why_ TEE? Who benefits from exposing this as
> > > > completely separate to the more generic uAPI that we specifically
> > > > designed to handle things like this?
> > >
> > > The bridging between DMA heaps and TEE would still require user-space
> > > to perform an IOCTL into TEE to register the DMA-bufs as you can see
> > > here [1]. Then it will rather be two handles for user-space to manage.
> >
> > Yes, the decoder would need to do this. That's common though: if you
> > want to share a buffer between V4L2 and DRM, you have three handles:
> > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to
> > bridge the two.
> >
> > > Similarly during restricted memory allocation/free we need another
> > > glue layer under DMA heaps to TEE subsystem.
> >
> > Yep.
> >
> > > The reason is simply which has been iterated over many times in the
> > > past threads that:
> > >
> > >     "If user-space has to interact with a TEE device for SVP use-case
> > > then why it's not better to ask TEE to allocate restricted DMA-bufs
> > > too"
> >
> > The first word in your proposition is load-bearing.
> >
> > Build out the usecase a little more here. You have a DRMed video
> > stream coming in, which you need to decode (involving TEE for this
> > usecase). You get a dmabuf handle to the decoded frame. You need to
> > pass the dmabuf across to the Wayland compositor. The compositor needs
> > to pass it to EGL/Vulkan to import and do composition, which in turn
> > passes it to the GPU DRM driver. The output of the composition is in
> > turn shared between the GPU DRM driver and the separate KMS DRM
> > driver, with the involvement of GBM.
> >
> > For the platforms I'm interested in, the GPU DRM driver needs to
> > switch into protected mode, which has no involvement at all with TEE -
> > it's architecturally impossible to have TEE involved without moving
> > most of the GPU driver into TEE and destroying performance. The
> > display hardware also needs to engage protected mode, which again has
> > no involvement with TEE and again would need to have half the driver
> > moved into TEE for no benefit in order to do so. The Wayland
> > compositor also has no interest in TEE: it tells the GPU DRM driver
> > about the protected status of its buffers, and that's it.
> >
> > What these components _are_ opinionated about, is the way buffers are
> > allocated and managed. We built out dmabuf modifiers for this usecase,
> > and we have a good negotiation protocol around that. We also really
> > care about buffer placement in some usecases - e.g. some display/codec
> > hardware requires buffers to be sourced from contiguous memory, other
> > hardware needs to know that when it shares buffers with another
> > device, it needs to place the buffers outside of inaccessible/slow
> > local RAM. So we built out dma-heaps, so every part of the component
> > in the stack can communicate their buffer-placement needs in the same
> > way as we do modifiers, and negotiate an acceptable allocation.
> >
> > That's my starting point for this discussion. We have a mechanism to
> > deal with the fact that buffers need to be shared between different IP
> > blocks which have their own constraints on buffer placement, avoiding
> > the current problem of having every subsystem reinvent their own
> > allocation uAPI which was burying us in impedance mismatch and
> > confusion. That mechanism is dma-heaps. It seems like your starting
> > point from this discussion is that you've implemented a TEE-centric
> > design for SVP, and so all of userspace should bypass our existing
> > cross-subsystem special-purpose allocation mechanism, and write
> > specifically to one implementation. I believe that is a massive step
> > backwards and an immediate introduction of technical debt.
> >
> > Again, having an implementation of SVP via TEE makes a huge amount of
> > sense. Having _most_ SVP implementations via TEE still makes a lot of
> > sense. Having _all_ SVP implementations eventually be via TEE would
> > still make sense. But even if we were at that point - which we aren't
> > - it still doesn't justify telling userspace 'use the generic dma-heap
> > uAPI for every device-specific allocation constraint, apart from SVP
> > which has a completely different way to allocate some bytes'.
> 
> I must admit that I don't see how this makes a significant difference,
> but then I haven't hacked much in the stacks you're talking about, so
> I'm going to take your word for it.
> 
> I've experimented with providing a dma-heap replacing the TEE API. The
> implementation is more complex than I first anticipated, adding about
> 400 lines to the patch set.

I did anticipated this but let's give it a try and see if DMA heaps
really adds any value from user-space point of view. If it does then it
will be worth the maintenence overhead.

> From user space, it looks like another
> dma-heap. I'm using the names you gave earlier,
> protected,secure-video, protected,trusted-ui, and
> protected,secure-video-record. However, I wonder if we shouldn't use
> "restricted" instead of "protected" since we had agreed to call it
> restricted memory earlier.

Let's stick with "restricted" memory buffer references only.

-Sumit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-03-04  7:45                                 ` Sumit Garg
@ 2025-03-18 18:38                                   ` Nicolas Dufresne
  2025-03-19  7:37                                     ` Jens Wiklander
  0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Dufresne @ 2025-03-18 18:38 UTC (permalink / raw)
  To: Sumit Garg, Jens Wiklander
  Cc: Daniel Stone, Boris Brezillon, Linux Kernel Mailing List,
	Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit :
> On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
> > Hi Daniel,
> > 
> > On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > 
> > > Hi Sumit,
> > > 
> > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote:
> > > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
> > > > > dma-heaps was created to solve the problem of having too many
> > > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> > > > > painful and making it difficult for userspace to do what it needed to
> > > > > do. Userspace doesn't _yet_ make full use of it, but the solution is
> > > > > to make userspace make full use of it, not to go create entirely
> > > > > separate allocation paths for unclear reasons.
> > > > > 
> > > > > Besides, I'm writing this from a platform that implements SVP not via
> > > > > TEE. I've worked on platforms which implement SVP without any TEE,
> > > > > where the TEE implementation would be at best a no-op stub, and at
> > > > > worst flat-out impossible.
> > > > 
> > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
> > > > bit more? As to how the protected/encrypted media content pipeline
> > > > works? Which architecture support does your use-case require? Is there
> > > > any higher privileged level firmware interaction required to perform
> > > > media content decryption into restricted memory? Do you plan to
> > > > upstream corresponding support in near future?
> > > 
> > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
> > > 
> > > There are TI Jacinto platforms which implement a 'secure' area
> > > configured statically by (IIRC) BL2, with static permissions defined
> > > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've
> > > heard of another SoC vendor doing the same, but I don't think I can
> > > share those details. There is no TEE interaction.
> > > 
> > > I'm writing this message from an AMD laptop which implements
> > > restricted content paths outside of TEE. I don't have the full picture
> > > of how SVP is implemented on AMD systems, but I do know that I don't
> > > have any TEE devices exposed.
> > > 
> > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring
> > > > a TEE implementation (in general terms a higher privileged firmware
> > > > managing the pipeline as the kernel/user-space has no access
> > > > permissions to the plain text media content):
> > > > 
> > > > - [...]
> > > 
> > > Yeah, I totally understand the TEE usecase. I think that TEE is a good
> > > design to implement this. I think that TEE should be used for SVP
> > > where it makes sense.
> > > 
> > > Please understand that I am _not_ arguing that no-one should use TEE for SVP!
> > > 
> > > > > So, again, let's
> > > > > please turn this around: _why_ TEE? Who benefits from exposing this as
> > > > > completely separate to the more generic uAPI that we specifically
> > > > > designed to handle things like this?
> > > > 
> > > > The bridging between DMA heaps and TEE would still require user-space
> > > > to perform an IOCTL into TEE to register the DMA-bufs as you can see
> > > > here [1]. Then it will rather be two handles for user-space to manage.
> > > 
> > > Yes, the decoder would need to do this. That's common though: if you
> > > want to share a buffer between V4L2 and DRM, you have three handles:
> > > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to
> > > bridge the two.
> > > 
> > > > Similarly during restricted memory allocation/free we need another
> > > > glue layer under DMA heaps to TEE subsystem.
> > > 
> > > Yep.
> > > 
> > > > The reason is simply which has been iterated over many times in the
> > > > past threads that:
> > > > 
> > > >     "If user-space has to interact with a TEE device for SVP use-case
> > > > then why it's not better to ask TEE to allocate restricted DMA-bufs
> > > > too"
> > > 
> > > The first word in your proposition is load-bearing.
> > > 
> > > Build out the usecase a little more here. You have a DRMed video
> > > stream coming in, which you need to decode (involving TEE for this
> > > usecase). You get a dmabuf handle to the decoded frame. You need to
> > > pass the dmabuf across to the Wayland compositor. The compositor needs
> > > to pass it to EGL/Vulkan to import and do composition, which in turn
> > > passes it to the GPU DRM driver. The output of the composition is in
> > > turn shared between the GPU DRM driver and the separate KMS DRM
> > > driver, with the involvement of GBM.
> > > 
> > > For the platforms I'm interested in, the GPU DRM driver needs to
> > > switch into protected mode, which has no involvement at all with TEE -
> > > it's architecturally impossible to have TEE involved without moving
> > > most of the GPU driver into TEE and destroying performance. The
> > > display hardware also needs to engage protected mode, which again has
> > > no involvement with TEE and again would need to have half the driver
> > > moved into TEE for no benefit in order to do so. The Wayland
> > > compositor also has no interest in TEE: it tells the GPU DRM driver
> > > about the protected status of its buffers, and that's it.
> > > 
> > > What these components _are_ opinionated about, is the way buffers are
> > > allocated and managed. We built out dmabuf modifiers for this usecase,
> > > and we have a good negotiation protocol around that. We also really
> > > care about buffer placement in some usecases - e.g. some display/codec
> > > hardware requires buffers to be sourced from contiguous memory, other
> > > hardware needs to know that when it shares buffers with another
> > > device, it needs to place the buffers outside of inaccessible/slow
> > > local RAM. So we built out dma-heaps, so every part of the component
> > > in the stack can communicate their buffer-placement needs in the same
> > > way as we do modifiers, and negotiate an acceptable allocation.
> > > 
> > > That's my starting point for this discussion. We have a mechanism to
> > > deal with the fact that buffers need to be shared between different IP
> > > blocks which have their own constraints on buffer placement, avoiding
> > > the current problem of having every subsystem reinvent their own
> > > allocation uAPI which was burying us in impedance mismatch and
> > > confusion. That mechanism is dma-heaps. It seems like your starting
> > > point from this discussion is that you've implemented a TEE-centric
> > > design for SVP, and so all of userspace should bypass our existing
> > > cross-subsystem special-purpose allocation mechanism, and write
> > > specifically to one implementation. I believe that is a massive step
> > > backwards and an immediate introduction of technical debt.
> > > 
> > > Again, having an implementation of SVP via TEE makes a huge amount of
> > > sense. Having _most_ SVP implementations via TEE still makes a lot of
> > > sense. Having _all_ SVP implementations eventually be via TEE would
> > > still make sense. But even if we were at that point - which we aren't
> > > - it still doesn't justify telling userspace 'use the generic dma-heap
> > > uAPI for every device-specific allocation constraint, apart from SVP
> > > which has a completely different way to allocate some bytes'.
> > 
> > I must admit that I don't see how this makes a significant difference,
> > but then I haven't hacked much in the stacks you're talking about, so
> > I'm going to take your word for it.
> > 
> > I've experimented with providing a dma-heap replacing the TEE API. The
> > implementation is more complex than I first anticipated, adding about
> > 400 lines to the patch set.
> 
> I did anticipated this but let's give it a try and see if DMA heaps
> really adds any value from user-space point of view. If it does then it
> will be worth the maintenence overhead.
> 
> > From user space, it looks like another
> > dma-heap. I'm using the names you gave earlier,
> > protected,secure-video, protected,trusted-ui, and
> > protected,secure-video-record. However, I wonder if we shouldn't use
> > "restricted" instead of "protected" since we had agreed to call it
> > restricted memory earlier.
> 
> Let's stick with "restricted" memory buffer references only.

Until now, we didn't have a standard to balance our naming choice, we
simply wanted to move away from "secure" which didn't mean much, and
restricted met our needs. I think the discussion is worth having again,
now that there is a standard that decided toward "protected". Matchcing
the Khronos standard means reducing a lot of confusion.

https://docs.vulkan.org/guide/latest/protected.html

regards,
Nicolas


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations
  2025-03-18 18:38                                   ` Nicolas Dufresne
@ 2025-03-19  7:37                                     ` Jens Wiklander
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Wiklander @ 2025-03-19  7:37 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Sumit Garg, Daniel Stone, Boris Brezillon,
	Linux Kernel Mailing List, Linux Media Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, op-tee,
	linux-arm-kernel, Olivier Masse, Thierry Reding, Yong Wu,
	Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T . J . Mercier, Christian König, Matthias Brugger,
	AngeloGioacchino Del Regno, azarrabi, Florent Tomasin

Hi,

On Tue, Mar 18, 2025 at 7:38 PM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit :
> > On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
> > > Hi Daniel,
> > >
> > > On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > > >
> > > > Hi Sumit,
> > > >
> > > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote:
> > > > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
> > > > > > dma-heaps was created to solve the problem of having too many
> > > > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> > > > > > painful and making it difficult for userspace to do what it needed to
> > > > > > do. Userspace doesn't _yet_ make full use of it, but the solution is
> > > > > > to make userspace make full use of it, not to go create entirely
> > > > > > separate allocation paths for unclear reasons.
> > > > > >
> > > > > > Besides, I'm writing this from a platform that implements SVP not via
> > > > > > TEE. I've worked on platforms which implement SVP without any TEE,
> > > > > > where the TEE implementation would be at best a no-op stub, and at
> > > > > > worst flat-out impossible.
> > > > >
> > > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
> > > > > bit more? As to how the protected/encrypted media content pipeline
> > > > > works? Which architecture support does your use-case require? Is there
> > > > > any higher privileged level firmware interaction required to perform
> > > > > media content decryption into restricted memory? Do you plan to
> > > > > upstream corresponding support in near future?
> > > >
> > > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
> > > >
> > > > There are TI Jacinto platforms which implement a 'secure' area
> > > > configured statically by (IIRC) BL2, with static permissions defined
> > > > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've
> > > > heard of another SoC vendor doing the same, but I don't think I can
> > > > share those details. There is no TEE interaction.
> > > >
> > > > I'm writing this message from an AMD laptop which implements
> > > > restricted content paths outside of TEE. I don't have the full picture
> > > > of how SVP is implemented on AMD systems, but I do know that I don't
> > > > have any TEE devices exposed.
> > > >
> > > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring
> > > > > a TEE implementation (in general terms a higher privileged firmware
> > > > > managing the pipeline as the kernel/user-space has no access
> > > > > permissions to the plain text media content):
> > > > >
> > > > > - [...]
> > > >
> > > > Yeah, I totally understand the TEE usecase. I think that TEE is a good
> > > > design to implement this. I think that TEE should be used for SVP
> > > > where it makes sense.
> > > >
> > > > Please understand that I am _not_ arguing that no-one should use TEE for SVP!
> > > >
> > > > > > So, again, let's
> > > > > > please turn this around: _why_ TEE? Who benefits from exposing this as
> > > > > > completely separate to the more generic uAPI that we specifically
> > > > > > designed to handle things like this?
> > > > >
> > > > > The bridging between DMA heaps and TEE would still require user-space
> > > > > to perform an IOCTL into TEE to register the DMA-bufs as you can see
> > > > > here [1]. Then it will rather be two handles for user-space to manage.
> > > >
> > > > Yes, the decoder would need to do this. That's common though: if you
> > > > want to share a buffer between V4L2 and DRM, you have three handles:
> > > > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to
> > > > bridge the two.
> > > >
> > > > > Similarly during restricted memory allocation/free we need another
> > > > > glue layer under DMA heaps to TEE subsystem.
> > > >
> > > > Yep.
> > > >
> > > > > The reason is simply which has been iterated over many times in the
> > > > > past threads that:
> > > > >
> > > > >     "If user-space has to interact with a TEE device for SVP use-case
> > > > > then why it's not better to ask TEE to allocate restricted DMA-bufs
> > > > > too"
> > > >
> > > > The first word in your proposition is load-bearing.
> > > >
> > > > Build out the usecase a little more here. You have a DRMed video
> > > > stream coming in, which you need to decode (involving TEE for this
> > > > usecase). You get a dmabuf handle to the decoded frame. You need to
> > > > pass the dmabuf across to the Wayland compositor. The compositor needs
> > > > to pass it to EGL/Vulkan to import and do composition, which in turn
> > > > passes it to the GPU DRM driver. The output of the composition is in
> > > > turn shared between the GPU DRM driver and the separate KMS DRM
> > > > driver, with the involvement of GBM.
> > > >
> > > > For the platforms I'm interested in, the GPU DRM driver needs to
> > > > switch into protected mode, which has no involvement at all with TEE -
> > > > it's architecturally impossible to have TEE involved without moving
> > > > most of the GPU driver into TEE and destroying performance. The
> > > > display hardware also needs to engage protected mode, which again has
> > > > no involvement with TEE and again would need to have half the driver
> > > > moved into TEE for no benefit in order to do so. The Wayland
> > > > compositor also has no interest in TEE: it tells the GPU DRM driver
> > > > about the protected status of its buffers, and that's it.
> > > >
> > > > What these components _are_ opinionated about, is the way buffers are
> > > > allocated and managed. We built out dmabuf modifiers for this usecase,
> > > > and we have a good negotiation protocol around that. We also really
> > > > care about buffer placement in some usecases - e.g. some display/codec
> > > > hardware requires buffers to be sourced from contiguous memory, other
> > > > hardware needs to know that when it shares buffers with another
> > > > device, it needs to place the buffers outside of inaccessible/slow
> > > > local RAM. So we built out dma-heaps, so every part of the component
> > > > in the stack can communicate their buffer-placement needs in the same
> > > > way as we do modifiers, and negotiate an acceptable allocation.
> > > >
> > > > That's my starting point for this discussion. We have a mechanism to
> > > > deal with the fact that buffers need to be shared between different IP
> > > > blocks which have their own constraints on buffer placement, avoiding
> > > > the current problem of having every subsystem reinvent their own
> > > > allocation uAPI which was burying us in impedance mismatch and
> > > > confusion. That mechanism is dma-heaps. It seems like your starting
> > > > point from this discussion is that you've implemented a TEE-centric
> > > > design for SVP, and so all of userspace should bypass our existing
> > > > cross-subsystem special-purpose allocation mechanism, and write
> > > > specifically to one implementation. I believe that is a massive step
> > > > backwards and an immediate introduction of technical debt.
> > > >
> > > > Again, having an implementation of SVP via TEE makes a huge amount of
> > > > sense. Having _most_ SVP implementations via TEE still makes a lot of
> > > > sense. Having _all_ SVP implementations eventually be via TEE would
> > > > still make sense. But even if we were at that point - which we aren't
> > > > - it still doesn't justify telling userspace 'use the generic dma-heap
> > > > uAPI for every device-specific allocation constraint, apart from SVP
> > > > which has a completely different way to allocate some bytes'.
> > >
> > > I must admit that I don't see how this makes a significant difference,
> > > but then I haven't hacked much in the stacks you're talking about, so
> > > I'm going to take your word for it.
> > >
> > > I've experimented with providing a dma-heap replacing the TEE API. The
> > > implementation is more complex than I first anticipated, adding about
> > > 400 lines to the patch set.
> >
> > I did anticipated this but let's give it a try and see if DMA heaps
> > really adds any value from user-space point of view. If it does then it
> > will be worth the maintenence overhead.
> >
> > > From user space, it looks like another
> > > dma-heap. I'm using the names you gave earlier,
> > > protected,secure-video, protected,trusted-ui, and
> > > protected,secure-video-record. However, I wonder if we shouldn't use
> > > "restricted" instead of "protected" since we had agreed to call it
> > > restricted memory earlier.
> >
> > Let's stick with "restricted" memory buffer references only.
>
> Until now, we didn't have a standard to balance our naming choice, we
> simply wanted to move away from "secure" which didn't mean much, and
> restricted met our needs. I think the discussion is worth having again,
> now that there is a standard that decided toward "protected". Matchcing
> the Khronos standard means reducing a lot of confusion.
>
> https://docs.vulkan.org/guide/latest/protected.html

Yeah, that's fine with me. I don't mind changing the name again as
long as we progress. The latest version of the patchset is here [1].
I've published a demo and changed the patchset to provide a heap
interface instead of a special interface in the TEE subsystem for
memory allocations as requested. I'm interested in feedback on the
patches in general, but in particular, on how the heap interface is
provided.

[1] https://lore.kernel.org/lkml/20250305130634.1850178-1-jens.wiklander@linaro.org/

Cheers,
Jens


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-03-19  7:41 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17 10:07 [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Jens Wiklander
2024-12-17 10:07 ` [PATCH v4 1/6] tee: add restricted memory allocation Jens Wiklander
2025-01-08 16:54   ` Simona Vetter
2025-01-09  7:17     ` Jens Wiklander
2025-02-13  8:44   ` Boris Brezillon
2024-12-17 10:07 ` [PATCH v4 2/6] optee: account for direction while converting parameters Jens Wiklander
2024-12-17 10:07 ` [PATCH v4 3/6] optee: sync secure world ABI headers Jens Wiklander
2024-12-17 10:07 ` [PATCH v4 4/6] optee: support restricted memory allocation Jens Wiklander
2024-12-17 10:07 ` [PATCH v4 5/6] optee: FF-A: dynamic " Jens Wiklander
2024-12-20 14:38   ` kernel test robot
2024-12-21  9:40   ` kernel test robot
2024-12-17 10:07 ` [PATCH v4 6/6] optee: smc abi: " Jens Wiklander
2024-12-18 11:06 ` [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations Simona Vetter
2024-12-24  6:35   ` Sumit Garg
     [not found]     ` <Z2p-v-xjhzhPso6u@wunner.de>
     [not found]       ` <Z2p_ucfNOyKcbWhH@wunner.de>
2024-12-24 10:00         ` Dmitry Baryshkov
2024-12-26  5:59       ` Sumit Garg
     [not found]         ` <Z209ZegsmgN1xlNG@wunner.de>
2025-01-08 17:00           ` Simona Vetter
2025-01-08 16:57     ` Simona Vetter
2025-01-09  6:08       ` Sumit Garg
2025-02-12 19:56 ` Boris Brezillon
2025-02-13  6:41   ` Sumit Garg
2025-02-13  8:19     ` Jens Wiklander
2025-02-13  8:35     ` Boris Brezillon
2025-02-13  9:16       ` Sumit Garg
2025-02-13 12:40         ` Boris Brezillon
2025-02-13 14:05           ` Daniel Stone
2025-02-13 15:57             ` Jens Wiklander
2025-02-13 17:39               ` Daniel Stone
2025-02-14 10:07                 ` Jens Wiklander
2025-02-14 13:07                   ` Sumit Garg
2025-02-14 15:48                     ` Boris Brezillon
2025-02-17  6:12                       ` Sumit Garg
2025-02-18 16:22                         ` Daniel Stone
2025-02-19 13:22                           ` Simona Vetter
2025-02-21 11:24                           ` Sumit Garg
2025-02-21 14:12                             ` Daniel Stone
2025-03-04  7:17                               ` Jens Wiklander
2025-03-04  7:45                                 ` Sumit Garg
2025-03-18 18:38                                   ` Nicolas Dufresne
2025-03-19  7:37                                     ` Jens Wiklander

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).