* [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:20 ` Steven Price
2025-11-27 8:34 ` Thomas Zimmermann
2025-11-26 12:44 ` [PATCH v6 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
` (15 subsequent siblings)
16 siblings, 2 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
driver implements custom dma_buf_ops. Instead of duplicating a bunch
of helpers to work around it, let's provide a way for drivers to
expose their custom dma_buf_ops so the core prime helpers can rely on
that instead of hardcoding &drm_gem_prime_dmabuf_ops.
v5:
- New patch
v6:
- Pass custom dma_buf_ops directly instead of through a getter
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/drm_prime.c | 10 ++++++++--
include/drm/drm_drv.h | 8 ++++++++
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 21809a82187b..86fd95f0c105 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -904,6 +904,12 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
}
EXPORT_SYMBOL(drm_prime_get_contiguous_size);
+static const struct dma_buf_ops *
+drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
+{
+ return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
+}
+
/**
* drm_gem_prime_export - helper library implementation of the export callback
* @obj: GEM object to export
@@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
struct dma_buf_export_info exp_info = {
.exp_name = KBUILD_MODNAME, /* white lie for debug */
.owner = dev->driver->fops->owner,
- .ops = &drm_gem_prime_dmabuf_ops,
+ .ops = drm_gem_prime_get_dma_buf_ops(dev),
.size = obj->size,
.flags = flags,
.priv = obj,
@@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
{
struct drm_gem_object *obj = dma_buf->priv;
- return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
+ return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) && obj->dev == dev;
}
EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 42fc085f986d..1c6dae60d523 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -431,6 +431,14 @@ struct drm_driver {
* some examples.
*/
const struct file_operations *fops;
+
+ /**
+ * @dma_buf_ops:
+ *
+ * dma_buf_ops to use for buffers exported by this driver. When NULL,
+ * the drm_prime logic defaults to &drm_gem_prime_dmabuf_ops.
+ */
+ const struct dma_buf_ops *dma_buf_ops;
};
void *__devm_drm_dev_alloc(struct device *parent,
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-26 12:44 ` [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
@ 2025-11-26 16:20 ` Steven Price
2025-11-27 8:34 ` Thomas Zimmermann
1 sibling, 0 replies; 33+ messages in thread
From: Steven Price @ 2025-11-26 16:20 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On 26/11/2025 12:44, Boris Brezillon wrote:
> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> of helpers to work around it, let's provide a way for drivers to
> expose their custom dma_buf_ops so the core prime helpers can rely on
> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
>
> v5:
> - New patch
>
> v6:
> - Pass custom dma_buf_ops directly instead of through a getter
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> include/drm/drm_drv.h | 8 ++++++++
> 2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 21809a82187b..86fd95f0c105 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -904,6 +904,12 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
> }
> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
>
> +static const struct dma_buf_ops *
> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> +{
> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> +}
> +
> /**
> * drm_gem_prime_export - helper library implementation of the export callback
> * @obj: GEM object to export
> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
> struct dma_buf_export_info exp_info = {
> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> .owner = dev->driver->fops->owner,
> - .ops = &drm_gem_prime_dmabuf_ops,
> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
> .size = obj->size,
> .flags = flags,
> .priv = obj,
> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
> {
> struct drm_gem_object *obj = dma_buf->priv;
>
> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) && obj->dev == dev;
> }
> EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
>
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 42fc085f986d..1c6dae60d523 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -431,6 +431,14 @@ struct drm_driver {
> * some examples.
> */
> const struct file_operations *fops;
> +
> + /**
> + * @dma_buf_ops:
> + *
> + * dma_buf_ops to use for buffers exported by this driver. When NULL,
> + * the drm_prime logic defaults to &drm_gem_prime_dmabuf_ops.
> + */
> + const struct dma_buf_ops *dma_buf_ops;
> };
>
> void *__devm_drm_dev_alloc(struct device *parent,
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-26 12:44 ` [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
2025-11-26 16:20 ` Steven Price
@ 2025-11-27 8:34 ` Thomas Zimmermann
2025-11-27 8:42 ` Thomas Zimmermann
2025-11-27 10:40 ` Boris Brezillon
1 sibling, 2 replies; 33+ messages in thread
From: Thomas Zimmermann @ 2025-11-27 8:34 UTC (permalink / raw)
To: Boris Brezillon, Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, David Airlie,
Simona Vetter, Faith Ekstrand, Thierry Reding, Mikko Perttunen,
Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
Hi
Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> of helpers to work around it, let's provide a way for drivers to
> expose their custom dma_buf_ops so the core prime helpers can rely on
> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
This can't go in as-is. I've spent an awful amount of patches on
removing buffer callbacks from struct drm_driver. Let's please not go
back to that.
>
> v5:
> - New patch
>
> v6:
> - Pass custom dma_buf_ops directly instead of through a getter
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> include/drm/drm_drv.h | 8 ++++++++
> 2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 21809a82187b..86fd95f0c105 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -904,6 +904,12 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
> }
> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
>
> +static const struct dma_buf_ops *
> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> +{
> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> +}
> +
> /**
> * drm_gem_prime_export - helper library implementation of the export callback
> * @obj: GEM object to export
> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
> struct dma_buf_export_info exp_info = {
> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> .owner = dev->driver->fops->owner,
> - .ops = &drm_gem_prime_dmabuf_ops,
> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
Rather provide a new function drm_gem_prime_export_with_ops() that takes
an additional dma_ops instance. The current drm_gem_prime_export() would
call it with &drm_gem_prime_dmabuf_ops.
If this really does not work, you could add a pointer to dma_buf_ops to
drm_gem_object_funcs and fetch that from drm_gem_prime_export(). We
already vm_ops there.
Other drivers, such as amdgpu, would also benefit from such a change
> .size = obj->size,
> .flags = flags,
> .priv = obj,
> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
> {
> struct drm_gem_object *obj = dma_buf->priv;
>
> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) && obj->dev == dev;
This is a bit more complicated and the test has been a pain point
before. For this case, I think we should add a GEM callback for this
struct drm_gem_object_funcs {
bool (*exported_by)(struct drm_gem_object *obj, struct drm_device *dev)
}
next to the existing export callback.
And drm_gem_is_prime_exported_dma_buf would then do
{
if (obj->funcs->exported_by)
return obj->funcs-<exported_by(obj, dev)
return /* what we currently test */
}
IIRC amdgpu would again benefit from this.
These changes will isolate dma_buf handling near GEM code, keep
drm_driver clean, and even allow for a driver to have different
implementations of dma_buf_ops.
Best regards
Thomas
> }
> EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
>
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 42fc085f986d..1c6dae60d523 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -431,6 +431,14 @@ struct drm_driver {
> * some examples.
> */
> const struct file_operations *fops;
> +
> + /**
> + * @dma_buf_ops:
> + *
> + * dma_buf_ops to use for buffers exported by this driver. When NULL,
> + * the drm_prime logic defaults to &drm_gem_prime_dmabuf_ops.
> + */
> + const struct dma_buf_ops *dma_buf_ops;
> };
>
> void *__devm_drm_dev_alloc(struct device *parent,
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 8:34 ` Thomas Zimmermann
@ 2025-11-27 8:42 ` Thomas Zimmermann
2025-11-27 8:58 ` Thomas Zimmermann
2025-11-27 10:34 ` Boris Brezillon
2025-11-27 10:40 ` Boris Brezillon
1 sibling, 2 replies; 33+ messages in thread
From: Thomas Zimmermann @ 2025-11-27 8:42 UTC (permalink / raw)
To: Boris Brezillon, Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, David Airlie,
Simona Vetter, Faith Ekstrand, Thierry Reding, Mikko Perttunen,
Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
Hi
Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
> Hi
>
> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
>> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
>> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
>> driver implements custom dma_buf_ops. Instead of duplicating a bunch
>> of helpers to work around it, let's provide a way for drivers to
>> expose their custom dma_buf_ops so the core prime helpers can rely on
>> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
>
> This can't go in as-is. I've spent an awful amount of patches on
> removing buffer callbacks from struct drm_driver. Let's please not go
> back to that.
>
>>
>> v5:
>> - New patch
>>
>> v6:
>> - Pass custom dma_buf_ops directly instead of through a getter
>>
>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>> ---
>> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
>> include/drm/drm_drv.h | 8 ++++++++
>> 2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 21809a82187b..86fd95f0c105 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -904,6 +904,12 @@ unsigned long
>> drm_prime_get_contiguous_size(struct sg_table *sgt)
>> }
>> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
>> +static const struct dma_buf_ops *
>> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
>> +{
>> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
>> +}
>> +
>> /**
>> * drm_gem_prime_export - helper library implementation of the
>> export callback
>> * @obj: GEM object to export
>> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
>> drm_gem_object *obj,
>> struct dma_buf_export_info exp_info = {
>> .exp_name = KBUILD_MODNAME, /* white lie for debug */
>> .owner = dev->driver->fops->owner,
>> - .ops = &drm_gem_prime_dmabuf_ops,
>> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
>
> Rather provide a new function drm_gem_prime_export_with_ops() that
> takes an additional dma_ops instance. The current
> drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
>
> If this really does not work, you could add a pointer to dma_buf_ops
> to drm_gem_object_funcs and fetch that from drm_gem_prime_export(). We
> already vm_ops there.
>
> Other drivers, such as amdgpu, would also benefit from such a change
>
>> .size = obj->size,
>> .flags = flags,
>> .priv = obj,
>> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
>> drm_device *dev,
>> {
>> struct drm_gem_object *obj = dma_buf->priv;
>> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
>> (obj->dev == dev);
>> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
>> obj->dev == dev;
On a second thought, we probably cannot be sure that dma_buf->priv
really is a GEM object until we tested the ops field. :/ IIRC that's
why the ops test goes first and the test for obj->dev goes second. So
neither solution works.
Best regards
Thomas
>
> This is a bit more complicated and the test has been a pain point
> before. For this case, I think we should add a GEM callback for this
>
> struct drm_gem_object_funcs {
> bool (*exported_by)(struct drm_gem_object *obj, struct drm_device
> *dev)
> }
>
> next to the existing export callback.
>
> And drm_gem_is_prime_exported_dma_buf would then do
>
> {
> if (obj->funcs->exported_by)
> return obj->funcs-<exported_by(obj, dev)
>
> return /* what we currently test */
> }
>
> IIRC amdgpu would again benefit from this.
>
> These changes will isolate dma_buf handling near GEM code, keep
> drm_driver clean, and even allow for a driver to have different
> implementations of dma_buf_ops.
>
> Best regards
> Thomas
>
>> }
>> EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>> index 42fc085f986d..1c6dae60d523 100644
>> --- a/include/drm/drm_drv.h
>> +++ b/include/drm/drm_drv.h
>> @@ -431,6 +431,14 @@ struct drm_driver {
>> * some examples.
>> */
>> const struct file_operations *fops;
>> +
>> + /**
>> + * @dma_buf_ops:
>> + *
>> + * dma_buf_ops to use for buffers exported by this driver. When
>> NULL,
>> + * the drm_prime logic defaults to &drm_gem_prime_dmabuf_ops.
>> + */
>> + const struct dma_buf_ops *dma_buf_ops;
>> };
>> void *__devm_drm_dev_alloc(struct device *parent,
>
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 8:42 ` Thomas Zimmermann
@ 2025-11-27 8:58 ` Thomas Zimmermann
2025-11-27 10:44 ` Boris Brezillon
2025-11-27 10:34 ` Boris Brezillon
1 sibling, 1 reply; 33+ messages in thread
From: Thomas Zimmermann @ 2025-11-27 8:58 UTC (permalink / raw)
To: Boris Brezillon, Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, David Airlie,
Simona Vetter, Faith Ekstrand, Thierry Reding, Mikko Perttunen,
Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
Hi
Am 27.11.25 um 09:42 schrieb Thomas Zimmermann:
> Hi
>
> Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
>> Hi
>>
>> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
>>> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
>>> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
>>> driver implements custom dma_buf_ops. Instead of duplicating a bunch
>>> of helpers to work around it, let's provide a way for drivers to
>>> expose their custom dma_buf_ops so the core prime helpers can rely on
>>> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
>>
>> This can't go in as-is. I've spent an awful amount of patches on
>> removing buffer callbacks from struct drm_driver. Let's please not go
>> back to that.
>>
>>>
>>> v5:
>>> - New patch
>>>
>>> v6:
>>> - Pass custom dma_buf_ops directly instead of through a getter
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> ---
>>> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
>>> include/drm/drm_drv.h | 8 ++++++++
>>> 2 files changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>> index 21809a82187b..86fd95f0c105 100644
>>> --- a/drivers/gpu/drm/drm_prime.c
>>> +++ b/drivers/gpu/drm/drm_prime.c
>>> @@ -904,6 +904,12 @@ unsigned long
>>> drm_prime_get_contiguous_size(struct sg_table *sgt)
>>> }
>>> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
>>> +static const struct dma_buf_ops *
>>> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
>>> +{
>>> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
>>> +}
>>> +
>>> /**
>>> * drm_gem_prime_export - helper library implementation of the
>>> export callback
>>> * @obj: GEM object to export
>>> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
>>> drm_gem_object *obj,
>>> struct dma_buf_export_info exp_info = {
>>> .exp_name = KBUILD_MODNAME, /* white lie for debug */
>>> .owner = dev->driver->fops->owner,
>>> - .ops = &drm_gem_prime_dmabuf_ops,
>>> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
>>
>> Rather provide a new function drm_gem_prime_export_with_ops() that
>> takes an additional dma_ops instance. The current
>> drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
>>
>> If this really does not work, you could add a pointer to dma_buf_ops
>> to drm_gem_object_funcs and fetch that from drm_gem_prime_export().
>> We already vm_ops there.
>>
>> Other drivers, such as amdgpu, would also benefit from such a change
>>
>>> .size = obj->size,
>>> .flags = flags,
>>> .priv = obj,
>>> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
>>> drm_device *dev,
>>> {
>>> struct drm_gem_object *obj = dma_buf->priv;
>>> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
>>> (obj->dev == dev);
>>> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
>>> obj->dev == dev;
>
> On a second thought, we probably cannot be sure that dma_buf->priv
> really is a GEM object until we tested the ops field. :/ IIRC that's
> why the ops test goes first and the test for obj->dev goes second. So
> neither solution works.
I think, instead of looking at the ops field, the test could look at
dma_buf->owner == dev->driver->fops->owner. This will tell if the
dma_buf comes from the same driver and hence is a GEM object. In the
next step, do obj->dev == dev as before. This will also allow drivers
like amdgpu to use the helper for testing. See [1].
[1]
https://elixir.bootlin.com/linux/v6.17.9/source/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c#L512
Best regards
Thomas
>
> Best regards
> Thomas
>
>>
>> This is a bit more complicated and the test has been a pain point
>> before. For this case, I think we should add a GEM callback for this
>>
>> struct drm_gem_object_funcs {
>> bool (*exported_by)(struct drm_gem_object *obj, struct drm_device
>> *dev)
>> }
>>
>> next to the existing export callback.
>>
>> And drm_gem_is_prime_exported_dma_buf would then do
>>
>> {
>> if (obj->funcs->exported_by)
>> return obj->funcs-<exported_by(obj, dev)
>>
>> return /* what we currently test */
>> }
>>
>> IIRC amdgpu would again benefit from this.
>>
>> These changes will isolate dma_buf handling near GEM code, keep
>> drm_driver clean, and even allow for a driver to have different
>> implementations of dma_buf_ops.
>>
>> Best regards
>> Thomas
>>
>>> }
>>> EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>> index 42fc085f986d..1c6dae60d523 100644
>>> --- a/include/drm/drm_drv.h
>>> +++ b/include/drm/drm_drv.h
>>> @@ -431,6 +431,14 @@ struct drm_driver {
>>> * some examples.
>>> */
>>> const struct file_operations *fops;
>>> +
>>> + /**
>>> + * @dma_buf_ops:
>>> + *
>>> + * dma_buf_ops to use for buffers exported by this driver. When
>>> NULL,
>>> + * the drm_prime logic defaults to &drm_gem_prime_dmabuf_ops.
>>> + */
>>> + const struct dma_buf_ops *dma_buf_ops;
>>> };
>>> void *__devm_drm_dev_alloc(struct device *parent,
>>
>
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 8:58 ` Thomas Zimmermann
@ 2025-11-27 10:44 ` Boris Brezillon
2025-11-27 12:32 ` Boris Brezillon
0 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-27 10:44 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Thu, 27 Nov 2025 09:58:55 +0100
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 27.11.25 um 09:42 schrieb Thomas Zimmermann:
> > Hi
> >
> > Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
> >> Hi
> >>
> >> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> >>> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> >>> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> >>> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> >>> of helpers to work around it, let's provide a way for drivers to
> >>> expose their custom dma_buf_ops so the core prime helpers can rely on
> >>> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
> >>
> >> This can't go in as-is. I've spent an awful amount of patches on
> >> removing buffer callbacks from struct drm_driver. Let's please not go
> >> back to that.
> >>
> >>>
> >>> v5:
> >>> - New patch
> >>>
> >>> v6:
> >>> - Pass custom dma_buf_ops directly instead of through a getter
> >>>
> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> >>> ---
> >>> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> >>> include/drm/drm_drv.h | 8 ++++++++
> >>> 2 files changed, 16 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> >>> index 21809a82187b..86fd95f0c105 100644
> >>> --- a/drivers/gpu/drm/drm_prime.c
> >>> +++ b/drivers/gpu/drm/drm_prime.c
> >>> @@ -904,6 +904,12 @@ unsigned long
> >>> drm_prime_get_contiguous_size(struct sg_table *sgt)
> >>> }
> >>> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >>> +static const struct dma_buf_ops *
> >>> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> >>> +{
> >>> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> >>> +}
> >>> +
> >>> /**
> >>> * drm_gem_prime_export - helper library implementation of the
> >>> export callback
> >>> * @obj: GEM object to export
> >>> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
> >>> drm_gem_object *obj,
> >>> struct dma_buf_export_info exp_info = {
> >>> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> >>> .owner = dev->driver->fops->owner,
> >>> - .ops = &drm_gem_prime_dmabuf_ops,
> >>> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
> >>
> >> Rather provide a new function drm_gem_prime_export_with_ops() that
> >> takes an additional dma_ops instance. The current
> >> drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
> >>
> >> If this really does not work, you could add a pointer to dma_buf_ops
> >> to drm_gem_object_funcs and fetch that from drm_gem_prime_export().
> >> We already vm_ops there.
> >>
> >> Other drivers, such as amdgpu, would also benefit from such a change
> >>
> >>> .size = obj->size,
> >>> .flags = flags,
> >>> .priv = obj,
> >>> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
> >>> drm_device *dev,
> >>> {
> >>> struct drm_gem_object *obj = dma_buf->priv;
> >>> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
> >>> (obj->dev == dev);
> >>> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
> >>> obj->dev == dev;
> >
> > On a second thought, we probably cannot be sure that dma_buf->priv
> > really is a GEM object until we tested the ops field. :/ IIRC that's
> > why the ops test goes first and the test for obj->dev goes second. So
> > neither solution works.
>
> I think, instead of looking at the ops field, the test could look at
> dma_buf->owner == dev->driver->fops->owner. This will tell if the
> dma_buf comes from the same driver and hence is a GEM object. In the
> next step, do obj->dev == dev as before. This will also allow drivers
> like amdgpu to use the helper for testing. See [1].
Except this doesn't work when the driver is linked statically (not
enabled as a module), because THIS_MODULE is NULL in that case.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 10:44 ` Boris Brezillon
@ 2025-11-27 12:32 ` Boris Brezillon
2025-11-27 13:05 ` Boris Brezillon
0 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-27 12:32 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
Hi Thomas,
On Thu, 27 Nov 2025 11:44:40 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> On Thu, 27 Nov 2025 09:58:55 +0100
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> > Hi
> >
> > Am 27.11.25 um 09:42 schrieb Thomas Zimmermann:
> > > Hi
> > >
> > > Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
> > >> Hi
> > >>
> > >> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> > >>> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> > >>> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> > >>> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> > >>> of helpers to work around it, let's provide a way for drivers to
> > >>> expose their custom dma_buf_ops so the core prime helpers can rely on
> > >>> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
> > >>
> > >> This can't go in as-is. I've spent an awful amount of patches on
> > >> removing buffer callbacks from struct drm_driver. Let's please not go
> > >> back to that.
> > >>
> > >>>
> > >>> v5:
> > >>> - New patch
> > >>>
> > >>> v6:
> > >>> - Pass custom dma_buf_ops directly instead of through a getter
> > >>>
> > >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > >>> ---
> > >>> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> > >>> include/drm/drm_drv.h | 8 ++++++++
> > >>> 2 files changed, 16 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > >>> index 21809a82187b..86fd95f0c105 100644
> > >>> --- a/drivers/gpu/drm/drm_prime.c
> > >>> +++ b/drivers/gpu/drm/drm_prime.c
> > >>> @@ -904,6 +904,12 @@ unsigned long
> > >>> drm_prime_get_contiguous_size(struct sg_table *sgt)
> > >>> }
> > >>> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> > >>> +static const struct dma_buf_ops *
> > >>> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> > >>> +{
> > >>> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> > >>> +}
> > >>> +
> > >>> /**
> > >>> * drm_gem_prime_export - helper library implementation of the
> > >>> export callback
> > >>> * @obj: GEM object to export
> > >>> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
> > >>> drm_gem_object *obj,
> > >>> struct dma_buf_export_info exp_info = {
> > >>> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> > >>> .owner = dev->driver->fops->owner,
> > >>> - .ops = &drm_gem_prime_dmabuf_ops,
> > >>> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
> > >>
> > >> Rather provide a new function drm_gem_prime_export_with_ops() that
> > >> takes an additional dma_ops instance. The current
> > >> drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
> > >>
> > >> If this really does not work, you could add a pointer to dma_buf_ops
> > >> to drm_gem_object_funcs and fetch that from drm_gem_prime_export().
> > >> We already vm_ops there.
> > >>
> > >> Other drivers, such as amdgpu, would also benefit from such a change
> > >>
> > >>> .size = obj->size,
> > >>> .flags = flags,
> > >>> .priv = obj,
> > >>> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
> > >>> drm_device *dev,
> > >>> {
> > >>> struct drm_gem_object *obj = dma_buf->priv;
> > >>> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
> > >>> (obj->dev == dev);
> > >>> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
> > >>> obj->dev == dev;
> > >
> > > On a second thought, we probably cannot be sure that dma_buf->priv
> > > really is a GEM object until we tested the ops field. :/ IIRC that's
> > > why the ops test goes first and the test for obj->dev goes second. So
> > > neither solution works.
> >
> > I think, instead of looking at the ops field, the test could look at
> > dma_buf->owner == dev->driver->fops->owner. This will tell if the
> > dma_buf comes from the same driver and hence is a GEM object. In the
> > next step, do obj->dev == dev as before. This will also allow drivers
> > like amdgpu to use the helper for testing. See [1].
>
> Except this doesn't work when the driver is linked statically (not
> enabled as a module), because THIS_MODULE is NULL in that case.
Couple more alternatives, if someone is interested in pursing in that
path:
- Have a drm_device::dmabuf_ops and a drm_dev_set_dmabuf_ops() helper
to attach the driver dma_buf_ops to the device and allow a direct:
dmabuf->ops == dev->dmabuf_ops
test
- Have a dev field (struct device *) added to dma_buf, and have a
dmabuf->dev == drm_dev_dma_dev(dev)
test
On my side, I'll just drop all the drm_gem[_shmem] changes in this
series and duplicate the logic in panthor/panfrost for now, because it
seems there's no consensus on this code-sharing proposal, and I want the
cached CPU mapping stuff merged.
Just to be clear, I still think the proposed code sharing is
valuable to
- avoid simple mistakes in drivers (it's very easy to get something
wrong in the import/export sequence)
- ease propagation of fixes (all drivers using the common bits get the
fix automatically)
- ease refactoring of code (it's easier to patch one common helper than
a half a dozen drivers)
Let alone the fact it could remove a bunch of boilerplate code in
various drivers. This being said, I'm not willing to spend time on
something that's likely to be rejected because of postures on
philosophical design decisions (which I understand, but not necessarily
agree with ;-)).
Regards,
Boris
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 12:32 ` Boris Brezillon
@ 2025-11-27 13:05 ` Boris Brezillon
0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-27 13:05 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Thu, 27 Nov 2025 13:32:30 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> Hi Thomas,
>
> On Thu, 27 Nov 2025 11:44:40 +0100
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
> > On Thu, 27 Nov 2025 09:58:55 +0100
> > Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >
> > > Hi
> > >
> > > Am 27.11.25 um 09:42 schrieb Thomas Zimmermann:
> > > > Hi
> > > >
> > > > Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
> > > >> Hi
> > > >>
> > > >> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> > > >>> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> > > >>> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> > > >>> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> > > >>> of helpers to work around it, let's provide a way for drivers to
> > > >>> expose their custom dma_buf_ops so the core prime helpers can rely on
> > > >>> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
> > > >>
> > > >> This can't go in as-is. I've spent an awful amount of patches on
> > > >> removing buffer callbacks from struct drm_driver. Let's please not go
> > > >> back to that.
> > > >>
> > > >>>
> > > >>> v5:
> > > >>> - New patch
> > > >>>
> > > >>> v6:
> > > >>> - Pass custom dma_buf_ops directly instead of through a getter
> > > >>>
> > > >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > >>> ---
> > > >>> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> > > >>> include/drm/drm_drv.h | 8 ++++++++
> > > >>> 2 files changed, 16 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > >>> index 21809a82187b..86fd95f0c105 100644
> > > >>> --- a/drivers/gpu/drm/drm_prime.c
> > > >>> +++ b/drivers/gpu/drm/drm_prime.c
> > > >>> @@ -904,6 +904,12 @@ unsigned long
> > > >>> drm_prime_get_contiguous_size(struct sg_table *sgt)
> > > >>> }
> > > >>> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> > > >>> +static const struct dma_buf_ops *
> > > >>> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> > > >>> +{
> > > >>> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> > > >>> +}
> > > >>> +
> > > >>> /**
> > > >>> * drm_gem_prime_export - helper library implementation of the
> > > >>> export callback
> > > >>> * @obj: GEM object to export
> > > >>> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
> > > >>> drm_gem_object *obj,
> > > >>> struct dma_buf_export_info exp_info = {
> > > >>> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> > > >>> .owner = dev->driver->fops->owner,
> > > >>> - .ops = &drm_gem_prime_dmabuf_ops,
> > > >>> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
> > > >>
> > > >> Rather provide a new function drm_gem_prime_export_with_ops() that
> > > >> takes an additional dma_ops instance. The current
> > > >> drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
> > > >>
> > > >> If this really does not work, you could add a pointer to dma_buf_ops
> > > >> to drm_gem_object_funcs and fetch that from drm_gem_prime_export().
> > > >> We already vm_ops there.
> > > >>
> > > >> Other drivers, such as amdgpu, would also benefit from such a change
> > > >>
> > > >>> .size = obj->size,
> > > >>> .flags = flags,
> > > >>> .priv = obj,
> > > >>> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
> > > >>> drm_device *dev,
> > > >>> {
> > > >>> struct drm_gem_object *obj = dma_buf->priv;
> > > >>> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
> > > >>> (obj->dev == dev);
> > > >>> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
> > > >>> obj->dev == dev;
> > > >
> > > > On a second thought, we probably cannot be sure that dma_buf->priv
> > > > really is a GEM object until we tested the ops field. :/ IIRC that's
> > > > why the ops test goes first and the test for obj->dev goes second. So
> > > > neither solution works.
> > >
> > > I think, instead of looking at the ops field, the test could look at
> > > dma_buf->owner == dev->driver->fops->owner. This will tell if the
> > > dma_buf comes from the same driver and hence is a GEM object. In the
> > > next step, do obj->dev == dev as before. This will also allow drivers
> > > like amdgpu to use the helper for testing. See [1].
> >
> > Except this doesn't work when the driver is linked statically (not
> > enabled as a module), because THIS_MODULE is NULL in that case.
>
> Couple more alternatives, if someone is interested in pursing in that
> path:
>
> - Have a drm_device::dmabuf_ops and a drm_dev_set_dmabuf_ops() helper
> to attach the driver dma_buf_ops to the device and allow a direct:
>
> dmabuf->ops == dev->dmabuf_ops
>
> test
> - Have a dev field (struct device *) added to dma_buf, and have a
>
> dmabuf->dev == drm_dev_dma_dev(dev)
>
> test
>
> On my side, I'll just drop all the drm_gem[_shmem] changes in this
> series and duplicate the logic in panthor/panfrost for now, because it
> seems there's no consensus on this code-sharing proposal, and I want the
> cached CPU mapping stuff merged.
>
> Just to be clear, I still think the proposed code sharing is
> valuable to
>
> - avoid simple mistakes in drivers (it's very easy to get something
> wrong in the import/export sequence)
> - ease propagation of fixes (all drivers using the common bits get the
> fix automatically)
> - ease refactoring of code (it's easier to patch one common helper than
> a half a dozen drivers)
>
> Let alone the fact it could remove a bunch of boilerplate code in
> various drivers. This being said, I'm not willing to spend time on
> something that's likely to be rejected because of postures on
> philosophical design decisions (which I understand, but not necessarily
> agree with ;-)).
Quick update based on the discussion that happened on IRC between Thomas
and I. Thomas suggestion would be to add an optional
bool (*gem_prime_dev_is_exporter)(struct drm_device *dev,
struct dma_buf *dmabuf);
callback instead of the drm_driver::dma_buf_ops field added in this patch.
This means driver would simply have to provide their own ops, and
implement their own drm_gem_object_funcs::export().
Christian, would you be happy with that?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 8:42 ` Thomas Zimmermann
2025-11-27 8:58 ` Thomas Zimmermann
@ 2025-11-27 10:34 ` Boris Brezillon
1 sibling, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-27 10:34 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Thu, 27 Nov 2025 09:42:52 +0100
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 27.11.25 um 09:34 schrieb Thomas Zimmermann:
> > Hi
> >
> > Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> >> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> >> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> >> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> >> of helpers to work around it, let's provide a way for drivers to
> >> expose their custom dma_buf_ops so the core prime helpers can rely on
> >> that instead of hardcoding &drm_gem_prime_dmabuf_ops.
> >
> > This can't go in as-is. I've spent an awful amount of patches on
> > removing buffer callbacks from struct drm_driver. Let's please not go
> > back to that.
> >
> >>
> >> v5:
> >> - New patch
> >>
> >> v6:
> >> - Pass custom dma_buf_ops directly instead of through a getter
> >>
> >> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> >> ---
> >> drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> >> include/drm/drm_drv.h | 8 ++++++++
> >> 2 files changed, 16 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> >> index 21809a82187b..86fd95f0c105 100644
> >> --- a/drivers/gpu/drm/drm_prime.c
> >> +++ b/drivers/gpu/drm/drm_prime.c
> >> @@ -904,6 +904,12 @@ unsigned long
> >> drm_prime_get_contiguous_size(struct sg_table *sgt)
> >> }
> >> EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >> +static const struct dma_buf_ops *
> >> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> >> +{
> >> + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> >> +}
> >> +
> >> /**
> >> * drm_gem_prime_export - helper library implementation of the
> >> export callback
> >> * @obj: GEM object to export
> >> @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct
> >> drm_gem_object *obj,
> >> struct dma_buf_export_info exp_info = {
> >> .exp_name = KBUILD_MODNAME, /* white lie for debug */
> >> .owner = dev->driver->fops->owner,
> >> - .ops = &drm_gem_prime_dmabuf_ops,
> >> + .ops = drm_gem_prime_get_dma_buf_ops(dev),
> >
> > Rather provide a new function drm_gem_prime_export_with_ops() that
> > takes an additional dma_ops instance. The current
> > drm_gem_prime_export() would call it with &drm_gem_prime_dmabuf_ops.
> >
> > If this really does not work, you could add a pointer to dma_buf_ops
> > to drm_gem_object_funcs and fetch that from drm_gem_prime_export(). We
> > already vm_ops there.
> >
> > Other drivers, such as amdgpu, would also benefit from such a change
> >
> >> .size = obj->size,
> >> .flags = flags,
> >> .priv = obj,
> >> @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct
> >> drm_device *dev,
> >> {
> >> struct drm_gem_object *obj = dma_buf->priv;
> >> - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) &&
> >> (obj->dev == dev);
> >> + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) &&
> >> obj->dev == dev;
>
> On a second thought, we probably cannot be sure that dma_buf->priv
> really is a GEM object until we tested the ops field. :/ IIRC that's
> why the ops test goes first and the test for obj->dev goes second. So
> neither solution works.
Hm, What do you mean by neither solution works? The original proposal
never dereferences obj until it's sure it's a drm_gem_object, that's
what
dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev)
is for.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
2025-11-27 8:34 ` Thomas Zimmermann
2025-11-27 8:42 ` Thomas Zimmermann
@ 2025-11-27 10:40 ` Boris Brezillon
1 sibling, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-27 10:40 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Thu, 27 Nov 2025 09:34:43 +0100
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> > drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> > drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> > driver implements custom dma_buf_ops. Instead of duplicating a bunch
> > of helpers to work around it, let's provide a way for drivers to
> > expose their custom dma_buf_ops so the core prime helpers can rely on
> > that instead of hardcoding &drm_gem_prime_dmabuf_ops.
>
> This can't go in as-is. I've spent an awful amount of patches on
> removing buffer callbacks from struct drm_driver.
First, it's no longer a callback, but a pointer to dma_buf_ops
(joking of course, I get what you mean :P).
> Let's please not go
> back to that.
>
> >
> > v5:
> > - New patch
> >
> > v6:
> > - Pass custom dma_buf_ops directly instead of through a getter
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > drivers/gpu/drm/drm_prime.c | 10 ++++++++--
> > include/drm/drm_drv.h | 8 ++++++++
> > 2 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 21809a82187b..86fd95f0c105 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -904,6 +904,12 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
> > }
> > EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >
> > +static const struct dma_buf_ops *
> > +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> > +{
> > + return dev->driver->dma_buf_ops ?: &drm_gem_prime_dmabuf_ops;
> > +}
> > +
> > /**
> > * drm_gem_prime_export - helper library implementation of the export callback
> > * @obj: GEM object to export
> > @@ -920,7 +926,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
> > struct dma_buf_export_info exp_info = {
> > .exp_name = KBUILD_MODNAME, /* white lie for debug */
> > .owner = dev->driver->fops->owner,
> > - .ops = &drm_gem_prime_dmabuf_ops,
> > + .ops = drm_gem_prime_get_dma_buf_ops(dev),
>
> Rather provide a new function drm_gem_prime_export_with_ops() that takes
> an additional dma_ops instance. The current drm_gem_prime_export() would
> call it with &drm_gem_prime_dmabuf_ops.
>
> If this really does not work, you could add a pointer to dma_buf_ops to
> drm_gem_object_funcs and fetch that from drm_gem_prime_export(). We
> already vm_ops there.
It doesn't work because we need an ops to test against when a dma_buf
is imported, and that information has to be available at the
drm_{device,driver} level.
>
> Other drivers, such as amdgpu, would also benefit from such a change
>
> > .size = obj->size,
> > .flags = flags,
> > .priv = obj,
> > @@ -947,7 +953,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
> > {
> > struct drm_gem_object *obj = dma_buf->priv;
> >
> > - return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
> > + return dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev) && obj->dev == dev;
>
> This is a bit more complicated and the test has been a pain point
> before. For this case, I think we should add a GEM callback for this
>
> struct drm_gem_object_funcs {
> bool (*exported_by)(struct drm_gem_object *obj, struct drm_device *dev)
> }
>
> next to the existing export callback.
Nope, because dma_buf::ops needs to be tested against the driver
dma_buf_ops to determine if it's a drm_gem_object that's stored in
dma_buf::priv. If we add such a callback, it has to be at the
drm_driver level.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:23 ` [PATCH v6 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Steven Price
2025-11-26 12:44 ` [PATCH v6 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
` (14 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
The default implementation simply takes care of invalidating/flushing
caches around CPU accesses. It takes care of both the exporter and
the importers, which forces us to overload the default
::[un]map_dma_buf() implementation provided by drm_gem.c to store the
sgt.
v5:
- New patch
v6:
- Fix doc
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
include/drm/drm_gem_shmem_helper.h | 10 +++
2 files changed, 124 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index dc94a27710e5..c91608d9d4d7 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
+/**
+ * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
+ * @attach: attachment
+ * @sgt: SG table to unmap
+ * @dir: type of access done by this attachment
+ *
+ * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
+ * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
+ * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
+ * around CPU accesses.
+ */
+struct sg_table *
+drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
+ enum dma_data_direction dir)
+{
+ struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
+
+ if (!IS_ERR(sgt))
+ attach->priv = sgt;
+
+ return sgt;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
+
+/**
+ * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
+ * @attach: attachment
+ * @sgt: SG table to unmap
+ * @dir: type of access done by this attachment
+ *
+ * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
+ * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
+ * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
+ * as a mapped attachment to sync against.
+ */
+void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ attach->priv = NULL;
+ drm_gem_unmap_dma_buf(attach, sgt, dir);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
+
+/**
+ * drm_gem_shmem_prime_begin_cpu_access - Default begin_cpu_access() for exported buffers
+ * @dma_buf: The exported DMA buffer this acts on
+ * @dir: direction of the access
+ *
+ * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
+ * cache maintenance.
+ */
+int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction dir)
+{
+ struct drm_gem_object *obj = dma_buf->priv;
+ struct drm_device *dev = obj->dev;
+ struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+ struct dma_buf_attachment *attach;
+
+ dma_resv_lock(obj->resv, NULL);
+ if (shmem->sgt)
+ dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
+
+ if (shmem->vaddr)
+ invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
+
+ list_for_each_entry(attach, &dma_buf->attachments, node) {
+ struct sg_table *sgt = attach->priv;
+
+ if (sgt)
+ dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
+ }
+ dma_resv_unlock(obj->resv);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_begin_cpu_access);
+
+/**
+ * drm_gem_shmem_prime_end_cpu_access - Default end_cpu_access() for exported buffers
+ * @dma_buf: The exported DMA buffer this acts on
+ * @dir: direction of the access
+ *
+ * Default implementation for dma_buf_ops::end_cpu_access(). This only takes care of
+ * cache maintenance.
+ */
+int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction dir)
+{
+ struct drm_gem_object *obj = dma_buf->priv;
+ struct drm_device *dev = obj->dev;
+ struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+ struct dma_buf_attachment *attach;
+
+ dma_resv_lock(obj->resv, NULL);
+ list_for_each_entry(attach, &dma_buf->attachments, node) {
+ struct sg_table *sgt = attach->priv;
+
+ if (sgt)
+ dma_sync_sgtable_for_device(attach->dev, sgt, dir);
+ }
+
+ if (shmem->vaddr)
+ flush_kernel_vmap_range(shmem->vaddr, shmem->base.size);
+
+ if (shmem->sgt)
+ dma_sync_sgtable_for_device(dev->dev, shmem->sgt, dir);
+
+ dma_resv_unlock(obj->resv);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_end_cpu_access);
+
MODULE_DESCRIPTION("DRM SHMEM memory-management helpers");
MODULE_IMPORT_NS("DMA_BUF");
MODULE_LICENSE("GPL v2");
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 589f7bfe7506..075275d6b2fd 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -291,6 +291,16 @@ int drm_gem_shmem_dumb_create(struct drm_file *file, struct drm_device *dev,
struct drm_mode_create_dumb *args);
struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
struct dma_buf *buf);
+struct sg_table *
+drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
+ enum dma_data_direction dir);
+void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
+ struct sg_table *sgt,
+ enum dma_data_direction dir);
+int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction dir);
+int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction dir);
/**
* DRM_GEM_SHMEM_DRIVER_OPS - Default shmem GEM operations
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
2025-11-26 12:44 ` [PATCH v6 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
@ 2025-11-26 16:23 ` Steven Price
0 siblings, 0 replies; 33+ messages in thread
From: Steven Price @ 2025-11-26 16:23 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On 26/11/2025 12:44, Boris Brezillon wrote:
> The default implementation simply takes care of invalidating/flushing
> caches around CPU accesses. It takes care of both the exporter and
> the importers, which forces us to overload the default
> ::[un]map_dma_buf() implementation provided by drm_gem.c to store the
> sgt.
>
> v5:
> - New patch
>
> v6:
> - Fix doc
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
One issue in the kerneldoc below, but with that fixed:
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
> include/drm/drm_gem_shmem_helper.h | 10 +++
> 2 files changed, 124 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index dc94a27710e5..c91608d9d4d7 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
> }
> EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
>
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
NIT: Above you need to do s/unmap/map/ three times ;)
Thanks,
Steve
> + *
> + * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
> + * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
> + * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
> + * around CPU accesses.
> + */
> +struct sg_table *
> +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> + enum dma_data_direction dir)
> +{
> + struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
> +
> + if (!IS_ERR(sgt))
> + attach->priv = sgt;
> +
> + return sgt;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
> + *
> + * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
> + * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
> + * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
> + * as a mapped attachment to sync against.
> + */
> +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> + struct sg_table *sgt,
> + enum dma_data_direction dir)
> +{
> + attach->priv = NULL;
> + drm_gem_unmap_dma_buf(attach, sgt, dir);
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_begin_cpu_access - Default begin_cpu_access() for exported buffers
> + * @dma_buf: The exported DMA buffer this acts on
> + * @dir: direction of the access
> + *
> + * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
> + * cache maintenance.
> + */
> +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> + enum dma_data_direction dir)
> +{
> + struct drm_gem_object *obj = dma_buf->priv;
> + struct drm_device *dev = obj->dev;
> + struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> + struct dma_buf_attachment *attach;
> +
> + dma_resv_lock(obj->resv, NULL);
> + if (shmem->sgt)
> + dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
> +
> + if (shmem->vaddr)
> + invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> +
> + list_for_each_entry(attach, &dma_buf->attachments, node) {
> + struct sg_table *sgt = attach->priv;
> +
> + if (sgt)
> + dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
> + }
> + dma_resv_unlock(obj->resv);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_begin_cpu_access);
> +
> +/**
> + * drm_gem_shmem_prime_end_cpu_access - Default end_cpu_access() for exported buffers
> + * @dma_buf: The exported DMA buffer this acts on
> + * @dir: direction of the access
> + *
> + * Default implementation for dma_buf_ops::end_cpu_access(). This only takes care of
> + * cache maintenance.
> + */
> +int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
> + enum dma_data_direction dir)
> +{
> + struct drm_gem_object *obj = dma_buf->priv;
> + struct drm_device *dev = obj->dev;
> + struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> + struct dma_buf_attachment *attach;
> +
> + dma_resv_lock(obj->resv, NULL);
> + list_for_each_entry(attach, &dma_buf->attachments, node) {
> + struct sg_table *sgt = attach->priv;
> +
> + if (sgt)
> + dma_sync_sgtable_for_device(attach->dev, sgt, dir);
> + }
> +
> + if (shmem->vaddr)
> + flush_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> +
> + if (shmem->sgt)
> + dma_sync_sgtable_for_device(dev->dev, shmem->sgt, dir);
> +
> + dma_resv_unlock(obj->resv);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_end_cpu_access);
> +
> MODULE_DESCRIPTION("DRM SHMEM memory-management helpers");
> MODULE_IMPORT_NS("DMA_BUF");
> MODULE_LICENSE("GPL v2");
> diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
> index 589f7bfe7506..075275d6b2fd 100644
> --- a/include/drm/drm_gem_shmem_helper.h
> +++ b/include/drm/drm_gem_shmem_helper.h
> @@ -291,6 +291,16 @@ int drm_gem_shmem_dumb_create(struct drm_file *file, struct drm_device *dev,
> struct drm_mode_create_dumb *args);
> struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
> struct dma_buf *buf);
> +struct sg_table *
> +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> + enum dma_data_direction dir);
> +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> + struct sg_table *sgt,
> + enum dma_data_direction dir);
> +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> + enum dma_data_direction dir);
> +int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
> + enum dma_data_direction dir);
>
> /**
> * DRM_GEM_SHMEM_DRIVER_OPS - Default shmem GEM operations
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
` (13 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel, Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
This enables syncing mapped GEM objects between the CPU and GPU via calls
to dma_sync_*(). It's a bit annoying as it requires walking the sg_table
so it's best if every driver doesn't hand-roll it.
v2:
- s/drm_gem_shmem_sync_mmap/drm_gem_shmem_sync/
- Change the prototype to match drm_gem_object_funcs::sync()
- Add a wrapper for drm_gem_object_funcs::sync()
v3:
- No changes
v4:
- Add Steve's R-b
v5:
- Change the semantics of the drm_gem_shmem_sync() helper to better
reflect the UMD cache flush/flush+invalidate semantics (discussed
with Faith)
- Drop R-bs
v6:
- Collect R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 93 ++++++++++++++++++++++++++
include/drm/drm_gem_shmem_helper.h | 14 ++++
2 files changed, 107 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index c91608d9d4d7..1f3ce3f70c08 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -690,6 +690,99 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_mmap);
+/**
+ * drm_gem_shmem_sync - Sync CPU-mapped data
+ * @shmem: shmem GEM object
+ * @offset: Offset into the GEM object
+ * @size: Size of the area to sync
+ * @type: Type of synchronization
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+ size_t size, enum drm_gem_shmem_sync_type type)
+{
+ const struct drm_device *dev = shmem->base.dev;
+ struct sg_table *sgt;
+ struct scatterlist *sgl;
+ unsigned int count;
+
+ /* Make sure the range is in bounds. */
+ if (offset + size < offset || offset + size > shmem->base.size)
+ return -EINVAL;
+
+ /* Disallow CPU-cache maintenance on imported buffers. */
+ if (drm_gem_is_imported(&shmem->base))
+ return -EINVAL;
+
+ switch (type) {
+ case DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH:
+ case DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ /* Don't bother if it's WC-mapped */
+ if (shmem->map_wc)
+ return 0;
+
+ /* Nothing to do if the size is zero. */
+ if (size == 0)
+ return 0;
+
+ sgt = drm_gem_shmem_get_pages_sgt(shmem);
+ if (IS_ERR(sgt))
+ return PTR_ERR(sgt);
+
+ for_each_sgtable_dma_sg(sgt, sgl, count) {
+ if (size == 0)
+ break;
+
+ dma_addr_t paddr = sg_dma_address(sgl);
+ size_t len = sg_dma_len(sgl);
+
+ if (len <= offset) {
+ offset -= len;
+ continue;
+ }
+
+ paddr += offset;
+ len -= offset;
+ len = min_t(size_t, len, size);
+ size -= len;
+ offset = 0;
+
+ /* It's unclear whether dma_sync_xxx() is the right API to do CPU
+ * cache maintenance given an IOMMU can register their own
+ * implementation doing more than just CPU cache flushes/invalidation,
+ * and what we really care about here is CPU caches only, but that's
+ * the best we have that is both arch-agnostic and does at least the
+ * CPU cache maintenance on a <page,offset,size> tuple.
+ *
+ * Also, I wish we could do a single
+ *
+ * dma_sync_single_for_device(BIDIR)
+ *
+ * and get a flush+invalidate, but that's not how it's implemented
+ * in practice (at least on arm64), so we have to make it
+ *
+ * dma_sync_single_for_device(TO_DEVICE)
+ * dma_sync_single_for_cpu(FROM_DEVICE)
+ *
+ * for the flush+invalidate case.
+ */
+ dma_sync_single_for_device(dev->dev, paddr, len, DMA_TO_DEVICE);
+ if (type == DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE)
+ dma_sync_single_for_cpu(dev->dev, paddr, len, DMA_FROM_DEVICE);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_sync);
+
/**
* drm_gem_shmem_print_info() - Print &drm_gem_shmem_object info for debugfs
* @shmem: shmem GEM object
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 075275d6b2fd..b0b6d0104a9a 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -124,6 +124,20 @@ void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
struct iosys_map *map);
int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma);
+/**
+ * enum enum drm_gem_shmem_sync_type - Type of synchronization
+ */
+enum drm_gem_shmem_sync_type {
+ /** DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH: Flush CPU caches */
+ DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH = 0,
+
+ /** DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches */
+ DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE,
+};
+
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+ size_t size, enum drm_gem_shmem_sync_type type);
+
int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem);
void drm_gem_shmem_unpin_locked(struct drm_gem_shmem_object *shmem);
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 04/16] drm/panthor: Provide a custom dma_buf implementation
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (2 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
` (12 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
Before we introduce cached CPU mappings, we want a dma_buf
implementation satisfying synchronization requests around CPU
accesses coming from a dma_buf exported by our driver. Let's
provide our own implementation relying on the default
gem_shmem_prime helpers designed for that purpose.
v5:
- New patch
v6:
- Collect R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 1 +
drivers/gpu/drm/panthor/panthor_gem.c | 13 +++++++++++++
drivers/gpu/drm/panthor/panthor_gem.h | 2 ++
3 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index d1d4c50da5bf..cf2cf09335a4 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1621,6 +1621,7 @@ static const struct drm_driver panthor_drm_driver = {
.gem_create_object = panthor_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
+ .dma_buf_ops = &panthor_dma_buf_ops,
#ifdef CONFIG_DEBUG_FS
.debugfs_init = panthor_debugfs_init,
#endif
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 10d255cccc09..3100a895513e 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -184,6 +184,19 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
return ERR_PTR(ret);
}
+const struct dma_buf_ops panthor_dma_buf_ops = {
+ .attach = drm_gem_map_attach,
+ .detach = drm_gem_map_detach,
+ .map_dma_buf = drm_gem_shmem_prime_map_dma_buf,
+ .unmap_dma_buf = drm_gem_shmem_prime_unmap_dma_buf,
+ .release = drm_gem_dmabuf_release,
+ .mmap = drm_gem_dmabuf_mmap,
+ .vmap = drm_gem_dmabuf_vmap,
+ .vunmap = drm_gem_dmabuf_vunmap,
+ .begin_cpu_access = drm_gem_shmem_prime_begin_cpu_access,
+ .end_cpu_access = drm_gem_shmem_prime_end_cpu_access,
+};
+
static struct dma_buf *
panthor_gem_prime_export(struct drm_gem_object *obj, int flags)
{
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
index 80c6e24112d0..27e565650374 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -147,6 +147,8 @@ panthor_gem_create_with_handle(struct drm_file *file,
void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
+extern const struct dma_buf_ops panthor_dma_buf_ops;
+
static inline u64
panthor_kernel_bo_gpuva(struct panthor_kernel_bo *bo)
{
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 05/16] drm/panthor: Fix panthor_gpu_coherency_set()
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (3 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
` (11 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel, Akash Goel
GPU_COHERENCY_PROTOCOL takes one of GPU_COHERENCY_xx
not BIT(GPU_COHERENCY_xx).
v3:
- New commit
v4:
- Add Steve's R-b
v5:
- No changes
v6:
- No changes
Cc: Akash Goel <akash.goel@arm.com>
Fixes: dd7db8d911a1 ("drm/panthor: Explicitly set the coherency mode")
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 06b231b2460a..cd38da5ad26c 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -51,7 +51,7 @@ struct panthor_gpu {
static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
{
gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
- ptdev->coherent ? GPU_COHERENCY_PROT_BIT(ACE_LITE) : GPU_COHERENCY_NONE);
+ ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
}
static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 06/16] drm/panthor: Expose the selected coherency protocol to the UMD
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (4 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 13:25 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
` (10 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
If we want to be able to skip CPU cache maintenance operations on
CPU-cached mappings, the UMD needs to know the kind of coherency
in place. Add a field to drm_panthor_gpu_info to do that. We can re-use
a padding field for that since this object is write-only from the
KMD perspective, and the UMD should just ignore it.
v2:
- New commit
v3:
- Make coherency protocol a real enum, not a bitmask
- Add BUILD_BUG_ON()s to make sure the values in panthor_regs.h and
those exposed through the uAPI match
v4:
- Add Steve's R-b
v5:
- No changes
v6:
- No changes
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_device.c | 10 +++++-
drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
include/uapi/drm/panthor_drm.h | 39 ++++++++++++++++++++++--
3 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index e133b1e0ad6d..a66fc66999c2 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -27,6 +27,12 @@
static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
{
+ BUILD_BUG_ON(GPU_COHERENCY_NONE != DRM_PANTHOR_GPU_COHERENCY_NONE);
+ BUILD_BUG_ON(GPU_COHERENCY_ACE_LITE != DRM_PANTHOR_GPU_COHERENCY_ACE_LITE);
+ BUILD_BUG_ON(GPU_COHERENCY_ACE != DRM_PANTHOR_GPU_COHERENCY_ACE);
+
+ /* Start with no coherency, and update it if the device is flagged coherent. */
+ ptdev->gpu_info.selected_coherency = GPU_COHERENCY_NONE;
ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;
if (!ptdev->coherent)
@@ -36,8 +42,10 @@ static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
* ACE protocol has never been supported for command stream frontend GPUs.
*/
if ((gpu_read(ptdev, GPU_COHERENCY_FEATURES) &
- GPU_COHERENCY_PROT_BIT(ACE_LITE)))
+ GPU_COHERENCY_PROT_BIT(ACE_LITE))) {
+ ptdev->gpu_info.selected_coherency = GPU_COHERENCY_ACE_LITE;
return 0;
+ }
drm_err(&ptdev->base, "Coherency not supported by the device");
return -ENOTSUPP;
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index cd38da5ad26c..b7c64be0b6e2 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -51,7 +51,7 @@ struct panthor_gpu {
static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
{
gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
- ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
+ ptdev->gpu_info.selected_coherency);
}
static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 467d365ed7ba..f0f637e0631d 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -245,6 +245,26 @@ enum drm_panthor_dev_query_type {
DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
};
+/**
+ * enum drm_panthor_gpu_coherency: Type of GPU coherency
+ */
+enum drm_panthor_gpu_coherency {
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE Lite coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_ACE_LITE = 0,
+
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_ACE = 1,
+
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_NONE: No coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_NONE = 31,
+};
+
/**
* struct drm_panthor_gpu_info - GPU information
*
@@ -301,7 +321,16 @@ struct drm_panthor_gpu_info {
*/
__u32 thread_max_barrier_size;
- /** @coherency_features: Coherency features. */
+ /**
+ * @coherency_features: Coherency features.
+ *
+ * Combination of drm_panthor_gpu_coherency flags.
+ *
+ * Note that this is just what the coherency protocols supported by the
+ * GPU, but the actual coherency in place depends on the SoC
+ * integration and is reflected by
+ * drm_panthor_gpu_info::selected_coherency.
+ */
__u32 coherency_features;
/** @texture_features: Texture features. */
@@ -310,8 +339,12 @@ struct drm_panthor_gpu_info {
/** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */
__u32 as_present;
- /** @pad0: MBZ. */
- __u32 pad0;
+ /**
+ * @select_coherency: Coherency selected for this device.
+ *
+ * One of drm_panthor_gpu_coherency.
+ */
+ __u32 selected_coherency;
/** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */
__u64 shader_present;
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 06/16] drm/panthor: Expose the selected coherency protocol to the UMD
2025-11-26 12:44 ` [PATCH v6 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-11-26 13:25 ` Boris Brezillon
0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 13:25 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Wed, 26 Nov 2025 13:44:45 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> If we want to be able to skip CPU cache maintenance operations on
> CPU-cached mappings, the UMD needs to know the kind of coherency
> in place. Add a field to drm_panthor_gpu_info to do that. We can re-use
> a padding field for that since this object is write-only from the
> KMD perspective, and the UMD should just ignore it.
>
> v2:
> - New commit
>
> v3:
> - Make coherency protocol a real enum, not a bitmask
> - Add BUILD_BUG_ON()s to make sure the values in panthor_regs.h and
> those exposed through the uAPI match
>
> v4:
> - Add Steve's R-b
>
> v5:
> - No changes
>
> v6:
> - No changes
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.c | 10 +++++-
> drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
> include/uapi/drm/panthor_drm.h | 39 ++++++++++++++++++++++--
> 3 files changed, 46 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index e133b1e0ad6d..a66fc66999c2 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -27,6 +27,12 @@
>
> static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
> {
> + BUILD_BUG_ON(GPU_COHERENCY_NONE != DRM_PANTHOR_GPU_COHERENCY_NONE);
> + BUILD_BUG_ON(GPU_COHERENCY_ACE_LITE != DRM_PANTHOR_GPU_COHERENCY_ACE_LITE);
> + BUILD_BUG_ON(GPU_COHERENCY_ACE != DRM_PANTHOR_GPU_COHERENCY_ACE);
> +
> + /* Start with no coherency, and update it if the device is flagged coherent. */
> + ptdev->gpu_info.selected_coherency = GPU_COHERENCY_NONE;
> ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;
>
> if (!ptdev->coherent)
> @@ -36,8 +42,10 @@ static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
> * ACE protocol has never been supported for command stream frontend GPUs.
> */
> if ((gpu_read(ptdev, GPU_COHERENCY_FEATURES) &
> - GPU_COHERENCY_PROT_BIT(ACE_LITE)))
> + GPU_COHERENCY_PROT_BIT(ACE_LITE))) {
> + ptdev->gpu_info.selected_coherency = GPU_COHERENCY_ACE_LITE;
> return 0;
> + }
>
> drm_err(&ptdev->base, "Coherency not supported by the device");
> return -ENOTSUPP;
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index cd38da5ad26c..b7c64be0b6e2 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -51,7 +51,7 @@ struct panthor_gpu {
> static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
> {
> gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
> - ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
> + ptdev->gpu_info.selected_coherency);
> }
>
> static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 467d365ed7ba..f0f637e0631d 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -245,6 +245,26 @@ enum drm_panthor_dev_query_type {
> DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
> };
>
> +/**
> + * enum drm_panthor_gpu_coherency: Type of GPU coherency
> + */
> +enum drm_panthor_gpu_coherency {
> + /**
> + * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE Lite coherency.
> + */
> + DRM_PANTHOR_GPU_COHERENCY_ACE_LITE = 0,
> +
> + /**
> + * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE coherency.
Forgot to fix:
* @DRM_PANTHOR_GPU_COHERENCY_ACE: ACE coherency.
> + */
> + DRM_PANTHOR_GPU_COHERENCY_ACE = 1,
> +
> + /**
> + * @DRM_PANTHOR_GPU_COHERENCY_NONE: No coherency.
> + */
> + DRM_PANTHOR_GPU_COHERENCY_NONE = 31,
> +};
> +
> /**
> * struct drm_panthor_gpu_info - GPU information
> *
> @@ -301,7 +321,16 @@ struct drm_panthor_gpu_info {
> */
> __u32 thread_max_barrier_size;
>
> - /** @coherency_features: Coherency features. */
> + /**
> + * @coherency_features: Coherency features.
> + *
> + * Combination of drm_panthor_gpu_coherency flags.
> + *
> + * Note that this is just what the coherency protocols supported by the
> + * GPU, but the actual coherency in place depends on the SoC
> + * integration and is reflected by
> + * drm_panthor_gpu_info::selected_coherency.
> + */
> __u32 coherency_features;
>
> /** @texture_features: Texture features. */
> @@ -310,8 +339,12 @@ struct drm_panthor_gpu_info {
> /** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */
> __u32 as_present;
>
> - /** @pad0: MBZ. */
> - __u32 pad0;
> + /**
> + * @select_coherency: Coherency selected for this device.
> + *
> + * One of drm_panthor_gpu_coherency.
> + */
> + __u32 selected_coherency;
>
> /** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */
> __u64 shader_present;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (5 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:25 ` Steven Price
2025-11-26 12:44 ` [PATCH v6 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
` (9 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).
v2:
- Change the flags so they better match the drm_gem_shmem_sync()
semantics
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- Drop Steve's R-b (the semantics changes call for a new review)
v6:
- Drop ret initialization in panthor_ioctl_bo_sync()
- Bail out early in panthor_ioctl_bo_sync() if ops.count is zero
- Drop unused PANTHOR_BO_SYNC_OP_FLAGS definition
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 41 ++++++++++++++++++++-
drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++++
drivers/gpu/drm/panthor/panthor_gem.h | 2 ++
include/uapi/drm/panthor_drm.h | 52 +++++++++++++++++++++++++++
4 files changed, 115 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index cf2cf09335a4..7aba9e5d2579 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -177,7 +177,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
- PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
+ PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
/**
* PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
@@ -1396,6 +1397,43 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
return 0;
}
+static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panthor_bo_sync *args = data;
+ struct drm_panthor_bo_sync_op *ops;
+ struct drm_gem_object *obj;
+ int ret;
+
+ if (!args->ops.count)
+ return 0;
+
+ ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
+ if (ret)
+ return ret;
+
+ for (u32 i = 0; i < args->ops.count; i++) {
+ obj = drm_gem_object_lookup(file, ops[i].handle);
+ if (!obj) {
+ ret = -ENOENT;
+ goto err_ops;
+ }
+
+ ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
+ ops[i].size);
+
+ drm_gem_object_put(obj);
+
+ if (ret)
+ goto err_ops;
+ }
+
+err_ops:
+ kvfree(ops);
+
+ return ret;
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1470,6 +1508,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
+ PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
};
static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 3100a895513e..1a3c1afaf88d 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -344,6 +344,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
panthor_gem_bo_set_label(bo->obj, str);
}
+int
+panthor_gem_sync(struct drm_gem_object *obj, u32 type,
+ u64 offset, u64 size)
+{
+ enum drm_gem_shmem_sync_type shmem_sync_type;
+ struct panthor_gem_object *bo = to_panthor_bo(obj);
+
+ switch (type) {
+ case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH:
+ shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
+ break;
+ case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+ shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
+}
+
#ifdef CONFIG_DEBUG_FS
struct gem_size_totals {
size_t size;
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
index 27e565650374..1d9733373d74 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -146,6 +146,8 @@ panthor_gem_create_with_handle(struct drm_file *file,
void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
+int panthor_gem_sync(struct drm_gem_object *obj,
+ u32 type, u64 offset, u64 size);
extern const struct dma_buf_ops panthor_dma_buf_ops;
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index f0f637e0631d..bb12760abe99 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
* pgoff_t size.
*/
DRM_PANTHOR_SET_USER_MMIO_OFFSET,
+
+ /** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
+ DRM_PANTHOR_BO_SYNC,
};
/**
@@ -1073,6 +1076,53 @@ struct drm_panthor_set_user_mmio_offset {
__u64 offset;
};
+/**
+ * enum drm_panthor_bo_sync_op_type - BO sync type
+ */
+enum drm_panthor_bo_sync_op_type {
+ /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH: Flush CPU caches. */
+ DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH = 0,
+
+ /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches. */
+ DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE = 1,
+};
+
+/**
+ * struct drm_panthor_bo_sync_op - BO map sync op
+ */
+struct drm_panthor_bo_sync_op {
+ /** @handle: Handle of the buffer object to sync. */
+ __u32 handle;
+
+ /** @type: Type of operation. */
+ __u32 type;
+
+ /**
+ * @offset: Offset into the BO at which the sync range starts.
+ *
+ * This will be rounded down to the nearest cache line as needed.
+ */
+ __u64 offset;
+
+ /**
+ * @size: Size of the range to sync
+ *
+ * @size + @offset will be rounded up to the nearest cache line as
+ * needed.
+ */
+ __u64 size;
+};
+
+/**
+ * struct drm_panthor_bo_sync - BO map sync request
+ */
+struct drm_panthor_bo_sync {
+ /**
+ * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
+ */
+ struct drm_panthor_obj_array ops;
+};
+
/**
* DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
* @__access: Access type. Must be R, W or RW.
@@ -1119,6 +1169,8 @@ enum {
DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
+ DRM_IOCTL_PANTHOR_BO_SYNC =
+ DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
};
#if defined(__cplusplus)
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
2025-11-26 12:44 ` [PATCH v6 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
@ 2025-11-26 16:25 ` Steven Price
0 siblings, 0 replies; 33+ messages in thread
From: Steven Price @ 2025-11-26 16:25 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On 26/11/2025 12:44, Boris Brezillon wrote:
> This will be used by the UMD to synchronize CPU-cached mappings when
> the UMD can't do it directly (no usermode cache maintenance instruction
> on Arm32).
>
> v2:
> - Change the flags so they better match the drm_gem_shmem_sync()
> semantics
>
> v3:
> - Add Steve's R-b
>
> v4:
> - No changes
>
> v5:
> - Drop Steve's R-b (the semantics changes call for a new review)
>
> v6:
> - Drop ret initialization in panthor_ioctl_bo_sync()
> - Bail out early in panthor_ioctl_bo_sync() if ops.count is zero
> - Drop unused PANTHOR_BO_SYNC_OP_FLAGS definition
>
> Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 41 ++++++++++++++++++++-
> drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++++
> drivers/gpu/drm/panthor/panthor_gem.h | 2 ++
> include/uapi/drm/panthor_drm.h | 52 +++++++++++++++++++++++++++
> 4 files changed, 115 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index cf2cf09335a4..7aba9e5d2579 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -177,7 +177,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> - PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
> + PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
>
> /**
> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -1396,6 +1397,43 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
> return 0;
> }
>
> +static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
> + struct drm_file *file)
> +{
> + struct drm_panthor_bo_sync *args = data;
> + struct drm_panthor_bo_sync_op *ops;
> + struct drm_gem_object *obj;
> + int ret;
> +
> + if (!args->ops.count)
> + return 0;
> +
> + ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
> + if (ret)
> + return ret;
> +
> + for (u32 i = 0; i < args->ops.count; i++) {
> + obj = drm_gem_object_lookup(file, ops[i].handle);
> + if (!obj) {
> + ret = -ENOENT;
> + goto err_ops;
> + }
> +
> + ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
> + ops[i].size);
> +
> + drm_gem_object_put(obj);
> +
> + if (ret)
> + goto err_ops;
> + }
> +
> +err_ops:
> + kvfree(ops);
> +
> + return ret;
> +}
> +
> static int
> panthor_open(struct drm_device *ddev, struct drm_file *file)
> {
> @@ -1470,6 +1508,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
> + PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
> };
>
> static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> index 3100a895513e..1a3c1afaf88d 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.c
> +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> @@ -344,6 +344,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
> panthor_gem_bo_set_label(bo->obj, str);
> }
>
> +int
> +panthor_gem_sync(struct drm_gem_object *obj, u32 type,
> + u64 offset, u64 size)
> +{
> + enum drm_gem_shmem_sync_type shmem_sync_type;
> + struct panthor_gem_object *bo = to_panthor_bo(obj);
> +
> + switch (type) {
> + case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH:
> + shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
> + break;
> + case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
> + shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
> +}
> +
> #ifdef CONFIG_DEBUG_FS
> struct gem_size_totals {
> size_t size;
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
> index 27e565650374..1d9733373d74 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.h
> +++ b/drivers/gpu/drm/panthor/panthor_gem.h
> @@ -146,6 +146,8 @@ panthor_gem_create_with_handle(struct drm_file *file,
>
> void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
> void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
> +int panthor_gem_sync(struct drm_gem_object *obj,
> + u32 type, u64 offset, u64 size);
>
> extern const struct dma_buf_ops panthor_dma_buf_ops;
>
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index f0f637e0631d..bb12760abe99 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
> * pgoff_t size.
> */
> DRM_PANTHOR_SET_USER_MMIO_OFFSET,
> +
> + /** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
> + DRM_PANTHOR_BO_SYNC,
> };
>
> /**
> @@ -1073,6 +1076,53 @@ struct drm_panthor_set_user_mmio_offset {
> __u64 offset;
> };
>
> +/**
> + * enum drm_panthor_bo_sync_op_type - BO sync type
> + */
> +enum drm_panthor_bo_sync_op_type {
> + /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH: Flush CPU caches. */
> + DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH = 0,
> +
> + /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches. */
> + DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE = 1,
> +};
> +
> +/**
> + * struct drm_panthor_bo_sync_op - BO map sync op
> + */
> +struct drm_panthor_bo_sync_op {
> + /** @handle: Handle of the buffer object to sync. */
> + __u32 handle;
> +
> + /** @type: Type of operation. */
> + __u32 type;
> +
> + /**
> + * @offset: Offset into the BO at which the sync range starts.
> + *
> + * This will be rounded down to the nearest cache line as needed.
> + */
> + __u64 offset;
> +
> + /**
> + * @size: Size of the range to sync
> + *
> + * @size + @offset will be rounded up to the nearest cache line as
> + * needed.
> + */
> + __u64 size;
> +};
> +
> +/**
> + * struct drm_panthor_bo_sync - BO map sync request
> + */
> +struct drm_panthor_bo_sync {
> + /**
> + * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
> + */
> + struct drm_panthor_obj_array ops;
> +};
> +
> /**
> * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
> * @__access: Access type. Must be R, W or RW.
> @@ -1119,6 +1169,8 @@ enum {
> DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
> DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
> DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
> + DRM_IOCTL_PANTHOR_BO_SYNC =
> + DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
> };
>
> #if defined(__cplusplus)
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 08/16] drm/panthor: Add an ioctl to query BO flags
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (6 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
` (8 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.
We can also know when the buffer comes from a different subsystem and
take proper actions (avoid CPU mappings, or do kernel-based syncs
instead of userland cache flushes).
v2:
- New commit
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- No changes
v6:
- No changes
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 24 +++++++++++
include/uapi/drm/panthor_drm.h | 57 +++++++++++++++++++++++++++
2 files changed, 81 insertions(+)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 7aba9e5d2579..5df874538e88 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1434,6 +1434,29 @@ static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
return ret;
}
+static int panthor_ioctl_bo_query_info(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panthor_bo_query_info *args = data;
+ struct panthor_gem_object *bo;
+ struct drm_gem_object *obj;
+
+ obj = drm_gem_object_lookup(file, args->handle);
+ if (!obj)
+ return -ENOENT;
+
+ bo = to_panthor_bo(obj);
+ args->pad = 0;
+ args->create_flags = bo->flags;
+
+ args->extra_flags = 0;
+ if (drm_gem_is_imported(&bo->base.base))
+ args->extra_flags |= DRM_PANTHOR_BO_IS_IMPORTED;
+
+ drm_gem_object_put(obj);
+ return 0;
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1509,6 +1532,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
+ PANTHOR_IOCTL(BO_QUERY_INFO, bo_query_info, DRM_RENDER_ALLOW),
};
static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index bb12760abe99..7eec9f922183 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -147,6 +147,13 @@ enum drm_panthor_ioctl_id {
/** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
DRM_PANTHOR_BO_SYNC,
+
+ /**
+ * @DRM_PANTHOR_BO_QUERY_INFO: Query information about a BO.
+ *
+ * This is useful for imported BOs.
+ */
+ DRM_PANTHOR_BO_QUERY_INFO,
};
/**
@@ -1123,6 +1130,54 @@ struct drm_panthor_bo_sync {
struct drm_panthor_obj_array ops;
};
+/**
+ * enum drm_panthor_bo_extra_flags - Set of flags returned on a BO_QUERY_INFO request
+ *
+ * Those are flags reflecting BO properties that are not directly coming from the flags
+ * passed are creation time, or information on BOs that were imported from other drivers.
+ */
+enum drm_panthor_bo_extra_flags {
+ /**
+ * @DRM_PANTHOR_BO_IS_IMPORTED: BO has been imported from an external driver.
+ *
+ * Note that imported dma-buf handles are not flagged as imported if they
+ * where exported by panthor. Only buffers that are coming from other drivers
+ * (dma heaps, other GPUs, display controllers, V4L, ...).
+ *
+ * It's also important to note that all imported BOs are mapped cached and can't
+ * be considered IO-coherent even if the GPU is. This means they require explicit
+ * syncs that must go through the DRM_PANTHOR_BO_SYNC ioctl (userland cache
+ * maintenance is not allowed in that case, because extra operations might be
+ * needed to make changes visible to the CPU/device, like buffer migration when the
+ * exporter is a GPU with its own VRAM).
+ */
+ DRM_PANTHOR_BO_IS_IMPORTED = (1 << 0),
+};
+
+/**
+ * struct drm_panthor_bo_query_info - Query BO info
+ */
+struct drm_panthor_bo_query_info {
+ /** @handle: Handle of the buffer object to query flags on. */
+ __u32 handle;
+
+ /**
+ * @extra_flags: Combination of enum drm_panthor_bo_extra_flags flags.
+ */
+ __u32 extra_flags;
+
+ /**
+ * @create_flags: Flags passed at creation time.
+ *
+ * Combination of enum drm_panthor_bo_flags flags.
+ * Will be zero if the buffer comes from a different driver.
+ */
+ __u32 create_flags;
+
+ /** @pad: Will be zero on return. */
+ __u32 pad;
+};
+
/**
* DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
* @__access: Access type. Must be R, W or RW.
@@ -1171,6 +1226,8 @@ enum {
DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
DRM_IOCTL_PANTHOR_BO_SYNC =
DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
+ DRM_IOCTL_PANTHOR_BO_QUERY_INFO =
+ DRM_IOCTL_PANTHOR(WR, BO_QUERY_INFO, bo_query_info),
};
#if defined(__cplusplus)
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (7 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:27 ` Steven Price
2025-11-26 12:44 ` [PATCH v6 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
` (7 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Loïc Molinari, kernel, Boris Brezillon
From: Loïc Molinari <loic.molinari@collabora.com>
Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.
Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.
v2:
- Add more to the commit message
- Tweak the doc
- Make sure we sync the section of the BO pointing to the CS
syncobj before we read its seqno
v3:
- Fix formatting/spelling issues
v4:
- Add Steve's R-b
v5:
- Drop Steve's R-b (changes in the ioctl semantics requiring
new review)
v6:
- Fix the uAPI doc
- Fix inverted logic in some comment
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 7 ++++-
drivers/gpu/drm/panthor/panthor_gem.c | 37 +++++++++++++++++++++++--
drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++--
include/uapi/drm/panthor_drm.h | 9 ++++++
4 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 5df874538e88..662be9649f92 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -902,7 +902,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
}
-#define PANTHOR_BO_FLAGS DRM_PANTHOR_BO_NO_MMAP
+#define PANTHOR_BO_FLAGS (DRM_PANTHOR_BO_NO_MMAP | \
+ DRM_PANTHOR_BO_WB_MMAP)
static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
struct drm_file *file)
@@ -921,6 +922,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
goto out_dev_exit;
}
+ if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
+ (args->flags & DRM_PANTHOR_BO_WB_MMAP))
+ return -EINVAL;
+
if (args->exclusive_vm_id) {
vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
if (!vm) {
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 1a3c1afaf88d..21d7a4bb2af5 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -59,6 +59,39 @@ static void panthor_gem_debugfs_set_usage_flags(struct panthor_gem_object *bo, u
static void panthor_gem_debugfs_bo_init(struct panthor_gem_object *bo) {}
#endif
+static bool
+should_map_wc(struct panthor_gem_object *bo, struct panthor_vm *exclusive_vm)
+{
+ struct panthor_device *ptdev = container_of(bo->base.base.dev, struct panthor_device, base);
+
+ /* We can't do uncached mappings if the device is coherent,
+ * because the zeroing done by the shmem layer at page allocation
+ * time happens on a cached mapping which isn't CPU-flushed (at least
+ * not on Arm64 where the flush is deferred to PTE setup time, and
+ * only done conditionally based on the mapping permissions). We can't
+ * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
+ * those, because they are NOPed if dma_dev_coherent() returns true.
+ *
+ * FIXME: Note that this problem is going to pop up again when we
+ * decide to support mapping buffers with the NO_MMAP flag as
+ * non-shareable (AKA buffers accessed only by the GPU), because we
+ * need the same CPU flush to happen after page allocation, otherwise
+ * there's a risk of data leak or late corruption caused by a dirty
+ * cacheline being evicted. At this point we'll need a way to force
+ * CPU cache maintenance regardless of whether the device is coherent
+ * or not.
+ */
+ if (ptdev->coherent)
+ return false;
+
+ /* Cached mappings are explicitly requested, so no write-combine. */
+ if (bo->flags & DRM_PANTHOR_BO_WB_MMAP)
+ return false;
+
+ /* The default is write-combine. */
+ return true;
+}
+
static void panthor_gem_free_object(struct drm_gem_object *obj)
{
struct panthor_gem_object *bo = to_panthor_bo(obj);
@@ -145,6 +178,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
bo = to_panthor_bo(&obj->base);
kbo->obj = &obj->base;
bo->flags = bo_flags;
+ bo->base.map_wc = should_map_wc(bo, vm);
bo->exclusive_vm_root_gem = panthor_vm_root_gem(vm);
drm_gem_object_get(bo->exclusive_vm_root_gem);
bo->base.base.resv = bo->exclusive_vm_root_gem->resv;
@@ -242,7 +276,6 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
*/
struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t size)
{
- struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
struct panthor_gem_object *obj;
obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -250,7 +283,6 @@ struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t
return ERR_PTR(-ENOMEM);
obj->base.base.funcs = &panthor_gem_funcs;
- obj->base.map_wc = !ptdev->coherent;
mutex_init(&obj->label.lock);
panthor_gem_debugfs_bo_init(obj);
@@ -285,6 +317,7 @@ panthor_gem_create_with_handle(struct drm_file *file,
bo = to_panthor_bo(&shmem->base);
bo->flags = flags;
+ bo->base.map_wc = should_map_wc(bo, exclusive_vm);
if (exclusive_vm) {
bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index b834123a6560..45843fb4d5ca 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -872,8 +872,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
struct iosys_map map;
int ret;
- if (queue->syncwait.kmap)
- return queue->syncwait.kmap + queue->syncwait.offset;
+ if (queue->syncwait.kmap) {
+ bo = container_of(queue->syncwait.obj,
+ struct panthor_gem_object, base.base);
+ goto out_sync;
+ }
bo = panthor_vm_get_bo_for_va(group->vm,
queue->syncwait.gpu_va,
@@ -890,6 +893,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
goto err_put_syncwait_obj;
+out_sync:
+ /* Make sure the CPU caches are invalidated before the seqno is read.
+ * drm_gem_shmem_sync() is a NOP if map_wc=true, so no need to check
+ * it here.
+ */
+ drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
+ queue->syncwait.sync64 ?
+ sizeof(struct panthor_syncobj_64b) :
+ sizeof(struct panthor_syncobj_32b),
+ DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE);
+
return queue->syncwait.kmap + queue->syncwait.offset;
err_put_syncwait_obj:
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 7eec9f922183..20c6707e0f6e 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -681,6 +681,15 @@ struct drm_panthor_vm_get_state {
enum drm_panthor_bo_flags {
/** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
+
+ /**
+ * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
+ *
+ * CPU map the buffer object in userspace by forcing the "Write-Back
+ * Cacheable" cacheability attribute. The mapping otherwise uses the
+ * "Non-Cacheable" attribute if the GPU is not IO coherent.
+ */
+ DRM_PANTHOR_BO_WB_MMAP = (1 << 1),
};
/**
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable
2025-11-26 12:44 ` [PATCH v6 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-11-26 16:27 ` Steven Price
0 siblings, 0 replies; 33+ messages in thread
From: Steven Price @ 2025-11-26 16:27 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Loïc Molinari, kernel
On 26/11/2025 12:44, Boris Brezillon wrote:
> From: Loïc Molinari <loic.molinari@collabora.com>
>
> Will be used by the UMD to optimize CPU accesses to buffers
> that are frequently read by the CPU, or on which the access
> pattern makes non-cacheable mappings inefficient.
>
> Mapping buffers CPU-cached implies taking care of the CPU
> cache maintenance in the UMD, unless the GPU is IO coherent.
>
> v2:
> - Add more to the commit message
> - Tweak the doc
> - Make sure we sync the section of the BO pointing to the CS
> syncobj before we read its seqno
>
> v3:
> - Fix formatting/spelling issues
>
> v4:
> - Add Steve's R-b
>
> v5:
> - Drop Steve's R-b (changes in the ioctl semantics requiring
> new review)
>
> v6:
> - Fix the uAPI doc
> - Fix inverted logic in some comment
>
> Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 7 ++++-
> drivers/gpu/drm/panthor/panthor_gem.c | 37 +++++++++++++++++++++++--
> drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++--
> include/uapi/drm/panthor_drm.h | 9 ++++++
> 4 files changed, 66 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 5df874538e88..662be9649f92 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -902,7 +902,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
> return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
> }
>
> -#define PANTHOR_BO_FLAGS DRM_PANTHOR_BO_NO_MMAP
> +#define PANTHOR_BO_FLAGS (DRM_PANTHOR_BO_NO_MMAP | \
> + DRM_PANTHOR_BO_WB_MMAP)
>
> static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
> struct drm_file *file)
> @@ -921,6 +922,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
> goto out_dev_exit;
> }
>
> + if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
> + (args->flags & DRM_PANTHOR_BO_WB_MMAP))
> + return -EINVAL;
> +
> if (args->exclusive_vm_id) {
> vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
> if (!vm) {
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> index 1a3c1afaf88d..21d7a4bb2af5 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.c
> +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> @@ -59,6 +59,39 @@ static void panthor_gem_debugfs_set_usage_flags(struct panthor_gem_object *bo, u
> static void panthor_gem_debugfs_bo_init(struct panthor_gem_object *bo) {}
> #endif
>
> +static bool
> +should_map_wc(struct panthor_gem_object *bo, struct panthor_vm *exclusive_vm)
> +{
> + struct panthor_device *ptdev = container_of(bo->base.base.dev, struct panthor_device, base);
> +
> + /* We can't do uncached mappings if the device is coherent,
> + * because the zeroing done by the shmem layer at page allocation
> + * time happens on a cached mapping which isn't CPU-flushed (at least
> + * not on Arm64 where the flush is deferred to PTE setup time, and
> + * only done conditionally based on the mapping permissions). We can't
> + * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
> + * those, because they are NOPed if dma_dev_coherent() returns true.
> + *
> + * FIXME: Note that this problem is going to pop up again when we
> + * decide to support mapping buffers with the NO_MMAP flag as
> + * non-shareable (AKA buffers accessed only by the GPU), because we
> + * need the same CPU flush to happen after page allocation, otherwise
> + * there's a risk of data leak or late corruption caused by a dirty
> + * cacheline being evicted. At this point we'll need a way to force
> + * CPU cache maintenance regardless of whether the device is coherent
> + * or not.
> + */
> + if (ptdev->coherent)
> + return false;
> +
> + /* Cached mappings are explicitly requested, so no write-combine. */
> + if (bo->flags & DRM_PANTHOR_BO_WB_MMAP)
> + return false;
> +
> + /* The default is write-combine. */
> + return true;
> +}
> +
> static void panthor_gem_free_object(struct drm_gem_object *obj)
> {
> struct panthor_gem_object *bo = to_panthor_bo(obj);
> @@ -145,6 +178,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
> bo = to_panthor_bo(&obj->base);
> kbo->obj = &obj->base;
> bo->flags = bo_flags;
> + bo->base.map_wc = should_map_wc(bo, vm);
> bo->exclusive_vm_root_gem = panthor_vm_root_gem(vm);
> drm_gem_object_get(bo->exclusive_vm_root_gem);
> bo->base.base.resv = bo->exclusive_vm_root_gem->resv;
> @@ -242,7 +276,6 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
> */
> struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t size)
> {
> - struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
> struct panthor_gem_object *obj;
>
> obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -250,7 +283,6 @@ struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t
> return ERR_PTR(-ENOMEM);
>
> obj->base.base.funcs = &panthor_gem_funcs;
> - obj->base.map_wc = !ptdev->coherent;
> mutex_init(&obj->label.lock);
>
> panthor_gem_debugfs_bo_init(obj);
> @@ -285,6 +317,7 @@ panthor_gem_create_with_handle(struct drm_file *file,
>
> bo = to_panthor_bo(&shmem->base);
> bo->flags = flags;
> + bo->base.map_wc = should_map_wc(bo, exclusive_vm);
>
> if (exclusive_vm) {
> bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index b834123a6560..45843fb4d5ca 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -872,8 +872,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
> struct iosys_map map;
> int ret;
>
> - if (queue->syncwait.kmap)
> - return queue->syncwait.kmap + queue->syncwait.offset;
> + if (queue->syncwait.kmap) {
> + bo = container_of(queue->syncwait.obj,
> + struct panthor_gem_object, base.base);
> + goto out_sync;
> + }
>
> bo = panthor_vm_get_bo_for_va(group->vm,
> queue->syncwait.gpu_va,
> @@ -890,6 +893,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
> if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
> goto err_put_syncwait_obj;
>
> +out_sync:
> + /* Make sure the CPU caches are invalidated before the seqno is read.
> + * drm_gem_shmem_sync() is a NOP if map_wc=true, so no need to check
> + * it here.
> + */
> + drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
> + queue->syncwait.sync64 ?
> + sizeof(struct panthor_syncobj_64b) :
> + sizeof(struct panthor_syncobj_32b),
> + DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE);
> +
> return queue->syncwait.kmap + queue->syncwait.offset;
>
> err_put_syncwait_obj:
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 7eec9f922183..20c6707e0f6e 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -681,6 +681,15 @@ struct drm_panthor_vm_get_state {
> enum drm_panthor_bo_flags {
> /** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
> DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
> +
> + /**
> + * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
> + *
> + * CPU map the buffer object in userspace by forcing the "Write-Back
> + * Cacheable" cacheability attribute. The mapping otherwise uses the
> + * "Non-Cacheable" attribute if the GPU is not IO coherent.
> + */
> + DRM_PANTHOR_BO_WB_MMAP = (1 << 1),
> };
>
> /**
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 10/16] drm/panthor: Bump the driver version to 1.6
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (8 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
` (6 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel, Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Bump the driver version to reflect the new cached-CPU mapping
capability.
v2:
- Quickly describe what the new version exposes in the commit message
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- No changes
v6:
- No changes
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 662be9649f92..2f1743b5bda6 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1672,6 +1672,10 @@ static void panthor_debugfs_init(struct drm_minor *minor)
* - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
* - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
* - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
+ * - 1.6 - adds DRM_PANTHOR_BO_WB_MMAP flag
+ * - adds DRM_IOCTL_PANTHOR_BO_SYNC ioctl
+ * - adds DRM_IOCTL_PANTHOR_BO_QUERY_INFO ioctl
+ * - adds drm_panthor_gpu_info::selected_coherency
*/
static const struct drm_driver panthor_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -1685,7 +1689,7 @@ static const struct drm_driver panthor_drm_driver = {
.name = "panthor",
.desc = "Panthor DRM driver",
.major = 1,
- .minor = 5,
+ .minor = 6,
.gem_create_object = panthor_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 11/16] drm/panfrost: Provide a custom dma_buf implementation
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (9 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
` (5 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
Before we introduce cached CPU mappings, we want a dma_buf
implementation satisfying synchronization requests around CPU
accesses coming from a dma_buf exported by our driver. Let's
provide our own implementation relying on the default
gem_shmem_prime helpers designed for that purpose.
v5:
- New patch
v6:
- Collect R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
drivers/gpu/drm/panfrost/panfrost_gem.c | 13 +++++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 ++
3 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 7d8c7c337606..3c62cdd43069 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -853,6 +853,7 @@ static const struct drm_driver panfrost_drm_driver = {
.gem_create_object = panfrost_gem_create_object,
.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
+ .dma_buf_ops = &panfrost_dma_buf_ops,
#ifdef CONFIG_DEBUG_FS
.debugfs_init = panfrost_debugfs_init,
#endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 8041b65c6609..292f3ce6287f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -324,6 +324,19 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
return bo;
}
+const struct dma_buf_ops panfrost_dma_buf_ops = {
+ .attach = drm_gem_map_attach,
+ .detach = drm_gem_map_detach,
+ .map_dma_buf = drm_gem_shmem_prime_map_dma_buf,
+ .unmap_dma_buf = drm_gem_shmem_prime_unmap_dma_buf,
+ .release = drm_gem_dmabuf_release,
+ .mmap = drm_gem_dmabuf_mmap,
+ .vmap = drm_gem_dmabuf_vmap,
+ .vunmap = drm_gem_dmabuf_vunmap,
+ .begin_cpu_access = drm_gem_shmem_prime_begin_cpu_access,
+ .end_cpu_access = drm_gem_shmem_prime_end_cpu_access,
+};
+
struct drm_gem_object *
panfrost_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 8de3e76f2717..9ad35e2d99fc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -131,6 +131,8 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach,
struct sg_table *sgt);
+extern const struct dma_buf_ops panfrost_dma_buf_ops;
+
struct panfrost_gem_object *
panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags);
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (10 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
` (4 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
Will be needed if we want to skip CPU cache maintenance operations when
the GPU can snoop CPU caches.
v2:
- New commit
v3:
- Fix the coherency values (enum instead of bitmask)
v4:
- Fix init/test on coherency_features
v5:
- No changes
v6:
- Collect R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
drivers/gpu/drm/panfrost/panfrost_gpu.c | 26 +++++++++++++++++++---
drivers/gpu/drm/panfrost/panfrost_regs.h | 10 +++++++--
include/uapi/drm/panfrost_drm.h | 7 ++++++
5 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index e61c4329fd07..0f3992412205 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -79,6 +79,7 @@ struct panfrost_features {
u32 thread_max_workgroup_sz;
u32 thread_max_barrier_sz;
u32 coherency_features;
+ u32 selected_coherency;
u32 afbc_features;
u32 texture_features[4];
u32 js_features[16];
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3c62cdd43069..54ac9d2ececf 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -95,6 +95,7 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
PANFROST_FEATURE_ARRAY(JS_FEATURES, js_features, 15);
PANFROST_FEATURE(NR_CORE_GROUPS, nr_core_groups);
PANFROST_FEATURE(THREAD_TLS_ALLOC, thread_tls_alloc);
+ PANFROST_FEATURE(SELECTED_COHERENCY, selected_coherency);
case DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP:
ret = panfrost_ioctl_query_timestamp(pfdev, ¶m->value);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 483d278eb154..7d555e63e21a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -159,8 +159,8 @@ static void panfrost_gpu_init_quirks(struct panfrost_device *pfdev)
pfdev->features.revision >= 0x2000)
quirks |= JM_MAX_JOB_THROTTLE_LIMIT << JM_JOB_THROTTLE_LIMIT_SHIFT;
else if (panfrost_model_eq(pfdev, 0x6000) &&
- pfdev->features.coherency_features == COHERENCY_ACE)
- quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
+ pfdev->features.coherency_features == BIT(COHERENCY_ACE))
+ quirks |= (BIT(COHERENCY_ACE_LITE) | BIT(COHERENCY_ACE)) <<
JM_FORCE_COHERENCY_FEATURES_SHIFT;
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
@@ -263,7 +263,27 @@ static int panfrost_gpu_init_features(struct panfrost_device *pfdev)
pfdev->features.max_threads = gpu_read(pfdev, GPU_THREAD_MAX_THREADS);
pfdev->features.thread_max_workgroup_sz = gpu_read(pfdev, GPU_THREAD_MAX_WORKGROUP_SIZE);
pfdev->features.thread_max_barrier_sz = gpu_read(pfdev, GPU_THREAD_MAX_BARRIER_SIZE);
- pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+
+ if (panfrost_has_hw_feature(pfdev, HW_FEATURE_COHERENCY_REG))
+ pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+ else
+ pfdev->features.coherency_features = BIT(COHERENCY_ACE_LITE);
+
+ BUILD_BUG_ON(COHERENCY_ACE_LITE != DRM_PANFROST_GPU_COHERENCY_ACE_LITE);
+ BUILD_BUG_ON(COHERENCY_ACE != DRM_PANFROST_GPU_COHERENCY_ACE);
+ BUILD_BUG_ON(COHERENCY_NONE != DRM_PANFROST_GPU_COHERENCY_NONE);
+
+ if (!pfdev->coherent) {
+ pfdev->features.selected_coherency = COHERENCY_NONE;
+ } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE)) {
+ pfdev->features.selected_coherency = COHERENCY_ACE;
+ } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE_LITE)) {
+ pfdev->features.selected_coherency = COHERENCY_ACE_LITE;
+ } else {
+ drm_WARN(&pfdev->base, true, "No known coherency protocol supported");
+ pfdev->features.selected_coherency = COHERENCY_NONE;
+ }
+
pfdev->features.afbc_features = gpu_read(pfdev, GPU_AFBC_FEATURES);
for (i = 0; i < 4; i++)
pfdev->features.texture_features[i] = gpu_read(pfdev, GPU_TEXTURE_FEATURES(i));
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 2b8f1617b836..ee15f6bf6e6f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -102,9 +102,15 @@
#define GPU_L2_PRESENT_LO 0x120 /* (RO) Level 2 cache present bitmap, low word */
#define GPU_L2_PRESENT_HI 0x124 /* (RO) Level 2 cache present bitmap, high word */
+/* GPU_COHERENCY_FEATURES is a bitmask of BIT(COHERENCY_xxx) values encoding the
+ * set of supported coherency protocols. GPU_COHERENCY_ENABLE is passed a
+ * COHERENCY_xxx value.
+ */
#define GPU_COHERENCY_FEATURES 0x300 /* (RO) Coherency features present */
-#define COHERENCY_ACE_LITE BIT(0)
-#define COHERENCY_ACE BIT(1)
+#define GPU_COHERENCY_ENABLE 0x304 /* (RW) Coherency protocol selection */
+#define COHERENCY_ACE_LITE 0
+#define COHERENCY_ACE 1
+#define COHERENCY_NONE 31
#define GPU_STACK_PRESENT_LO 0xE00 /* (RO) Core stack present bitmap, low word */
#define GPU_STACK_PRESENT_HI 0xE04 /* (RO) Core stack present bitmap, high word */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 1956431bb391..0c59714ae42b 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -228,6 +228,13 @@ enum drm_panfrost_param {
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
+ DRM_PANFROST_PARAM_SELECTED_COHERENCY,
+};
+
+enum drm_panfrost_gpu_coherency {
+ DRM_PANFROST_GPU_COHERENCY_ACE_LITE = 0,
+ DRM_PANFROST_GPU_COHERENCY_ACE = 1,
+ DRM_PANFROST_GPU_COHERENCY_NONE = 31,
};
struct drm_panfrost_get_param {
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (11 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:47 ` Steven Price
2025-11-26 12:44 ` [PATCH v6 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
` (3 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel, Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).
v2:
- Add more to the commit message
- Change the flags to better match the drm_gem_shmem_sync semantics
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- Drop Steve's R-b (semantics changes requiring a new review)
v6:
- Bail out early in panfrost_ioctl_sync_bo() if op_count is zero
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 51 +++++++++++++++++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.c | 21 ++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +
include/uapi/drm/panfrost_drm.h | 45 ++++++++++++++++++++++
4 files changed, 119 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 54ac9d2ececf..85ae2eefca04 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -580,6 +580,56 @@ static int panfrost_ioctl_jm_ctx_destroy(struct drm_device *dev, void *data,
return panfrost_jm_ctx_destroy(file, args->handle);
}
+static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panfrost_sync_bo *args = data;
+ struct drm_panfrost_bo_sync_op *ops;
+ struct drm_gem_object *obj;
+ int ret;
+ u32 i;
+
+ if (args->pad)
+ return -EINVAL;
+
+ if (!args->op_count)
+ return 0;
+
+ ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
+ if (!ops) {
+ DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
+ args->op_count * sizeof(*ops))) {
+ DRM_DEBUG("Failed to copy in BO sync ops\n");
+ ret = -EFAULT;
+ goto err_ops;
+ }
+
+ for (i = 0; i < args->op_count; i++) {
+ obj = drm_gem_object_lookup(file, ops[i].handle);
+ if (!obj) {
+ ret = -ENOENT;
+ goto err_ops;
+ }
+
+ ret = panfrost_gem_sync(obj, ops[i].type,
+ ops[i].offset, ops[i].size);
+
+ drm_gem_object_put(obj);
+
+ if (ret)
+ goto err_ops;
+ }
+
+err_ops:
+ kvfree(ops);
+
+ return ret;
+}
+
int panfrost_unstable_ioctl_check(void)
{
if (!unstable_ioctls)
@@ -649,6 +699,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(SET_LABEL_BO, set_label_bo, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(SYNC_BO, sync_bo, DRM_RENDER_ALLOW),
};
static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 292f3ce6287f..ca884bf216d6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -379,6 +379,27 @@ panfrost_gem_set_label(struct drm_gem_object *obj, const char *label)
kfree_const(old_label);
}
+int
+panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
+ u32 offset, u32 size)
+{
+ struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+ enum drm_gem_shmem_sync_type shmem_sync_type;
+
+ switch (type) {
+ case PANFROST_BO_SYNC_CPU_CACHE_FLUSH:
+ shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
+ break;
+ case PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+ shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
+}
+
void
panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label)
{
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 9ad35e2d99fc..30438ad335de 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -150,6 +150,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev);
void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
+int panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
+ u32 offset, u32 size);
void panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label);
#ifdef CONFIG_DEBUG_FS
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 0c59714ae42b..e194e087a0c8 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -24,6 +24,7 @@ extern "C" {
#define DRM_PANFROST_SET_LABEL_BO 0x09
#define DRM_PANFROST_JM_CTX_CREATE 0x0a
#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
+#define DRM_PANFROST_SYNC_BO 0x0c
#define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
#define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -35,6 +36,7 @@ extern "C" {
#define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
+#define DRM_IOCTL_PANFROST_SYNC_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
/*
* Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -308,6 +310,49 @@ struct drm_panfrost_set_label_bo {
__u64 label;
};
+/* Valid flags to pass to drm_panfrost_bo_sync_op */
+#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH 0
+#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE 1
+
+/**
+ * struct drm_panthor_bo_flush_map_op - BO map sync op
+ */
+struct drm_panfrost_bo_sync_op {
+ /** @handle: Handle of the buffer object to sync. */
+ __u32 handle;
+
+ /** @type: Type of sync operation. */
+ __u32 type;
+
+ /**
+ * @offset: Offset into the BO at which the sync range starts.
+ *
+ * This will be rounded down to the nearest cache line as needed.
+ */
+ __u32 offset;
+
+ /**
+ * @size: Size of the range to sync
+ *
+ * @size + @offset will be rounded up to the nearest cache line as
+ * needed.
+ */
+ __u32 size;
+};
+
+/**
+ * struct drm_panfrost_sync_bo - ioctl argument for syncing BO maps
+ */
+struct drm_panfrost_sync_bo {
+ /** Array of struct drm_panfrost_bo_sync_op */
+ __u64 ops;
+
+ /** Number of BO sync ops */
+ __u32 op_count;
+
+ __u32 pad;
+};
+
/* Definitions for coredump decoding in user space */
#define PANFROSTDUMP_MAJOR 1
#define PANFROSTDUMP_MINOR 0
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-11-26 12:44 ` [PATCH v6 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-11-26 16:47 ` Steven Price
0 siblings, 0 replies; 33+ messages in thread
From: Steven Price @ 2025-11-26 16:47 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On 26/11/2025 12:44, Boris Brezillon wrote:
> From: Faith Ekstrand <faith.ekstrand@collabora.com>
>
> This will be used by the UMD to synchronize CPU-cached mappings when
> the UMD can't do it directly (no usermode cache maintenance instruction
> on Arm32).
>
> v2:
> - Add more to the commit message
> - Change the flags to better match the drm_gem_shmem_sync semantics
>
> v3:
> - Add Steve's R-b
>
> v4:
> - No changes
>
> v5:
> - Drop Steve's R-b (semantics changes requiring a new review)
>
> v6:
> - Bail out early in panfrost_ioctl_sync_bo() if op_count is zero
>
> Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panfrost/panfrost_drv.c | 51 +++++++++++++++++++++++++
> drivers/gpu/drm/panfrost/panfrost_gem.c | 21 ++++++++++
> drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +
> include/uapi/drm/panfrost_drm.h | 45 ++++++++++++++++++++++
> 4 files changed, 119 insertions(+)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 54ac9d2ececf..85ae2eefca04 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -580,6 +580,56 @@ static int panfrost_ioctl_jm_ctx_destroy(struct drm_device *dev, void *data,
> return panfrost_jm_ctx_destroy(file, args->handle);
> }
>
> +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> + struct drm_file *file)
> +{
> + struct drm_panfrost_sync_bo *args = data;
> + struct drm_panfrost_bo_sync_op *ops;
> + struct drm_gem_object *obj;
> + int ret;
> + u32 i;
> +
> + if (args->pad)
> + return -EINVAL;
> +
> + if (!args->op_count)
> + return 0;
> +
> + ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> + if (!ops) {
> + DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> + return -ENOMEM;
> + }
> +
> + if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> + args->op_count * sizeof(*ops))) {
> + DRM_DEBUG("Failed to copy in BO sync ops\n");
> + ret = -EFAULT;
> + goto err_ops;
> + }
NIT: there's a helper vmemdup_array_user() which does this, but we do
already have similar code in panfrost_copy_in_sync(), so it's fine by me
whichever way.
Thanks,
Steve
> +
> + for (i = 0; i < args->op_count; i++) {
> + obj = drm_gem_object_lookup(file, ops[i].handle);
> + if (!obj) {
> + ret = -ENOENT;
> + goto err_ops;
> + }
> +
> + ret = panfrost_gem_sync(obj, ops[i].type,
> + ops[i].offset, ops[i].size);
> +
> + drm_gem_object_put(obj);
> +
> + if (ret)
> + goto err_ops;
> + }
> +
> +err_ops:
> + kvfree(ops);
> +
> + return ret;
> +}
> +
> int panfrost_unstable_ioctl_check(void)
> {
> if (!unstable_ioctls)
> @@ -649,6 +699,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
> PANFROST_IOCTL(SET_LABEL_BO, set_label_bo, DRM_RENDER_ALLOW),
> PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
> PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
> + PANFROST_IOCTL(SYNC_BO, sync_bo, DRM_RENDER_ALLOW),
> };
>
> static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 292f3ce6287f..ca884bf216d6 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -379,6 +379,27 @@ panfrost_gem_set_label(struct drm_gem_object *obj, const char *label)
> kfree_const(old_label);
> }
>
> +int
> +panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
> + u32 offset, u32 size)
> +{
> + struct panfrost_gem_object *bo = to_panfrost_bo(obj);
> + enum drm_gem_shmem_sync_type shmem_sync_type;
> +
> + switch (type) {
> + case PANFROST_BO_SYNC_CPU_CACHE_FLUSH:
> + shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
> + break;
> + case PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
> + shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
> +}
> +
> void
> panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label)
> {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index 9ad35e2d99fc..30438ad335de 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -150,6 +150,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev);
> void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
>
> void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
> +int panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
> + u32 offset, u32 size);
> void panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label);
>
> #ifdef CONFIG_DEBUG_FS
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index 0c59714ae42b..e194e087a0c8 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -24,6 +24,7 @@ extern "C" {
> #define DRM_PANFROST_SET_LABEL_BO 0x09
> #define DRM_PANFROST_JM_CTX_CREATE 0x0a
> #define DRM_PANFROST_JM_CTX_DESTROY 0x0b
> +#define DRM_PANFROST_SYNC_BO 0x0c
>
> #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> @@ -35,6 +36,7 @@ extern "C" {
> #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
> #define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
> #define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
> +#define DRM_IOCTL_PANFROST_SYNC_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
>
> /*
> * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
> @@ -308,6 +310,49 @@ struct drm_panfrost_set_label_bo {
> __u64 label;
> };
>
> +/* Valid flags to pass to drm_panfrost_bo_sync_op */
> +#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH 0
> +#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE 1
> +
> +/**
> + * struct drm_panthor_bo_flush_map_op - BO map sync op
> + */
> +struct drm_panfrost_bo_sync_op {
> + /** @handle: Handle of the buffer object to sync. */
> + __u32 handle;
> +
> + /** @type: Type of sync operation. */
> + __u32 type;
> +
> + /**
> + * @offset: Offset into the BO at which the sync range starts.
> + *
> + * This will be rounded down to the nearest cache line as needed.
> + */
> + __u32 offset;
> +
> + /**
> + * @size: Size of the range to sync
> + *
> + * @size + @offset will be rounded up to the nearest cache line as
> + * needed.
> + */
> + __u32 size;
> +};
> +
> +/**
> + * struct drm_panfrost_sync_bo - ioctl argument for syncing BO maps
> + */
> +struct drm_panfrost_sync_bo {
> + /** Array of struct drm_panfrost_bo_sync_op */
> + __u64 ops;
> +
> + /** Number of BO sync ops */
> + __u32 op_count;
> +
> + __u32 pad;
> +};
> +
> /* Definitions for coredump decoding in user space */
> #define PANFROSTDUMP_MAJOR 1
> #define PANFROSTDUMP_MINOR 0
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v6 14/16] drm/panfrost: Add an ioctl to query BO flags
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (12 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
` (2 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, Boris Brezillon, kernel
This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.
v2:
- New commit
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- No changes
v6:
- No changes
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 33 +++++++++++++++++++++++++
include/uapi/drm/panfrost_drm.h | 19 ++++++++++++++
2 files changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 85ae2eefca04..44ca58d62432 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -630,6 +630,38 @@ static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
return ret;
}
+static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct drm_panfrost_query_bo_info *args = data;
+ struct drm_gem_object *gem_obj;
+ struct panfrost_gem_object *bo;
+
+ gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+ if (!gem_obj) {
+ DRM_DEBUG("Failed to look up GEM BO %d\n", args->handle);
+ return -ENOENT;
+ }
+
+ bo = to_panfrost_bo(gem_obj);
+ args->pad = 0;
+ args->create_flags = 0;
+ args->extra_flags = 0;
+
+ if (drm_gem_is_imported(gem_obj)) {
+ args->extra_flags |= DRM_PANFROST_BO_IS_IMPORTED;
+ } else {
+ if (bo->noexec)
+ args->create_flags |= PANFROST_BO_NOEXEC;
+
+ if (bo->is_heap)
+ args->create_flags |= PANFROST_BO_HEAP;
+ }
+
+ drm_gem_object_put(gem_obj);
+ return 0;
+}
+
int panfrost_unstable_ioctl_check(void)
{
if (!unstable_ioctls)
@@ -700,6 +732,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
PANFROST_IOCTL(SYNC_BO, sync_bo, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(QUERY_BO_INFO, query_bo_info, DRM_RENDER_ALLOW),
};
static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index e194e087a0c8..36ae48ea50d3 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -25,6 +25,7 @@ extern "C" {
#define DRM_PANFROST_JM_CTX_CREATE 0x0a
#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
#define DRM_PANFROST_SYNC_BO 0x0c
+#define DRM_PANFROST_QUERY_BO_INFO 0x0d
#define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
#define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -37,6 +38,7 @@ extern "C" {
#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
#define DRM_IOCTL_PANFROST_SYNC_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
+#define DRM_IOCTL_PANFROST_QUERY_BO_INFO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_QUERY_BO_INFO, struct drm_panfrost_query_bo_info)
/*
* Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -353,6 +355,23 @@ struct drm_panfrost_sync_bo {
__u32 pad;
};
+/** BO comes from a different subsystem. */
+#define DRM_PANFROST_BO_IS_IMPORTED (1 << 0)
+
+struct drm_panfrost_query_bo_info {
+ /** Handle of the object being queried. */
+ __u32 handle;
+
+ /** Extra flags that are not coming from the BO_CREATE ioctl(). */
+ __u32 extra_flags;
+
+ /** Flags passed at creation time. */
+ __u32 create_flags;
+
+ /** Will be zero on return. */
+ __u32 pad;
+};
+
/* Definitions for coredump decoding in user space */
#define PANFROSTDUMP_MAJOR 1
#define PANFROSTDUMP_MINOR 0
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (13 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 12:44 ` [PATCH v6 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
2025-11-26 16:20 ` [PATCH v6 00/16] drm/panfrost,panthor: Cached maps and explicit flushing Thomas Zimmermann
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel, Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.
Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.
v2:
- Add more to the commit message
v3:
- No changes
v4:
- Fix the map_wc test in panfrost_ioctl_query_bo_info()
v5:
- Drop Steve's R-b (enough has changed to justify a new review)
v6:
- Collect R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++++++--
drivers/gpu/drm/panfrost/panfrost_gem.c | 33 +++++++++++++++++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++++
include/uapi/drm/panfrost_drm.h | 5 +++-
4 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 44ca58d62432..70bdcf639f54 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -126,6 +126,10 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
return 0;
}
+#define PANFROST_BO_FLAGS (PANFROST_BO_NOEXEC | \
+ PANFROST_BO_HEAP | \
+ PANFROST_BO_WB_MMAP)
+
static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
struct drm_file *file)
{
@@ -135,8 +139,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
struct panfrost_gem_mapping *mapping;
int ret;
- if (!args->size || args->pad ||
- (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
+ if (!args->size || args->pad || (args->flags & ~PANFROST_BO_FLAGS))
return -EINVAL;
/* Heaps should never be executable */
@@ -656,6 +659,9 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
if (bo->is_heap)
args->create_flags |= PANFROST_BO_HEAP;
+
+ if (!bo->base.map_wc)
+ args->create_flags |= PANFROST_BO_WB_MMAP;
}
drm_gem_object_put(gem_obj);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index ca884bf216d6..56ee5c705c4d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -270,6 +270,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+ .export = drm_gem_prime_export,
.status = panfrost_gem_status,
.rss = panfrost_gem_rss,
.vm_ops = &drm_gem_shmem_vm_ops,
@@ -303,12 +304,42 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
return &obj->base.base;
}
+static bool
+should_map_wc(struct panfrost_gem_object *bo)
+{
+ struct panfrost_device *pfdev = to_panfrost_device(bo->base.base.dev);
+
+ /* We can't do uncached mappings if the device is coherent,
+ * because the zeroing done by the shmem layer at page allocation
+ * time happens on a cached mapping which isn't CPU-flushed (at least
+ * not on Arm64 where the flush is deferred to PTE setup time, and
+ * only done conditionally based on the mapping permissions). We can't
+ * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
+ * those, because they are NOPed if dma_dev_coherent() returns true.
+ */
+ if (pfdev->coherent)
+ return false;
+
+ /* Cached mappings are explicitly requested, so no write-combine. */
+ if (bo->wb_mmap)
+ return false;
+
+ /* The default is write-combine. */
+ return true;
+}
+
struct panfrost_gem_object *
panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
{
struct drm_gem_shmem_object *shmem;
struct panfrost_gem_object *bo;
+ /* The heap buffer is not supposed to be CPU-visible, so don't allow
+ * WB_MMAP on those.
+ */
+ if ((flags & PANFROST_BO_HEAP) && (flags & PANFROST_BO_WB_MMAP))
+ return ERR_PTR(-EINVAL);
+
/* Round up heap allocations to 2MB to keep fault handling simple */
if (flags & PANFROST_BO_HEAP)
size = roundup(size, SZ_2M);
@@ -320,6 +351,8 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
bo = to_panfrost_bo(&shmem->base);
bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
bo->is_heap = !!(flags & PANFROST_BO_HEAP);
+ bo->wb_mmap = !!(flags & PANFROST_BO_WB_MMAP);
+ bo->base.map_wc = should_map_wc(bo);
return bo;
}
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 30438ad335de..f62738f9eb50 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -98,6 +98,11 @@ struct panfrost_gem_object {
bool noexec :1;
bool is_heap :1;
+ /* On coherent devices, this reflects the creation flags, not the true
+ * cacheability attribute of the mapping.
+ */
+ bool wb_mmap :1;
+
#ifdef CONFIG_DEBUG_FS
struct panfrost_gem_debugfs debugfs;
#endif
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 36ae48ea50d3..50d5337f35ef 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -124,9 +124,12 @@ struct drm_panfrost_wait_bo {
__s64 timeout_ns;
};
-/* Valid flags to pass to drm_panfrost_create_bo */
+/* Valid flags to pass to drm_panfrost_create_bo.
+ * PANFROST_BO_WB_MMAP can't be set if PANFROST_BO_HEAP is.
+ */
#define PANFROST_BO_NOEXEC 1
#define PANFROST_BO_HEAP 2
+#define PANFROST_BO_WB_MMAP 4
/**
* struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v6 16/16] drm/panfrost: Bump the driver version to 1.6
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (14 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-11-26 12:44 ` Boris Brezillon
2025-11-26 16:20 ` [PATCH v6 00/16] drm/panfrost,panthor: Cached maps and explicit flushing Thomas Zimmermann
16 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 12:44 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel, Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Bump the driver version to reflect the new cached-CPU mapping
capability.
v2:
- Quickly describe what the new version exposes in the commit message
v3:
- Add Steve's R-b
v4:
- No changes
v5:
- No changes
v6:
- No changes
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 70bdcf639f54..6cd87381aca7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -928,6 +928,9 @@ static void panfrost_debugfs_init(struct drm_minor *minor)
* - 1.4 - adds SET_LABEL_BO
* - 1.5 - adds JM_CTX_{CREATE,DESTROY} ioctls and extend SUBMIT to allow
* context creation with configurable priorities/affinity
+ * - 1.6 - adds PANFROST_BO_MAP_WB, PANFROST_IOCTL_SYNC_BO,
+ * PANFROST_IOCTL_QUERY_BO_INFO and
+ * DRM_PANFROST_PARAM_SELECTED_COHERENCY
*/
static const struct drm_driver panfrost_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
@@ -940,7 +943,7 @@ static const struct drm_driver panfrost_drm_driver = {
.name = "panfrost",
.desc = "panfrost DRM",
.major = 1,
- .minor = 5,
+ .minor = 6,
.gem_create_object = panfrost_gem_create_object,
.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
--
2.51.1
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v6 00/16] drm/panfrost,panthor: Cached maps and explicit flushing
2025-11-26 12:44 [PATCH v6 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (15 preceding siblings ...)
2025-11-26 12:44 ` [PATCH v6 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
@ 2025-11-26 16:20 ` Thomas Zimmermann
2025-11-26 17:24 ` Boris Brezillon
16 siblings, 1 reply; 33+ messages in thread
From: Thomas Zimmermann @ 2025-11-26 16:20 UTC (permalink / raw)
To: Boris Brezillon, Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, David Airlie,
Simona Vetter, Faith Ekstrand, Thierry Reding, Mikko Perttunen,
Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
Hi
Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> This series implements cached maps and explicit flushing for both panfrost
> and panthor. To avoid code/bug duplication, the tricky guts of the cache
> flushing ioctl which walk the sg list are broken into a new common shmem
> helper which can be used by any driver.
Gem-shmem is getting more and more complicated. I think gem-shmem would
be better off to be a simple implementation for all the drivers that use
shadow buffering and software rendering. There are plenty of them. And
drivers like the ones in sysfb/ are our failure-mode fallback. They
should have non-complicated memory management wherever possible.
Therefore, could we first duplicate the existing gem-shmem code into
gem-uma as we discussed recently on IRC? The changes are simple:
- copy the existing gem-shmem to gem-uma (plus renames)
- convert panthor and panfrost to the new interfaces
And on top of that, further improvements, such as the series at hand,
could be done. Later we'd convert other drivers to gem-uma where it
fits, such as lima.
Best regards
Thomas
>
> The PanVK MR to use this lives here:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385
>
> The questions about the DMA-API based CPU-cache-flush mechanism used
> in this patchset have been dropped. After briefly discussing it with
> Sima and Robin, it seems there's a consensus on the fact we should
> probably expose CPU cache maintenance without going through the DMA
> API (extending drm_cache? providing MM helpers for CPU cache
> flush/invalidation? It's not clear yet how, but this will be discussed
> in a separate thread). In the meantime, we can rely on dma_sync because
> that's good enough for our usecase.
>
> Changes in v2:
> - Expose the coherency so userspace can know when it should skip cache
> maintenance
> - Hook things up at drm_gem_object_funcs level to dma-buf cpu_prep hooks
> can be implemented generically
> - Revisit the semantics of the flags passed to gem_sync()
> - Add BO_QUERY_INFO ioctls to query BO flags on imported objects and
> let the UMD know when cache maintenance is needed on those
>
> Changes in v3:
> - New patch to fix panthor_gpu_coherency_set()
> - No other major changes, check each patch changelog for more details
>
> Changes in v4:
> - Two trivial fixes, check each patch changelog for more details
>
> Changes in v5:
> - Add a way to overload dma_buf_ops while still relying on the drm_prime
> boilerplate
> - Add default shmem implementation for
> dma_buf_ops::{begin,end}_cpu_access()
> - Provide custom dma_buf_ops to deal with CPU cache flushes around CPU
> accesses when the BO is CPU-cacheable
> - Go back to a version of drm_gem_shmem_sync() that only deals with
> cache maintenance, and adjust the semantics to make it clear this is
> the only thing it cares about
> - Adjust the BO_SYNC ioctls according to the new drm_gem_shmem_sync()
> semantics
>
> Changes in v6:
> - No major changes, check the changelog in each patch for more details
>
> Boris Brezillon (10):
> drm/prime: Simplify life of drivers needing custom dma_buf_ops
> drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
> drm/panthor: Provide a custom dma_buf implementation
> drm/panthor: Fix panthor_gpu_coherency_set()
> drm/panthor: Expose the selected coherency protocol to the UMD
> drm/panthor: Add a PANTHOR_BO_SYNC ioctl
> drm/panthor: Add an ioctl to query BO flags
> drm/panfrost: Provide a custom dma_buf implementation
> drm/panfrost: Expose the selected coherency protocol to the UMD
> drm/panfrost: Add an ioctl to query BO flags
>
> Faith Ekstrand (5):
> drm/shmem: Add a drm_gem_shmem_sync() helper
> drm/panthor: Bump the driver version to 1.6
> drm/panfrost: Add a PANFROST_SYNC_BO ioctl
> drm/panfrost: Add flag to map GEM object Write-Back Cacheable
> drm/panfrost: Bump the driver version to 1.6
>
> Loïc Molinari (1):
> drm/panthor: Add flag to map GEM object Write-Back Cacheable
>
> drivers/gpu/drm/drm_gem_shmem_helper.c | 207 +++++++++++++++++++++
> drivers/gpu/drm/drm_prime.c | 10 +-
> drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
> drivers/gpu/drm/panfrost/panfrost_drv.c | 101 +++++++++-
> drivers/gpu/drm/panfrost/panfrost_gem.c | 67 +++++++
> drivers/gpu/drm/panfrost/panfrost_gem.h | 9 +
> drivers/gpu/drm/panfrost/panfrost_gpu.c | 26 ++-
> drivers/gpu/drm/panfrost/panfrost_regs.h | 10 +-
> drivers/gpu/drm/panthor/panthor_device.c | 10 +-
> drivers/gpu/drm/panthor/panthor_drv.c | 79 +++++++-
> drivers/gpu/drm/panthor/panthor_gem.c | 71 ++++++-
> drivers/gpu/drm/panthor/panthor_gem.h | 4 +
> drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
> drivers/gpu/drm/panthor/panthor_sched.c | 18 +-
> include/drm/drm_drv.h | 8 +
> include/drm/drm_gem_shmem_helper.h | 24 +++
> include/uapi/drm/panfrost_drm.h | 76 +++++++-
> include/uapi/drm/panthor_drm.h | 157 +++++++++++++++-
> 18 files changed, 857 insertions(+), 23 deletions(-)
>
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v6 00/16] drm/panfrost,panthor: Cached maps and explicit flushing
2025-11-26 16:20 ` [PATCH v6 00/16] drm/panfrost,panthor: Cached maps and explicit flushing Thomas Zimmermann
@ 2025-11-26 17:24 ` Boris Brezillon
0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2025-11-26 17:24 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
amd-gfx, kernel
On Wed, 26 Nov 2025 17:20:08 +0100
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 26.11.25 um 13:44 schrieb Boris Brezillon:
> > This series implements cached maps and explicit flushing for both panfrost
> > and panthor. To avoid code/bug duplication, the tricky guts of the cache
> > flushing ioctl which walk the sg list are broken into a new common shmem
> > helper which can be used by any driver.
>
> Gem-shmem is getting more and more complicated. I think gem-shmem would
> be better off to be a simple implementation for all the drivers that use
> shadow buffering and software rendering. There are plenty of them. And
> drivers like the ones in sysfb/ are our failure-mode fallback. They
> should have non-complicated memory management wherever possible.
Just to be clear, none of the gem-shmem additions in this series are
forced to existing gem-shmem users, and no extra field is added to
drm_gem_shmem_object. Given how far were are, I'd be tempted to merge
this series, and revisit things later.
>
> Therefore, could we first duplicate the existing gem-shmem code into
> gem-uma as we discussed recently on IRC? The changes are simple:
>
> - copy the existing gem-shmem to gem-uma (plus renames)
> - convert panthor and panfrost to the new interfaces
Actually, I started a panthor patchset adding reclaim support, and one
of the patch is parting ways with gem-shmem. I was planning on sending
that after I've got all the pending panthor patches merged.
Based on what we end up with, and if others are interested in moving to
this new implementation, I'll extract the code into a gem-uma layer, as
discussed.
>
> And on top of that, further improvements, such as the series at hand,
> could be done. Later we'd convert other drivers to gem-uma where it
> fits, such as lima.
I'm fine with the request to fork gem-shmem in order to simplify the
implementation for the non-GPU use cases, it's the ordering I'm not
super happy with, because we're already at v6, and I was expecting those
changes to be merged soon...
^ permalink raw reply [flat|nested] 33+ messages in thread