* [PATCH 0/4] panthor: print task pid and comm on gpu errors
@ 2025-06-20 23:50 Chia-I Wu
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
This series saves task pid and comm in panthor_file, ensures panthor_group can
access panthor_file, and prints task pid and comm on gpu errors.
Chia-I Wu (4):
panthor: set owner field for driver fops
panthor: save panthor_file in panthor_group
panthor: save task pid and comm in panthor_file
panthor: dump task pid and comm on gpu errors
drivers/gpu/drm/panthor/panthor_device.h | 22 ++++++++++++++
drivers/gpu/drm/panthor/panthor_drv.c | 38 ++++++++++++++++--------
drivers/gpu/drm/panthor/panthor_mmu.c | 1 +
drivers/gpu/drm/panthor/panthor_sched.c | 31 +++++++++++++++----
4 files changed, 75 insertions(+), 17 deletions(-)
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] panthor: set owner field for driver fops
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
2025-06-23 6:16 ` Boris Brezillon
2025-06-23 8:42 ` Steven Price
2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
` (2 subsequent siblings)
3 siblings, 2 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
It allows us to get rid of manual try_module_get / module_put.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 1116f2d2826ee..775a66c394544 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
struct panthor_file *pfile;
int ret;
- if (!try_module_get(THIS_MODULE))
- return -EINVAL;
-
pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
- if (!pfile) {
- ret = -ENOMEM;
- goto err_put_mod;
- }
+ if (!pfile)
+ return -ENOMEM;
pfile->ptdev = ptdev;
pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
@@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
err_free_file:
kfree(pfile);
-
-err_put_mod:
- module_put(THIS_MODULE);
return ret;
}
@@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
panthor_vm_pool_destroy(pfile);
kfree(pfile);
- module_put(THIS_MODULE);
}
static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
@@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
}
static const struct file_operations panthor_drm_driver_fops = {
+ .owner = THIS_MODULE,
.open = drm_open,
.release = drm_release,
.unlocked_ioctl = drm_ioctl,
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] panthor: save panthor_file in panthor_group
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
2025-06-23 6:21 ` Boris Brezillon
2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu
3 siblings, 1 reply; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
We would like to access panthor_file from panthor_group on gpu errors.
Because panthour_group can outlive drm_file, add refcount to
panthor_file to ensure its lifetime.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
drivers/gpu/drm/panthor/panthor_drv.c | 15 ++++++++++++++-
drivers/gpu/drm/panthor/panthor_mmu.c | 1 +
drivers/gpu/drm/panthor/panthor_sched.c | 6 ++++++
4 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4fc7cf2aeed57..75ae6fd3a5128 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -256,8 +256,24 @@ struct panthor_file {
/** @stats: cycle and timestamp measures for job execution. */
struct panthor_gpu_usage stats;
+
+ /** @refcount: ref count of this file */
+ struct kref refcount;
};
+static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
+{
+ kref_get(&pfile->refcount);
+ return pfile;
+}
+
+void panthor_file_release(struct kref *kref);
+
+static inline void panthor_file_put(struct panthor_file *pfile)
+{
+ kref_put(&pfile->refcount, panthor_file_release);
+}
+
int panthor_device_init(struct panthor_device *ptdev);
void panthor_device_unplug(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 775a66c394544..aea9609684b77 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
return 0;
}
+void panthor_file_release(struct kref *kref)
+{
+ struct panthor_file *pfile =
+ container_of(kref, struct panthor_file, refcount);
+
+ WARN_ON(pfile->vms || pfile->groups);
+
+ kfree(pfile);
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
if (ret)
goto err_destroy_vm_pool;
+ kref_init(&pfile->refcount);
+
file->driver_priv = pfile;
return 0;
@@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
{
struct panthor_file *pfile = file->driver_priv;
+ /* destroy vm and group handles now to avoid circular references */
panthor_group_pool_destroy(pfile);
panthor_vm_pool_destroy(pfile);
- kfree(pfile);
+ panthor_file_put(pfile);
}
static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index b39ea6acc6a96..ccbcfe11420ac 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
xa_destroy(&pfile->vms->xa);
kfree(pfile->vms);
+ pfile->vms = NULL;
}
/**
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index a2248f692a030..485072904cd7d 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -535,6 +535,9 @@ struct panthor_group {
/** @ptdev: Device. */
struct panthor_device *ptdev;
+ /** @pfile: File this group is created from. */
+ struct panthor_file *pfile;
+
/** @vm: VM bound to the group. */
struct panthor_vm *vm;
@@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
panthor_kernel_bo_destroy(group->syncobjs);
panthor_vm_put(group->vm);
+ panthor_file_put(group->pfile);
kfree(group);
}
@@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
INIT_WORK(&group->release_work, group_release_work);
+ group->pfile = panthor_file_get(pfile);
+
group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
if (!group->vm) {
ret = -EINVAL;
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] panthor: save task pid and comm in panthor_file
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu
3 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
We would like to report them on gpu errors.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++
drivers/gpu/drm/panthor/panthor_drv.c | 9 +++++++++
2 files changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 75ae6fd3a5128..8c31c1d4296b6 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -257,6 +257,12 @@ struct panthor_file {
/** @stats: cycle and timestamp measures for job execution. */
struct panthor_gpu_usage stats;
+ /** @pid: pid of the task created this file */
+ pid_t pid;
+
+ /** @comm: comm of the task created this file */
+ char *comm;
+
/** @refcount: ref count of this file */
struct kref refcount;
};
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index aea9609684b77..b9d86b86591db 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1400,6 +1400,7 @@ void panthor_file_release(struct kref *kref)
WARN_ON(pfile->vms || pfile->groups);
+ kfree(pfile->comm);
kfree(pfile);
}
@@ -1408,6 +1409,7 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
{
struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
struct panthor_file *pfile;
+ struct task_struct *task;
int ret;
pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
@@ -1436,6 +1438,13 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
if (ret)
goto err_destroy_vm_pool;
+ task = get_pid_task(rcu_access_pointer(file->pid), PIDTYPE_PID);
+ if (task) {
+ pfile->pid = task->pid;
+ pfile->comm = kstrdup(task->comm, GFP_KERNEL);
+ put_task_struct(task);
+ }
+
kref_init(&pfile->refcount);
file->driver_priv = pfile;
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] panthor: dump task pid and comm on gpu errors
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
` (2 preceding siblings ...)
2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
3 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
It is useful to know which tasks cause gpu errors.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 485072904cd7d..f44cf95e8f1d1 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1359,8 +1359,12 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
fatal = cs_iface->output->fatal;
info = cs_iface->output->fatal_info;
- if (group)
+ if (group) {
+ drm_warn(&ptdev->base, "CS_FATAL: pid=%d, comm=%s\n",
+ group->pfile->pid, group->pfile->comm);
+
group->fatal_queues |= BIT(cs_id);
+ }
if (CS_EXCEPTION_TYPE(fatal) == DRM_PANTHOR_EXCEPTION_CS_UNRECOVERABLE) {
/* If this exception is unrecoverable, queue a reset, and make
@@ -1420,6 +1424,11 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
spin_unlock(&queue->fence_ctx.lock);
}
+ if (group) {
+ drm_warn(&ptdev->base, "CS_FAULT: pid=%d, comm=%s\n",
+ group->pfile->pid, group->pfile->comm);
+ }
+
drm_warn(&ptdev->base,
"CSG slot %d CS slot: %d\n"
"CS_FAULT.EXCEPTION_TYPE: 0x%x (%s)\n"
@@ -1636,11 +1645,16 @@ csg_slot_process_progress_timer_event_locked(struct panthor_device *ptdev, u32 c
lockdep_assert_held(&sched->lock);
- drm_warn(&ptdev->base, "CSG slot %d progress timeout\n", csg_id);
-
group = csg_slot->group;
- if (!drm_WARN_ON(&ptdev->base, !group))
+ if (!drm_WARN_ON(&ptdev->base, !group)) {
+ drm_warn(&ptdev->base,
+ "CSG_PROGRESS_TIMER_EVENT: pid=%d, comm=%s\n",
+ group->pfile->pid, group->pfile->comm);
+
group->timedout = true;
+ }
+
+ drm_warn(&ptdev->base, "CSG slot %d progress timeout\n", csg_id);
sched_queue_delayed_work(sched, tick, 0);
}
@@ -3222,7 +3236,8 @@ queue_timedout_job(struct drm_sched_job *sched_job)
struct panthor_scheduler *sched = ptdev->scheduler;
struct panthor_queue *queue = group->queues[job->queue_idx];
- drm_warn(&ptdev->base, "job timeout\n");
+ drm_warn(&ptdev->base, "job timeout: pid=%d, comm=%s, seqno=%llu\n",
+ group->pfile->pid, group->pfile->comm, job->done_fence->seqno);
drm_WARN_ON(&ptdev->base, atomic_read(&sched->reset.in_progress));
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] panthor: set owner field for driver fops
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
@ 2025-06-23 6:16 ` Boris Brezillon
2025-06-23 8:42 ` Steven Price
1 sibling, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2025-06-23 6:16 UTC (permalink / raw)
To: Chia-I Wu
Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
linux-kernel
On Fri, 20 Jun 2025 16:50:50 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:
> It allows us to get rid of manual try_module_get / module_put.
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 1116f2d2826ee..775a66c394544 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> struct panthor_file *pfile;
> int ret;
>
> - if (!try_module_get(THIS_MODULE))
> - return -EINVAL;
> -
> pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
> - if (!pfile) {
> - ret = -ENOMEM;
> - goto err_put_mod;
> - }
> + if (!pfile)
> + return -ENOMEM;
>
> pfile->ptdev = ptdev;
> pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
> @@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>
> err_free_file:
> kfree(pfile);
> -
> -err_put_mod:
> - module_put(THIS_MODULE);
> return ret;
> }
>
> @@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
> panthor_vm_pool_destroy(pfile);
>
> kfree(pfile);
> - module_put(THIS_MODULE);
> }
>
> static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> @@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> }
>
> static const struct file_operations panthor_drm_driver_fops = {
> + .owner = THIS_MODULE,
> .open = drm_open,
> .release = drm_release,
> .unlocked_ioctl = drm_ioctl,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
@ 2025-06-23 6:21 ` Boris Brezillon
2025-06-23 9:07 ` Liviu Dudau
0 siblings, 1 reply; 10+ messages in thread
From: Boris Brezillon @ 2025-06-23 6:21 UTC (permalink / raw)
To: Chia-I Wu
Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, dri-devel, linux-kernel
On Fri, 20 Jun 2025 16:50:51 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:
> We would like to access panthor_file from panthor_group on gpu errors.
> Because panthour_group can outlive drm_file, add refcount to
> panthor_file to ensure its lifetime.
I'm not a huge fan of refcounting panthor_file because people tend to
put resource they expect to be released when the last handle goes away,
and if we don't refcount these sub-resources they might live longer
than they are meant to. Also not a huge fan of the circular referencing
that exists between file and groups after this change.
How about we move the process info to a sub-object that's refcounted
and let both panthor_file and panthor_group take a ref on this object
instead?
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
> drivers/gpu/drm/panthor/panthor_drv.c | 15 ++++++++++++++-
> drivers/gpu/drm/panthor/panthor_mmu.c | 1 +
> drivers/gpu/drm/panthor/panthor_sched.c | 6 ++++++
> 4 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4fc7cf2aeed57..75ae6fd3a5128 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -256,8 +256,24 @@ struct panthor_file {
>
> /** @stats: cycle and timestamp measures for job execution. */
> struct panthor_gpu_usage stats;
> +
> + /** @refcount: ref count of this file */
> + struct kref refcount;
> };
>
> +static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
> +{
> + kref_get(&pfile->refcount);
> + return pfile;
> +}
> +
> +void panthor_file_release(struct kref *kref);
> +
> +static inline void panthor_file_put(struct panthor_file *pfile)
> +{
> + kref_put(&pfile->refcount, panthor_file_release);
> +}
> +
> int panthor_device_init(struct panthor_device *ptdev);
> void panthor_device_unplug(struct panthor_device *ptdev);
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 775a66c394544..aea9609684b77 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
> return 0;
> }
>
> +void panthor_file_release(struct kref *kref)
> +{
> + struct panthor_file *pfile =
> + container_of(kref, struct panthor_file, refcount);
> +
> + WARN_ON(pfile->vms || pfile->groups);
> +
> + kfree(pfile);
> +}
> +
> static int
> panthor_open(struct drm_device *ddev, struct drm_file *file)
> {
> @@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> if (ret)
> goto err_destroy_vm_pool;
>
> + kref_init(&pfile->refcount);
> +
> file->driver_priv = pfile;
> return 0;
>
> @@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
> {
> struct panthor_file *pfile = file->driver_priv;
>
> + /* destroy vm and group handles now to avoid circular references */
> panthor_group_pool_destroy(pfile);
> panthor_vm_pool_destroy(pfile);
>
> - kfree(pfile);
> + panthor_file_put(pfile);
> }
>
> static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index b39ea6acc6a96..ccbcfe11420ac 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
>
> xa_destroy(&pfile->vms->xa);
> kfree(pfile->vms);
> + pfile->vms = NULL;
> }
>
> /**
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index a2248f692a030..485072904cd7d 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -535,6 +535,9 @@ struct panthor_group {
> /** @ptdev: Device. */
> struct panthor_device *ptdev;
>
> + /** @pfile: File this group is created from. */
> + struct panthor_file *pfile;
> +
> /** @vm: VM bound to the group. */
> struct panthor_vm *vm;
>
> @@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
> panthor_kernel_bo_destroy(group->syncobjs);
>
> panthor_vm_put(group->vm);
> + panthor_file_put(group->pfile);
> kfree(group);
> }
>
> @@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
> INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
> INIT_WORK(&group->release_work, group_release_work);
>
> + group->pfile = panthor_file_get(pfile);
> +
> group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
> if (!group->vm) {
> ret = -EINVAL;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] panthor: set owner field for driver fops
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
2025-06-23 6:16 ` Boris Brezillon
@ 2025-06-23 8:42 ` Steven Price
1 sibling, 0 replies; 10+ messages in thread
From: Steven Price @ 2025-06-23 8:42 UTC (permalink / raw)
To: Chia-I Wu, Boris Brezillon, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 21/06/2025 00:50, Chia-I Wu wrote:
> It allows us to get rid of manual try_module_get / module_put.
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 1116f2d2826ee..775a66c394544 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> struct panthor_file *pfile;
> int ret;
>
> - if (!try_module_get(THIS_MODULE))
> - return -EINVAL;
> -
> pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
> - if (!pfile) {
> - ret = -ENOMEM;
> - goto err_put_mod;
> - }
> + if (!pfile)
> + return -ENOMEM;
>
> pfile->ptdev = ptdev;
> pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
> @@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>
> err_free_file:
> kfree(pfile);
> -
> -err_put_mod:
> - module_put(THIS_MODULE);
> return ret;
> }
>
> @@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
> panthor_vm_pool_destroy(pfile);
>
> kfree(pfile);
> - module_put(THIS_MODULE);
> }
>
> static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> @@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> }
>
> static const struct file_operations panthor_drm_driver_fops = {
> + .owner = THIS_MODULE,
> .open = drm_open,
> .release = drm_release,
> .unlocked_ioctl = drm_ioctl,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
2025-06-23 6:21 ` Boris Brezillon
@ 2025-06-23 9:07 ` Liviu Dudau
2025-07-13 3:12 ` Chia-I Wu
0 siblings, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2025-06-23 9:07 UTC (permalink / raw)
To: Boris Brezillon
Cc: Chia-I Wu, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, dri-devel, linux-kernel
On Mon, Jun 23, 2025 at 08:21:22AM +0200, Boris Brezillon wrote:
> On Fri, 20 Jun 2025 16:50:51 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > We would like to access panthor_file from panthor_group on gpu errors.
> > Because panthour_group can outlive drm_file, add refcount to
> > panthor_file to ensure its lifetime.
>
> I'm not a huge fan of refcounting panthor_file because people tend to
> put resource they expect to be released when the last handle goes away,
> and if we don't refcount these sub-resources they might live longer
> than they are meant to. Also not a huge fan of the circular referencing
> that exists between file and groups after this change.
>
> How about we move the process info to a sub-object that's refcounted
> and let both panthor_file and panthor_group take a ref on this object
> instead?
I agree with Boris on this. One alternative is to put the pid and comm in
the panthor_group struct as panthor_file makes no use of the fields.
Best regards,
Liviu
>
> >
> > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> > drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
> > drivers/gpu/drm/panthor/panthor_drv.c | 15 ++++++++++++++-
> > drivers/gpu/drm/panthor/panthor_mmu.c | 1 +
> > drivers/gpu/drm/panthor/panthor_sched.c | 6 ++++++
> > 4 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 4fc7cf2aeed57..75ae6fd3a5128 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -256,8 +256,24 @@ struct panthor_file {
> >
> > /** @stats: cycle and timestamp measures for job execution. */
> > struct panthor_gpu_usage stats;
> > +
> > + /** @refcount: ref count of this file */
> > + struct kref refcount;
> > };
> >
> > +static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
> > +{
> > + kref_get(&pfile->refcount);
> > + return pfile;
> > +}
> > +
> > +void panthor_file_release(struct kref *kref);
> > +
> > +static inline void panthor_file_put(struct panthor_file *pfile)
> > +{
> > + kref_put(&pfile->refcount, panthor_file_release);
> > +}
> > +
> > int panthor_device_init(struct panthor_device *ptdev);
> > void panthor_device_unplug(struct panthor_device *ptdev);
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> > index 775a66c394544..aea9609684b77 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
> > return 0;
> > }
> >
> > +void panthor_file_release(struct kref *kref)
> > +{
> > + struct panthor_file *pfile =
> > + container_of(kref, struct panthor_file, refcount);
> > +
> > + WARN_ON(pfile->vms || pfile->groups);
> > +
> > + kfree(pfile);
> > +}
> > +
> > static int
> > panthor_open(struct drm_device *ddev, struct drm_file *file)
> > {
> > @@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> > if (ret)
> > goto err_destroy_vm_pool;
> >
> > + kref_init(&pfile->refcount);
> > +
> > file->driver_priv = pfile;
> > return 0;
> >
> > @@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
> > {
> > struct panthor_file *pfile = file->driver_priv;
> >
> > + /* destroy vm and group handles now to avoid circular references */
> > panthor_group_pool_destroy(pfile);
> > panthor_vm_pool_destroy(pfile);
> >
> > - kfree(pfile);
> > + panthor_file_put(pfile);
> > }
> >
> > static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index b39ea6acc6a96..ccbcfe11420ac 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
> >
> > xa_destroy(&pfile->vms->xa);
> > kfree(pfile->vms);
> > + pfile->vms = NULL;
> > }
> >
> > /**
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index a2248f692a030..485072904cd7d 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -535,6 +535,9 @@ struct panthor_group {
> > /** @ptdev: Device. */
> > struct panthor_device *ptdev;
> >
> > + /** @pfile: File this group is created from. */
> > + struct panthor_file *pfile;
> > +
> > /** @vm: VM bound to the group. */
> > struct panthor_vm *vm;
> >
> > @@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
> > panthor_kernel_bo_destroy(group->syncobjs);
> >
> > panthor_vm_put(group->vm);
> > + panthor_file_put(group->pfile);
> > kfree(group);
> > }
> >
> > @@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
> > INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
> > INIT_WORK(&group->release_work, group_release_work);
> >
> > + group->pfile = panthor_file_get(pfile);
> > +
> > group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
> > if (!group->vm) {
> > ret = -EINVAL;
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
2025-06-23 9:07 ` Liviu Dudau
@ 2025-07-13 3:12 ` Chia-I Wu
0 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-07-13 3:12 UTC (permalink / raw)
To: Liviu Dudau
Cc: Boris Brezillon, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, dri-devel, linux-kernel
Hi,
On Mon, Jun 23, 2025 at 2:07 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
>
> On Mon, Jun 23, 2025 at 08:21:22AM +0200, Boris Brezillon wrote:
> > On Fri, 20 Jun 2025 16:50:51 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > > We would like to access panthor_file from panthor_group on gpu errors.
> > > Because panthour_group can outlive drm_file, add refcount to
> > > panthor_file to ensure its lifetime.
> >
> > I'm not a huge fan of refcounting panthor_file because people tend to
> > put resource they expect to be released when the last handle goes away,
> > and if we don't refcount these sub-resources they might live longer
> > than they are meant to. Also not a huge fan of the circular referencing
> > that exists between file and groups after this change.
> >
> > How about we move the process info to a sub-object that's refcounted
> > and let both panthor_file and panthor_group take a ref on this object
> > instead?
>
> I agree with Boris on this. One alternative is to put the pid and comm in
> the panthor_group struct as panthor_file makes no use of the fields.
I took this suggestion in v2 because, when the task that opened the
node differs from the task that created the group, we are more
interested in the latter.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-13 3:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
2025-06-23 6:16 ` Boris Brezillon
2025-06-23 8:42 ` Steven Price
2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
2025-06-23 6:21 ` Boris Brezillon
2025-06-23 9:07 ` Liviu Dudau
2025-07-13 3:12 ` Chia-I Wu
2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).