linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] panthor: print task pid and comm on gpu errors
@ 2025-06-20 23:50 Chia-I Wu
  2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

This series saves task pid and comm in panthor_file, ensures panthor_group can
access panthor_file, and prints task pid and comm on gpu errors.

Chia-I Wu (4):
  panthor: set owner field for driver fops
  panthor: save panthor_file in panthor_group
  panthor: save task pid and comm in panthor_file
  panthor: dump task pid and comm on gpu errors

 drivers/gpu/drm/panthor/panthor_device.h | 22 ++++++++++++++
 drivers/gpu/drm/panthor/panthor_drv.c    | 38 ++++++++++++++++--------
 drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
 drivers/gpu/drm/panthor/panthor_sched.c  | 31 +++++++++++++++----
 4 files changed, 75 insertions(+), 17 deletions(-)

-- 
2.50.0.714.g196bf9f422-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] panthor: set owner field for driver fops
  2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
  2025-06-23  6:16   ` Boris Brezillon
  2025-06-23  8:42   ` Steven Price
  2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

It allows us to get rid of manual try_module_get / module_put.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 1116f2d2826ee..775a66c394544 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
 	struct panthor_file *pfile;
 	int ret;
 
-	if (!try_module_get(THIS_MODULE))
-		return -EINVAL;
-
 	pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
-	if (!pfile) {
-		ret = -ENOMEM;
-		goto err_put_mod;
-	}
+	if (!pfile)
+		return -ENOMEM;
 
 	pfile->ptdev = ptdev;
 	pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
@@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
 
 err_free_file:
 	kfree(pfile);
-
-err_put_mod:
-	module_put(THIS_MODULE);
 	return ret;
 }
 
@@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
 	panthor_vm_pool_destroy(pfile);
 
 	kfree(pfile);
-	module_put(THIS_MODULE);
 }
 
 static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
@@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 }
 
 static const struct file_operations panthor_drm_driver_fops = {
+	.owner = THIS_MODULE,
 	.open = drm_open,
 	.release = drm_release,
 	.unlocked_ioctl = drm_ioctl,
-- 
2.50.0.714.g196bf9f422-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] panthor: save panthor_file in panthor_group
  2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
  2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
  2025-06-23  6:21   ` Boris Brezillon
  2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
  2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu
  3 siblings, 1 reply; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

We would like to access panthor_file from panthor_group on gpu errors.
Because panthour_group can outlive drm_file, add refcount to
panthor_file to ensure its lifetime.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
 drivers/gpu/drm/panthor/panthor_drv.c    | 15 ++++++++++++++-
 drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
 drivers/gpu/drm/panthor/panthor_sched.c  |  6 ++++++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4fc7cf2aeed57..75ae6fd3a5128 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -256,8 +256,24 @@ struct panthor_file {
 
 	/** @stats: cycle and timestamp measures for job execution. */
 	struct panthor_gpu_usage stats;
+
+	/** @refcount: ref count of this file */
+	struct kref refcount;
 };
 
+static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
+{
+	kref_get(&pfile->refcount);
+	return pfile;
+}
+
+void panthor_file_release(struct kref *kref);
+
+static inline void panthor_file_put(struct panthor_file *pfile)
+{
+	kref_put(&pfile->refcount, panthor_file_release);
+}
+
 int panthor_device_init(struct panthor_device *ptdev);
 void panthor_device_unplug(struct panthor_device *ptdev);
 
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 775a66c394544..aea9609684b77 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
 	return 0;
 }
 
+void panthor_file_release(struct kref *kref)
+{
+	struct panthor_file *pfile =
+		container_of(kref, struct panthor_file, refcount);
+
+	WARN_ON(pfile->vms || pfile->groups);
+
+	kfree(pfile);
+}
+
 static int
 panthor_open(struct drm_device *ddev, struct drm_file *file)
 {
@@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
 	if (ret)
 		goto err_destroy_vm_pool;
 
+	kref_init(&pfile->refcount);
+
 	file->driver_priv = pfile;
 	return 0;
 
@@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
 {
 	struct panthor_file *pfile = file->driver_priv;
 
+	/* destroy vm and group handles now to avoid circular references */
 	panthor_group_pool_destroy(pfile);
 	panthor_vm_pool_destroy(pfile);
 
-	kfree(pfile);
+	panthor_file_put(pfile);
 }
 
 static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index b39ea6acc6a96..ccbcfe11420ac 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
 
 	xa_destroy(&pfile->vms->xa);
 	kfree(pfile->vms);
+	pfile->vms = NULL;
 }
 
 /**
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index a2248f692a030..485072904cd7d 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -535,6 +535,9 @@ struct panthor_group {
 	/** @ptdev: Device. */
 	struct panthor_device *ptdev;
 
+	/** @pfile: File this group is created from. */
+	struct panthor_file *pfile;
+
 	/** @vm: VM bound to the group. */
 	struct panthor_vm *vm;
 
@@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
 	panthor_kernel_bo_destroy(group->syncobjs);
 
 	panthor_vm_put(group->vm);
+	panthor_file_put(group->pfile);
 	kfree(group);
 }
 
@@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
 	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
 	INIT_WORK(&group->release_work, group_release_work);
 
+	group->pfile = panthor_file_get(pfile);
+
 	group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
 	if (!group->vm) {
 		ret = -EINVAL;
-- 
2.50.0.714.g196bf9f422-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] panthor: save task pid and comm in panthor_file
  2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
  2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
  2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
  2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu
  3 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

We would like to report them on gpu errors.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++
 drivers/gpu/drm/panthor/panthor_drv.c    | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 75ae6fd3a5128..8c31c1d4296b6 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -257,6 +257,12 @@ struct panthor_file {
 	/** @stats: cycle and timestamp measures for job execution. */
 	struct panthor_gpu_usage stats;
 
+	/** @pid: pid of the task created this file */
+	pid_t pid;
+
+	/** @comm: comm of the task created this file */
+	char *comm;
+
 	/** @refcount: ref count of this file */
 	struct kref refcount;
 };
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index aea9609684b77..b9d86b86591db 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1400,6 +1400,7 @@ void panthor_file_release(struct kref *kref)
 
 	WARN_ON(pfile->vms || pfile->groups);
 
+	kfree(pfile->comm);
 	kfree(pfile);
 }
 
@@ -1408,6 +1409,7 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
 {
 	struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
 	struct panthor_file *pfile;
+	struct task_struct *task;
 	int ret;
 
 	pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
@@ -1436,6 +1438,13 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
 	if (ret)
 		goto err_destroy_vm_pool;
 
+	task = get_pid_task(rcu_access_pointer(file->pid), PIDTYPE_PID);
+	if (task) {
+		pfile->pid = task->pid;
+		pfile->comm = kstrdup(task->comm, GFP_KERNEL);
+		put_task_struct(task);
+	}
+
 	kref_init(&pfile->refcount);
 
 	file->driver_priv = pfile;
-- 
2.50.0.714.g196bf9f422-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] panthor: dump task pid and comm on gpu errors
  2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
                   ` (2 preceding siblings ...)
  2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
@ 2025-06-20 23:50 ` Chia-I Wu
  3 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-06-20 23:50 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

It is useful to know which tasks cause gpu errors.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 485072904cd7d..f44cf95e8f1d1 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1359,8 +1359,12 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
 	fatal = cs_iface->output->fatal;
 	info = cs_iface->output->fatal_info;
 
-	if (group)
+	if (group) {
+		drm_warn(&ptdev->base, "CS_FATAL: pid=%d, comm=%s\n",
+			 group->pfile->pid, group->pfile->comm);
+
 		group->fatal_queues |= BIT(cs_id);
+	}
 
 	if (CS_EXCEPTION_TYPE(fatal) == DRM_PANTHOR_EXCEPTION_CS_UNRECOVERABLE) {
 		/* If this exception is unrecoverable, queue a reset, and make
@@ -1420,6 +1424,11 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 		spin_unlock(&queue->fence_ctx.lock);
 	}
 
+	if (group) {
+		drm_warn(&ptdev->base, "CS_FAULT: pid=%d, comm=%s\n",
+			 group->pfile->pid, group->pfile->comm);
+	}
+
 	drm_warn(&ptdev->base,
 		 "CSG slot %d CS slot: %d\n"
 		 "CS_FAULT.EXCEPTION_TYPE: 0x%x (%s)\n"
@@ -1636,11 +1645,16 @@ csg_slot_process_progress_timer_event_locked(struct panthor_device *ptdev, u32 c
 
 	lockdep_assert_held(&sched->lock);
 
-	drm_warn(&ptdev->base, "CSG slot %d progress timeout\n", csg_id);
-
 	group = csg_slot->group;
-	if (!drm_WARN_ON(&ptdev->base, !group))
+	if (!drm_WARN_ON(&ptdev->base, !group)) {
+		drm_warn(&ptdev->base,
+			 "CSG_PROGRESS_TIMER_EVENT: pid=%d, comm=%s\n",
+			 group->pfile->pid, group->pfile->comm);
+
 		group->timedout = true;
+	}
+
+	drm_warn(&ptdev->base, "CSG slot %d progress timeout\n", csg_id);
 
 	sched_queue_delayed_work(sched, tick, 0);
 }
@@ -3222,7 +3236,8 @@ queue_timedout_job(struct drm_sched_job *sched_job)
 	struct panthor_scheduler *sched = ptdev->scheduler;
 	struct panthor_queue *queue = group->queues[job->queue_idx];
 
-	drm_warn(&ptdev->base, "job timeout\n");
+	drm_warn(&ptdev->base, "job timeout: pid=%d, comm=%s, seqno=%llu\n",
+		 group->pfile->pid, group->pfile->comm, job->done_fence->seqno);
 
 	drm_WARN_ON(&ptdev->base, atomic_read(&sched->reset.in_progress));
 
-- 
2.50.0.714.g196bf9f422-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] panthor: set owner field for driver fops
  2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
@ 2025-06-23  6:16   ` Boris Brezillon
  2025-06-23  8:42   ` Steven Price
  1 sibling, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2025-06-23  6:16 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Fri, 20 Jun 2025 16:50:50 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> It allows us to get rid of manual try_module_get / module_put.
> 
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 1116f2d2826ee..775a66c394544 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>  	struct panthor_file *pfile;
>  	int ret;
>  
> -	if (!try_module_get(THIS_MODULE))
> -		return -EINVAL;
> -
>  	pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
> -	if (!pfile) {
> -		ret = -ENOMEM;
> -		goto err_put_mod;
> -	}
> +	if (!pfile)
> +		return -ENOMEM;
>  
>  	pfile->ptdev = ptdev;
>  	pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
> @@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>  
>  err_free_file:
>  	kfree(pfile);
> -
> -err_put_mod:
> -	module_put(THIS_MODULE);
>  	return ret;
>  }
>  
> @@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
>  	panthor_vm_pool_destroy(pfile);
>  
>  	kfree(pfile);
> -	module_put(THIS_MODULE);
>  }
>  
>  static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> @@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  }
>  
>  static const struct file_operations panthor_drm_driver_fops = {
> +	.owner = THIS_MODULE,
>  	.open = drm_open,
>  	.release = drm_release,
>  	.unlocked_ioctl = drm_ioctl,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
  2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
@ 2025-06-23  6:21   ` Boris Brezillon
  2025-06-23  9:07     ` Liviu Dudau
  0 siblings, 1 reply; 10+ messages in thread
From: Boris Brezillon @ 2025-06-23  6:21 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, dri-devel, linux-kernel

On Fri, 20 Jun 2025 16:50:51 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> We would like to access panthor_file from panthor_group on gpu errors.
> Because panthour_group can outlive drm_file, add refcount to
> panthor_file to ensure its lifetime.

I'm not a huge fan of refcounting panthor_file because people tend to
put resource they expect to be released when the last handle goes away,
and if we don't refcount these sub-resources they might live longer
than they are meant to. Also not a huge fan of the circular referencing
that exists between file and groups after this change.

How about we move the process info to a sub-object that's refcounted
and let both panthor_file and panthor_group take a ref on this object
instead?

> 
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
>  drivers/gpu/drm/panthor/panthor_drv.c    | 15 ++++++++++++++-
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
>  drivers/gpu/drm/panthor/panthor_sched.c  |  6 ++++++
>  4 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4fc7cf2aeed57..75ae6fd3a5128 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -256,8 +256,24 @@ struct panthor_file {
>  
>  	/** @stats: cycle and timestamp measures for job execution. */
>  	struct panthor_gpu_usage stats;
> +
> +	/** @refcount: ref count of this file */
> +	struct kref refcount;
>  };
>  
> +static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
> +{
> +	kref_get(&pfile->refcount);
> +	return pfile;
> +}
> +
> +void panthor_file_release(struct kref *kref);
> +
> +static inline void panthor_file_put(struct panthor_file *pfile)
> +{
> +	kref_put(&pfile->refcount, panthor_file_release);
> +}
> +
>  int panthor_device_init(struct panthor_device *ptdev);
>  void panthor_device_unplug(struct panthor_device *ptdev);
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 775a66c394544..aea9609684b77 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
>  	return 0;
>  }
>  
> +void panthor_file_release(struct kref *kref)
> +{
> +	struct panthor_file *pfile =
> +		container_of(kref, struct panthor_file, refcount);
> +
> +	WARN_ON(pfile->vms || pfile->groups);
> +
> +	kfree(pfile);
> +}
> +
>  static int
>  panthor_open(struct drm_device *ddev, struct drm_file *file)
>  {
> @@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>  	if (ret)
>  		goto err_destroy_vm_pool;
>  
> +	kref_init(&pfile->refcount);
> +
>  	file->driver_priv = pfile;
>  	return 0;
>  
> @@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
>  {
>  	struct panthor_file *pfile = file->driver_priv;
>  
> +	/* destroy vm and group handles now to avoid circular references */
>  	panthor_group_pool_destroy(pfile);
>  	panthor_vm_pool_destroy(pfile);
>  
> -	kfree(pfile);
> +	panthor_file_put(pfile);
>  }
>  
>  static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index b39ea6acc6a96..ccbcfe11420ac 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
>  
>  	xa_destroy(&pfile->vms->xa);
>  	kfree(pfile->vms);
> +	pfile->vms = NULL;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index a2248f692a030..485072904cd7d 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -535,6 +535,9 @@ struct panthor_group {
>  	/** @ptdev: Device. */
>  	struct panthor_device *ptdev;
>  
> +	/** @pfile: File this group is created from. */
> +	struct panthor_file *pfile;
> +
>  	/** @vm: VM bound to the group. */
>  	struct panthor_vm *vm;
>  
> @@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
>  	panthor_kernel_bo_destroy(group->syncobjs);
>  
>  	panthor_vm_put(group->vm);
> +	panthor_file_put(group->pfile);
>  	kfree(group);
>  }
>  
> @@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
>  	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
>  	INIT_WORK(&group->release_work, group_release_work);
>  
> +	group->pfile = panthor_file_get(pfile);
> +
>  	group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
>  	if (!group->vm) {
>  		ret = -EINVAL;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] panthor: set owner field for driver fops
  2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
  2025-06-23  6:16   ` Boris Brezillon
@ 2025-06-23  8:42   ` Steven Price
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Price @ 2025-06-23  8:42 UTC (permalink / raw)
  To: Chia-I Wu, Boris Brezillon, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

On 21/06/2025 00:50, Chia-I Wu wrote:
> It allows us to get rid of manual try_module_get / module_put.
> 
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_drv.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 1116f2d2826ee..775a66c394544 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1400,14 +1400,9 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>  	struct panthor_file *pfile;
>  	int ret;
>  
> -	if (!try_module_get(THIS_MODULE))
> -		return -EINVAL;
> -
>  	pfile = kzalloc(sizeof(*pfile), GFP_KERNEL);
> -	if (!pfile) {
> -		ret = -ENOMEM;
> -		goto err_put_mod;
> -	}
> +	if (!pfile)
> +		return -ENOMEM;
>  
>  	pfile->ptdev = ptdev;
>  	pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET;
> @@ -1439,9 +1434,6 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>  
>  err_free_file:
>  	kfree(pfile);
> -
> -err_put_mod:
> -	module_put(THIS_MODULE);
>  	return ret;
>  }
>  
> @@ -1454,7 +1446,6 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
>  	panthor_vm_pool_destroy(pfile);
>  
>  	kfree(pfile);
> -	module_put(THIS_MODULE);
>  }
>  
>  static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> @@ -1555,6 +1546,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  }
>  
>  static const struct file_operations panthor_drm_driver_fops = {
> +	.owner = THIS_MODULE,
>  	.open = drm_open,
>  	.release = drm_release,
>  	.unlocked_ioctl = drm_ioctl,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
  2025-06-23  6:21   ` Boris Brezillon
@ 2025-06-23  9:07     ` Liviu Dudau
  2025-07-13  3:12       ` Chia-I Wu
  0 siblings, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2025-06-23  9:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Chia-I Wu, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, dri-devel, linux-kernel

On Mon, Jun 23, 2025 at 08:21:22AM +0200, Boris Brezillon wrote:
> On Fri, 20 Jun 2025 16:50:51 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
> 
> > We would like to access panthor_file from panthor_group on gpu errors.
> > Because panthour_group can outlive drm_file, add refcount to
> > panthor_file to ensure its lifetime.
> 
> I'm not a huge fan of refcounting panthor_file because people tend to
> put resource they expect to be released when the last handle goes away,
> and if we don't refcount these sub-resources they might live longer
> than they are meant to. Also not a huge fan of the circular referencing
> that exists between file and groups after this change.
> 
> How about we move the process info to a sub-object that's refcounted
> and let both panthor_file and panthor_group take a ref on this object
> instead?

I agree with Boris on this. One alternative is to put the pid and comm in
the panthor_group struct as panthor_file makes no use of the fields.

Best regards,
Liviu

> 
> > 
> > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h | 16 ++++++++++++++++
> >  drivers/gpu/drm/panthor/panthor_drv.c    | 15 ++++++++++++++-
> >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> >  drivers/gpu/drm/panthor/panthor_sched.c  |  6 ++++++
> >  4 files changed, 37 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 4fc7cf2aeed57..75ae6fd3a5128 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -256,8 +256,24 @@ struct panthor_file {
> >  
> >  	/** @stats: cycle and timestamp measures for job execution. */
> >  	struct panthor_gpu_usage stats;
> > +
> > +	/** @refcount: ref count of this file */
> > +	struct kref refcount;
> >  };
> >  
> > +static inline struct panthor_file *panthor_file_get(struct panthor_file *pfile)
> > +{
> > +	kref_get(&pfile->refcount);
> > +	return pfile;
> > +}
> > +
> > +void panthor_file_release(struct kref *kref);
> > +
> > +static inline void panthor_file_put(struct panthor_file *pfile)
> > +{
> > +	kref_put(&pfile->refcount, panthor_file_release);
> > +}
> > +
> >  int panthor_device_init(struct panthor_device *ptdev);
> >  void panthor_device_unplug(struct panthor_device *ptdev);
> >  
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> > index 775a66c394544..aea9609684b77 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -1393,6 +1393,16 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
> >  	return 0;
> >  }
> >  
> > +void panthor_file_release(struct kref *kref)
> > +{
> > +	struct panthor_file *pfile =
> > +		container_of(kref, struct panthor_file, refcount);
> > +
> > +	WARN_ON(pfile->vms || pfile->groups);
> > +
> > +	kfree(pfile);
> > +}
> > +
> >  static int
> >  panthor_open(struct drm_device *ddev, struct drm_file *file)
> >  {
> > @@ -1426,6 +1436,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> >  	if (ret)
> >  		goto err_destroy_vm_pool;
> >  
> > +	kref_init(&pfile->refcount);
> > +
> >  	file->driver_priv = pfile;
> >  	return 0;
> >  
> > @@ -1442,10 +1454,11 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file)
> >  {
> >  	struct panthor_file *pfile = file->driver_priv;
> >  
> > +	/* destroy vm and group handles now to avoid circular references */
> >  	panthor_group_pool_destroy(pfile);
> >  	panthor_vm_pool_destroy(pfile);
> >  
> > -	kfree(pfile);
> > +	panthor_file_put(pfile);
> >  }
> >  
> >  static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index b39ea6acc6a96..ccbcfe11420ac 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -1604,6 +1604,7 @@ void panthor_vm_pool_destroy(struct panthor_file *pfile)
> >  
> >  	xa_destroy(&pfile->vms->xa);
> >  	kfree(pfile->vms);
> > +	pfile->vms = NULL;
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index a2248f692a030..485072904cd7d 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -535,6 +535,9 @@ struct panthor_group {
> >  	/** @ptdev: Device. */
> >  	struct panthor_device *ptdev;
> >  
> > +	/** @pfile: File this group is created from. */
> > +	struct panthor_file *pfile;
> > +
> >  	/** @vm: VM bound to the group. */
> >  	struct panthor_vm *vm;
> >  
> > @@ -919,6 +922,7 @@ static void group_release_work(struct work_struct *work)
> >  	panthor_kernel_bo_destroy(group->syncobjs);
> >  
> >  	panthor_vm_put(group->vm);
> > +	panthor_file_put(group->pfile);
> >  	kfree(group);
> >  }
> >  
> > @@ -3467,6 +3471,8 @@ int panthor_group_create(struct panthor_file *pfile,
> >  	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
> >  	INIT_WORK(&group->release_work, group_release_work);
> >  
> > +	group->pfile = panthor_file_get(pfile);
> > +
> >  	group->vm = panthor_vm_pool_get_vm(pfile->vms, group_args->vm_id);
> >  	if (!group->vm) {
> >  		ret = -EINVAL;
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] panthor: save panthor_file in panthor_group
  2025-06-23  9:07     ` Liviu Dudau
@ 2025-07-13  3:12       ` Chia-I Wu
  0 siblings, 0 replies; 10+ messages in thread
From: Chia-I Wu @ 2025-07-13  3:12 UTC (permalink / raw)
  To: Liviu Dudau
  Cc: Boris Brezillon, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, dri-devel, linux-kernel

Hi,

On Mon, Jun 23, 2025 at 2:07 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
>
> On Mon, Jun 23, 2025 at 08:21:22AM +0200, Boris Brezillon wrote:
> > On Fri, 20 Jun 2025 16:50:51 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > > We would like to access panthor_file from panthor_group on gpu errors.
> > > Because panthour_group can outlive drm_file, add refcount to
> > > panthor_file to ensure its lifetime.
> >
> > I'm not a huge fan of refcounting panthor_file because people tend to
> > put resource they expect to be released when the last handle goes away,
> > and if we don't refcount these sub-resources they might live longer
> > than they are meant to. Also not a huge fan of the circular referencing
> > that exists between file and groups after this change.
> >
> > How about we move the process info to a sub-object that's refcounted
> > and let both panthor_file and panthor_group take a ref on this object
> > instead?
>
> I agree with Boris on this. One alternative is to put the pid and comm in
> the panthor_group struct as panthor_file makes no use of the fields.
I took this suggestion in v2 because, when the task that opened the
node differs from the task that created the group, we are more
interested in the latter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-13  3:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 23:50 [PATCH 0/4] panthor: print task pid and comm on gpu errors Chia-I Wu
2025-06-20 23:50 ` [PATCH 1/4] panthor: set owner field for driver fops Chia-I Wu
2025-06-23  6:16   ` Boris Brezillon
2025-06-23  8:42   ` Steven Price
2025-06-20 23:50 ` [PATCH 2/4] panthor: save panthor_file in panthor_group Chia-I Wu
2025-06-23  6:21   ` Boris Brezillon
2025-06-23  9:07     ` Liviu Dudau
2025-07-13  3:12       ` Chia-I Wu
2025-06-20 23:50 ` [PATCH 3/4] panthor: save task pid and comm in panthor_file Chia-I Wu
2025-06-20 23:50 ` [PATCH 4/4] panthor: dump task pid and comm on gpu errors Chia-I Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).