All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] drm/panthor: Fix IO-page mmap() for 32-bit userspace on 64-bit kernel
@ 2024-03-25 13:57 Boris Brezillon
  2024-03-25 13:57 ` [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend() Boris Brezillon
  2024-03-25 13:57 ` [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path Boris Brezillon
  0 siblings, 2 replies; 10+ messages in thread
From: Boris Brezillon @ 2024-03-25 13:57 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe
  Cc: dri-devel, kernel, Lukas F . Hartmann

When mapping an IO region, the pseudo-file offset is dependent on the
userspace architecture. panthor_device_mmio_offset() abstracts that
away for us by turning a userspace MMIO offset into its kernel
equivalent, but we were not updating vm_area_struct::vm_pgoff
accordingly, leading us to attach the MMIO region to the wrong file
offset.

This has implications when we start mixing 64 bit and 32 bit apps, but
that's only really a problem when we start having more that 2^43 bytes of
memory allocated, which is very unlikely to happen.

What's more problematic is the fact this turns our
unmap_mapping_range(DRM_PANTHOR_USER_MMIO_OFFSET) calls, which are
supposed to kill the MMIO mapping when entering suspend, into NOPs.
Which means we either keep the dummy flush_id mapping active at all
times, or we risk a BUS_FAULT if the MMIO region was mapped, and the
GPU is suspended after that.

Solve that by patching vm_pgoff early in panthor_mmap(). With
this in place, we no longer need the panthor_device_mmio_offset()
helper.

v2:
- Kill panthor_device_mmio_offset()

Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
Reported-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10835
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_device.c |  8 ++++----
 drivers/gpu/drm/panthor/panthor_device.h | 24 ------------------------
 drivers/gpu/drm/panthor/panthor_drv.c    | 17 ++++++++++++++++-
 3 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index bfe8da4a6e4c..3ddc4ba0acbe 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -334,7 +334,7 @@ static vm_fault_t panthor_mmio_vm_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct panthor_device *ptdev = vma->vm_private_data;
-	u64 id = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	u64 offset = (u64)vma->vm_pgoff << PAGE_SHIFT;
 	unsigned long pfn;
 	pgprot_t pgprot;
 	vm_fault_t ret;
@@ -347,7 +347,7 @@ static vm_fault_t panthor_mmio_vm_fault(struct vm_fault *vmf)
 	mutex_lock(&ptdev->pm.mmio_lock);
 	active = atomic_read(&ptdev->pm.state) == PANTHOR_DEVICE_PM_STATE_ACTIVE;
 
-	switch (panthor_device_mmio_offset(id)) {
+	switch (offset) {
 	case DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET:
 		if (active)
 			pfn = __phys_to_pfn(ptdev->phys_addr + CSF_GPU_LATEST_FLUSH_ID);
@@ -378,9 +378,9 @@ static const struct vm_operations_struct panthor_mmio_vm_ops = {
 
 int panthor_device_mmap_io(struct panthor_device *ptdev, struct vm_area_struct *vma)
 {
-	u64 id = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	u64 offset = (u64)vma->vm_pgoff << PAGE_SHIFT;
 
-	switch (panthor_device_mmio_offset(id)) {
+	switch (offset) {
 	case DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET:
 		if (vma->vm_end - vma->vm_start != PAGE_SIZE ||
 		    (vma->vm_flags & (VM_WRITE | VM_EXEC)))
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 51c9d61b6796..7ee4987a3796 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -365,30 +365,6 @@ static int panthor_request_ ## __name ## _irq(struct panthor_device *ptdev,			\
 					 pirq);							\
 }
 
-/**
- * panthor_device_mmio_offset() - Turn a user MMIO offset into a kernel one
- * @offset: Offset to convert.
- *
- * With 32-bit systems being limited by the 32-bit representation of mmap2's
- * pgoffset field, we need to make the MMIO offset arch specific. This function
- * converts a user MMIO offset into something the kernel driver understands.
- *
- * If the kernel and userspace architecture match, the offset is unchanged. If
- * the kernel is 64-bit and userspace is 32-bit, the offset is adjusted to match
- * 64-bit offsets. 32-bit kernel with 64-bit userspace is impossible.
- *
- * Return: Adjusted offset.
- */
-static inline u64 panthor_device_mmio_offset(u64 offset)
-{
-#ifdef CONFIG_ARM64
-	if (test_tsk_thread_flag(current, TIF_32BIT))
-		offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT;
-#endif
-
-	return offset;
-}
-
 extern struct workqueue_struct *panthor_cleanup_wq;
 
 #endif
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index ff484506229f..0097a01d0fc7 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1327,7 +1327,22 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
 	if (!drm_dev_enter(file->minor->dev, &cookie))
 		return -ENODEV;
 
-	if (panthor_device_mmio_offset(offset) >= DRM_PANTHOR_USER_MMIO_OFFSET)
+#ifdef CONFIG_ARM64
+	/*
+	 * With 32-bit systems being limited by the 32-bit representation of
+	 * mmap2's pgoffset field, we need to make the MMIO offset arch
+	 * specific. This converts a user MMIO offset into something the kernel
+	 * driver understands.
+	 */
+	if (test_tsk_thread_flag(current, TIF_32BIT) &&
+	    offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) {
+		offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT -
+			  DRM_PANTHOR_USER_MMIO_OFFSET_32BIT;
+		vma->vm_pgoff = offset >> PAGE_SHIFT;
+	}
+#endif
+
+	if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET)
 		ret = panthor_device_mmap_io(ptdev, vma);
 	else
 		ret = drm_gem_mmap(filp, vma);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-25 13:57 [PATCH v2 1/3] drm/panthor: Fix IO-page mmap() for 32-bit userspace on 64-bit kernel Boris Brezillon
@ 2024-03-25 13:57 ` Boris Brezillon
  2024-03-25 15:23   ` Steven Price
  2024-03-25 17:16   ` Liviu Dudau
  2024-03-25 13:57 ` [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path Boris Brezillon
  1 sibling, 2 replies; 10+ messages in thread
From: Boris Brezillon @ 2024-03-25 13:57 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe
  Cc: dri-devel, kernel

Make sure we set suspended=true last to avoid generating an irq storm
in the unlikely case where an IRQ happens between the suspended=true
assignment and the _INT_MASK update.

v2:
- New patch

Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 7ee4987a3796..3a930a368ae1 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
 {												\
 	int cookie;										\
 												\
-	atomic_set(&pirq->suspended, true);							\
+	pirq->mask = 0;										\
 												\
 	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
 		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\
@@ -333,7 +333,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
 		drm_dev_exit(cookie);								\
 	}											\
 												\
-	pirq->mask = 0;										\
+	atomic_set(&pirq->suspended, true);							\
 }												\
 												\
 static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq, u32 mask)	\
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path
  2024-03-25 13:57 [PATCH v2 1/3] drm/panthor: Fix IO-page mmap() for 32-bit userspace on 64-bit kernel Boris Brezillon
  2024-03-25 13:57 ` [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend() Boris Brezillon
@ 2024-03-25 13:57 ` Boris Brezillon
  2024-03-25 17:24   ` Liviu Dudau
  1 sibling, 1 reply; 10+ messages in thread
From: Boris Brezillon @ 2024-03-25 13:57 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe
  Cc: dri-devel, kernel

panthor_xxx_irq_suspend() doesn't mask the interrupts if drm_dev_unplug()
has been called, which is always the case when our panthor_xxx_unplug()
helpers are called. Fix that by introducing a panthor_xxx_unplug() helper
that does what panthor_xxx_irq_suspend() except it does it
unconditionally.

v2:
- Add Steve's R-b

Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
Found inadvertently while debugging another issue. I guess I managed to
call rmmod during a PING and that led to the FW interrupt handler
being executed after the device suspend happened.
---
 drivers/gpu/drm/panthor/panthor_device.h | 8 ++++++++
 drivers/gpu/drm/panthor/panthor_fw.c     | 2 +-
 drivers/gpu/drm/panthor/panthor_gpu.c    | 2 +-
 drivers/gpu/drm/panthor/panthor_mmu.c    | 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 3a930a368ae1..5634e9490c7f 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -321,6 +321,14 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
 	return ret;										\
 }												\
 												\
+static inline void panthor_ ## __name ## _irq_unplug(struct panthor_irq *pirq)			\
+{												\
+	pirq->mask = 0;										\
+	gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);					\
+	synchronize_irq(pirq->irq);								\
+	atomic_set(&pirq->suspended, true);							\
+}												\
+												\
 static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
 {												\
 	int cookie;										\
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 33c87a59834e..7a9710a38c5f 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1128,7 +1128,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
 
 	/* Make sure the IRQ handler can be called after that point. */
 	if (ptdev->fw->irq.irq)
-		panthor_job_irq_suspend(&ptdev->fw->irq);
+		panthor_job_irq_unplug(&ptdev->fw->irq);
 
 	panthor_fw_stop(ptdev);
 
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 6dbbc4cfbe7e..b84c5b650fd9 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -174,7 +174,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
 	unsigned long flags;
 
 	/* Make sure the IRQ handler is not running after that point. */
-	panthor_gpu_irq_suspend(&ptdev->gpu->irq);
+	panthor_gpu_irq_unplug(&ptdev->gpu->irq);
 
 	/* Wake-up all waiters. */
 	spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index fdd35249169f..1f333cdded0f 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -2622,7 +2622,7 @@ int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm
  */
 void panthor_mmu_unplug(struct panthor_device *ptdev)
 {
-	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+	panthor_mmu_irq_unplug(&ptdev->mmu->irq);
 
 	mutex_lock(&ptdev->mmu->as.slots_lock);
 	for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-25 13:57 ` [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend() Boris Brezillon
@ 2024-03-25 15:23   ` Steven Price
  2024-03-25 17:16   ` Liviu Dudau
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Price @ 2024-03-25 15:23 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau, Adrián Larumbe; +Cc: dri-devel, kernel

On 25/03/2024 13:57, Boris Brezillon wrote:
> Make sure we set suspended=true last to avoid generating an irq storm
> in the unlikely case where an IRQ happens between the suspended=true
> assignment and the _INT_MASK update.
> 
> v2:
> - New patch
> 
> Reported-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

Thanks!

Steve

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 7ee4987a3796..3a930a368ae1 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
>  {												\
>  	int cookie;										\
>  												\
> -	atomic_set(&pirq->suspended, true);							\
> +	pirq->mask = 0;										\
>  												\
>  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
>  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\
> @@ -333,7 +333,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
>  		drm_dev_exit(cookie);								\
>  	}											\
>  												\
> -	pirq->mask = 0;										\
> +	atomic_set(&pirq->suspended, true);							\
>  }												\
>  												\
>  static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq, u32 mask)	\


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-25 13:57 ` [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend() Boris Brezillon
  2024-03-25 15:23   ` Steven Price
@ 2024-03-25 17:16   ` Liviu Dudau
  2024-03-25 18:02     ` Boris Brezillon
  1 sibling, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2024-03-25 17:16 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:
> Make sure we set suspended=true last to avoid generating an irq storm
> in the unlikely case where an IRQ happens between the suspended=true
> assignment and the _INT_MASK update.
> 
> v2:
> - New patch
> 
> Reported-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 7ee4987a3796..3a930a368ae1 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
>  {												\
>  	int cookie;										\
>  												\
> -	atomic_set(&pirq->suspended, true);							\
> +	pirq->mask = 0;										\

I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
be masked (kind of the opposite problem to patch 3/3).

I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
with in the other functions.

>  												\
>  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
>  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\

If you move the line above before the if condition, do you still need patch 3/3?

Best regards,
Liviu

> @@ -333,7 +333,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
>  		drm_dev_exit(cookie);								\
>  	}											\
>  												\
> -	pirq->mask = 0;										\
> +	atomic_set(&pirq->suspended, true);							\
>  }												\
>  												\
>  static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq, u32 mask)	\
> -- 
> 2.44.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path
  2024-03-25 13:57 ` [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path Boris Brezillon
@ 2024-03-25 17:24   ` Liviu Dudau
  2024-03-25 18:05     ` Boris Brezillon
  0 siblings, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2024-03-25 17:24 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Mon, Mar 25, 2024 at 02:57:05PM +0100, Boris Brezillon wrote:
> panthor_xxx_irq_suspend() doesn't mask the interrupts if drm_dev_unplug()
> has been called, which is always the case when our panthor_xxx_unplug()
> helpers are called. Fix that by introducing a panthor_xxx_unplug() helper
> that does what panthor_xxx_irq_suspend() except it does it
> unconditionally.

I understand that drm_dev_unplug() messes up with the cleanup, but I'm a bit
reluctant to see a function that completely ignores if the device has been
unplugged or not. Like mentioned on the review of 2/3, can we move the masking
of the interrupts outside the critical section and not add drm_dev_unplug() ?


> 
> v2:
> - Add Steve's R-b
> 
> Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> Found inadvertently while debugging another issue. I guess I managed to
> call rmmod during a PING and that led to the FW interrupt handler
> being executed after the device suspend happened.
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 8 ++++++++
>  drivers/gpu/drm/panthor/panthor_fw.c     | 2 +-
>  drivers/gpu/drm/panthor/panthor_gpu.c    | 2 +-
>  drivers/gpu/drm/panthor/panthor_mmu.c    | 2 +-
>  4 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 3a930a368ae1..5634e9490c7f 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -321,6 +321,14 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  	return ret;										\
>  }												\
>  												\
> +static inline void panthor_ ## __name ## _irq_unplug(struct panthor_irq *pirq)			\
> +{												\
> +	pirq->mask = 0;										\
> +	gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);					\
> +	synchronize_irq(pirq->irq);								\
> +	atomic_set(&pirq->suspended, true);							\
> +}												\
> +												\
>  static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
>  {												\
>  	int cookie;										\
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 33c87a59834e..7a9710a38c5f 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1128,7 +1128,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>  
>  	/* Make sure the IRQ handler can be called after that point. */

While reviewing this I've spotted that the comment needs updating: "... handler *can't* be called ..."

Best regards,
Liviu

>  	if (ptdev->fw->irq.irq)
> -		panthor_job_irq_suspend(&ptdev->fw->irq);
> +		panthor_job_irq_unplug(&ptdev->fw->irq);
>  
>  	panthor_fw_stop(ptdev);
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index 6dbbc4cfbe7e..b84c5b650fd9 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -174,7 +174,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
>  	unsigned long flags;
>  
>  	/* Make sure the IRQ handler is not running after that point. */
> -	panthor_gpu_irq_suspend(&ptdev->gpu->irq);
> +	panthor_gpu_irq_unplug(&ptdev->gpu->irq);
>  
>  	/* Wake-up all waiters. */
>  	spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index fdd35249169f..1f333cdded0f 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -2622,7 +2622,7 @@ int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm
>   */
>  void panthor_mmu_unplug(struct panthor_device *ptdev)
>  {
> -	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +	panthor_mmu_irq_unplug(&ptdev->mmu->irq);
>  
>  	mutex_lock(&ptdev->mmu->as.slots_lock);
>  	for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
> -- 
> 2.44.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-25 17:16   ` Liviu Dudau
@ 2024-03-25 18:02     ` Boris Brezillon
  2024-03-26  9:58       ` Liviu Dudau
  0 siblings, 1 reply; 10+ messages in thread
From: Boris Brezillon @ 2024-03-25 18:02 UTC (permalink / raw)
  To: Liviu Dudau; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Mon, 25 Mar 2024 17:16:16 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:
> > Make sure we set suspended=true last to avoid generating an irq storm
> > in the unlikely case where an IRQ happens between the suspended=true
> > assignment and the _INT_MASK update.
> > 
> > v2:
> > - New patch
> > 
> > Reported-by: Steven Price <steven.price@arm.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 7ee4987a3796..3a930a368ae1 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
> >  {												\
> >  	int cookie;										\
> >  												\
> > -	atomic_set(&pirq->suspended, true);							\
> > +	pirq->mask = 0;										\  
> 
> I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
> status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
> be masked (kind of the opposite problem to patch 3/3).

Right, but I'm trying to find a case where this is an issue. Yes, we
might lose events, but at the same time, when _irq_suspend() is called,
we are supposed to be idle, so all this mask=0 assignment does is
speed-up the synchronization with the irq-thread. If there's anything
we need to be done before suspending the IRQ, this should really use
its own synchronization model.

> 
> I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
> with in the other functions.

It kinda is, as we don't modify panthor_irq::mask outside the
suspend/resume (and now unplug) path, and each of these accesses has a
reason to exist:

- in the resume path, we know all IRQs are masked, and we reset the
  SW-side mask to the interrupts we want to accept before updating
  _INT_MASK. No risk of race in that one
- in the unplug path, I don't think we care about unhandled interrupts,
  because the device will become unusable after that point, so updating
  the panthor_irq::mask early and losing events should be okay.
- the suspend case has been described above. As explained, I don't think
  it matters if we lose events there, because really, if there's any
  synchronization needed, it should have happened explicitly before
  _irq_suspend() is called. The synchronize_irq() we have is just here
  to make sure there's nothing accessing registers when we turn the
  device clk/power-domain off.

> 
> >  												\
> >  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
> >  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\  
> 
> If you move the line above before the if condition, do you still need patch 3/3?

The whole point of the drm_dev_enter/exit() section was to prevent
access to registers after the device has been unplugged, so, if I move
the gpu_write() outside of this block, I'd rather drop the entire
drm_dev_enter/exit() section (both here and in _irq_resume()). That
should be safe actually, as I don't expect the PM hooks or the reset
handler to be called after the device and its resource have been
removed, and those are the two only paths where _irq_suspend/resume()
can be called.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path
  2024-03-25 17:24   ` Liviu Dudau
@ 2024-03-25 18:05     ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2024-03-25 18:05 UTC (permalink / raw)
  To: Liviu Dudau; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Mon, 25 Mar 2024 17:24:17 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Mon, Mar 25, 2024 at 02:57:05PM +0100, Boris Brezillon wrote:
> > panthor_xxx_irq_suspend() doesn't mask the interrupts if drm_dev_unplug()
> > has been called, which is always the case when our panthor_xxx_unplug()
> > helpers are called. Fix that by introducing a panthor_xxx_unplug() helper
> > that does what panthor_xxx_irq_suspend() except it does it
> > unconditionally.  
> 
> I understand that drm_dev_unplug() messes up with the cleanup, but I'm a bit
> reluctant to see a function that completely ignores if the device has been
> unplugged or not. Like mentioned on the review of 2/3, can we move the masking
> of the interrupts outside the critical section and not add drm_dev_unplug() ?

Nope, because the whole point of this drm_dev_enter/exit() section was
to prevent accesses to registers when the associated iomem range has
been returned to the system after the device has been removed. If you
move this gpu_write() outside of the drm_dev_enter/exit() you're better
off dropping this check entirely...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-25 18:02     ` Boris Brezillon
@ 2024-03-26  9:58       ` Liviu Dudau
  2024-03-26 10:25         ` Boris Brezillon
  0 siblings, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2024-03-26  9:58 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Mon, Mar 25, 2024 at 07:02:13PM +0100, Boris Brezillon wrote:
> On Mon, 25 Mar 2024 17:16:16 +0000
> Liviu Dudau <liviu.dudau@arm.com> wrote:
> 
> > On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:
> > > Make sure we set suspended=true last to avoid generating an irq storm
> > > in the unlikely case where an IRQ happens between the suspended=true
> > > assignment and the _INT_MASK update.
> > > 
> > > v2:
> > > - New patch
> > > 
> > > Reported-by: Steven Price <steven.price@arm.com>
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > index 7ee4987a3796..3a930a368ae1 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
> > >  {												\
> > >  	int cookie;										\
> > >  												\
> > > -	atomic_set(&pirq->suspended, true);							\
> > > +	pirq->mask = 0;										\  
> > 
> > I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
> > status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
> > be masked (kind of the opposite problem to patch 3/3).
> 
> Right, but I'm trying to find a case where this is an issue. Yes, we
> might lose events, but at the same time, when _irq_suspend() is called,
> we are supposed to be idle, so all this mask=0 assignment does is
> speed-up the synchronization with the irq-thread. If there's anything
> we need to be done before suspending the IRQ, this should really use
> its own synchronization model.

Perf counter collection before going idle ;)

I know that's not yet on your radar, but we need to keep it in mind to
test that scenario when we add support for keeping runtime PM alive
during perf counters dumping.

> 
> > 
> > I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
> > with in the other functions.
> 
> It kinda is, as we don't modify panthor_irq::mask outside the
> suspend/resume (and now unplug) path, and each of these accesses has a
> reason to exist:
> 
> - in the resume path, we know all IRQs are masked, and we reset the
>   SW-side mask to the interrupts we want to accept before updating
>   _INT_MASK. No risk of race in that one
> - in the unplug path, I don't think we care about unhandled interrupts,
>   because the device will become unusable after that point, so updating
>   the panthor_irq::mask early and losing events should be okay.
> - the suspend case has been described above. As explained, I don't think
>   it matters if we lose events there, because really, if there's any
>   synchronization needed, it should have happened explicitly before
>   _irq_suspend() is called. The synchronize_irq() we have is just here
>   to make sure there's nothing accessing registers when we turn the
>   device clk/power-domain off.
> 
> > 
> > >  												\
> > >  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
> > >  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\  
> > 
> > If you move the line above before the if condition, do you still need patch 3/3?
> 
> The whole point of the drm_dev_enter/exit() section was to prevent
> access to registers after the device has been unplugged, so, if I move
> the gpu_write() outside of this block, I'd rather drop the entire
> drm_dev_enter/exit() section (both here and in _irq_resume()). That
> should be safe actually, as I don't expect the PM hooks or the reset
> handler to be called after the device and its resource have been
> removed, and those are the two only paths where _irq_suspend/resume()
> can be called.

Agree. It feels strange to care a lot about preventing access to registers
(as we should) and then going and adding the unplug function which replaces
the suspend in 3 out of 7 cases. New contributors might get confused about
which one to use in the future.

What we should do is rename suspend to unplug and move the drm_dev_enter/exit()
condition at the calling point for the cases where it makes sense.

Best regards,
Liviu

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()
  2024-03-26  9:58       ` Liviu Dudau
@ 2024-03-26 10:25         ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2024-03-26 10:25 UTC (permalink / raw)
  To: Liviu Dudau; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel

On Tue, 26 Mar 2024 09:58:17 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Mon, Mar 25, 2024 at 07:02:13PM +0100, Boris Brezillon wrote:
> > On Mon, 25 Mar 2024 17:16:16 +0000
> > Liviu Dudau <liviu.dudau@arm.com> wrote:
> >   
> > > On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:  
> > > > Make sure we set suspended=true last to avoid generating an irq storm
> > > > in the unlikely case where an IRQ happens between the suspended=true
> > > > assignment and the _INT_MASK update.
> > > > 
> > > > v2:
> > > > - New patch
> > > > 
> > > > Reported-by: Steven Price <steven.price@arm.com>
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > ---
> > > >  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > > index 7ee4987a3796..3a930a368ae1 100644
> > > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > > @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
> > > >  {												\
> > > >  	int cookie;										\
> > > >  												\
> > > > -	atomic_set(&pirq->suspended, true);							\
> > > > +	pirq->mask = 0;										\    
> > > 
> > > I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
> > > status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
> > > be masked (kind of the opposite problem to patch 3/3).  
> > 
> > Right, but I'm trying to find a case where this is an issue. Yes, we
> > might lose events, but at the same time, when _irq_suspend() is called,
> > we are supposed to be idle, so all this mask=0 assignment does is
> > speed-up the synchronization with the irq-thread. If there's anything
> > we need to be done before suspending the IRQ, this should really use
> > its own synchronization model.  
> 
> Perf counter collection before going idle ;)

That's typically a case of synchronization that should be done
explicitly rather than relying on the synchronize_irq() we have in
_irq_suspend().

> 
> I know that's not yet on your radar, but we need to keep it in mind to
> test that scenario when we add support for keeping runtime PM alive
> during perf counters dumping.

It's definitely on our radar, but I don't think _irq_suspend() is the
right way to handle perfcnt dump synchronization. IMHO, it deserves its
own _perfcnt_suspend() function that's called before _fw_suspend() to
make sure any in-flight perfcnt dumps get flushed before the
JOB/FW irq suspend happens.

> 
> >   
> > > 
> > > I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
> > > with in the other functions.  
> > 
> > It kinda is, as we don't modify panthor_irq::mask outside the
> > suspend/resume (and now unplug) path, and each of these accesses has a
> > reason to exist:
> > 
> > - in the resume path, we know all IRQs are masked, and we reset the
> >   SW-side mask to the interrupts we want to accept before updating
> >   _INT_MASK. No risk of race in that one
> > - in the unplug path, I don't think we care about unhandled interrupts,
> >   because the device will become unusable after that point, so updating
> >   the panthor_irq::mask early and losing events should be okay.
> > - the suspend case has been described above. As explained, I don't think
> >   it matters if we lose events there, because really, if there's any
> >   synchronization needed, it should have happened explicitly before
> >   _irq_suspend() is called. The synchronize_irq() we have is just here
> >   to make sure there's nothing accessing registers when we turn the
> >   device clk/power-domain off.
> >   
> > >   
> > > >  												\
> > > >  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
> > > >  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\    
> > > 
> > > If you move the line above before the if condition, do you still need patch 3/3?  
> > 
> > The whole point of the drm_dev_enter/exit() section was to prevent
> > access to registers after the device has been unplugged, so, if I move
> > the gpu_write() outside of this block, I'd rather drop the entire
> > drm_dev_enter/exit() section (both here and in _irq_resume()). That
> > should be safe actually, as I don't expect the PM hooks or the reset
> > handler to be called after the device and its resource have been
> > removed, and those are the two only paths where _irq_suspend/resume()
> > can be called.  
> 
> Agree. It feels strange to care a lot about preventing access to registers
> (as we should) and then going and adding the unplug function which replaces
> the suspend in 3 out of 7 cases. New contributors might get confused about
> which one to use in the future.

Actually, irq_unplug() should be used in the _unplug() path, as the
name implies, I don't think that's confusing :P. My point was more on
the fact we probably don't need the dev_enter/exit() sections in the
first place, so why introduce a new helper if we can make the
_irq_suspend() helper simpler and re-usable for the _unplug() case?

> 
> What we should do is rename suspend to unplug and move the drm_dev_enter/exit()
> condition at the calling point for the cases where it makes sense.

That'd be even more confusing IMO, why would you call an helper named
_unplug() if all you want to do is temporarily suspend the IRQ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-26 10:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 13:57 [PATCH v2 1/3] drm/panthor: Fix IO-page mmap() for 32-bit userspace on 64-bit kernel Boris Brezillon
2024-03-25 13:57 ` [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend() Boris Brezillon
2024-03-25 15:23   ` Steven Price
2024-03-25 17:16   ` Liviu Dudau
2024-03-25 18:02     ` Boris Brezillon
2024-03-26  9:58       ` Liviu Dudau
2024-03-26 10:25         ` Boris Brezillon
2024-03-25 13:57 ` [PATCH v2 3/3] drm/panthor: Actually suspend IRQs in the unplug path Boris Brezillon
2024-03-25 17:24   ` Liviu Dudau
2024-03-25 18:05     ` Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.