Re: [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>,
	Liviu Dudau <liviu.dudau@arm.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Chia-I Wu <olvaffe@gmail.com>,
	Karunika Choo <karunika.choo@arm.com>,
	kernel@collabora.com, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration
Date: Thu, 15 Jan 2026 12:15:22 +0100	[thread overview]
Message-ID: <3991854.44csPzL39Z@workhorse> (raw)
In-Reply-To: <20260112161252.09396916@fedora>

On Monday, 12 January 2026 16:12:52 Central European Standard Time Boris Brezillon wrote:
> On Mon, 12 Jan 2026 15:37:50 +0100
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com> wrote:
> 
> > The current IRQ helpers do not guarantee mutual exclusion that covers
> > the entire transaction from accessing the mask member and modifying the
> > mask register.
> > 
> > This makes it hard, if not impossible, to implement mask modification
> > helpers that may change one of these outside the normal
> > suspend/resume/isr code paths.
> > 
> > Add a spinlock to struct panthor_irq that protects both the mask member
> > and register. Acquire it in all code paths that access these, but drop
> > it before processing the threaded handler function. Then, add the
> > aforementioned new helpers: enable_events, and disable_events. They work
> > by ORing and NANDing the mask bits.
> > 
> > resume is changed to no longer have a mask passed, as pirq->mask is
> > supposed to be the user-requested mask now, rather than a mirror of the
> > INT_MASK register contents. Users of the resume helper are adjusted
> > accordingly, including a rather painful refactor in panthor_mmu.c.
> > 
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h |  72 +++++++--
> >  drivers/gpu/drm/panthor/panthor_fw.c     |   3 +-
> >  drivers/gpu/drm/panthor/panthor_gpu.c    |   2 +-
> >  drivers/gpu/drm/panthor/panthor_mmu.c    | 247 ++++++++++++++++---------------
> >  drivers/gpu/drm/panthor/panthor_pwr.c    |   2 +-
> >  5 files changed, 187 insertions(+), 139 deletions(-)
> > 
> [... snip ...]
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index 198d59f42578..71b8318eab31 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -655,125 +655,6 @@ static void panthor_vm_release_as_locked(struct panthor_vm *vm)
> >  	vm->as.id = -1;
> >  }
> >  
> > -/**
> > - * panthor_vm_active() - Flag a VM as active
> > - * @vm: VM to flag as active.
> > - *
> > - * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > - *
> > - * Return: 0 on success, a negative error code otherwise.
> > - */
> > -int panthor_vm_active(struct panthor_vm *vm)
> > -{
> > -	struct panthor_device *ptdev = vm->ptdev;
> > -	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > -	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > -	int ret = 0, as, cookie;
> > -	u64 transtab, transcfg;
> > -
> > -	if (!drm_dev_enter(&ptdev->base, &cookie))
> > -		return -ENODEV;
> > -
> > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > -		goto out_dev_exit;
> > -
> > -	/* Make sure we don't race with lock/unlock_region() calls
> > -	 * happening around VM bind operations.
> > -	 */
> > -	mutex_lock(&vm->op_lock);
> > -	mutex_lock(&ptdev->mmu->as.slots_lock);
> > -
> > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > -		goto out_unlock;
> > -
> > -	as = vm->as.id;
> > -	if (as >= 0) {
> > -		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > -		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > -		 */
> > -		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > -			goto out_enable_as;
> > -
> > -		goto out_make_active;
> > -	}
> > -
> > -	/* Check for a free AS */
> > -	if (vm->for_mcu) {
> > -		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > -		as = 0;
> > -	} else {
> > -		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > -	}
> > -
> > -	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > -		struct panthor_vm *lru_vm;
> > -
> > -		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > -						  struct panthor_vm,
> > -						  as.lru_node);
> > -		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > -			ret = -EBUSY;
> > -			goto out_unlock;
> > -		}
> > -
> > -		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > -		as = lru_vm->as.id;
> > -
> > -		ret = panthor_mmu_as_disable(ptdev, as, true);
> > -		if (ret)
> > -			goto out_unlock;
> > -
> > -		panthor_vm_release_as_locked(lru_vm);
> > -	}
> > -
> > -	/* Assign the free or reclaimed AS to the FD */
> > -	vm->as.id = as;
> > -	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > -	ptdev->mmu->as.slots[as].vm = vm;
> > -
> > -out_enable_as:
> > -	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > -	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > -		   AS_TRANSCFG_PTW_RA |
> > -		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > -		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > -	if (ptdev->coherent)
> > -		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > -
> > -	/* If the VM is re-activated, we clear the fault. */
> > -	vm->unhandled_fault = false;
> > -
> > -	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > -	 * before enabling the AS.
> > -	 */
> > -	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
> > -		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
> > -		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
> > -		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
> > -		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
> > -	}
> > -
> > -	/* The VM update is guarded by ::op_lock, which we take at the beginning
> > -	 * of this function, so we don't expect any locked region here.
> > -	 */
> > -	drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0);
> > -	ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > -
> > -out_make_active:
> > -	if (!ret) {
> > -		refcount_set(&vm->as.active_cnt, 1);
> > -		list_del_init(&vm->as.lru_node);
> > -	}
> > -
> > -out_unlock:
> > -	mutex_unlock(&ptdev->mmu->as.slots_lock);
> > -	mutex_unlock(&vm->op_lock);
> > -
> > -out_dev_exit:
> > -	drm_dev_exit(cookie);
> > -	return ret;
> > -}
> > -
> >  /**
> >   * panthor_vm_idle() - Flag a VM idle
> >   * @vm: VM to flag as idle.
> > @@ -1762,6 +1643,128 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> >  }
> >  PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler);
> >  
> > +/**
> > + * panthor_vm_active() - Flag a VM as active
> > + * @vm: VM to flag as active.
> > + *
> > + * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.
> > + */
> > +int panthor_vm_active(struct panthor_vm *vm)
> > +{
> > +	struct panthor_device *ptdev = vm->ptdev;
> > +	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > +	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > +	int ret = 0, as, cookie;
> > +	u64 transtab, transcfg;
> > +	u32 fault_mask;
> > +
> > +	if (!drm_dev_enter(&ptdev->base, &cookie))
> > +		return -ENODEV;
> > +
> > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > +		goto out_dev_exit;
> > +
> > +	/* Make sure we don't race with lock/unlock_region() calls
> > +	 * happening around VM bind operations.
> > +	 */
> > +	mutex_lock(&vm->op_lock);
> > +	mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > +		goto out_unlock;
> > +
> > +	as = vm->as.id;
> > +	if (as >= 0) {
> > +		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > +		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > +		 */
> > +		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > +			goto out_enable_as;
> > +
> > +		goto out_make_active;
> > +	}
> > +
> > +	/* Check for a free AS */
> > +	if (vm->for_mcu) {
> > +		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > +		as = 0;
> > +	} else {
> > +		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > +	}
> > +
> > +	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > +		struct panthor_vm *lru_vm;
> > +
> > +		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > +						  struct panthor_vm,
> > +						  as.lru_node);
> > +		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > +			ret = -EBUSY;
> > +			goto out_unlock;
> > +		}
> > +
> > +		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > +		as = lru_vm->as.id;
> > +
> > +		ret = panthor_mmu_as_disable(ptdev, as, true);
> > +		if (ret)
> > +			goto out_unlock;
> > +
> > +		panthor_vm_release_as_locked(lru_vm);
> > +	}
> > +
> > +	/* Assign the free or reclaimed AS to the FD */
> > +	vm->as.id = as;
> > +	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > +	ptdev->mmu->as.slots[as].vm = vm;
> > +
> > +out_enable_as:
> > +	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > +	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > +		   AS_TRANSCFG_PTW_RA |
> > +		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > +		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > +	if (ptdev->coherent)
> > +		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > +
> > +	/* If the VM is re-activated, we clear the fault. */
> > +	vm->unhandled_fault = false;
> > +
> > +	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > +	 * before enabling the AS.
> > +	 */
> > +	fault_mask = panthor_mmu_as_fault_mask(ptdev, as);
> > +	if (ptdev->mmu->as.faulty_mask & fault_mask) {
> > +		gpu_write(ptdev, MMU_INT_CLEAR, fault_mask);
> > +		ptdev->mmu->as.faulty_mask &= ~fault_mask;
> > +		panthor_mmu_irq_enable_events(&ptdev->mmu->irq, fault_mask);
> > +		panthor_mmu_irq_disable_events(&ptdev->mmu->irq, ptdev->mmu->as.faulty_mask);
> 
> Why do we need a _disable_events() here?

It's what the code originally did as far as I can tell. Not super obvious
because I had to move the function, but it did:

	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
	 * before enabling the AS.
	 */
	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
	}

We write `~(ptdev->mmu->as.faulty_mask & ~panthor_mmu_as_fault_mask(ptdev, as))` to
the mask register. Though now looking at it again, I don't think my new version
expands to the same thing at all, since
`ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);` is trying
to clear the fault mask of the one bit this translates to from what I can tell,
and then the negation in the write re-enables it but clears all other bits? That
can't be right. If anything if it wanted to re-enable interrupts it should OR
the register contents, not overwrite them.

I feel a little better about the me from a few days ago when I can look at the
code with a fresh set of eyes and still not get what it's actually trying to do,
other than trusting the comment.

Also, genuinely what is the point of `panthor_mmu_as_fault_mask`? Half of its
parameters are unused and its entire implementation is shorter than the function
name.

So yeah I think I'll remove the disable_events here and double-check what this
code is actually supposed to do, because the version I'm replacing seems very
non-obvious, as I can't see how what it does corresponds to what the comment
says it does (clear fault and re-enable its interrupt). This also plays into your
remark below.

> 
> > +	}
> > +
> > +	/* The VM update is guarded by ::op_lock, which we take at the beginning
> > +	 * of this function, so we don't expect any locked region here.
> > +	 */
> > +	drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0);
> > +	ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > +
> > +out_make_active:
> > +	if (!ret) {
> > +		refcount_set(&vm->as.active_cnt, 1);
> > +		list_del_init(&vm->as.lru_node);
> > +	}
> > +
> > +out_unlock:
> > +	mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +	mutex_unlock(&vm->op_lock);
> > +
> > +out_dev_exit:
> > +	drm_dev_exit(cookie);
> > +	return ret;
> > +}
> > +
> > +
> 
> nit: one too many empty lines.
> 
> >  /**
> >   * panthor_mmu_suspend() - Suspend the MMU logic
> >   * @ptdev: Device.
> > @@ -1805,7 +1808,8 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
> >  	ptdev->mmu->as.faulty_mask = 0;
> >  	mutex_unlock(&ptdev->mmu->as.slots_lock);
> >  
> > -	panthor_mmu_irq_resume(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> > +	panthor_mmu_irq_enable_events(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> 
> I don't think we should touch the events mask in the suspend/resume
> path. The way I see it, events should be:
> 
> - enabled when an AS is enabled (as_enable())
> - disabled when an AS is disabled (as_disable())
> - disabled when a VM has an unhandled faults
> 
> Because making a VM active might imply evicting another VM, we might
> end up with disable+enable_events() pairs that we could have been
> optimized into a NOP, but the overhead should be negligible, and if we
> have to rotate VMs on AS slots we've already lost anyway (in term of
> perfs).

Yep, I'll do that. I think I was naively trying to translate the code,
but since we now have pirq->mask preserved for us, this explicit juggling
of state can be removed.

> 
> > +	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> >  }
> >  
> >  /**
> > @@ -1859,7 +1863,8 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
> >  
> >  	mutex_unlock(&ptdev->mmu->as.slots_lock);
> >  
> > -	panthor_mmu_irq_resume(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> > +	panthor_mmu_irq_enable_events(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> 
> Same here, I don't think we need to change the event mask.
> 
> > +	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> >  
> >  	/* Restart the VM_BIND queues. */
> >  	mutex_lock(&ptdev->mmu->vm.lock);
> > diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> > index 57cfc7ce715b..ed3b2b4479ca 100644
> > --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> > +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> > @@ -545,5 +545,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
> >  	if (!ptdev->pwr)
> >  		return;
> >  
> > -	panthor_pwr_irq_resume(&ptdev->pwr->irq, PWR_INTERRUPTS_MASK);
> > +	panthor_pwr_irq_resume(&ptdev->pwr->irq);
> >  }
> > 
> 
> 

Thanks for the review.

Kind regards,
Nicolas Frattaroli

next prev parent reply	other threads:[~2026-01-15 11:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 14:37 [PATCH v8 0/4] Add a few tracepoints to panthor Nicolas Frattaroli
2026-01-12 14:37 ` [PATCH v8 1/4] drm/panthor: Rework panthor_irq::suspended into panthor_irq::state Nicolas Frattaroli
2026-01-14 16:07   ` Steven Price
2026-01-12 14:37 ` [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration Nicolas Frattaroli
2026-01-12 15:12   ` Boris Brezillon
2026-01-15 11:15     ` Nicolas Frattaroli [this message]
2026-01-15 11:30       ` Boris Brezillon
2026-01-13 12:23   ` Boris Brezillon
2026-01-12 14:37 ` [PATCH v8 3/4] drm/panthor: Add tracepoint for hardware utilisation changes Nicolas Frattaroli
2026-01-12 14:37 ` [PATCH v8 4/4] drm/panthor: Add gpu_job_irq tracepoint Nicolas Frattaroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3991854.44csPzL39Z@workhorse \
    --to=nicolas.frattaroli@collabora.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=karunika.choo@arm.com \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liviu.dudau@arm.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=olvaffe@gmail.com \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.