Re: [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>,
	Liviu Dudau <liviu.dudau@arm.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Chia-I Wu <olvaffe@gmail.com>,
	Karunika Choo <karunika.choo@arm.com>,
	kernel@collabora.com, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration
Date: Thu, 15 Jan 2026 12:15:22 +0100	[thread overview]
Message-ID: <3991854.44csPzL39Z@workhorse> (raw)
In-Reply-To: <20260112161252.09396916@fedora>

On Monday, 12 January 2026 16:12:52 Central European Standard Time Boris Brezillon wrote:
> On Mon, 12 Jan 2026 15:37:50 +0100
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com> wrote:
> 
> > The current IRQ helpers do not guarantee mutual exclusion that covers
> > the entire transaction from accessing the mask member and modifying the
> > mask register.
> > 
> > This makes it hard, if not impossible, to implement mask modification
> > helpers that may change one of these outside the normal
> > suspend/resume/isr code paths.
> > 
> > Add a spinlock to struct panthor_irq that protects both the mask member
> > and register. Acquire it in all code paths that access these, but drop
> > it before processing the threaded handler function. Then, add the
> > aforementioned new helpers: enable_events, and disable_events. They work
> > by ORing and NANDing the mask bits.
> > 
> > resume is changed to no longer have a mask passed, as pirq->mask is
> > supposed to be the user-requested mask now, rather than a mirror of the
> > INT_MASK register contents. Users of the resume helper are adjusted
> > accordingly, including a rather painful refactor in panthor_mmu.c.
> > 
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h |  72 +++++++--
> >  drivers/gpu/drm/panthor/panthor_fw.c     |   3 +-
> >  drivers/gpu/drm/panthor/panthor_gpu.c    |   2 +-
> >  drivers/gpu/drm/panthor/panthor_mmu.c    | 247 ++++++++++++++++---------------
> >  drivers/gpu/drm/panthor/panthor_pwr.c    |   2 +-
> >  5 files changed, 187 insertions(+), 139 deletions(-)
> > 
> [... snip ...]
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index 198d59f42578..71b8318eab31 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -655,125 +655,6 @@ static void panthor_vm_release_as_locked(struct panthor_vm *vm)
> >  	vm->as.id = -1;
> >  }
> >  
> > -/**
> > - * panthor_vm_active() - Flag a VM as active
> > - * @vm: VM to flag as active.
> > - *
> > - * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > - *
> > - * Return: 0 on success, a negative error code otherwise.
> > - */
> > -int panthor_vm_active(struct panthor_vm *vm)
> > -{
> > -	struct panthor_device *ptdev = vm->ptdev;
> > -	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > -	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > -	int ret = 0, as, cookie;
> > -	u64 transtab, transcfg;
> > -
> > -	if (!drm_dev_enter(&ptdev->base, &cookie))
> > -		return -ENODEV;
> > -
> > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > -		goto out_dev_exit;
> > -
> > -	/* Make sure we don't race with lock/unlock_region() calls
> > -	 * happening around VM bind operations.
> > -	 */
> > -	mutex_lock(&vm->op_lock);
> > -	mutex_lock(&ptdev->mmu->as.slots_lock);
> > -
> > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > -		goto out_unlock;
> > -
> > -	as = vm->as.id;
> > -	if (as >= 0) {
> > -		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > -		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > -		 */
> > -		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > -			goto out_enable_as;
> > -
> > -		goto out_make_active;
> > -	}
> > -
> > -	/* Check for a free AS */
> > -	if (vm->for_mcu) {
> > -		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > -		as = 0;
> > -	} else {
> > -		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > -	}
> > -
> > -	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > -		struct panthor_vm *lru_vm;
> > -
> > -		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > -						  struct panthor_vm,
> > -						  as.lru_node);
> > -		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > -			ret = -EBUSY;
> > -			goto out_unlock;
> > -		}
> > -
> > -		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > -		as = lru_vm->as.id;
> > -
> > -		ret = panthor_mmu_as_disable(ptdev, as, true);
> > -		if (ret)
> > -			goto out_unlock;
> > -
> > -		panthor_vm_release_as_locked(lru_vm);
> > -	}
> > -
> > -	/* Assign the free or reclaimed AS to the FD */
> > -	vm->as.id = as;
> > -	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > -	ptdev->mmu->as.slots[as].vm = vm;
> > -
> > -out_enable_as:
> > -	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > -	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > -		   AS_TRANSCFG_PTW_RA |
> > -		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > -		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > -	if (ptdev->coherent)
> > -		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > -
> > -	/* If the VM is re-activated, we clear the fault. */
> > -	vm->unhandled_fault = false;
> > -
> > -	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > -	 * before enabling the AS.
> > -	 */
> > -	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
> > -		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
> > -		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
> > -		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
> > -		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
> > -	}
> > -
> > -	/* The VM update is guarded by ::op_lock, which we take at the beginning
> > -	 * of this function, so we don't expect any locked region here.
> > -	 */
> > -	drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0);
> > -	ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > -
> > -out_make_active:
> > -	if (!ret) {
> > -		refcount_set(&vm->as.active_cnt, 1);
> > -		list_del_init(&vm->as.lru_node);
> > -	}
> > -
> > -out_unlock:
> > -	mutex_unlock(&ptdev->mmu->as.slots_lock);
> > -	mutex_unlock(&vm->op_lock);
> > -
> > -out_dev_exit:
> > -	drm_dev_exit(cookie);
> > -	return ret;
> > -}
> > -
> >  /**
> >   * panthor_vm_idle() - Flag a VM idle
> >   * @vm: VM to flag as idle.
> > @@ -1762,6 +1643,128 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> >  }
> >  PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler);
> >  
> > +/**
> > + * panthor_vm_active() - Flag a VM as active
> > + * @vm: VM to flag as active.
> > + *
> > + * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.
> > + */
> > +int panthor_vm_active(struct panthor_vm *vm)
> > +{
> > +	struct panthor_device *ptdev = vm->ptdev;
> > +	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > +	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > +	int ret = 0, as, cookie;
> > +	u64 transtab, transcfg;
> > +	u32 fault_mask;
> > +
> > +	if (!drm_dev_enter(&ptdev->base, &cookie))
> > +		return -ENODEV;
> > +
> > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > +		goto out_dev_exit;
> > +
> > +	/* Make sure we don't race with lock/unlock_region() calls
> > +	 * happening around VM bind operations.
> > +	 */
> > +	mutex_lock(&vm->op_lock);
> > +	mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > +		goto out_unlock;
> > +
> > +	as = vm->as.id;
> > +	if (as >= 0) {
> > +		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > +		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > +		 */
> > +		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > +			goto out_enable_as;
> > +
> > +		goto out_make_active;
> > +	}
> > +
> > +	/* Check for a free AS */
> > +	if (vm->for_mcu) {
> > +		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > +		as = 0;
> > +	} else {
> > +		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > +	}
> > +
> > +	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > +		struct panthor_vm *lru_vm;
> > +
> > +		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > +						  struct panthor_vm,
> > +						  as.lru_node);
> > +		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > +			ret = -EBUSY;
> > +			goto out_unlock;
> > +		}
> > +
> > +		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > +		as = lru_vm->as.id;
> > +
> > +		ret = panthor_mmu_as_disable(ptdev, as, true);
> > +		if (ret)
> > +			goto out_unlock;
> > +
> > +		panthor_vm_release_as_locked(lru_vm);
> > +	}
> > +
> > +	/* Assign the free or reclaimed AS to the FD */
> > +	vm->as.id = as;
> > +	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > +	ptdev->mmu->as.slots[as].vm = vm;
> > +
> > +out_enable_as:
> > +	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > +	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > +		   AS_TRANSCFG_PTW_RA |
> > +		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > +		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > +	if (ptdev->coherent)
> > +		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > +
> > +	/* If the VM is re-activated, we clear the fault. */
> > +	vm->unhandled_fault = false;
> > +
> > +	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > +	 * before enabling the AS.
> > +	 */
> > +	fault_mask = panthor_mmu_as_fault_mask(ptdev, as);
> > +	if (ptdev->mmu->as.faulty_mask & fault_mask) {
> > +		gpu_write(ptdev, MMU_INT_CLEAR, fault_mask);
> > +		ptdev->mmu->as.faulty_mask &= ~fault_mask;
> > +		panthor_mmu_irq_enable_events(&ptdev->mmu->irq, fault_mask);
> > +		panthor_mmu_irq_disable_events(&ptdev->mmu->irq, ptdev->mmu->as.faulty_mask);
> 
> Why do we need a _disable_events() here?

It's what the code originally did as far as I can tell. Not super obvious
because I had to move the function, but it did:

	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
	 * before enabling the AS.
	 */
	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
	}

We write `~(ptdev->mmu->as.faulty_mask & ~panthor_mmu_as_fault_mask(ptdev, as))` to
the mask register. Though now looking at it again, I don't think my new version
expands to the same thing at all, since
`ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);` is trying
to clear the fault mask of the one bit this translates to from what I can tell,
and then the negation in the write re-enables it but clears all other bits? That
can't be right. If anything if it wanted to re-enable interrupts it should OR
the register contents, not overwrite them.

I feel a little better about the me from a few days ago when I can look at the
code with a fresh set of eyes and still not get what it's actually trying to do,
other than trusting the comment.

Also, genuinely what is the point of `panthor_mmu_as_fault_mask`? Half of its
parameters are unused and its entire implementation is shorter than the function
name.

So yeah I think I'll remove the disable_events here and double-check what this
code is actually supposed to do, because the version I'm replacing seems very
non-obvious, as I can't see how what it does corresponds to what the comment
says it does (clear fault and re-enable its interrupt). This also plays into your
remark below.

> 
> > +	}
> > +
> > +	/* The VM update is guarded by ::op_lock, which we take at the beginning
> > +	 * of this function, so we don't expect any locked region here.
> > +	 */
> > +	drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0);
> > +	ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > +
> > +out_make_active:
> > +	if (!ret) {
> > +		refcount_set(&vm->as.active_cnt, 1);
> > +		list_del_init(&vm->as.lru_node);
> > +	}
> > +
> > +out_unlock:
> > +	mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +	mutex_unlock(&vm->op_lock);
> > +
> > +out_dev_exit:
> > +	drm_dev_exit(cookie);
> > +	return ret;
> > +}
> > +
> > +
> 
> nit: one too many empty lines.
> 
> >  /**
> >   * panthor_mmu_suspend() - Suspend the MMU logic
> >   * @ptdev: Device.
> > @@ -1805,7 +1808,8 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
> >  	ptdev->mmu->as.faulty_mask = 0;
> >  	mutex_unlock(&ptdev->mmu->as.slots_lock);
> >  
> > -	panthor_mmu_irq_resume(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> > +	panthor_mmu_irq_enable_events(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> 
> I don't think we should touch the events mask in the suspend/resume
> path. The way I see it, events should be:
> 
> - enabled when an AS is enabled (as_enable())
> - disabled when an AS is disabled (as_disable())
> - disabled when a VM has an unhandled faults
> 
> Because making a VM active might imply evicting another VM, we might
> end up with disable+enable_events() pairs that we could have been
> optimized into a NOP, but the overhead should be negligible, and if we
> have to rotate VMs on AS slots we've already lost anyway (in term of
> perfs).

Yep, I'll do that. I think I was naively trying to translate the code,
but since we now have pirq->mask preserved for us, this explicit juggling
of state can be removed.

> 
> > +	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> >  }
> >  
> >  /**
> > @@ -1859,7 +1863,8 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
> >  
> >  	mutex_unlock(&ptdev->mmu->as.slots_lock);
> >  
> > -	panthor_mmu_irq_resume(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> > +	panthor_mmu_irq_enable_events(&ptdev->mmu->irq, panthor_mmu_fault_mask(ptdev, ~0));
> 
> Same here, I don't think we need to change the event mask.
> 
> > +	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> >  
> >  	/* Restart the VM_BIND queues. */
> >  	mutex_lock(&ptdev->mmu->vm.lock);
> > diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> > index 57cfc7ce715b..ed3b2b4479ca 100644
> > --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> > +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> > @@ -545,5 +545,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
> >  	if (!ptdev->pwr)
> >  		return;
> >  
> > -	panthor_pwr_irq_resume(&ptdev->pwr->irq, PWR_INTERRUPTS_MASK);
> > +	panthor_pwr_irq_resume(&ptdev->pwr->irq);
> >  }
> > 
> 
> 

Thanks for the review.

Kind regards,
Nicolas Frattaroli

next prev parent reply	other threads:[~2026-01-15 11:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 14:37 [PATCH v8 0/4] Add a few tracepoints to panthor Nicolas Frattaroli
2026-01-12 14:37 ` [PATCH v8 1/4] drm/panthor: Rework panthor_irq::suspended into panthor_irq::state Nicolas Frattaroli
2026-01-14 16:07   ` Steven Price
2026-01-12 14:37 ` [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask modification/restoration Nicolas Frattaroli
2026-01-12 15:12   ` Boris Brezillon
2026-01-15 11:15     ` Nicolas Frattaroli [this message]
2026-01-15 11:30       ` Boris Brezillon
2026-01-13 12:23   ` Boris Brezillon
2026-01-12 14:37 ` [PATCH v8 3/4] drm/panthor: Add tracepoint for hardware utilisation changes Nicolas Frattaroli
2026-01-12 14:37 ` [PATCH v8 4/4] drm/panthor: Add gpu_job_irq tracepoint Nicolas Frattaroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3991854.44csPzL39Z@workhorse \
    --to=nicolas.frattaroli@collabora.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=karunika.choo@arm.com \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liviu.dudau@arm.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=olvaffe@gmail.com \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox