From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4F4936215E
	for <linux-kernel@vger.kernel.org>; Thu, 15 Jan 2026 11:31:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.105.195
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1768476662; cv=none; b=IEHVtoYQC/y3A7/t70zDuDNnW9jdWXIre2+3oYGBCcS9VoCXEC3fL8lkDB4k8gON/ofRyasNRlfPcf5gaNQCmYuyQMMhyRN++c9JXJmmwsO+XwoP8YnE6yffIL7rv2aYQTmflbXa0DQg4i/59c8g+Hb1P+7itjVgzONZec+gSDI=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1768476662; c=relaxed/simple;
	bh=4BSQh18x4p2LA8T7wpkFltsvFgNZGSy+EpUsFfN67Vw=;
	h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=GqQD50yOXtEn4caaOKXBchumQbixi8YT0I3Nq5c/B1/vYvdfNl4RVRWsR1s0IuhiKs/qsClFC5jMFLVd1NrH0bOc9ETv8xHNrJ4neOIaMyZ4LTww/0go/yuotBIpVTSQPqrm9JbECsffoRTajBzSPhoSWqWv/B9DSxg8iNDAu+I=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=i4WEFWPY; arc=none smtp.client-ip=148.251.105.195
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="i4WEFWPY"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com;
	s=mail; t=1768476659;
	bh=4BSQh18x4p2LA8T7wpkFltsvFgNZGSy+EpUsFfN67Vw=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=i4WEFWPYM5px7QUWmkefLoc9pM9YwR9pcZ3xlmaHyt5t7r8Zt2FcSXRgXBMpNrgpJ
	 mKQS9eyOaqX5LNGtprCXpJ2MagfkScFogfjrh1urCXuxUq2HN607eruLJgcxTzo5P6
	 ToRoN0lc8GfhBhbD2nZuEIML9DtwCC1zCCpDXbwmehxoxBPkw2P3q6FgIP3pqy9tA2
	 PLOGLhCeWHD3zhQvqw+uAK6YVE3ZoqZlwYAmXVYtP0HjR2b4Z8fnf9Ign/Pv8m2YFz
	 PDvIw6VLECsyb7HDbaswqg/VglSA9oqFODSUeki0QpkhuNQQE+S390xOvh7Olm0IHh
	 GraxFL1aRQSjw==
Received: from fedora (unknown [IPv6:2a01:e0a:2c:6930:d919:a6e:5ea1:8a9f])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	(Authenticated sender: bbrezillon)
	by bali.collaboradmins.com (Postfix) with ESMTPSA id 8348217E01E7;
	Thu, 15 Jan 2026 12:30:58 +0100 (CET)
Date: Thu, 15 Jan 2026 12:30:52 +0100
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Cc: Steven Price <steven.price@arm.com>, Liviu Dudau <liviu.dudau@arm.com>,
 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard
 <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, David Airlie
 <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>, Chia-I Wu
 <olvaffe@gmail.com>, Karunika Choo <karunika.choo@arm.com>,
 kernel@collabora.com, linux-kernel@vger.kernel.org,
 dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v8 2/4] drm/panthor: Extend IRQ helpers for mask
 modification/restoration
Message-ID: <20260115123052.5c30e714@fedora>
In-Reply-To: <3991854.44csPzL39Z@workhorse>
References: <20260112-panthor-tracepoints-v8-0-63efcb421d22@collabora.com>
	<20260112-panthor-tracepoints-v8-2-63efcb421d22@collabora.com>
	<20260112161252.09396916@fedora>
	<3991854.44csPzL39Z@workhorse>
Organization: Collabora
X-Mailer: Claws Mail 4.3.1 (GTK 3.24.51; x86_64-redhat-linux-gnu)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 15 Jan 2026 12:15:22 +0100
Nicolas Frattaroli <nicolas.frattaroli@collabora.com> wrote:

> On Monday, 12 January 2026 16:12:52 Central European Standard Time Boris Brezillon wrote:
> > On Mon, 12 Jan 2026 15:37:50 +0100
> > Nicolas Frattaroli <nicolas.frattaroli@collabora.com> wrote:
> >   
> > > The current IRQ helpers do not guarantee mutual exclusion that covers
> > > the entire transaction from accessing the mask member and modifying the
> > > mask register.
> > > 
> > > This makes it hard, if not impossible, to implement mask modification
> > > helpers that may change one of these outside the normal
> > > suspend/resume/isr code paths.
> > > 
> > > Add a spinlock to struct panthor_irq that protects both the mask member
> > > and register. Acquire it in all code paths that access these, but drop
> > > it before processing the threaded handler function. Then, add the
> > > aforementioned new helpers: enable_events, and disable_events. They work
> > > by ORing and NANDing the mask bits.
> > > 
> > > resume is changed to no longer have a mask passed, as pirq->mask is
> > > supposed to be the user-requested mask now, rather than a mirror of the
> > > INT_MASK register contents. Users of the resume helper are adjusted
> > > accordingly, including a rather painful refactor in panthor_mmu.c.
> > > 
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_device.h |  72 +++++++--
> > >  drivers/gpu/drm/panthor/panthor_fw.c     |   3 +-
> > >  drivers/gpu/drm/panthor/panthor_gpu.c    |   2 +-
> > >  drivers/gpu/drm/panthor/panthor_mmu.c    | 247 ++++++++++++++++---------------
> > >  drivers/gpu/drm/panthor/panthor_pwr.c    |   2 +-
> > >  5 files changed, 187 insertions(+), 139 deletions(-)
> > >   
> > [... snip ...]  
> > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > index 198d59f42578..71b8318eab31 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > @@ -655,125 +655,6 @@ static void panthor_vm_release_as_locked(struct panthor_vm *vm)
> > >  	vm->as.id = -1;
> > >  }
> > >  
> > > -/**
> > > - * panthor_vm_active() - Flag a VM as active
> > > - * @vm: VM to flag as active.
> > > - *
> > > - * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > > - *
> > > - * Return: 0 on success, a negative error code otherwise.
> > > - */
> > > -int panthor_vm_active(struct panthor_vm *vm)
> > > -{
> > > -	struct panthor_device *ptdev = vm->ptdev;
> > > -	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > > -	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > > -	int ret = 0, as, cookie;
> > > -	u64 transtab, transcfg;
> > > -
> > > -	if (!drm_dev_enter(&ptdev->base, &cookie))
> > > -		return -ENODEV;
> > > -
> > > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > > -		goto out_dev_exit;
> > > -
> > > -	/* Make sure we don't race with lock/unlock_region() calls
> > > -	 * happening around VM bind operations.
> > > -	 */
> > > -	mutex_lock(&vm->op_lock);
> > > -	mutex_lock(&ptdev->mmu->as.slots_lock);
> > > -
> > > -	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > > -		goto out_unlock;
> > > -
> > > -	as = vm->as.id;
> > > -	if (as >= 0) {
> > > -		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > > -		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > > -		 */
> > > -		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > > -			goto out_enable_as;
> > > -
> > > -		goto out_make_active;
> > > -	}
> > > -
> > > -	/* Check for a free AS */
> > > -	if (vm->for_mcu) {
> > > -		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > > -		as = 0;
> > > -	} else {
> > > -		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > > -	}
> > > -
> > > -	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > > -		struct panthor_vm *lru_vm;
> > > -
> > > -		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > > -						  struct panthor_vm,
> > > -						  as.lru_node);
> > > -		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > > -			ret = -EBUSY;
> > > -			goto out_unlock;
> > > -		}
> > > -
> > > -		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > > -		as = lru_vm->as.id;
> > > -
> > > -		ret = panthor_mmu_as_disable(ptdev, as, true);
> > > -		if (ret)
> > > -			goto out_unlock;
> > > -
> > > -		panthor_vm_release_as_locked(lru_vm);
> > > -	}
> > > -
> > > -	/* Assign the free or reclaimed AS to the FD */
> > > -	vm->as.id = as;
> > > -	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > > -	ptdev->mmu->as.slots[as].vm = vm;
> > > -
> > > -out_enable_as:
> > > -	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > > -	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > > -		   AS_TRANSCFG_PTW_RA |
> > > -		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > > -		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > > -	if (ptdev->coherent)
> > > -		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > > -
> > > -	/* If the VM is re-activated, we clear the fault. */
> > > -	vm->unhandled_fault = false;
> > > -
> > > -	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > > -	 * before enabling the AS.
> > > -	 */
> > > -	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
> > > -		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
> > > -		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
> > > -		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
> > > -		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
> > > -	}
> > > -
> > > -	/* The VM update is guarded by ::op_lock, which we take at the beginning
> > > -	 * of this function, so we don't expect any locked region here.
> > > -	 */
> > > -	drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0);
> > > -	ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > > -
> > > -out_make_active:
> > > -	if (!ret) {
> > > -		refcount_set(&vm->as.active_cnt, 1);
> > > -		list_del_init(&vm->as.lru_node);
> > > -	}
> > > -
> > > -out_unlock:
> > > -	mutex_unlock(&ptdev->mmu->as.slots_lock);
> > > -	mutex_unlock(&vm->op_lock);
> > > -
> > > -out_dev_exit:
> > > -	drm_dev_exit(cookie);
> > > -	return ret;
> > > -}
> > > -
> > >  /**
> > >   * panthor_vm_idle() - Flag a VM idle
> > >   * @vm: VM to flag as idle.
> > > @@ -1762,6 +1643,128 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> > >  }
> > >  PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler);
> > >  
> > > +/**
> > > + * panthor_vm_active() - Flag a VM as active
> > > + * @vm: VM to flag as active.
> > > + *
> > > + * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > > + *
> > > + * Return: 0 on success, a negative error code otherwise.
> > > + */
> > > +int panthor_vm_active(struct panthor_vm *vm)
> > > +{
> > > +	struct panthor_device *ptdev = vm->ptdev;
> > > +	u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features);
> > > +	struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > > +	int ret = 0, as, cookie;
> > > +	u64 transtab, transcfg;
> > > +	u32 fault_mask;
> > > +
> > > +	if (!drm_dev_enter(&ptdev->base, &cookie))
> > > +		return -ENODEV;
> > > +
> > > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > > +		goto out_dev_exit;
> > > +
> > > +	/* Make sure we don't race with lock/unlock_region() calls
> > > +	 * happening around VM bind operations.
> > > +	 */
> > > +	mutex_lock(&vm->op_lock);
> > > +	mutex_lock(&ptdev->mmu->as.slots_lock);
> > > +
> > > +	if (refcount_inc_not_zero(&vm->as.active_cnt))
> > > +		goto out_unlock;
> > > +
> > > +	as = vm->as.id;
> > > +	if (as >= 0) {
> > > +		/* Unhandled pagefault on this AS, the MMU was disabled. We need to
> > > +		 * re-enable the MMU after clearing+unmasking the AS interrupts.
> > > +		 */
> > > +		if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as))
> > > +			goto out_enable_as;
> > > +
> > > +		goto out_make_active;
> > > +	}
> > > +
> > > +	/* Check for a free AS */
> > > +	if (vm->for_mcu) {
> > > +		drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > > +		as = 0;
> > > +	} else {
> > > +		as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > > +	}
> > > +
> > > +	if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > > +		struct panthor_vm *lru_vm;
> > > +
> > > +		lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > > +						  struct panthor_vm,
> > > +						  as.lru_node);
> > > +		if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > > +			ret = -EBUSY;
> > > +			goto out_unlock;
> > > +		}
> > > +
> > > +		drm_WARN_ON(&ptdev->base, refcount_read(&lru_vm->as.active_cnt));
> > > +		as = lru_vm->as.id;
> > > +
> > > +		ret = panthor_mmu_as_disable(ptdev, as, true);
> > > +		if (ret)
> > > +			goto out_unlock;
> > > +
> > > +		panthor_vm_release_as_locked(lru_vm);
> > > +	}
> > > +
> > > +	/* Assign the free or reclaimed AS to the FD */
> > > +	vm->as.id = as;
> > > +	set_bit(as, &ptdev->mmu->as.alloc_mask);
> > > +	ptdev->mmu->as.slots[as].vm = vm;
> > > +
> > > +out_enable_as:
> > > +	transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > > +	transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > > +		   AS_TRANSCFG_PTW_RA |
> > > +		   AS_TRANSCFG_ADRMODE_AARCH64_4K |
> > > +		   AS_TRANSCFG_INA_BITS(55 - va_bits);
> > > +	if (ptdev->coherent)
> > > +		transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > > +
> > > +	/* If the VM is re-activated, we clear the fault. */
> > > +	vm->unhandled_fault = false;
> > > +
> > > +	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> > > +	 * before enabling the AS.
> > > +	 */
> > > +	fault_mask = panthor_mmu_as_fault_mask(ptdev, as);
> > > +	if (ptdev->mmu->as.faulty_mask & fault_mask) {
> > > +		gpu_write(ptdev, MMU_INT_CLEAR, fault_mask);
> > > +		ptdev->mmu->as.faulty_mask &= ~fault_mask;
> > > +		panthor_mmu_irq_enable_events(&ptdev->mmu->irq, fault_mask);
> > > +		panthor_mmu_irq_disable_events(&ptdev->mmu->irq, ptdev->mmu->as.faulty_mask);  
> > 
> > Why do we need a _disable_events() here?  
> 
> It's what the code originally did as far as I can tell. Not super obvious
> because I had to move the function, but it did:
> 
> 	/* Unhandled pagefault on this AS, clear the fault and re-enable interrupts
> 	 * before enabling the AS.
> 	 */
> 	if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) {
> 		gpu_write(ptdev, MMU_INT_CLEAR, panthor_mmu_as_fault_mask(ptdev, as));
> 		ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);
> 		ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as);
> 		gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
> 	}
> 
> We write `~(ptdev->mmu->as.faulty_mask & ~panthor_mmu_as_fault_mask(ptdev, as))` to
> the mask register. Though now looking at it again, I don't think my new version
> expands to the same thing at all, since
> `ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);` is trying
> to clear the fault mask of the one bit this translates to from what I can tell,
> and then the negation in the write re-enables it but clears all other bits? That
> can't be right. If anything if it wanted to re-enable interrupts it should OR
> the register contents, not overwrite them.

It's more an "enable anything that's not faulty" than a "re-enable
events for this specific AS". So all we were doing is keep track of the
faulty+not-acknowledged mask, and then clearing bits in this mask as AS
slots get acknowledged or recycled.

> 
> I feel a little better about the me from a few days ago when I can look at the
> code with a fresh set of eyes and still not get what it's actually trying to do,
> other than trusting the comment.
> 
> Also, genuinely what is the point of `panthor_mmu_as_fault_mask`? Half of its
> parameters are unused and its entire implementation is shorter than the function
> name.

panthor_mmu_as_fault_mask() is only here because at some point I
intended to port JM HW support to panthor, the fault mask for an AS
there is `BIT(as) | BIT(as + 16)` instead of just `BIT(as)` on new HW.
So the plan was for panthor_mmu_as_fault_mask() to abstract that for us.