From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6E354C83F17 for ; Tue, 15 Jul 2025 15:04:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E32F810E227; Tue, 15 Jul 2025 15:04:08 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; unprotected) header.d=collabora.com header.i=caterina.shablia@collabora.com header.b="RXdpNAOs"; dkim-atps=neutral X-Greylist: delayed 904 seconds by postgrey-1.36 at gabe; Tue, 15 Jul 2025 15:04:08 UTC Received: from sender4-op-o14.zoho.com (sender4-op-o14.zoho.com [136.143.188.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2F33E10E227 for ; Tue, 15 Jul 2025 15:04:08 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; t=1752590929; cv=none; d=zohomail.com; s=zohoarc; b=NLKS4l7nztxW21MIa6vh82w6apF0HOua5Qkdoh9PRNtE93kYvnEgj+wruyJ2hOTsdEdd/Pi4OFHAL5yKf3dTLjxt+Ws43SgaEmkbvBIIDB1csFz/8NmRArN6LT7G7MS3CQsb0sdaMhiFdZVSpsMFR7xv0PhaCOYV/HqHwv1e6bI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1752590929; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=xFVZ1c6WnKw4ORI58frXJc1W+Rm8JeEJu25qU2qDMtE=; b=QG8/HMyAVQpYVD6I4rKCEOp8v2DJo/57nRsUQf2YWLcnppaetyxBaidDDa8tr53Qs7JyjpHOsGJVMnPwovtxjQCpe/bEWa0ILQXXmDk3t2vV3HgES320Jq8LwVhHh14Op5ILMRk/f8JjGKUUIyo474N0IkzYM8t8tL6z7Dr0TIE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=collabora.com; spf=pass smtp.mailfrom=caterina.shablia@collabora.com; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1752590929; s=zohomail; d=collabora.com; i=caterina.shablia@collabora.com; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Content-Type:Message-Id:Reply-To; bh=xFVZ1c6WnKw4ORI58frXJc1W+Rm8JeEJu25qU2qDMtE=; b=RXdpNAOshXyaTWeIuqgynTqFV7bbnyYpgn3AIS4gAb+shAz8CyWp9Al8KnQ3MCxm up640mI+WcKTnZFnP0p+Ld7muybq/DuMTTRZ0UfeBPcktFaxzViF/U76+Smdxh8CHQF IgT4l8Ek9N/dDC2Lmo3z+tCPaLVTadGjDmNckbj8= Received: by mx.zohomail.com with SMTPS id 1752590927616603.6004449742125; Tue, 15 Jul 2025 07:48:47 -0700 (PDT) From: Caterina Shablia To: Adrian Larumbe Cc: Boris Brezillon , Steven Price , Liviu Dudau , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , kernel@collabora.com, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 1/7] drm/panthor: Add support for atomic page table updates Date: Tue, 15 Jul 2025 16:48:43 +0200 Message-ID: <2696295.fDdHjke4Dd@xps> In-Reply-To: <5wxljw27mc4f2i6ag54upmpjxjj5odnd6d57fiiozpb3hjl4zi@okwx34aj56b6> References: <20250703152908.16702-2-caterina.shablia@collabora.com> <20250703152908.16702-3-caterina.shablia@collabora.com> <5wxljw27mc4f2i6ag54upmpjxjj5odnd6d57fiiozpb3hjl4zi@okwx34aj56b6> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" El viernes, 11 de julio de 2025 21:11:15 (hora de verano de Europa central)= ,=20 Adrian Larumbe escribi=C3=B3: > Hi Caterina, >=20 > On 03.07.2025 15:28, Caterina Shablia wrote: > > From: Boris Brezillon > >=20 > > Move the lock/flush_mem operations around the gpuvm_sm_map() calls so > > we can implement true atomic page updates, where any access in the > > locked range done by the GPU has to wait for the page table updates > > to land before proceeding. > >=20 > > This is needed for vkQueueBindSparse(), so we can replace the dummy > > page mapped over the entire object by actual BO backed pages in an atom= ic > > way. > >=20 > > Signed-off-by: Boris Brezillon > > Signed-off-by: Caterina Shablia > > --- > >=20 > > drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++-- > > 1 file changed, 62 insertions(+), 3 deletions(-) > >=20 > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c > > b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..1e58948587a9 > > 100644 > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c > > @@ -387,6 +387,15 @@ struct panthor_vm { > >=20 > > * flagged as faulty as a result. > > */ > > =09 > > bool unhandled_fault; > >=20 > > + > > + /** @locked_region: Information about the currently locked region > > currently. */ > Nit: delete second 'current' >=20 > > + struct { > > + /** @locked_region.start: Start of the locked region.=20 */ > > + u64 start; > > + > > + /** @locked_region.size: Size of the locked region. */ > > + u64 size; > > + } locked_region; > >=20 > > }; > > =20 > > /** > >=20 > > @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm) > >=20 > > } > > =09 > > ret =3D panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,=20 transcfg, > > vm->memattr);>=20 > > + if (!ret && vm->locked_region.size) { > > + lock_region(ptdev, vm->as.id, vm->locked_region.start, > > vm->locked_region.size); > Why do we need to lock the region after enabling a new AS? See explanation further below for what happens in panthor_vm_lock_region. I= n=20 short, we might poke vm_bind on a vm that's not active, locking a region.=20 Because vm is not active, there's nothing for us to do to lock the region,= =20 beyond just specifying the region itself. If a vm in the meanwhile becomes= =20 active while some region is locked, we'll need to inform the mmu about the= =20 region that's supposed to be locked.=20 >=20 > > + ret =3D wait_ready(ptdev, vm->as.id); > > + } > >=20 > > out_make_active: > > if (!ret) { > >=20 > > @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm > > *vm, u64 iova, u64 size)>=20 > > struct io_pgtable_ops *ops =3D vm->pgtbl_ops; > > u64 offset =3D 0; > >=20 > > + drm_WARN_ON(&ptdev->base, > > + (iova < vm->locked_region.start) || > > + (iova + size > vm->locked_region.start + vm- >locked_region.size)); > >=20 > > drm_dbg(&ptdev->base, "unmap: as=3D%d, iova=3D%llx, len=3D%llx", vm- >as.id, > > iova, size); > > =09 > > while (offset < size) { > >=20 > > @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct panthor_= vm > > *vm, u64 iova, u64 size)>=20 > > iova + offset + unmapped_sz, > > iova + offset + pgsize * pgcount, > > iova, iova + size); > >=20 > > - panthor_vm_flush_range(vm, iova, offset +=20 unmapped_sz); >=20 > We've removed all calls to panthor_vm_flush_range(), but I don't see it > being done in panthor_vm_exec_op() before the region is unlocked. It's > effectively become dead code. >=20 > However, even if we did 'panthor_vm_flush_range(vm, op->va.addr, > op->va.range);' in panthor_vm_exec_op() right before we unlock the region, > we wouldn't be dealing well with the case in which only a partial unmap > happens, but maybe this isn't a big deal either. panthor_vm_unlock_region will do the necessary flushes. >=20 > > return -EINVAL; > > =09 > > } > > offset +=3D unmapped_sz; > > =09 > > } > >=20 > > - return panthor_vm_flush_range(vm, iova, size); > > + return 0; > >=20 > > } > > =20 > > static int > >=20 > > @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 io= va, > > int prot,>=20 > > if (!size) > > =09 > > return 0; > >=20 > > + drm_WARN_ON(&ptdev->base, > > + (iova < vm->locked_region.start) || > > + (iova + size > vm->locked_region.start + vm- >locked_region.size)); > > + > >=20 > > for_each_sgtable_dma_sg(sgt, sgl, count) { > > =09 > > dma_addr_t paddr =3D sg_dma_address(sgl); > > size_t len =3D sg_dma_len(sgl); > >=20 > > @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 io= va, > > int prot,>=20 > > offset =3D 0; > > =09 > > } > >=20 > > - return panthor_vm_flush_range(vm, start_iova, iova - start_iova); > >=20 > >=20 > > + return 0; > >=20 > > } > > =20 > > static int flags_to_prot(u32 flags) > >=20 > > @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct > > panthor_device *ptdev,>=20 > > } > > =20 > > } > >=20 > > +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64 > > size) +{ > > + struct panthor_device *ptdev =3D vm->ptdev; > > + int ret; > > + > > + mutex_lock(&ptdev->mmu->as.slots_lock); > > + drm_WARN_ON(&ptdev->base, vm->locked_region.start || > > vm->locked_region.size); + vm->locked_region.start =3D start; > > + vm->locked_region.size =3D size; > > + if (vm->as.id >=3D 0) { >=20 > I guess VM bind operations don't increase the active_cnt of a VM, so we > might try to be mapping addresses from UM while no active groups are > submitting jobs targetting this VM? vm_bind, beyond locking the region, does not involve the gpu and is perform= ed=20 entirely on host. If we were to try vm_binding while active_cnt is 0, we wo= n't=20 poke the mmu to do any locking. Instead, when panthor_vm_active is poked to= =20 make the vm active, we'll poke the mmu to actually lock the region supposed= to=20 be locked. > > + lock_region(ptdev, vm->as.id, start, size); > > + ret =3D wait_ready(ptdev, vm->as.id); >=20 > I've noticed in mmu_hw_do_operation_locked() we don't do wait_ready() aft= er > locking the region. Is it missing or else maybe waiting for the AS to be > locked isn't necessary here? > > + } > > + mutex_unlock(&ptdev->mmu->as.slots_lock); > > + > > + return ret; > > +} > > + > > +static void panthor_vm_unlock_region(struct panthor_vm *vm) > > +{ > > + struct panthor_device *ptdev =3D vm->ptdev; > > + > > + mutex_lock(&ptdev->mmu->as.slots_lock); > > + if (vm->as.id >=3D 0) { > > + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM); >=20 > I guess this is why we no longer need to call panthor_vm_flush_range() ri= ght > before this function. Does AS_COMMAND_FLUSH_MEM only flush the locked > region? Also, why not AS_COMMAND_FLUSH_PT instead? IIUC FLUSH_MEM is less heavy of the two hammers. The doc says that it only= =20 invalidates MMU caches when used without a previous LOCK, so I guess it doe= s=20 something smarter in this case. FLUSH_PT invalidates MMU caches=20 unconditionally. > > + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm- >as.id)); > > + } > > + vm->locked_region.start =3D 0; > > + vm->locked_region.size =3D 0; > > + mutex_unlock(&ptdev->mmu->as.slots_lock); > > +} > > + > >=20 > > static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 > > status) { > > =20 > > bool has_unhandled_faults =3D false; > >=20 > > @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct > > panthor_vm_op_ctx *op,>=20 > > mutex_lock(&vm->op_lock); > > vm->op_ctx =3D op; > >=20 > > + > > + ret =3D panthor_vm_lock_region(vm, op->va.addr, op->va.range); > > + if (ret) > > + goto out; > > + > >=20 > > switch (op_type) { > > =09 > > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP: > > if (vm->unusable) { > >=20 > > @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct > > panthor_vm_op_ctx *op,>=20 > > break; > > =09 > > } > >=20 > > + panthor_vm_unlock_region(vm); > > + > >=20 > > +out: > > if (ret && flag_vm_unusable_on_failure) > > =09 > > vm->unusable =3D true; > >=20 > > -- > > 2.47.2 >=20 > Adrian Larumbe