From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 662C7C4707B for ; Mon, 15 Jan 2024 14:18:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CD0E010E2B8; Mon, 15 Jan 2024 14:18:22 +0000 (UTC) Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9327810E2B8 for ; Mon, 15 Jan 2024 14:18:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1705328300; bh=z6KcxdknYsSFxGXKIVsllpOIvVGKC47lgB87/ecGrAc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=hbcZS9+GVdHRXrDn5zFilkKoGe/wz9Un8A6DldbXHgDSf3F5jW03spi4g+UeM9s1j Q+pBh/TWMcpJlB3q6wEy7VXqKYTJtOalK3LGzcCNlgITfDJZ7EOs5OMEsoX/oPBgwU 5ajYj4WJ+rCSaEhhi0E1bv5BSmWvuSeKligdsuos0NBRsbwP1XIYwjJXIZ9qIBl5EQ rbV7C+AzMdUX42vFNHodFAw0LgvNvpBAqUEbWqYURNaDJHEk6SQxo3whjhBi07So4a ogw5Hx/TLhHwxj8I+6vtcP3rYOxhxziF8NethYeciszzty/BWArM3cH2Nmz0CvbtOE hUG42xxPXCKMw== Received: from localhost (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 683413781F80; Mon, 15 Jan 2024 14:18:19 +0000 (UTC) Date: Mon, 15 Jan 2024 15:18:18 +0100 From: Boris Brezillon To: Faith Ekstrand Subject: Re: [PATCH v3 00/14] drm: Add a driver for CSF-based Mali GPUs Message-ID: <20240115151818.521622ca@collabora.com> In-Reply-To: References: <20231204173313.2098733-1-boris.brezillon@collabora.com> <08C0F75B-292B-444D-A509-1451B08D0059@gmail.com> <20231211095206.13e3d7ba@collabora.com> Organization: Collabora X-Mailer: Claws Mail 4.2.0 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Neil Armstrong , Tatsuyuki Ishi , Nicolas Boichat , Daniel Stone , Liviu Dudau , dri-devel@lists.freedesktop.org, Steven Price , =?UTF-8?B?Q2zDqW1lbnQgUMOpcm9u?= , kernel@collabora.com, Robin Murphy , "Marty E . Plummer" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Hi Faith, Sorry for the late reply, I only got back to panthor very recently. On Mon, 11 Dec 2023 12:18:04 -0600 Faith Ekstrand wrote: > On Mon, 2023-12-11 at 09:52 +0100, Boris Brezillon wrote: > > Hi, > >=20 > > On Sun, 10 Dec 2023 13:58:51 +0900 > > Tatsuyuki Ishi wrote: > > =20 > > > > On Dec 5, 2023, at 2:32, Boris Brezillon > > > > wrote: > > > >=20 > > > > Hello, > > > >=20 > > > > This is the 3rd version of the kernel driver for Mali CSF-based > > > > GPUs. > > > >=20 > > > > With all the DRM dependencies being merged (drm-sched single > > > > entity and > > > > drm_gpuvm), I thought now was a good time to post a new version. > > > > Note > > > > that the iommu series we depend on [1] has been merged recently. > > > > The > > > > only remaining dependency that hasn't been merged yet is this > > > > rather > > > > trival drm_gpuvm [2] patch. > > > >=20 > > > > As for v2, I pushed a branch based on drm-misc-next and > > > > containing > > > > all the dependencies that are not yet available in drm-misc-next > > > > here[3], and another [4] containing extra patches to have things > > > > working on rk3588. The CSF firmware binary can be found here[5], > > > > and > > > > should be placed under > > > > /lib/firmware/arm/mali/arch10.8/mali_csffw.bin. > > > >=20 > > > > The mesa MR adding v10 support on top of panthor is available > > > > here [6]. > > > >=20 > > > > Regarding the GPL2+MIT relicensing, Liviu did an audit and found > > > > two > > > > more people that I didn't spot initially: Cl=C3=A9ment P=C3=A9ron f= or the > > > > devfreq > > > > code, and Alexey Sheplyakov for some bits in panthor_gpu.c. Both > > > > are > > > > Cc-ed on the relevant patches. The rest of the code is either > > > > new, or > > > > covered by the Linaro, Arm and Collabora acks. > > > >=20 > > > > And here is a non-exhaustive changelog, check each commit for a > > > > detailed > > > > changelog. > > > >=20 > > > > v3; > > > > - Quite a few changes at the MMU/sched level to make the fix some > > > > =C2=A0race conditions and deadlocks > > > > - Addition of the a sync-only VM_BIND operation (to support > > > > =C2=A0vkQueueSparseBind with zero commands).=C2=A0 =20 > > >=20 > > > Hi Boris, > > >=20 > > > Just wanted to point out that vkQueueBindSparse's semantics is > > > rather different > > > from vkQueueSubmit when it comes to synchronization.=C2=A0 In short, > > > vkQueueBindSparse does not operate on a particular timeline (aka > > > scheduling > > > queue / drm_sched_entity).=C2=A0 The property of following a timeline > > > order is known > > > as =E2=80=9Csubmission order=E2=80=9D [1] in Vulkan, and applies to v= kQueueSubmit > > > only and not > > > vkQueueBindSparse. =20 > >=20 > > Hm, okay. I really though the same ordering guarantees applied to > > sparse binding queues too, as the spec [1] says > >=20 > > " > > Batches begin execution in the order they appear in pBindInfo, but > > may complete out of order. > > " =20 >=20 > Right. So this is something where the Vulkan spec isn't terribly clear > and I think I need to go file a spec bug.=C2=A0 I'm fairly sure that the > intent is that bind operations MAY complete out-of-order but are not > required to complete out-of-order.=C2=A0 In other words, I'm fairly sure > that implementations are allowed but not required to insert extra > dependencies that force some amount of ordering.=C2=A0 We take advantage = of > this in Mesa today to properly implement vkQueueWaitIdle() on sparse > binding queues. Do I get it correctly that, for Mesa's generic vk_common_QueueWaitIdle() implementation to work correctly, we not only need to guarantee in-order submission, but also in-order completion on the queue. I mean, that's no problem for panvk/panthor, because that's how it's implemented anyway, but I didn't realize this constraint existed until you mentioned it. > This is also a requirement of Windows WDDM2 as far as > I can tell so I'd be very surprised if we disallowed it in the spec. >=20 > That said, that's all very unclear and IDK where you'd get that from > the spec.=C2=A0 I'm going to go file a spec bug so we can get this sorted > out. >=20 > ~Faith >=20 >=20 > > which means things are submitted in order inside a vkQueueSparseBind > > context, so I was expecting the submission ordering guarantee to > > apply > > across vkQueueSparseBind() calls on the same queue too. Just want to > > mention that all kernel implementations I have seen so far assume > > VM_BIND submissions on a given queue are serialized (that's how > > drm_sched works, and Xe, Nouveau and Panthor are basing their VM_BIND > > implemenation on drm_sched). > >=20 > > Maybe Faith, or anyone deeply involved in the Vulkan specification, > > can > > confirm that submission ordering guarantees are relaxed on sparse > > binding queues. > > =20 > > >=20 > > > Hence, an implementation that takes full advantage of Vulkan > > > semantics would > > > essentially have an infinite amount of VM_BIND contexts. =20 > >=20 > > Uh, that's definitely not how I understood it initially... > > =20 > > > It would also not need > > > sync-only VM_BIND submissions, assuming that drmSyncobjTransfer > > > works. =20 > >=20 > > Sure, if each vkQueueSparseBind() has its own timeline, an internal > > timeline-syncobj with a bunch of drmSyncobjTransfer() calls would do > > the > > trick (might require several ioctls() to merge input syncobjs, but > > that's probably not the end of the world). Back to the kernel-side implementation. As Tatsuyuki pointed out, we can always replace the sync-only VM_BIND (no actual operation on the VM, just a bunch of wait/signal sync operations) with X calls to drmSyncobjTransfer() and a mesa-driver-specific timeline syncobj that's used to consolidate all the operations we have submitted so far. But given Nouveau supports this sync-only operation (at least that's my understanding of the nouveau_uvmm.c code and how VM_BIND is currently used in NVK), I guess there are good reasons to support that natively. Besides the potential optimization on the number of ioctl() calls, and the fact adding support for VM_BIND with zero VM ops is trivial enough that it's probably worth adding to simplify the UMD implementation, is there anything else I'm missing? Regards, Boris