From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0AB32C8303C for ; Mon, 7 Jul 2025 09:29:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B034210E319; Mon, 7 Jul 2025 09:29:48 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=holodeck1.com header.i=@holodeck1.com header.b="gOCucfBN"; dkim-atps=neutral X-Greylist: delayed 360 seconds by postgrey-1.36 at gabe; Mon, 07 Jul 2025 09:29:46 UTC Received: from mx.kolabnow.com (mx.kolabnow.com [212.103.80.155]) by gabe.freedesktop.org (Postfix) with ESMTPS id DAD7210E319 for ; Mon, 7 Jul 2025 09:29:46 +0000 (UTC) Received: from localhost (unknown [127.0.0.1]) by mx.kolabnow.com (Postfix) with ESMTP id F192520B275B; Mon, 7 Jul 2025 11:23:44 +0200 (CEST) Authentication-Results: ext-mx-out011.mykolab.com (amavis); dkim=pass reason="pass (just generated, assumed good)" header.d=holodeck1.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holodeck1.com; h=content-type:content-type:mime-version:references:in-reply-to :organization:message-id:date:date:subject:subject:from:from :received:received:received; s=dkim2; t=1751880223; x= 1753694624; bh=ucVJc7Nis8vBbLGTPWF3kBljITl4fTU5kUGtxKRPEug=; b=g OCucfBNhM6AG4KG8jN90wDFNI2k2dpFYMM4eGfB6vclUIWApjjtrcvwgsZW+fwHd jePdpoFli9T7odxWpbdRD4GSKcPOs+5fHdy8vUk1s8d3Xut4DWDbbO2RTEUs7hLV 1yVCi1oNpdKAz6tG98AqjugfAhp39OIXAatANm7Miam37oydoPNGdlxnJ6K10Ho+ ACWGpSu0w5y+O09jFBQZuPQoAN8TCE/2CKph/lB2gQ4aRcCUjm4Lfh5plL71yCsY 5Hv1pvOdaiixNNrDrJotbQ1T9ZYekFe/idprv0W1dqfNpWJXHPROJnuOma5oyTLL pwkN5IFahwdXfbVk8j4Yg== X-Virus-Scanned: amavis at mykolab.com Received: from mx.kolabnow.com ([127.0.0.1]) by localhost (ext-mx-out011.mykolab.com [127.0.0.1]) (amavis, port 10024) with ESMTP id HGYzK6BE8JMj; Mon, 7 Jul 2025 11:23:43 +0200 (CEST) Received: from int-mx009.mykolab.com (unknown [10.9.13.9]) by mx.kolabnow.com (Postfix) with ESMTPS id 40B9120B2745; Mon, 7 Jul 2025 11:23:43 +0200 (CEST) Received: from ext-subm010.mykolab.com (unknown [10.9.6.10]) by int-mx009.mykolab.com (Postfix) with ESMTPS id F070D2093AFD; Mon, 7 Jul 2025 11:23:42 +0200 (CEST) From: Jure Repinc To: Alex Deucher , amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] Documentation: add initial documenation for user queues Date: Mon, 07 Jul 2025 11:23:35 +0200 Message-ID: <1908655.LH7GnMWURc@excalibur> Organization: Holodeck 1 In-Reply-To: <20250502205924.935319-1-alexander.deucher@amd.com> References: <20250502205924.935319-1-alexander.deucher@amd.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart3555875.sQuhbGJ8Bu"; micalg="sha256"; protocol="application/pkcs7-signature" X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" --nextPart3555875.sQuhbGJ8Bu Content-Type: multipart/alternative; boundary="nextPart2208764.yiUUSuA9gR" Content-Transfer-Encoding: 7Bit This is a multi-part message in MIME format. --nextPart2208764.yiUUSuA9gR Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="utf-8" Hi Some suggestions in addition to ones from Rodrigo > Add an initial documentation page for user mode queues. > > Signed-off-by: Alex Deucher > --- > Documentation/gpu/amdgpu/index.rst | 1 + > Documentation/gpu/amdgpu/userq.rst | 196 +++++++++++++++++++++++++++++ > 2 files changed, 197 insertions(+) > create mode 100644 Documentation/gpu/amdgpu/userq.rst > > diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst > index bb2894b5edaf2..45523e9860fc5 100644 > --- a/Documentation/gpu/amdgpu/index.rst > +++ b/Documentation/gpu/amdgpu/index.rst > @@ -12,6 +12,7 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. > module-parameters > gc/index > display/index > + userq > flashing > xgmi > ras > diff --git a/Documentation/gpu/amdgpu/userq.rst b/Documentation/gpu/amdgpu/userq.rst > new file mode 100644 > index 0000000000000..53e6b053f652f > --- /dev/null > +++ b/Documentation/gpu/amdgpu/userq.rst > @@ -0,0 +1,196 @@ > +================== > + User Mode Queues > +================== > + > +Introduction > +============ > + > +Similar to the KFD, GPU engine queues move into userspace. The idea is to let > +user processes manage their submissions to the GPU engines directly, bypassing > +IOCTL calls to the driver to submit work. This reduces overhead and also allows > +the GPU to submit work to itself. Applications can set up work graphs of jobs > +across multiple GPU engines without needing trips through the CPU. > + > +UMDs directly interface with firmware via per application shared memory areas. > +The main vehicle for this is queue. A queue is a ring buffer with a read > +pointer (rptr) and a write pointer (wptr). The UMD writes IP specific packets > +into the queue and the firmware processes those packets, kicking off work on the > +GPU engines. The CPU in the application (or another queue or device) updates > +the wptr to tell the firmware how far into the ring buffer to process packets > +and the rtpr provides feedback to the UMD on how far the firmware has progressed > +in executing those packets. When the wptr and the rptr are equal, the queue is > +idle. > + > +Theory of Operation > +=================== > + > +The various engines on modern AMD GPUs support multiple queues per engine with a > +scheduling firmware which handles dynamically scheduling user queues on the > +available hardware queue slots. When the number of user queues outnumbers the > +available hardware queue slots, the scheduling firmware dynamically maps and > +unmaps queues based on priority and time quanta. The state of each user queue > +is managed in the kernel driver in an MQD (Memory Queue Descriptor). This is a > +buffer in GPU accessible memory that stores the state of a user queue. The > +scheduling firmware uses the MQD to load the queue state into an HQD (Hardware > +Queue Descriptor) when a user queue is mapped. Each user queue requires a > +number of additional buffers which represent the ring buffer and any metadata > +needed by the engine for runtime operation. On most engines this consists of > +the ring buffer itself, a rptr buffer (where the firmware will shadow the rptr > +to userspace), a wrptr buffer (where the application will write the wptr for the wrptr -> wptr > +firmware to fetch it), and a doorbell. A doorbell is a piece of the device's > +MMIO BAR which can be mapped to specific user queues. Writing to the doorbell > +wakes the firmware and causes it to fetch the wptr and start processing the > +packets in the queue. Each 4K page of the doorbell BAR supports specific offset > +ranges for specific engines. The doorbell of a queue most be mapped into the > +aperture aligned to the IP used by the queue (e.g., GFX, VCN, SDMA, etc.). > +These doorbell apertures are set up via NBIO registers. Doorbells are 32 bit or > +64 bit (depending on the engine) chunks of the doorbell BAR. A 4K doorbell page > +provides 512 64-bit doorbells for up to 512 user queues. A subset of each page > +is reserved for each IP type supported on the device. The user can query the > +doorbell ranges for each IP via the INFO IOCTL. > + > +When an application wants to create a user queue, it allocates the the necessary the the -> the > +buffers for the queue (ring buffer, wptr and rptr, context save areas, etc.). > +These can be separate buffers or all part of one larger buffer. The application > +would map the buffer(s) into its GPUVM and use the GPU virtual addresses of for > +the areas of memory they want t use for the user queue. They would also > +allocate a doorbell page for the doorbells used by the user queues. The > +application would then populate the MQD in the USERQ IOCTL structure with the > +GPU virtual addresses and doorbell index they want to use. The user can also > +specify the attributes for the user queue (priority, whether the queue is secure > +for protected content, etc.). The application would then call the USERQ > +create IOCTL to create the queue from using the specified MQD. The create IOCTL -> CREATE IOCTL > +kernel driver then validates the MQD provided by the application and translates > +the MQD into the engine specific MQD format for the IP. The IP specific MQD > +would be allocated and the queue would be added to the run list maintained by > +the scheduling firmware. Once the queue has been created, the application can > +write packets directly into the queue, update the wptr, and write to the > +doorbell offset to kick off work in the user queue. > + > +When the application is done with the user queue, it would call the USERQ > +FREE IOCTL to destroy it. The kernel driver would preempt the queue and > +remove it from the scheduling firmware's run list. Then the IP specific MQD > +would be freed and the user queue state would be cleaned up. > + > +Some engines may require the aggregated doorbell to if the engine does not > +support doorbells from unmapped queues. The aggregated doorbell is a special > +page of doorbell space which wakes the scheduler. In cases where the engine may > +be oversubscribed, some queues may not be mapped. If the doorbell is rung when > +the queue is not mapped, the engine firmware may miss the request. Some > +scheduling firmware may work around this my polling wptr shadows when the > +hardware is oversubscribed, other engines may support doorbell updates from > +unmapped queues. In the event that one of these options is not available, the > +kernel driver will map a page of aggregated doorbell space into each GPUVM > +space. The UMD will then update the doorbell and wptr as normal and then write > +to the aggregated doorbell as well. > + > +Special Packets > +--------------- > + > +In order to support legacy implicit synchronization, as well as mixed user and > +kernel queues, we need a synchronization mechanism that is secure. Because > +kernel queues or memory management tasks depend on kernel fences, we need a way > +for user queues to update memory that the kernel can use for a fence, that can't > +be messed with by a bad actor. To support this, we've added protected fence > +packet. This packet works by writing the a monotonically increasing value to the a -> a > +a memory location that is only the privileged clients have write access to. is only -> only > +User queues only have read access. When this packet is executed, the memory > +location is updated and other queues (kernel or user) can see the results. > + > +Memory Management > +================= > + > +It is assumed that all buffers mapped into the GPUVM space for the process are > +valid when engines on the GPU are running. The kernel driver will only allow > +user queues to run when all buffers are mapped. If there is a memory event that > +requires buffer migration, the kernel driver will preempt the user queues, > +migrate buffers to where they need to be, update the GPUVM page tables and > +invaldidate the TLB, and then resume the user queues. > + > +Interaction with Kernel Queues > +============================== > + > +Depending on the IP and the scheduling firmware, you can enable kernel queues > +and user queues at the same time, However, you are limited by the HQD slots. > +Kernel queues are always mapped so any work the goes into kernel queues will the goes -> that goes > +take priority. This limits the available HQD slots for user queues. > + > +Not all IPs will support user queues on all GPUs. As such, UMDs will need to > +support both user queues and kernel queues depending on the IP. For example, a > +GPU may support user queues for GFX, compute, and SDMA, but not for VCN, JPEG, > +and VPE. UMDs need to support both. The kernel driver provides a way to > +determine if user queues and kernel queues are supported on a per IP basis. > +UMDs can query this information via the INFO IOCTL and determine whether to use > +kernel queues or user queues for each IP. > + > +Queue Resets > +============ > + > +For most engines, queues can be reset individually. GFX, compute, and SDMA > +queues can be reset individually. When a hung queue is detected, it can be > +reset either via the scheduling firmware or MMIO. Since there are no kernel > +fences for most user queues, they will usually only be detected when some other > +event happens; e.g., a memory event which requires migration of buffers. When > +the queues are preempted, if the queue is hung, the preemption will fail. > +Driver will them look up the queues that failed to preempt and reset them and them -> then > +record which queues are hung. > + > + > +On the UMD side, we will add an USERQ QUERY_STATUS IOCTL to query the queue an USERQ -> a USERQ > +status. UMD will provide the queue id in the IOCTL and the kernel driver > +will check if it has already recorded the queue as hung (e.g., due to failed > +peemption) and report back the status. > + > +IOCTL Interfaces > +================ > + > +GPU virtual addresses used for queues and related data (rptrs, wptrs, context > +save areas, etc.) should be validated by the kernel mode driver to prevent the > +user from specifying invalid GPU virtual addresses. If the user provides > +invalid GPU virtual addresses or doorbell indicies, the IOCTL should return an > +error message. These buffers should also be tracked in the kernel driver so > +that if the user attempts to unmap the buffer(s) from the GPUVM, the umap call > +would return an error. > + > +INFO > +---- > +There are several new INFO queries related to user queues in order to query the > +size of user queue meta data needed for a user queue (e.g., context save areas > +or shadow buffers), and whether kernel or user queues or both are supported > +for each IP type. > + > +USERQ > +----- > +The USERQ IOCTL is used for creating, freeing, and querying the status of user > +queues. It supports 3 opcodes: > + > +1. CREATE - Create a user queue. The application provides a MQD-like structure > + that devices the type of queue and associated metadata and flags for that devices -> describes/defines > + queue type. Returns the queue id. > +2. FREE - Free a user queue. > +3. QUERY_STATRUS - Query that status of a queue. Used to check if the queue is > + healthy or not. E.g., if the queue has been reset. (WIP) > + > +USERQ_SIGNAL > +------------ > +The USERQ_SIGNAL IOCTL is used to provide a list of sync objects to be signaled. > + > +USERQ_WAIT > +---------- > +The USERQ_WAIT IOCTL is used to provide a list of sync object to be waited on. > + > +Kernel and User Queues > +====================== > + > +In order to properly validate and test performance, we have a driver option to > +select what type of queues are enabled (kernel queues, user queues or both). > +The user_queue driver parameter allows you to enable kernel queues only (0), > +user queues and kernel queues (1), and user queues only (2). Enabling user > +queues only will free up static queue assignments that would otherwise be used > +by kernel queues for use by the scheduling firmware. Some kernel queues are > +required for kernel driver operation and they will always be created. When the > +kernel queues are not enabled, they are not registered with the drm scheduler > +and the CS IOCTL will reject any incoming command submissions which target those > +queue types. Kernel queues only mirrors the behavior on all existing GPUs. > +Enabling both queues allows for backwards compatibility with old userspace while > +still supporting user queues. > Have a great time, Jure Repinc -- Jabber/XMPP: JLP@jabber.org Matrix: @jlp:matrix.org Mastodon/ActivityPub: @JRepin@mstdn.io --nextPart2208764.yiUUSuA9gR Content-Transfer-Encoding: 7Bit Content-Type: text/html; charset="utf-8"

Hi


Some suggestions in addition to ones from Rodrigo


> Add an initial documentation page for user mode queues.

>

> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

> ---

>  Documentation/gpu/amdgpu/index.rst |   1 +

>  Documentation/gpu/amdgpu/userq.rst | 196 +++++++++++++++++++++++++++++

>  2 files changed, 197 insertions(+)

>  create mode 100644 Documentation/gpu/amdgpu/userq.rst

>

> diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst

> index bb2894b5edaf2..45523e9860fc5 100644

> --- a/Documentation/gpu/amdgpu/index.rst

> +++ b/Documentation/gpu/amdgpu/index.rst

> @@ -12,6 +12,7 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures.

>     module-parameters

>     gc/index

>     display/index

> +   userq

>     flashing

>     xgmi

>     ras

> diff --git a/Documentation/gpu/amdgpu/userq.rst b/Documentation/gpu/amdgpu/userq.rst

> new file mode 100644

> index 0000000000000..53e6b053f652f

> --- /dev/null

> +++ b/Documentation/gpu/amdgpu/userq.rst

> @@ -0,0 +1,196 @@

> +==================

> + User Mode Queues

> +==================

> +

> +Introduction

> +============

> +

> +Similar to the KFD, GPU engine queues move into userspace.  The idea is to let

> +user processes manage their submissions to the GPU engines directly, bypassing

> +IOCTL calls to the driver to submit work.  This reduces overhead and also allows

> +the GPU to submit work to itself.  Applications can set up work graphs of jobs

> +across multiple GPU engines without needing trips through the CPU.

> +

> +UMDs directly interface with firmware via per application shared memory areas.

> +The main vehicle for this is queue.  A queue is a ring buffer with a read

> +pointer (rptr) and a write pointer (wptr).  The UMD writes IP specific packets

> +into the queue and the firmware processes those packets, kicking off work on the

> +GPU engines.  The CPU in the application (or another queue or device) updates

> +the wptr to tell the firmware how far into the ring buffer to process packets

> +and the rtpr provides feedback to the UMD on how far the firmware has progressed

> +in executing those packets.  When the wptr and the rptr are equal, the queue is

> +idle.

> +

> +Theory of Operation

> +===================

> +

> +The various engines on modern AMD GPUs support multiple queues per engine with a

> +scheduling firmware which handles dynamically scheduling user queues on the

> +available hardware queue slots.  When the number of user queues outnumbers the

> +available hardware queue slots, the scheduling firmware dynamically maps and

> +unmaps queues based on priority and time quanta.  The state of each user queue

> +is managed in the kernel driver in an MQD (Memory Queue Descriptor).  This is a

> +buffer in GPU accessible memory that stores the state of a user queue.  The

> +scheduling firmware uses the MQD to load the queue state into an HQD (Hardware

> +Queue Descriptor) when a user queue is mapped.  Each user queue requires a

> +number of additional buffers which represent the ring buffer and any metadata

> +needed by the engine for runtime operation.  On most engines this consists of

> +the ring buffer itself, a rptr buffer (where the firmware will shadow the rptr

> +to userspace), a wrptr buffer (where the application will write the wptr for the

wrptr -> wptr


> +firmware to fetch it), and a doorbell.  A doorbell is a piece of the device's

> +MMIO BAR which can be mapped to specific user queues.  Writing to the doorbell

> +wakes the firmware and causes it to fetch the wptr and start processing the

> +packets in the queue. Each 4K page of the doorbell BAR supports specific offset

> +ranges for specific engines.  The doorbell of a queue most be mapped into the

> +aperture aligned to the IP used by the queue (e.g., GFX, VCN, SDMA, etc.).

> +These doorbell apertures are set up via NBIO registers.  Doorbells are 32 bit or

> +64 bit (depending on the engine) chunks of the doorbell BAR.  A 4K doorbell page

> +provides 512 64-bit doorbells for up to 512 user queues.  A subset of each page

> +is reserved for each IP type supported on the device.  The user can query the

> +doorbell ranges for each IP via the INFO IOCTL.

> +

> +When an application wants to create a user queue, it allocates the the necessary

the the -> the


> +buffers for the queue (ring buffer, wptr and rptr, context save areas, etc.).

> +These can be separate buffers or all part of one larger buffer.  The application

> +would map the buffer(s) into its GPUVM and use the GPU virtual addresses of for

> +the areas of memory they want t use for the user queue.  They would also

> +allocate a doorbell page for the doorbells used by the user queues.  The

> +application would then populate the MQD in the USERQ IOCTL structure with the

> +GPU virtual addresses and doorbell index they want to use.  The user can also

> +specify the attributes for the user queue (priority, whether the queue is secure

> +for protected content, etc.).  The application would then call the USERQ

> +create IOCTL to create the queue from using the specified MQD.  The

create IOCTL -> CREATE IOCTL

> +kernel driver then validates the MQD provided by the application and translates

> +the MQD into the engine specific MQD format for the IP.  The IP specific MQD

> +would be allocated and the queue would be added to the run list maintained by

> +the scheduling firmware.  Once the queue has been created, the application can

> +write packets directly into the queue, update the wptr, and write to the

> +doorbell offset to kick off work in the user queue.

> +

> +When the application is done with the user queue, it would call the USERQ

> +FREE IOCTL to destroy it.  The kernel driver would preempt the queue and

> +remove it from the scheduling firmware's run list.  Then the IP specific MQD

> +would be freed and the user queue state would be cleaned up.

> +

> +Some engines may require the aggregated doorbell to if the engine does not

> +support doorbells from unmapped queues.  The aggregated doorbell is a special

> +page of doorbell space which wakes the scheduler.  In cases where the engine may

> +be oversubscribed, some queues may not be mapped.  If the doorbell is rung when

> +the queue is not mapped, the engine firmware may miss the request.  Some

> +scheduling firmware may work around this my polling wptr shadows when the

> +hardware is oversubscribed, other engines may support doorbell updates from

> +unmapped queues.  In the event that one of these options is not available, the

> +kernel driver will map a page of aggregated doorbell space into each GPUVM

> +space.  The UMD will then update the doorbell and wptr as normal and then write

> +to the aggregated doorbell as well.

> +

> +Special Packets

> +---------------

> +

> +In order to support legacy implicit synchronization, as well as mixed user and

> +kernel queues, we need a synchronization mechanism that is secure.  Because

> +kernel queues or memory management tasks depend on kernel fences, we need a way

> +for user queues to update memory that the kernel can use for a fence, that can't

> +be messed with by a bad actor.  To support this, we've added protected fence

> +packet.  This packet works by writing the a monotonically increasing value to

the a -> a


> +a memory location that is only the privileged clients have write access to.

is only -> only


> +User queues only have read access.  When this packet is executed, the memory

> +location is updated and other queues (kernel or user) can see the results.

> +

> +Memory Management

> +=================

> +

> +It is assumed that all buffers mapped into the GPUVM space for the process are

> +valid when engines on the GPU are running.  The kernel driver will only allow

> +user queues to run when all buffers are mapped.  If there is a memory event that

> +requires buffer migration, the kernel driver will preempt the user queues,

> +migrate buffers to where they need to be, update the GPUVM page tables and

> +invaldidate the TLB, and then resume the user queues.

> +

> +Interaction with Kernel Queues

> +==============================

> +

> +Depending on the IP and the scheduling firmware, you can enable kernel queues

> +and user queues at the same time,  However, you are limited by the HQD slots.

> +Kernel queues are always mapped so any work the goes into kernel queues will

the goes -> that goes


> +take priority.  This limits the available HQD slots for user queues.

> +

> +Not all IPs will support user queues on all GPUs.  As such, UMDs will need to

> +support both user queues and kernel queues depending on the IP.  For example, a

> +GPU may support user queues for GFX, compute, and SDMA, but not for VCN, JPEG,

> +and VPE.  UMDs need to support both.  The kernel driver provides a way to

> +determine if user queues and kernel queues are supported on a per IP basis.

> +UMDs can query this information via the INFO IOCTL and determine whether to use

> +kernel queues or user queues for each IP.

> +

> +Queue Resets

> +============

> +

> +For most engines, queues can be reset individually.  GFX, compute, and SDMA

> +queues can be reset individually.  When a hung queue is detected, it can be

> +reset either via the scheduling firmware or MMIO.  Since there are no kernel

> +fences for most user queues, they will usually only be detected when some other

> +event happens; e.g., a memory event which requires migration of buffers.  When

> +the queues are preempted, if the queue is hung, the preemption will fail.

> +Driver will them look up the queues that failed to preempt and reset them and

them -> then


> +record which queues are hung.

> +

> +

> +On the UMD side, we will add an USERQ QUERY_STATUS IOCTL to query the queue

an USERQ -> a USERQ


> +status.  UMD will provide the queue id in the IOCTL and the kernel driver

> +will check if it has already recorded the queue as hung (e.g., due to failed

> +peemption) and report back the status.

> +

> +IOCTL Interfaces

> +================

> +

> +GPU virtual addresses used for queues and related data (rptrs, wptrs, context

> +save areas, etc.) should be validated by the kernel mode driver to prevent the

> +user from specifying invalid GPU virtual addresses.  If the user provides

> +invalid GPU virtual addresses or doorbell indicies, the IOCTL should return an

> +error message.  These buffers should also be tracked in the kernel driver so

> +that if the user attempts to unmap the buffer(s) from the GPUVM, the umap call

> +would return an error.

> +

> +INFO

> +----

> +There are several new INFO queries related to user queues in order to query the

> +size of user queue meta data needed for a user queue (e.g., context save areas

> +or shadow buffers), and whether kernel or user queues or both are supported

> +for each IP type.

> +

> +USERQ

> +-----

> +The USERQ IOCTL is used for creating, freeing, and querying the status of user

> +queues.  It supports 3 opcodes:

> +

> +1. CREATE - Create a user queue.  The application provides a MQD-like structure

> +   that devices the type of queue and associated metadata and flags for that

devices -> describes/defines


> +   queue type.  Returns the queue id.

> +2. FREE - Free a user queue.

> +3. QUERY_STATRUS - Query that status of a queue.  Used to check if the queue is

> +   healthy or not.  E.g., if the queue has been reset. (WIP)

> +

> +USERQ_SIGNAL

> +------------

> +The USERQ_SIGNAL IOCTL is used to provide a list of sync objects to be signaled.

> +

> +USERQ_WAIT

> +----------

> +The USERQ_WAIT IOCTL is used to provide a list of sync object to be waited on.

> +

> +Kernel and User Queues

> +======================

> +

> +In order to properly validate and test performance, we have a driver option to

> +select what type of queues are enabled (kernel queues, user queues or both).

> +The user_queue driver parameter allows you to enable kernel queues only (0),

> +user queues and kernel queues (1), and user queues only (2).  Enabling user

> +queues only will free up static queue assignments that would otherwise be used

> +by kernel queues for use by the scheduling firmware.  Some kernel queues are

> +required for kernel driver operation and they will always be created.  When the

> +kernel queues are not enabled, they are not registered with the drm scheduler

> +and the CS IOCTL will reject any incoming command submissions which target those

> +queue types.  Kernel queues only mirrors the behavior on all existing GPUs.

> +Enabling both queues allows for backwards compatibility with old userspace while

> +still supporting user queues.

>


Have a great time,

Jure Repinc


--

         Jabber/XMPP: JLP@jabber.org

              Matrix: @jlp:matrix.org

Mastodon/ActivityPub: @JRepin@mstdn.io


--nextPart2208764.yiUUSuA9gR-- --nextPart3555875.sQuhbGJ8Bu Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Disposition: attachment; filename="smime.p7s" Content-Transfer-Encoding: base64 MIIP3gYJKoZIhvcNAQcCoIIPzzCCD8sCAQExDzANBglghkgBZQMEAgEFADALBgkqhkiG9w0BBwGg gg1qMIIGdTCCBN2gAwIBAgIMKMOYHQAAAABXHdDnMA0GCSqGSIb3DQEBCwUAMFwxCzAJBgNVBAYT AlNJMRwwGgYDVQQKExNSZXB1Ymxpa2EgU2xvdmVuaWphMRcwFQYDVQRhEw5WQVRTSS0xNzY1OTk1 NzEWMBQGA1UEAxMNU0ktVFJVU1QgUm9vdDAeFw0xNjA1MjQxMTQ5NDFaFw0zNjA0MjMyMjAwMDBa MFoxCzAJBgNVBAYTAlNJMRwwGgYDVQQKExNSZXB1Ymxpa2EgU2xvdmVuaWphMRcwFQYDVQRhEw5W QVRTSS0xNzY1OTk1NzEUMBIGA1UEAxMLU0lHRU4tQ0EgRzIwggGiMA0GCSqGSIb3DQEBAQUAA4IB jwAwggGKAoIBgQDRJl0dqc7nAg7Bi5WGSvyYlSuJq1N/3IqHKoA4JK2iB46GMBSc/akw2EMOigD1 9Uce9jNnKi3cZpDMQglTi3MlAX3pv8wajmMBMfQ2P5ID2F3VkcZVKWlPUAnWjzr+3SW3neGqMN+/ 3jixXPuyB45BGhW1kjqZ5i8DIwppQuF3dUYAkyESdGCwtqYAWn1d1vATzdRs7fn5uKNCGqbMcYaL 7hhC5Z0j+hnfKuZKRtzSH9xM07+xXKIoF8gvYEfWB/lkcIdVEUBANSa8TefuhVoClTapLH/zjZBV tHt4xBMbc+6go8KD/p7J+V9+uH3QCgzM3RdIIgl13TJUtMWAByzmkq74UFM0CsOTlVvpfUzqEUUY Zi0PYtg6/fDzg1k2dtqOEeQonQVBtPot0bP643D0zDS/kk0+V4zeQjhhVawBAvwTsOXOx9MRzyt8 5mnlMMN4Vdqk2vJd2C+uADBknlFXv5CM2CmtAbOXH4OeS0qICTqqWw8uh6T+DVl4eommyY8CAwEA AaOCAjcwggIzMBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgEGMDoGA1UdIAQzMDEw LwYEVR0gADAnMCUGCCsGAQUFBwIBFhlodHRwOi8vd3d3LmNhLmdvdi5zaS9jcHMvMGkGCCsGAQUF BwEBBF0wWzA2BggrBgEFBQcwAoYqaHR0cDovL3d3dy5jYS5nb3Yuc2kvY3J0L3NpLXRydXN0LXJv b3QuY3J0MCEGCCsGAQUFBzABhhVodHRwOi8vb2NzcC5jYS5nb3Yuc2kwEQYDVR0OBAoECEwlJ4yo LXKeMIIBPAYDVR0fBIIBMzCCAS8wgbeggbSggbGGKmh0dHA6Ly93d3cuY2EuZ292LnNpL2NybC9z aS10cnVzdC1yb290LmNybIaBgmxkYXA6Ly94NTAwLmdvdi5zaS9jbj1TSS1UUlVTVCUyMFJvb3Qs b3JnYW5pemF0aW9uSWRlbnRpZmllcj1WQVRTSS0xNzY1OTk1NyxvPVJlcHVibGlrYSUyMFNsb3Zl bmlqYSxjPVNJP2NlcnRpZmljYXRlUmV2b2NhdGlvbkxpc3Qwc6BxoG+kbTBrMQswCQYDVQQGEwJT STEcMBoGA1UEChMTUmVwdWJsaWthIFNsb3ZlbmlqYTEXMBUGA1UEYRMOVkFUU0ktMTc2NTk5NTcx FjAUBgNVBAMTDVNJLVRSVVNUIFJvb3QxDTALBgNVBAMTBENSTDEwEwYDVR0jBAwwCoAITKPDaF4I AmMwDQYJKoZIhvcNAQELBQADggGBADVgVyRt64166Ry1oNxqii/zIzrKEr24IPkI7vJmczGJ/lRm 6vIaZBdxeuebD1KICR8YQznI0xRiQZ/cz2oINJdVqCqXlOddRZvufWIcZXpmDPlig86+gueYtJIR Lq+gk4Fjz8tdPo6GhYN7b9wQ15FYDIjgPzEDnq/YQJ+ZJs9aPotskKHvKyaDg8NibS7qdiLXaToo WFCaSV4h+JPtStbw+R7MaLnHvyp8sqhl4vgnPNv3TLwPmWr1jU+EP1gx5axEkKpJamc1zMgTWw/F S4VzrxSKsDZM/7E6cCZHCWziPWs8C3uLqye2tBBBCgjmyNY5XC8rj85Rbpl5K1SIlg9jetEfUNoa WXP0S/GgAtgB5EQ9IXwSjf9D/DxqvOme5bhK7o2l3r/1/OvPmoYttgQhBmpIYQfzacB94yTHDCJZ rWqFc+DW4BOg/dyLLsykcNEnYWClibUWiU/ITlW/AcKkuovMQVMAHYu4u5LveEWEymkbaTxRmHx3 /swn3eZi2jCCBu0wggVVoAMCAQICDBSL7r4AAAAAVyU92jANBgkqhkiG9w0BAQsFADBaMQswCQYD VQQGEwJTSTEcMBoGA1UEChMTUmVwdWJsaWthIFNsb3ZlbmlqYTEXMBUGA1UEYRMOVkFUU0ktMTc2 NTk5NTcxFDASBgNVBAMTC1NJR0VOLUNBIEcyMB4XDTIwMTAwNjE2MzMwMFoXDTI1MTAwNjE3MDMw MFowfzELMAkGA1UEBhMCU0kxEjAQBgNVBAgTCVNsb3ZlbmlqYTEUMBIGA1UECxMLaW5kaXZpZHVh bHMxRjALBgNVBCoTBEp1cmUwDQYDVQQEEwZSZXBpbmMwEgYDVQQDEwtKdXJlIFJlcGluYzAUBgNV BAUTDTI0NjY0NzgzMTIwMzkwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCu1oYnY4M9 ZB4zQrL+Ynjuc1zIYxmHZNwqgv38oz5QzIdJIyg0XRgaOrFcevE30AZMrSwYsVD689pEnQgqrkbA JqExEBi3TtmsH4N5czwtIDiP2MYp9lkrNV/+/ND8hUUVz1i4LkFt8k8PHujuPrLgY/zmsS6Sz07I G05lf/gxlkPMKn8oOKwiCAny4qUYEEZeQG6B2gjkayXn11Uz4DVEIpUd76BaQAYgmC1kicmJXWYm CqKjQDMlXekvNhpFgmse3jnAlunPfoBtWM3OjSra7EJGSrWIkL1aHcXc8wMhwT2n9IsEuTitZOKw S17bOKEqWzQ/fXGgTyt8BSSMB6H3AgMBAAGjggMMMIIDCDAOBgNVHQ8BAf8EBAMCBeAwTAYDVR0g BEUwQzA2BgsrBgEEAa9ZAgIDBTAnMCUGCCsGAQUFBwIBFhlodHRwOi8vd3d3LmNhLmdvdi5zaS9j cHMvMAkGBwQAi+xAAQAwgaUGCCsGAQUFBwEDBIGYMIGVMAgGBgQAjkYBATB0BgYEAI5GAQUwajAz Fi1odHRwczovL3d3dy5jYS5nb3Yuc2kvY3BzL3NpZ2VuY2EyX3Bkc19lbi5wZGYTAmVuMDMWLWh0 dHBzOi8vd3d3LmNhLmdvdi5zaS9jcHMvc2lnZW5jYTJfcGRzX3NsLnBkZhMCc2wwEwYGBACORgEG MAkGBwQAjkYBBgEwcQYIKwYBBQUHAQEEZTBjMDwGCCsGAQUFBzAChjBodHRwOi8vd3d3LnNpZ2Vu LWNhLnNpL2NydC9zaWdlbi1jYS1nMi1jZXJ0cy5wN2MwIwYIKwYBBQUHMAGGF2h0dHA6Ly9vY3Nw LnNpZ2VuLWNhLnNpMBwGA1UdEQQVMBOBEWpscEBob2xvZGVjazEuY29tMIIBOgYDVR0fBIIBMTCC AS0wgbWggbKgga+GKmh0dHA6Ly93d3cuc2lnZW4tY2Euc2kvY3JsL3NpZ2VuLWNhLWcyLmNybIaB gGxkYXA6Ly94NTAwLmdvdi5zaS9jbj1TSUdFTi1DQSUyMEcyLG9yZ2FuaXphdGlvbklkZW50aWZp ZXI9VkFUU0ktMTc2NTk5NTcsbz1SZXB1Ymxpa2ElMjBTbG92ZW5pamEsYz1TST9jZXJ0aWZpY2F0 ZVJldm9jYXRpb25MaXN0MHOgcaBvpG0wazELMAkGA1UEBhMCU0kxHDAaBgNVBAoTE1JlcHVibGlr YSBTbG92ZW5pamExFzAVBgNVBGETDlZBVFNJLTE3NjU5OTU3MRQwEgYDVQQDEwtTSUdFTi1DQSBH MjEPMA0GA1UEAxMGQ1JMMzM1MBMGA1UdIwQMMAqACEwlJ4yoLXKeMBEGA1UdDgQKBAhBsPrpktbp jTAJBgNVHRMEAjAAMA0GCSqGSIb3DQEBCwUAA4IBgQB7FJBhacs9mzlIcERXuoNW+MjDa9dEflGl cfDWQsZSmuuu8k11KGfoQtqB6a/CT7mYf54abpcWMjQ3H9y0gwft/ZX5sHQsZV/qGyh+2KjE+gwA XXZ2R5lIAYDkZQBBU88dE6XzR3ibgpQ6O7wmGBN6dmIwLK1LMpOtEwHQqw/WWB6gRS+mpPnZhk9Z 6i4WVRHf5Gs/Xl3vUtV2V5VziGTc4evMh49+VG4oNG9EyF110JSAdAIbZ/F+Jz49UlzGrau6dh3H pS/tBk0m+nTdHurklnCobrPsE9viAdUVjIQ5vfN5SXTxBSDH5Ag/8OSEfPoJBykgGYkNST52HvXT M6WZqx9JSwA/tQrJpFrpDORgwyVRRgxgR7z7Dx38TViV7HeuqLVEJTtRIk8c6/Z2CYOlmSsUNNLw q72j2Wv2zXSzWS2aD7cMR5v8jPzatYf4TYVOdxygrGOEBHYoElWk7ixXHyheYEqix5CFBYA7PuQm CdDquv2oAC7tTM5m/uoS1N4xggI4MIICNAIBATBqMFoxCzAJBgNVBAYTAlNJMRwwGgYDVQQKExNS ZXB1Ymxpa2EgU2xvdmVuaWphMRcwFQYDVQRhEw5WQVRTSS0xNzY1OTk1NzEUMBIGA1UEAxMLU0lH RU4tQ0EgRzICDBSL7r4AAAAAVyU92jANBglghkgBZQMEAgEFAKCBoDAYBgkqhkiG9w0BCQMxCwYJ KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0yNTA3MDcwOTIzMzVaMC8GCSqGSIb3DQEJBDEiBCB+ C5wR6WniP459hCkCijAGPXVpKHpRHvuk41kEg3gSPzA1BgkqhkiG9w0BCQ8xKDAmMAsGCWCGSAFl AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDQYJKoZIhvcNAQEBBQAEggEASYDnWCCuzPPt /SnpTZSOSgOzDJFZREbO6oJOEJSBz71SJQ/9rgYZswh+Jh6Wtd10IDrAbbtSw4Aln6KcdHSH9IQ3 eRt6xAj8hVaecbzA2bjbcoLdVrB5jvhaTZ3Fi6N8BsMdoL42bpucB+cvL6PlwFLdjG3tWIVmg0u+ h00gCaf8HMopr8c3nSThXHX53ucATXELG//0oU4vxFbD7deDHRRRjDJKtuIdH/39bLaGuBWXq7yD yfdkLR71moxkoNBa4qEsiFVcnNbZNKMriNnjmc79HROsZruOuwxVs5KHEGXlHu7PJ0UZ+fUb7zqV uXnEcbDNn1Ycj6Ur8bjPftLROQ== --nextPart3555875.sQuhbGJ8Bu--