* Looking for clarifications around gfx/kcq/kiq [not found] <1473700406.26541073.1638716639650.JavaMail.root@zimbra39-e7> @ 2021-12-05 20:18 ` Yann Dirson 2021-12-06 20:01 ` Alex Deucher 0 siblings, 1 reply; 5+ messages in thread From: Yann Dirson @ 2021-12-05 20:18 UTC (permalink / raw) To: amd-gfx list, Rodrigo Siqueira Hello, Context: trying to understand what happens with my Renoir passed through to a Xen domu [0] (starting with the "VCN disabled" because I don't need it now (so let's postpone the problem with its _fini) and with "PSP disabled" because the alternative issue seems easier to solve -- so ip_block_mask=0xF7). I'm slowed down by a number of additional terms: * KIQ: we have the acronym, but a few more words about it would be great: it seems to relate to a ring buffer provided by the GFX IP, but this one does not talk much to me (e.g. it tells me less than the names of the "gfx" and "compute" ones) * "me", "mec" = ? In some places at least "me" stands for "micro engine" but what are those ? A "mec" contains pipes which contain queues. And in amdgpu_ring the "me" field seems to identify a "mec" * "mes", rather looks like an IP/block family than the plural of "me". A specific list of those IPs / hw blocks would be useful (maybe with a diagram showing how they interact, much as what was started by Rodrigo for the DC pipeline, but a first components/subcomponents diagram would probably be helpful) * RLC ? Looks like a "micro engine" inside the GFX IPs ? * one starting point for enhancing doc would be to start with amdgpu.h, where a number of acronyms used in structs are not self-explanatory: IB, SS, CP, ACP, CAC, HPD, ... Do we have somewhere a description of what the hardware expects to find in those queues ? About amdgpu_gfx_enable_kcq(): - Isn't the `DRM_INFO("kiq ring mec %d pipe %d q %d\n"` line rather meant as DRM_DEBUG ? - An error from amdgpu_ring_alloc() is reported as "failed to lock", but looks like "failed to allocate space on ring" ? amdgpu_ring_alloc() itself is unconditionally setting count_dw, which looked suspicious to me -- so I added the check shown below, and it does look like ring_alloc() gets called again too soon. Am I right in thinking this could be the cause of amdgpu_ring_test_helper() failing in timeout ? --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw) if (WARN_ON_ONCE(ndw > ring->max_dw)) return -ENOMEM; + /* check we're not allocating too fast */ + WARN_ON_ONCE(ring->count_dw); + ring->count_dw = ndw; ring->wptr_old = ring->wptr; About gfx_v9_0_sw_fini(): - the 2 calls to bo_free are called here without condition, whereas they are allocated from rlc_init, not directly from sw_init. Is this asymmetry wanted ? Maybe such info should join the documentation at some point? [0] https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Looking for clarifications around gfx/kcq/kiq 2021-12-05 20:18 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson @ 2021-12-06 20:01 ` Alex Deucher 2021-12-07 22:07 ` gpu block diagram Yann Dirson 2021-12-10 20:36 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson 0 siblings, 2 replies; 5+ messages in thread From: Alex Deucher @ 2021-12-06 20:01 UTC (permalink / raw) To: Yann Dirson; +Cc: Rodrigo Siqueira, amd-gfx list On Mon, Dec 6, 2021 at 5:29 AM Yann Dirson <ydirson@free.fr> wrote: > > Hello, > > Context: trying to understand what happens with my Renoir passed through > to a Xen domu [0] (starting with the "VCN disabled" because I don't need it > now (so let's postpone the problem with its _fini) and with "PSP disabled" > because the alternative issue seems easier to solve -- so ip_block_mask=0xF7). > > I'm slowed down by a number of additional terms: > > * KIQ: we have the acronym, but a few more words about it would be great: > it seems to relate to a ring buffer provided by the GFX IP, but this one > does not talk much to me (e.g. it tells me less than the names of the > "gfx" and "compute" ones) Kernel Interface Queue. This is a control queue used by the kernel driver to manage other gfx and compute queues on the GFX/compute engine. You can use it to map/unmap additional queues, etc. > * "me", "mec" = ? In some places at least "me" stands for "micro engine" but > what are those ? A "mec" contains pipes which contain queues. And in > amdgpu_ring the "me" field seems to identify a "mec" MicroEngine Compute. The is the microcontroller that controls the compute queues on the GFX/compute engine. > * "mes", rather looks like an IP/block family than the plural of "me". > A specific list of those IPs / hw blocks would be useful (maybe with > a diagram showing how they interact, much as what was started by > Rodrigo for the DC pipeline, but a first components/subcomponents diagram > would probably be helpful) MicroEngine Scheduler. This is a new engine for managing queues. This is currently unused. > * RLC ? Looks like a "micro engine" inside the GFX IPs ? RunList Controller. This is another microcontroller in the GFX/Compute engine. It handles power management related functionality within the GFX/Compute engine. The name is a vestige of old hardware where it was originally added and doesn't really have much relation to what the engine does now. > * one starting point for enhancing doc would be to start with amdgpu.h, where > a number of acronyms used in structs are not self-explanatory: IB, SS, CP, > ACP, CAC, HPD, ... IB = Indirect Buffer. A command buffer for a particular engine. Rather than writing commands directly to the queue, you can write the commands into a piece of memory and then put a pointer to the memory into the queue. The hardware will then follow the pointer and execute the commands in the memory, then returning to the rest of the commands in the ring. SS = Spread Spectrum. CP = Command Processor. The name for the hardware block that encompasses the front end of the GFX/Compute pipeline. Consists mainly of a bunch of microcontrollers (PFP, ME, CE, MEC). The firmware that runs on these microcontrollers provides the driver interface to interact with the GFX/Compute engine. > > Do we have somewhere a description of what the hardware expects to find in > those queues ? It depends on the Engine. Each engine has it's own packet format. GFX/Compute uses one format, SDMA uses another, VCN uses another. They are documented in the code and headers for the relevant engines. > > About amdgpu_gfx_enable_kcq(): > - Isn't the `DRM_INFO("kiq ring mec %d pipe %d q %d\n"` line rather meant as > DRM_DEBUG ? It's informational so we can see what queue slot is being used for KIQ. There are requirements around the physical queue slot for KIQ so it useful to know it. That said, it could probably be made debug only. > - An error from amdgpu_ring_alloc() is reported as "failed to lock", but looks > like "failed to allocate space on ring" ? > > amdgpu_ring_alloc() itself is unconditionally setting count_dw, which looked > suspicious to me -- so I added the check shown below, and it does look like > ring_alloc() gets called again too soon. Am I right in thinking this could be > the cause of amdgpu_ring_test_helper() failing in timeout ? > Not likely. The PSP failing to load firmware is most likely the problem. You need to have a functional PSP for any of the other engines to be usable. If we can't load the firmware for the microcontrollers, the driver can't interact with them. > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw) > if (WARN_ON_ONCE(ndw > ring->max_dw)) > return -ENOMEM; > > + /* check we're not allocating too fast */ > + WARN_ON_ONCE(ring->count_dw); > + > ring->count_dw = ndw; > ring->wptr_old = ring->wptr; > > > About gfx_v9_0_sw_fini(): > - the 2 calls to bo_free are called here without condition, whereas they are > allocated from rlc_init, not directly from sw_init. Is this asymmetry wanted ? > > > Maybe such info should join the documentation at some point? Yeah, would be useful. Alex > > [0] https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* gpu block diagram 2021-12-06 20:01 ` Alex Deucher @ 2021-12-07 22:07 ` Yann Dirson 2021-12-09 4:49 ` Alex Deucher 2021-12-10 20:36 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson 1 sibling, 1 reply; 5+ messages in thread From: Yann Dirson @ 2021-12-07 22:07 UTC (permalink / raw) To: Alex Deucher, amd-gfx list; +Cc: Rodrigo Siqueira Thanks for the details Alex! Here is an attempt to formalize the decomposition of a (mostly Renoir) APU, using plantuml. That's highly preliminary, focusing on blocks/sub-blocks/firmware, based on my current partial (and surely incorrect at places) understanding. I focused on getting contents quickly, so the formalism itself is shaky. Can you spot any error ? Fill those holes (usually marked with "?") ? What additional blocks would make sense (caches at least, I guess) ? What additional information would help understand how they work together (control/data flows, busses...) ? Indentation is shaky too, better format it to read (e.g. by pasting in http://www.plantuml.com/plantuml/uml/) ------ >8 ------- @startuml package "APU" { package CPU { } package GPU { package common? [[{"GPU Family"?}]] { } package GFX [[{Graphics and Compute Engine}]] { package CP [[{Command Processor}]] { package PFP [[{MicroEngine Compute}]] { package "pfp fw" #cccccc { } } package ME [[{MicroEngine ?}]] { package "me fw" #cccccc { } } package CE [[{?}]] { package "ce fw" #cccccc { } } package MEC [[{MicroEngine Compute}]] { package "mec fw" #cccccc { } package "mec2 fw?" #cccccc { } } } package RLC [[{RunList Controller (pm)}]] { package "rlc fw" #cccccc { } } } package '"management"'<<Cloud>> { package MES [[{Micro-Engine Scheduler}]] { } package SMU [[{System Mamagement Unit}]] { } package PSP [[{Platform Security Processor}]] { package "asd fw" #cccccc { } package "ta fw" #cccccc { } } package IH [[{Interrupt Handler}]] { } package GMC [[{Graphics Memory Controller}]] { } package SDMA [[{System DMA}]] { package "sdma fw" #cccccc { } } } package DM [[{Display Manager, link to...}]] { package "DMUB? DMU?" [[{Display Micro-Controller Unit}]] { package "dmcub fw" #cccccc { } } package ... { } } package multimedia <<Cloud>> { package .... { } package VCN { package "vcn fw" #cccccc { } } package JPEG { } } } } @enduml ------ >8 ------- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: gpu block diagram 2021-12-07 22:07 ` gpu block diagram Yann Dirson @ 2021-12-09 4:49 ` Alex Deucher 0 siblings, 0 replies; 5+ messages in thread From: Alex Deucher @ 2021-12-09 4:49 UTC (permalink / raw) To: Yann Dirson; +Cc: Rodrigo Siqueira, amd-gfx list On Tue, Dec 7, 2021 at 5:07 PM Yann Dirson <ydirson@free.fr> wrote: > > Thanks for the details Alex! > > Here is an attempt to formalize the decomposition of a > (mostly Renoir) APU, using plantuml. That's highly preliminary, > focusing on blocks/sub-blocks/firmware, based on my current > partial (and surely incorrect at places) understanding. > I focused on getting contents quickly, so the formalism itself > is shaky. > > Can you spot any error ? Fill those holes (usually marked with > "?") ? What additional blocks would make sense (caches at least, > I guess) ? What additional information would help understand how > they work together (control/data flows, busses...) ? Each asic is a collection of hardware blocks. We refer to them as "IPs" (Intellectual Property blocks). Each IP encapsulates certain functionality. IPs are versioned and can also be mixed and matched. E.g., you might have two different asics that both have SDMA 5.x IPs. The driver is arranged by IPs. There are driver components to handle the initialization and operation of each IP. There are also a bunch of smaller IPs that don't really need much if any driver interaction. Those end up getting lumped into the common stuff in the soc files. The soc files (e.g., vi.c, soc15.c nv.c) contain code for aspects of the SoC itself rather than specific IPs. E.g., things like GPU resets and register access functions are SoC dependent. An APU contains more than just CPU and GPU, it also contains all of the platform stuff (audio, usb, gpio, etc.). Also, a lot of components are shared between the CPU, platform, and the GPU (e.g., SMU, PSP, etc.). Specific components (CPU, GPU, etc.) usually have their interface to interact with those common components. For things like S0i3 there is a ton of coordination required across all the components, but that is probably a bit beyond the scope of this thread. With respect to the GPU, we have the following major IPs: 1. GMC (Graphics Memory Controller). This was a dedicated IP on older pre-vega chips, but has since become somewhat decentralized on vega and newer chips. They now have dedicated memory hubs for specific IPs or groups of IPs. We still treat it as a single component in the driver however since the programming model is still pretty similar. This is how the different IPs on the GPU get the memory (VRAM or system memory). It also provides the support for per process GPU virtual address spaces. 2. IH (Interrupt Handler). This is the interrupt controller on the GPU. All of the IPs feed their interrupts into this IP and it aggregates them into a set of ring buffers that the driver can parse to handle interrupts from different IPs. 3. PSP (Platform Security Processor). This handles security policy for the SoC and executes trusted applications, and validates and loads firmwares for other blocks. 4. SMU (System Management Unit). This is the power management microcontroller. It manages the entire SoC. The driver interacts with it to control power management features like clocks, voltages, power rails, etc. 5. DCN (Display Controller Next). This is the display controller. It handles the display hardware. 6. SDMA (System DMA). This is a multi-purpose DMA engine. The kernel driver uses it for various things including paging and GPU page table updates. It's also exposed to userspace for use by user mode drivers (OpenGL, Vulkan, etc.) 7. GC (graphics and Compute). This is the graphics and compute engine, i.e., the block that encompasses the 3D pipeline and and shader blocks. The is by far the largest block on the GPU. The 3D pipeline has tons of sub-blocks. In addition to that, it also contains the CP microcontrollers (ME, PFP, CE, MEC) and the RLC microcontroller. It's exposed to userspace for user mode drivers (OpenGL, Vulkan, OpenCL, etc.) 7. VCN (Video Core Next). This is the multi-media engine. It handles video and image encode and decode. It's exposed to userspace for user mode drivers (VA-API, OpenMAX, etc.) In general, the driver has a list of all of the IPs on a particular SoC and for things like init/fini/suspend/resume, more or less just walks the list and handles each IP. Alex > > Indentation is shaky too, better format it to read (e.g. by > pasting in http://www.plantuml.com/plantuml/uml/) > > ------ >8 ------- > @startuml > package "APU" { > package CPU { > } > package GPU { > package common? [[{"GPU Family"?}]] { > } > package GFX [[{Graphics and Compute Engine}]] { > package CP [[{Command Processor}]] { > package PFP [[{MicroEngine Compute}]] { > package "pfp fw" #cccccc { > } > } > package ME [[{MicroEngine ?}]] { > package "me fw" #cccccc { > } > } > package CE [[{?}]] { > package "ce fw" #cccccc { > } > } > package MEC [[{MicroEngine Compute}]] { > package "mec fw" #cccccc { > } > package "mec2 fw?" #cccccc { > } > } > } > package RLC [[{RunList Controller (pm)}]] { > package "rlc fw" #cccccc { > } > } > } > > package '"management"'<<Cloud>> { > package MES [[{Micro-Engine Scheduler}]] { > } > package SMU [[{System Mamagement Unit}]] { > } > package PSP [[{Platform Security Processor}]] { > package "asd fw" #cccccc { > } > package "ta fw" #cccccc { > } > } > > package IH [[{Interrupt Handler}]] { > } > package GMC [[{Graphics Memory Controller}]] { > } > package SDMA [[{System DMA}]] { > package "sdma fw" #cccccc { > } > } > } > > package DM [[{Display Manager, link to...}]] { > package "DMUB? DMU?" [[{Display Micro-Controller Unit}]] { > package "dmcub fw" #cccccc { > } > } > package ... { > } > } > > package multimedia <<Cloud>> { > package .... { > } > package VCN { > package "vcn fw" #cccccc { > } > } > package JPEG { > } > } > } > } > @enduml > ------ >8 ------- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Looking for clarifications around gfx/kcq/kiq 2021-12-06 20:01 ` Alex Deucher 2021-12-07 22:07 ` gpu block diagram Yann Dirson @ 2021-12-10 20:36 ` Yann Dirson 1 sibling, 0 replies; 5+ messages in thread From: Yann Dirson @ 2021-12-10 20:36 UTC (permalink / raw) To: Alex Deucher; +Cc: Rodrigo Siqueira, amd-gfx list > > amdgpu_ring_alloc() itself is unconditionally setting count_dw, > > which looked > > suspicious to me -- so I added the check shown below, and it does > > look like > > ring_alloc() gets called again too soon. Am I right in thinking > > this could be > > the cause of amdgpu_ring_test_helper() failing in timeout ? > > > > Not likely. The PSP failing to load firmware is most likely the > problem. You need to have a functional PSP for any of the other > engines to be usable. If we can't load the firmware for the > microcontrollers, the driver can't interact with them. Even if it has no effect on my primary issue, I'm still having doubt on this: if we call amdgpu_ring_alloc() twice without ensuring the allocated space has been padded with nop's (ie. 0xFFFFFFFF, right ?) what happens when the GFX IP (or should we rather say "GC"?) will parse those ? My reading of gfx_enable_kcq() is that it is in this case. Isn't it missing a call to ring_commit() before ring_test() ? > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > > @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, > > unsigned ndw) > > if (WARN_ON_ONCE(ndw > ring->max_dw)) > > return -ENOMEM; > > > > + /* check we're not allocating too fast */ > > + WARN_ON_ONCE(ring->count_dw); > > + > > ring->count_dw = ndw; > > ring->wptr_old = ring->wptr; > > > > > > About gfx_v9_0_sw_fini(): > > - the 2 calls to bo_free are called here without condition, whereas > > they are > > allocated from rlc_init, not directly from sw_init. Is this > > asymmetry wanted ? > > > > > > Maybe such info should join the documentation at some point? > > Yeah, would be useful. > > Alex > > > > > [0] > > https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-10 20:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1473700406.26541073.1638716639650.JavaMail.root@zimbra39-e7>
2021-12-05 20:18 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
2021-12-06 20:01 ` Alex Deucher
2021-12-07 22:07 ` gpu block diagram Yann Dirson
2021-12-09 4:49 ` Alex Deucher
2021-12-10 20:36 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.