All of lore.kernel.org
 help / color / mirror / Atom feed
* Looking for clarifications around gfx/kcq/kiq
       [not found] <1473700406.26541073.1638716639650.JavaMail.root@zimbra39-e7>
@ 2021-12-05 20:18 ` Yann Dirson
  2021-12-06 20:01   ` Alex Deucher
  0 siblings, 1 reply; 5+ messages in thread
From: Yann Dirson @ 2021-12-05 20:18 UTC (permalink / raw)
  To: amd-gfx list, Rodrigo Siqueira

Hello,

Context: trying to understand what happens with my Renoir passed through
to a Xen domu [0] (starting with the "VCN disabled" because I don't need it
now (so let's postpone the problem with its _fini) and with "PSP disabled"
because the alternative issue seems easier to solve -- so ip_block_mask=0xF7).

I'm slowed down by a number of additional terms:

* KIQ: we have the acronym, but a few more words about it would be great:
  it seems to relate to a ring buffer provided by the GFX IP, but this one
  does not talk much to me (e.g. it tells me less than the names of the
  "gfx" and "compute" ones)
* "me", "mec" = ?  In some places at least "me" stands for "micro engine" but
  what are those ?  A "mec" contains pipes which contain queues.  And in
  amdgpu_ring the "me" field seems to identify a "mec"
* "mes", rather looks like an IP/block family than the plural of "me".
  A specific list of those IPs / hw blocks would be useful (maybe with
  a diagram showing how they interact, much as what was started by
  Rodrigo for the DC pipeline, but a first components/subcomponents diagram
  would probably be helpful)
* RLC ?  Looks like a "micro engine" inside the GFX IPs ?
* one starting point for enhancing doc would be to start with amdgpu.h, where
  a number of acronyms used in structs are not self-explanatory: IB, SS, CP,
  ACP, CAC, HPD, ...

Do we have somewhere a description of what the hardware expects to find in
those queues ?

About amdgpu_gfx_enable_kcq():
- Isn't the `DRM_INFO("kiq ring mec %d pipe %d q %d\n"` line rather meant as
  DRM_DEBUG ?
- An error from amdgpu_ring_alloc() is reported as "failed to lock", but looks
  like "failed to allocate space on ring" ?

amdgpu_ring_alloc() itself is unconditionally setting count_dw, which looked
suspicious to me -- so I added the check shown below, and it does look like
ring_alloc() gets called again too soon.  Am I right in thinking this could be
the cause of amdgpu_ring_test_helper() failing in timeout ?

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw)
        if (WARN_ON_ONCE(ndw > ring->max_dw))
                return -ENOMEM;
 
+       /* check we're not allocating too fast */
+       WARN_ON_ONCE(ring->count_dw);
+
        ring->count_dw = ndw;
        ring->wptr_old = ring->wptr;


About gfx_v9_0_sw_fini():
- the 2 calls to bo_free are called here without condition, whereas they are
  allocated from rlc_init, not directly from sw_init.  Is this asymmetry wanted ?


Maybe such info should join the documentation at some point?

[0] https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for clarifications around gfx/kcq/kiq
  2021-12-05 20:18 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
@ 2021-12-06 20:01   ` Alex Deucher
  2021-12-07 22:07     ` gpu block diagram Yann Dirson
  2021-12-10 20:36     ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Deucher @ 2021-12-06 20:01 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Rodrigo Siqueira, amd-gfx list

On Mon, Dec 6, 2021 at 5:29 AM Yann Dirson <ydirson@free.fr> wrote:
>
> Hello,
>
> Context: trying to understand what happens with my Renoir passed through
> to a Xen domu [0] (starting with the "VCN disabled" because I don't need it
> now (so let's postpone the problem with its _fini) and with "PSP disabled"
> because the alternative issue seems easier to solve -- so ip_block_mask=0xF7).
>
> I'm slowed down by a number of additional terms:
>
> * KIQ: we have the acronym, but a few more words about it would be great:
>   it seems to relate to a ring buffer provided by the GFX IP, but this one
>   does not talk much to me (e.g. it tells me less than the names of the
>   "gfx" and "compute" ones)

Kernel Interface Queue.  This is a control queue used by the kernel
driver to manage other gfx and compute queues on the GFX/compute
engine.  You can use it to map/unmap additional queues, etc.

> * "me", "mec" = ?  In some places at least "me" stands for "micro engine" but
>   what are those ?  A "mec" contains pipes which contain queues.  And in
>   amdgpu_ring the "me" field seems to identify a "mec"

MicroEngine Compute.  The is the microcontroller that controls the
compute queues on the GFX/compute engine.

> * "mes", rather looks like an IP/block family than the plural of "me".
>   A specific list of those IPs / hw blocks would be useful (maybe with
>   a diagram showing how they interact, much as what was started by
>   Rodrigo for the DC pipeline, but a first components/subcomponents diagram
>   would probably be helpful)

MicroEngine Scheduler.  This is a new engine for managing queues.
This is currently unused.

> * RLC ?  Looks like a "micro engine" inside the GFX IPs ?

RunList Controller.  This is another microcontroller in the
GFX/Compute engine.  It handles power management related functionality
within the GFX/Compute engine.  The name is a vestige of old hardware
where it was originally added and doesn't really have much relation to
what the engine does now.

> * one starting point for enhancing doc would be to start with amdgpu.h, where
>   a number of acronyms used in structs are not self-explanatory: IB, SS, CP,
>   ACP, CAC, HPD, ...

IB = Indirect Buffer.  A command buffer for a particular engine.
Rather than writing commands directly to the queue, you can write the
commands into a piece of memory and then put a pointer to the memory
into the queue.  The hardware will then follow the pointer and execute
the commands in the memory, then returning to the rest of the commands
in the ring.

SS = Spread Spectrum.

CP = Command Processor.  The name for the hardware block that
encompasses the front end of the GFX/Compute pipeline.  Consists
mainly of a bunch of microcontrollers (PFP, ME, CE, MEC).  The
firmware that runs on these microcontrollers provides the driver
interface to interact with the GFX/Compute engine.

>
> Do we have somewhere a description of what the hardware expects to find in
> those queues ?

It depends on the Engine.  Each engine has it's own packet format.
GFX/Compute uses one format, SDMA uses another, VCN uses another.
They are documented in the code and headers for the relevant engines.

>
> About amdgpu_gfx_enable_kcq():
> - Isn't the `DRM_INFO("kiq ring mec %d pipe %d q %d\n"` line rather meant as
>   DRM_DEBUG ?

It's informational so we can see what queue slot is being used for
KIQ.  There are requirements around the physical queue slot for KIQ so
it useful to know it.  That said, it could probably be made debug
only.

> - An error from amdgpu_ring_alloc() is reported as "failed to lock", but looks
>   like "failed to allocate space on ring" ?
>
> amdgpu_ring_alloc() itself is unconditionally setting count_dw, which looked
> suspicious to me -- so I added the check shown below, and it does look like
> ring_alloc() gets called again too soon.  Am I right in thinking this could be
> the cause of amdgpu_ring_test_helper() failing in timeout ?
>

Not likely.  The PSP failing to load firmware is most likely the
problem.  You need to have a functional PSP for any of the other
engines to be usable.  If we can't load the firmware for the
microcontrollers, the driver can't interact with them.

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw)
>         if (WARN_ON_ONCE(ndw > ring->max_dw))
>                 return -ENOMEM;
>
> +       /* check we're not allocating too fast */
> +       WARN_ON_ONCE(ring->count_dw);
> +
>         ring->count_dw = ndw;
>         ring->wptr_old = ring->wptr;
>
>
> About gfx_v9_0_sw_fini():
> - the 2 calls to bo_free are called here without condition, whereas they are
>   allocated from rlc_init, not directly from sw_init.  Is this asymmetry wanted ?
>
>
> Maybe such info should join the documentation at some point?

Yeah, would be useful.

Alex

>
> [0] https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* gpu block diagram
  2021-12-06 20:01   ` Alex Deucher
@ 2021-12-07 22:07     ` Yann Dirson
  2021-12-09  4:49       ` Alex Deucher
  2021-12-10 20:36     ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
  1 sibling, 1 reply; 5+ messages in thread
From: Yann Dirson @ 2021-12-07 22:07 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx list; +Cc: Rodrigo Siqueira

Thanks for the details Alex!

Here is an attempt to formalize the decomposition of a
(mostly Renoir) APU, using plantuml.  That's highly preliminary,
focusing on blocks/sub-blocks/firmware, based on my current
partial (and surely incorrect at places) understanding.
I focused on getting contents quickly, so the formalism itself
is shaky.

Can you spot any error ?  Fill those holes (usually marked with
"?") ?  What additional blocks would make sense (caches at least,
I guess) ?  What additional information would help understand how
they work together (control/data flows, busses...) ?

Indentation is shaky too, better format it to read (e.g. by
pasting in http://www.plantuml.com/plantuml/uml/)

------ >8 -------
@startuml
package "APU" {
 package CPU {
 }
 package GPU {
  package common? [[{"GPU Family"?}]] {
  }
  package GFX [[{Graphics and Compute Engine}]] {
   package CP [[{Command Processor}]] {
    package PFP [[{MicroEngine Compute}]] {
      package "pfp fw" #cccccc {
      }
    }
    package ME [[{MicroEngine ?}]] {
      package "me fw" #cccccc {
      }
    }
    package CE [[{?}]] {
      package "ce fw" #cccccc {
      }
    }
    package MEC [[{MicroEngine Compute}]] {
      package "mec fw" #cccccc {
      }
      package "mec2 fw?" #cccccc {
      }
    }
   }
   package RLC [[{RunList Controller (pm)}]] {
      package "rlc fw" #cccccc {
      }
   }
  }

  package '"management"'<<Cloud>>  {
  package MES [[{Micro-Engine Scheduler}]] {
  }
  package SMU [[{System Mamagement Unit}]] {
  }
  package PSP [[{Platform Security Processor}]] {
    package "asd fw" #cccccc {
    }
    package "ta fw" #cccccc {
    }
  }

  package IH [[{Interrupt Handler}]] {
  }
  package GMC [[{Graphics Memory Controller}]] {
  }
  package SDMA [[{System DMA}]] {
    package "sdma fw" #cccccc {
    }
  }
  }

  package DM [[{Display Manager, link to...}]] {
   package "DMUB? DMU?" [[{Display Micro-Controller Unit}]] {
    package "dmcub fw" #cccccc {
    }
   }
   package ... {
   }
  }

  package multimedia <<Cloud>> {
   package .... {
   }
   package VCN {
    package "vcn fw" #cccccc {
    }
   }
   package JPEG {
   }
  }
 }
}
@enduml
------ >8 -------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gpu block diagram
  2021-12-07 22:07     ` gpu block diagram Yann Dirson
@ 2021-12-09  4:49       ` Alex Deucher
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Deucher @ 2021-12-09  4:49 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Rodrigo Siqueira, amd-gfx list

On Tue, Dec 7, 2021 at 5:07 PM Yann Dirson <ydirson@free.fr> wrote:
>
> Thanks for the details Alex!
>
> Here is an attempt to formalize the decomposition of a
> (mostly Renoir) APU, using plantuml.  That's highly preliminary,
> focusing on blocks/sub-blocks/firmware, based on my current
> partial (and surely incorrect at places) understanding.
> I focused on getting contents quickly, so the formalism itself
> is shaky.
>
> Can you spot any error ?  Fill those holes (usually marked with
> "?") ?  What additional blocks would make sense (caches at least,
> I guess) ?  What additional information would help understand how
> they work together (control/data flows, busses...) ?

Each asic is a collection of hardware blocks.  We refer to them as
"IPs" (Intellectual Property blocks).  Each IP encapsulates certain
functionality. IPs are versioned and can also be mixed and matched.
E.g., you might have two different asics that both have SDMA 5.x IPs.
The driver is arranged by IPs.  There are driver components to handle
the initialization and operation of each IP.  There are also a bunch
of smaller IPs that don't really need much if any driver interaction.
Those end up getting lumped into the common stuff in the soc files.
The soc files (e.g., vi.c, soc15.c nv.c) contain code for aspects of
the SoC itself rather than specific IPs.  E.g., things like GPU resets
and register access functions are SoC dependent.

An APU contains more than just CPU and GPU, it also contains all of
the platform stuff (audio, usb, gpio, etc.).  Also, a lot of
components are shared between the CPU, platform, and the GPU (e.g.,
SMU, PSP, etc.).  Specific components (CPU, GPU, etc.) usually have
their interface to interact with those common components.  For things
like S0i3 there is a ton of coordination required across all the
components, but that is probably a bit beyond the scope of this
thread.

With respect to the GPU, we have the following major IPs:

1. GMC (Graphics Memory Controller).  This was a dedicated IP on older
pre-vega chips, but has since become somewhat decentralized on vega
and newer chips.  They now have dedicated memory hubs for specific IPs
or groups of IPs.  We still treat it as a single component in the
driver however since the programming model is still pretty similar.
This is how the different IPs on the GPU get the memory (VRAM or
system memory).  It also provides the support for per process GPU
virtual address spaces.
2. IH (Interrupt Handler).  This is the interrupt controller on the
GPU.  All of the IPs feed their interrupts into this IP and it
aggregates them into a set of ring buffers that the driver can parse
to handle interrupts from different IPs.
3. PSP (Platform Security Processor).  This handles security policy
for the SoC and executes trusted applications, and validates and loads
firmwares for other blocks.
4. SMU (System Management Unit).  This is the power management
microcontroller.  It manages the entire SoC.  The driver interacts
with it to control power management features like clocks, voltages,
power rails, etc.
5. DCN (Display Controller Next).  This is the display controller.  It
handles the display hardware.
6. SDMA (System DMA).  This is a multi-purpose DMA engine.  The kernel
driver uses it for various things including paging and GPU page table
updates.  It's also exposed to userspace for use by user mode drivers
(OpenGL, Vulkan, etc.)
7. GC (graphics and Compute).  This is the graphics and compute
engine, i.e., the block that encompasses the 3D pipeline and and
shader blocks.  The is by far the largest block on the GPU.  The 3D
pipeline has tons of sub-blocks.  In addition to that, it also
contains the CP microcontrollers (ME, PFP, CE, MEC) and the RLC
microcontroller.  It's exposed to userspace for user mode drivers
(OpenGL, Vulkan, OpenCL, etc.)
7. VCN (Video Core Next).  This is the multi-media engine.  It handles
video and image encode and decode.  It's exposed to userspace for user
mode drivers (VA-API, OpenMAX, etc.)

In general, the driver has a list of all of the IPs on a particular
SoC and for things like init/fini/suspend/resume, more or less just
walks the list and handles each IP.

Alex


>
> Indentation is shaky too, better format it to read (e.g. by
> pasting in http://www.plantuml.com/plantuml/uml/)
>
> ------ >8 -------
> @startuml
> package "APU" {
>  package CPU {
>  }
>  package GPU {
>   package common? [[{"GPU Family"?}]] {
>   }
>   package GFX [[{Graphics and Compute Engine}]] {
>    package CP [[{Command Processor}]] {
>     package PFP [[{MicroEngine Compute}]] {
>       package "pfp fw" #cccccc {
>       }
>     }
>     package ME [[{MicroEngine ?}]] {
>       package "me fw" #cccccc {
>       }
>     }
>     package CE [[{?}]] {
>       package "ce fw" #cccccc {
>       }
>     }
>     package MEC [[{MicroEngine Compute}]] {
>       package "mec fw" #cccccc {
>       }
>       package "mec2 fw?" #cccccc {
>       }
>     }
>    }
>    package RLC [[{RunList Controller (pm)}]] {
>       package "rlc fw" #cccccc {
>       }
>    }
>   }
>
>   package '"management"'<<Cloud>>  {
>   package MES [[{Micro-Engine Scheduler}]] {
>   }
>   package SMU [[{System Mamagement Unit}]] {
>   }
>   package PSP [[{Platform Security Processor}]] {
>     package "asd fw" #cccccc {
>     }
>     package "ta fw" #cccccc {
>     }
>   }
>
>   package IH [[{Interrupt Handler}]] {
>   }
>   package GMC [[{Graphics Memory Controller}]] {
>   }
>   package SDMA [[{System DMA}]] {
>     package "sdma fw" #cccccc {
>     }
>   }
>   }
>
>   package DM [[{Display Manager, link to...}]] {
>    package "DMUB? DMU?" [[{Display Micro-Controller Unit}]] {
>     package "dmcub fw" #cccccc {
>     }
>    }
>    package ... {
>    }
>   }
>
>   package multimedia <<Cloud>> {
>    package .... {
>    }
>    package VCN {
>     package "vcn fw" #cccccc {
>     }
>    }
>    package JPEG {
>    }
>   }
>  }
> }
> @enduml
> ------ >8 -------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for clarifications around gfx/kcq/kiq
  2021-12-06 20:01   ` Alex Deucher
  2021-12-07 22:07     ` gpu block diagram Yann Dirson
@ 2021-12-10 20:36     ` Yann Dirson
  1 sibling, 0 replies; 5+ messages in thread
From: Yann Dirson @ 2021-12-10 20:36 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Rodrigo Siqueira, amd-gfx list

> > amdgpu_ring_alloc() itself is unconditionally setting count_dw,
> > which looked
> > suspicious to me -- so I added the check shown below, and it does
> > look like
> > ring_alloc() gets called again too soon.  Am I right in thinking
> > this could be
> > the cause of amdgpu_ring_test_helper() failing in timeout ?
> >
> 
> Not likely.  The PSP failing to load firmware is most likely the
> problem.  You need to have a functional PSP for any of the other
> engines to be usable.  If we can't load the firmware for the
> microcontrollers, the driver can't interact with them.

Even if it has no effect on my primary issue, I'm still having doubt
on this: if we call amdgpu_ring_alloc() twice without ensuring the
allocated space has been padded with nop's (ie. 0xFFFFFFFF, right ?)
what happens when the GFX IP (or should we rather say "GC"?) will
parse those ?

My reading of gfx_enable_kcq() is that it is in this case.  Isn't
it missing a call to ring_commit() before ring_test() ?

> 
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring,
> > unsigned ndw)
> >         if (WARN_ON_ONCE(ndw > ring->max_dw))
> >                 return -ENOMEM;
> >
> > +       /* check we're not allocating too fast */
> > +       WARN_ON_ONCE(ring->count_dw);
> > +
> >         ring->count_dw = ndw;
> >         ring->wptr_old = ring->wptr;
> >
> >
> > About gfx_v9_0_sw_fini():
> > - the 2 calls to bo_free are called here without condition, whereas
> > they are
> >   allocated from rlc_init, not directly from sw_init.  Is this
> >   asymmetry wanted ?
> >
> >
> > Maybe such info should join the documentation at some point?
> 
> Yeah, would be useful.
> 
> Alex
> 
> >
> > [0]
> > https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-10 20:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1473700406.26541073.1638716639650.JavaMail.root@zimbra39-e7>
2021-12-05 20:18 ` Looking for clarifications around gfx/kcq/kiq Yann Dirson
2021-12-06 20:01   ` Alex Deucher
2021-12-07 22:07     ` gpu block diagram Yann Dirson
2021-12-09  4:49       ` Alex Deucher
2021-12-10 20:36     ` Looking for clarifications around gfx/kcq/kiq Yann Dirson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.