Re: [PATCH] drm/amdgpu: optimize the padding with hw optimization

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Marek Olšák" <maraeo@gmail.com>, "Sunil Khatri" <sunil.khatri@amd.com>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	amd-gfx@lists.freedesktop.org,
	"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
	"Tvrtko Ursulin" <tursulin@igalia.com>,
	"Marek Olšák" <marek.olsak@amd.com>
Subject: Re: [PATCH] drm/amdgpu: optimize the padding with hw optimization
Date: Thu, 1 Aug 2024 10:02:21 +0530	[thread overview]
Message-ID: <00e7303a-e7cb-43ed-b2a3-8aebc388eeda@amd.com> (raw)
In-Reply-To: <CAAxE2A7CcN+ePP83Z55X-gqFdBg0YTPxRniLtphiJdMrZEXAcA@mail.gmail.com>


On 8/1/2024 8:52 AM, Marek Olšák wrote:
> On Wed, Jul 31, 2024 at 11:19 PM Marek Olšák <maraeo@gmail.com> wrote:
>> On Tue, Jul 30, 2024 at 8:43 AM Sunil Khatri <sunil.khatri@amd.com> wrote:
>>> Adding NOP packets one by one in the ring
>>> does not use the CP efficiently.
>>>
>>> Solution:
>>> Use CP optimization while adding NOP packet's so PFP
>>> can discard NOP packets based on information of count
>>> from the Header instead of fetching all NOP packets
>>> one by one.
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
>>> Cc: Tvrtko Ursulin <tursulin@igalia.com>
>>> Cc: Marek Olšák <marek.olsak@amd.com>
>>> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 24 +++++++++++++++++++++---
>>>   1 file changed, 21 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> index 853084a2ce7f..edf5b5c4d185 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> @@ -9397,6 +9397,24 @@ static void gfx_v10_0_emit_mem_sync(struct amdgpu_ring *ring)
>>>          amdgpu_ring_write(ring, gcr_cntl); /* GCR_CNTL */
>>>   }
>>>
>>> +static void amdgpu_gfx10_ring_insert_nop(struct amdgpu_ring *ring, uint32_t num_nop)
>>> +{
>>> +       int i;
>>> +
>>> +       /* Header itself is a NOP packet */
>>> +       if (num_nop == 1) {
>>> +               amdgpu_ring_write(ring, ring->funcs->nop);
>>> +               return;
>>> +       }
>>> +
>>> +       /* Max HW optimization till 0x3ffe, followed by remaining one NOP at a time*/
>>> +       amdgpu_ring_write(ring, PACKET3(PACKET3_NOP, min(num_nop - 2, 0x3ffe)));
>>> +
>>> +       /* Header is at index 0, followed by num_nops - 1 NOP packet's */
>>> +       for (i = 1; i < num_nop; i++)
>>> +               amdgpu_ring_write(ring, ring->funcs->nop);
>> This loop should be removed. It's unnecessary CPU overhead and we
>> should never get more than 0x3fff NOPs (maybe use BUG_ON). Leaving the
>> whole packet body uninitialized is the fastest option.
> If you remove amdgpu_ring_write, you still need to move wptr somehow.
> amdgpu_ring_write_multiple gives a hint about how to do it:
>
> ring->wptr += count_dw;
> ring->wptr &= ring->ptr_mask;
> ring->count_dw -= count_dw;
The reason i gave in the previous commit why we just dint just move the 
wptr instead or writing all nops in the ring. I also tried exactly what 
is given above to just move wptr but the device dint boot.

Possible the calculation is wrong or on some target the NOP isnt working 
as expected. With this approach if the NOP is working as per spec it 
would help in saving GPU cycles and if it does not in that case

also it wont crash as NOPS are still there in the ring.
I did not spend more time in analysing the crash due to just shifting 
the wptr for the reason explained in previous commit. Original 
understanding was to just move wptr but based on christian feedback
we are still filling the ring with nops.

>
> Marek
>
>> Marek
>>
>>> +}
>>> +
>>>   static void gfx_v10_ip_print(void *handle, struct drm_printer *p)
>>>   {
>>>          struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>> @@ -9588,7 +9606,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = {
>>>          .emit_hdp_flush = gfx_v10_0_ring_emit_hdp_flush,
>>>          .test_ring = gfx_v10_0_ring_test_ring,
>>>          .test_ib = gfx_v10_0_ring_test_ib,
>>> -       .insert_nop = amdgpu_ring_insert_nop,
>>> +       .insert_nop = amdgpu_gfx10_ring_insert_nop,
>>>          .pad_ib = amdgpu_ring_generic_pad_ib,
>>>          .emit_switch_buffer = gfx_v10_0_ring_emit_sb,
>>>          .emit_cntxcntl = gfx_v10_0_ring_emit_cntxcntl,
>>> @@ -9629,7 +9647,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_compute = {
>>>          .emit_hdp_flush = gfx_v10_0_ring_emit_hdp_flush,
>>>          .test_ring = gfx_v10_0_ring_test_ring,
>>>          .test_ib = gfx_v10_0_ring_test_ib,
>>> -       .insert_nop = amdgpu_ring_insert_nop,
>>> +       .insert_nop = amdgpu_gfx10_ring_insert_nop,
>>>          .pad_ib = amdgpu_ring_generic_pad_ib,
>>>          .emit_wreg = gfx_v10_0_ring_emit_wreg,
>>>          .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>>> @@ -9659,7 +9677,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_kiq = {
>>>          .emit_fence = gfx_v10_0_ring_emit_fence_kiq,
>>>          .test_ring = gfx_v10_0_ring_test_ring,
>>>          .test_ib = gfx_v10_0_ring_test_ib,
>>> -       .insert_nop = amdgpu_ring_insert_nop,
>>> +       .insert_nop = amdgpu_gfx10_ring_insert_nop,
>>>          .pad_ib = amdgpu_ring_generic_pad_ib,
>>>          .emit_rreg = gfx_v10_0_ring_emit_rreg,
>>>          .emit_wreg = gfx_v10_0_ring_emit_wreg,
>>> --
>>> 2.34.1
>>>

next prev parent reply	other threads:[~2024-08-01  4:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-30 12:43 [PATCH] drm/amdgpu: optimize the padding with hw optimization Sunil Khatri
2024-07-30 13:17 ` Christian König
2024-08-01  3:19 ` Marek Olšák
2024-08-01  3:22   ` Marek Olšák
2024-08-01  4:32     ` Khatri, Sunil [this message]
2024-08-01  4:27   ` Khatri, Sunil
2024-08-01  6:53     ` Marek Olšák
2024-08-01  7:37       ` Christian König
2024-08-01 18:55         ` Marek Olšák
2024-08-02 10:10           ` Lazar, Lijo
2024-08-04  5:28             ` Marek Olšák
2024-08-04 18:11           ` Marek Olšák
2024-08-07  8:21             ` Tvrtko Ursulin
2024-08-07 16:42               ` Marek Olšák
2024-08-01  3:24 ` Marek Olšák
2024-08-01  4:34   ` Khatri, Sunil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00e7303a-e7cb-43ed-b2a3-8aebc388eeda@amd.com \
    --to=sukhatri@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=maraeo@gmail.com \
    --cc=marek.olsak@amd.com \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=sunil.khatri@amd.com \
    --cc=tursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox