All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v4 00/12] Ring padding micro-optimisation etc
@ 2024-12-27 11:19 Tvrtko Ursulin
  2024-12-27 11:19 ` [PATCH 01/12] drm/amdgpu: Use memset32 for IB padding Tvrtko Ursulin
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2024-12-27 11:19 UTC (permalink / raw)
  To: amd-gfx; +Cc: kernel-dev, Tvrtko Ursulin, Christian König, Sunil Khatri

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

[Re-send after fdo mailman got recovered out of the -ENOSPC state.]

There is a few ideas in this series and not all might stick.

Trivial stuff aside, the two main things to higlight are:

1) Ther departure from the existing state of "duplicate everything" by
consolidating some SDMA insert nop vfuncs.

2) Conversion of amdgpu_ring_write() to variadic to allow for more compact
compiled code.

For the latter I have only included VCE, GFX v10.0 and SDMA v5.2 as examples.
(But note the code shrink is already noticable with even only those three.)

But it is churny and looks different so people may not like it. TBD.

Other than those two, the remaining general idea of the series is to consolidate
the padding approach to memset32, especially on rings with 64 or 256 dword
alignment.

Binary size comparison:

    text    data     bss     dec     hex filename
 10439777   542501  188232 11170510   aa72ce amdgpu.ko.before
 10412793   542609  188232 11143634   aa09d2 amdgpu.ko.after

Cc: Christian König <christian.koenig@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>

Tvrtko Ursulin (12):
  drm/amdgpu: Use memset32 for IB padding
  drm/amdgpu: Use memset32 for ring clearing
  drm/amdgpu: Cache SDMA instance and index in the ring
  drm/amdgpu: Consolidate a bunch of similar sdma insert nop vfuncs
  drm/amdgpu: Use memset32 for sdma insert nops
  drm/amdgpu: Use amdgpu_ring_fill() for VPE padding
  drm/amdgpu: Convert JPEG, VCE and UVD to more efficient ring padding
  drm/amdgpu: Cache some values in ring emission helpers
  drm/amdgpu: Optimise amdgpu_ring_write()
  drm/amdgpu: Convert VCE to variadic amdgpu_ring_write()
  drm/amdgpu: Convert GFX v10.0 to variadic amdgpu_ring_write()
  drm/amdgpu: Convert SDMA v5.2 to variadic amdgpu_ring_write()

 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  32 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 321 +++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c |  43 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  22 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c  |  13 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c    |   4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 399 ++++++++++++-----------
 drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c   |   8 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c   |   8 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   |  24 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   |  24 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  31 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c |  31 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   |  28 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 182 +++++------
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   |  31 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   |  31 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c    |   7 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c    |   7 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c    |   7 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    |   7 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c    |   9 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c    |   8 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c    |   7 +-
 26 files changed, 745 insertions(+), 551 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-01-04 16:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-27 11:19 [RFC v4 00/12] Ring padding micro-optimisation etc Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 01/12] drm/amdgpu: Use memset32 for IB padding Tvrtko Ursulin
2025-01-02 13:45   ` Christian König
2025-01-03 14:40     ` Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 02/12] drm/amdgpu: Use memset32 for ring clearing Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 03/12] drm/amdgpu: Cache SDMA instance and index in the ring Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 04/12] drm/amdgpu: Consolidate a bunch of similar sdma insert nop vfuncs Tvrtko Ursulin
2025-01-02 13:49   ` Christian König
2025-01-03 14:25     ` Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 05/12] drm/amdgpu: Use memset32 for sdma insert nops Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 06/12] drm/amdgpu: Use amdgpu_ring_fill() for VPE padding Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 07/12] drm/amdgpu: Convert JPEG, VCE and UVD to more efficient ring padding Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 08/12] drm/amdgpu: Cache some values in ring emission helpers Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 09/12] drm/amdgpu: Optimise amdgpu_ring_write() Tvrtko Ursulin
2025-01-02 13:55   ` Christian König
2025-01-03 13:02     ` Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 10/12] drm/amdgpu: Convert VCE to variadic amdgpu_ring_write() Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 11/12] drm/amdgpu: Convert GFX v10.0 " Tvrtko Ursulin
2024-12-27 11:19 ` [PATCH 12/12] drm/amdgpu: Convert SDMA v5.2 " Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.