From: "Timur Kristóf" <timur.kristof@gmail.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: amd-gfx@lists.freedesktop.org, Alex Deucher <alexander.deucher@amd.com>
Subject: Re: [PATCH 6/7] drm/amdgpu/gfx9: rework pipeline sync packet sequence
Date: Fri, 19 Dec 2025 10:33:53 -0600 [thread overview]
Message-ID: <7981459.W097sEU6C4@timur-max> (raw)
In-Reply-To: <CADnq5_OwQpWwMPywzApHMe-y2TS68217bTh3afwgjPYoOf9jtQ@mail.gmail.com>
On 2025. december 19., péntek 9:37:16 középső államokbeli zónaidő Alex Deucher
wrote:
> On Thu, Dec 18, 2025 at 9:36 PM Timur Kristóf <timur.kristof@gmail.com>
wrote:
> > On 2025. december 18., csütörtök 16:41:40 középső államokbeli zónaidő Alex
> >
> > Deucher wrote:
> > > Replace WAIT_REG_MEM with EVENT_WRITE flushes for all
> > > shader types and PFP_SYNC_ME. That should accomplish
> > > the same thing and avoid having to wait on a fence
> > > preventing any issues with pipeline syncs during
> > > queue resets.
> > >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > >
> > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 ++++++++++++++++++---------
> > > 1 file changed, 21 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index
> > > 7b012ca1153ea..d9dee3c11a05d
> > > 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > @@ -5572,15 +5572,26 @@ static void gfx_v9_0_ring_emit_fence(struct
> > > amdgpu_ring *ring, u64 addr, amdgpu_ring_write(ring, 0);
> > >
> > > }
> > >
> > > -static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
> > > +static void gfx_v9_0_ring_emit_event_write(struct amdgpu_ring *ring,
> > > + uint32_t event_type,
> > > + uint32_t
> >
> > event_index)
> >
> > > {
> > >
> > > - int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> > > - uint32_t seq = ring->fence_drv.sync_seq;
> > > - uint64_t addr = ring->fence_drv.gpu_addr;
> > > + amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> > > + amdgpu_ring_write(ring, EVENT_TYPE(event_type) |
> > > + EVENT_INDEX(event_index));
> > > +}
> > >
> > > - gfx_v9_0_wait_reg_mem(ring, usepfp, 1, 0,
> > > - lower_32_bits(addr),
> >
> > upper_32_bits(addr),
> >
> > > - seq, 0xffffffff, 4);
> > > +static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
> > > +{
> > > + if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
> > > + gfx_v9_0_ring_emit_event_write(ring, VS_PARTIAL_FLUSH,
> >
> > 4);
> >
> > Is VS_PARTIAL_FLUSH necessary when we already have PS_PARTIAL_FLUSH?
> > When we wait for all PS to finish, wouldn't that imply that all VS had
> > already finished as well?
>
> I'm not sure. The CP docs recommend all 3 if you want to wait for the
> engine to idle.
Alright, it doesn't hurt to have it here.
>
> > > + gfx_v9_0_ring_emit_event_write(ring, PS_PARTIAL_FLUSH,
> >
> > 4);
> >
> > > + gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH,
> >
> > 4);
> >
> > > + amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME,
> >
> > 0));
> >
> > > + amdgpu_ring_write(ring, 0x0);
> >
> > The above sequence just waits for all shaders to finish, but as far as I
> > understand it doesn't wait for memory writes and cache flushes. Please
> > correct me if I'm wrong about this. For that, I think we do need an
> > ACQUIRE_MEM packet. (And, if the ACQUIRE_MEM is done on the PFP then we
> > won't need the PFP_SYNC_ME.)
>
> There is already a RELEASE_MEM (the fence from the previous job) prior
> to this packet that would have flushed the caches. We just want to
> block the PFP from further fetching until that is complete. In the
> good case, the RELEASE_MEM would have handled pipeline idling and
> cache flushes so these would be effectively noops and in the reset
> case, we don't care because that bad job is gone anyway. I guess
> probably all we really need is the PFP_SYNC_ME.
>
> Alex
RELEASE_MEM doesn't wait for the GPU to go idle, RELEASE_MEM just promises to
write to the given fence address when the specified operations (eg. shaders and
cache flush) are complete. Here in the pipeline sync, we actually want to wait
for the GPU to go idle, and AFAIK we need an ACQUIRE_MEM for that.
>
> > > + } else {
> > > + gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH,
> >
> > 4);
> >
> > > + }
> > >
> > > }
> > >
> > > static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
> > >
> > > @@ -7404,7 +7415,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_gfx = { .set_wptr = gfx_v9_0_ring_set_wptr_gfx,
> > >
> > > .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > >
> > > 5 + /* COND_EXEC */
> > >
> > > - 7 + /* PIPELINE_SYNC */
> > > + 8 + /* PIPELINE_SYNC */
> > >
> > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > 2 + /* VM_FLUSH */
> > >
> > > @@ -7460,7 +7471,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_sw_ring_funcs_gfx = { .set_wptr = amdgpu_sw_ring_set_wptr_gfx,
> > >
> > > .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > >
> > > 5 + /* COND_EXEC */
> > >
> > > - 7 + /* PIPELINE_SYNC */
> > > + 8 + /* PIPELINE_SYNC */
> > >
> > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > 2 + /* VM_FLUSH */
> > >
> > > @@ -7521,7 +7532,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_compute = { 20 + /* gfx_v9_0_ring_emit_gds_switch */
> > >
> > > 7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > > 5 + /* hdp invalidate */
> > >
> > > - 7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > + 2 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > >
> > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > 8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user
> >
> > fence, vm fence */
> >
> > > @@ -7564,7 +7575,6 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_kiq = { 20 + /* gfx_v9_0_ring_emit_gds_switch */
> > >
> > > 7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > > 5 + /* hdp invalidate */
> > >
> > > - 7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > >
> > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user
> >
> > fence, vm fence */
next prev parent reply other threads:[~2025-12-19 16:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-18 22:41 [PATCH 1/7] drm/amdgpu: don't reemit ring contents more than once Alex Deucher
2025-12-18 22:41 ` [PATCH 2/7] drm/amdgpu: always backup and reemit fences Alex Deucher
2025-12-18 22:41 ` [PATCH 3/7] drm/amdgpu: use dma_fence_get_status() for adapter reset Alex Deucher
2025-12-18 22:41 ` [PATCH 4/7] drm/amdgpu: avoid a warning in timedout job handler Alex Deucher
2025-12-18 22:41 ` [PATCH 5/7] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
2025-12-18 22:41 ` [PATCH 6/7] drm/amdgpu/gfx9: rework pipeline sync packet sequence Alex Deucher
2025-12-19 1:17 ` Timur Kristóf
2025-12-19 15:37 ` Alex Deucher
2025-12-19 16:33 ` Timur Kristóf [this message]
2025-12-19 16:46 ` Deucher, Alexander
2025-12-18 22:41 ` [PATCH 7/7] drm/amdgpu/gfx9: Implement KGQ ring reset Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7981459.W097sEU6C4@timur-max \
--to=timur.kristof@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=alexdeucher@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.