From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91E5FD7879C for ; Fri, 19 Dec 2025 16:33:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 326BD10F03A; Fri, 19 Dec 2025 16:33:57 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="bPLz0fC8"; dkim-atps=neutral Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) by gabe.freedesktop.org (Postfix) with ESMTPS id D404E10F03A for ; Fri, 19 Dec 2025 16:33:55 +0000 (UTC) Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-7c730af8d69so1375531a34.1 for ; Fri, 19 Dec 2025 08:33:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766162035; x=1766766835; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nwlCeIRnkqM4EM2tZw3L6oK0NanH1zxGsc6ox46UU+g=; b=bPLz0fC8o2bbHT82jHhkYiJpE3AnxfWfHlzVQuGxKHaQD+l4yZVizrA/cb6R5c53hO 5KaVWe819hdpNNc79Re2a7Qt012KSoTqVjn8u3D03TjVVbOGk5jymK8pwuv3pCkDCDGg NfAhSqBcIY17O6Q0iGyPOHfNC72wD8ZDcCG3PQbCbeQdahvSPtklbEvIIEH6MXjZ11gd 7X10hsK8zHI1xQOt+iAhDvut+3jYkxwUipvwhYBs9UuDZ/6STh+ifqCm/tKwPTb6FzEh NVzDoRrsUShxNTSAQqbIv8cNL/4djfE2vZ9DQ22nr0rNLfkceWHD3aOEkc/ouQgm0qXl 9QYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766162035; x=1766766835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=nwlCeIRnkqM4EM2tZw3L6oK0NanH1zxGsc6ox46UU+g=; b=n9HQ//IeAhjYhq0OANA9qM56ShXug6XfqXLMXrDG+1nH795rB2bQRuzoS5+4UAsGV0 uzN5y9kx10Ux5WDyoRyerLrAi5jlddmxj1bUTovtXEmVb6Bt+5YOiIzhQ0J0hggwNmgT nCYfijOSXdakInn3KcSz5OdOpLKJIgBIB8/mlVbPPoi89o61M3D7SnKDnTiqv2BdAfVx 57Q9bGjuTzsCkWJSDSI+IxlH9Y38gDc0c1wC6fghn2T4/R0DJhtt6Zjxek40w/EY3/US O1cCK4sN9W6A6aOKsAs86zQr6HqoYTsbDLwwPd6vnaAWA6Ev3aaTZdKMB3RtduRlAZ6y xwIw== X-Gm-Message-State: AOJu0YyE+s62rPbSYj0C/6qTsR1bUJ3rVR9qf8sBUo+lncZpWpPab4de HilcCGB1a9qyCkSD0Xay2LrvECFYOYJWIQcM2GzbPfeErQ+92/xbIAA2 X-Gm-Gg: AY/fxX4j+YftmOsPIODqJcTR8E+xXZm2gxmhtAE/H70eQ6JAZI5Ek5rxYyL2Hlu6ez5 W+3jgFVCCKiZa+3zF3sS6A0VzmggoV6+UahWMHQFMIYnB5VB4d+4xnSu1Kdj/FxddZ+yha0Lcqf oW36DvuWX/onMn08oBXp8hTvwpvRZK2lUyNh2HP7d+ukR+ueJsNRBZbr+Qngw37U6RIlR7JufoJ O2F1vwKktHlIAU8UJ7IgFE6zyBZw+2UUm0hzTb3RwT+r7fj3zYpAhJOFO2XyUnEllNXVWsIpq22 NGevuJUh2vFViZPAQF+2gAhFk27FfnsaMFUIzzn+qJASg/IK+tgZPskqnkumf7cjGeJD8bwkGFj uQhcfKRKYaPeGvZBPIwanV8/liNrXEgbbJWp0aHpd/Ejpjsfqr98iyepPU0RaTWYWFf+UCzB+PZ aMYOCY4nQkfDrLNC+cV4p0nV3pE5A= X-Google-Smtp-Source: AGHT+IHo6uYzLAE9FBtquEe9ajO5XOcL8FpRw3mpSnxZb7nmfba45BK2bRtJKDCRBzm5P7DMKmrlqQ== X-Received: by 2002:a05:6830:3488:b0:7cc:4d72:5870 with SMTP id 46e09a7af769-7cc66a4b672mr1583705a34.15.1766162034820; Fri, 19 Dec 2025 08:33:54 -0800 (PST) Received: from timur-max.localnet ([189.171.135.243]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7cc667ec367sm1942634a34.23.2025.12.19.08.33.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Dec 2025 08:33:54 -0800 (PST) From: Timur =?UTF-8?B?S3Jpc3TDs2Y=?= To: Alex Deucher Cc: amd-gfx@lists.freedesktop.org, Alex Deucher Subject: Re: [PATCH 6/7] drm/amdgpu/gfx9: rework pipeline sync packet sequence Date: Fri, 19 Dec 2025 10:33:53 -0600 Message-ID: <7981459.W097sEU6C4@timur-max> In-Reply-To: References: <20251218224141.5841-1-alexander.deucher@amd.com> <3069977.btlEUcBR6m@timur-max> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On 2025. december 19., p=C3=A9ntek 9:37:16 k=C3=B6z=C3=A9ps=C5=91 =C3=A1lla= mokbeli z=C3=B3naid=C5=91 Alex Deucher=20 wrote: > On Thu, Dec 18, 2025 at 9:36=E2=80=AFPM Timur Krist=C3=B3f =20 wrote: > > On 2025. december 18., cs=C3=BCt=C3=B6rt=C3=B6k 16:41:40 k=C3=B6z=C3=A9= ps=C5=91 =C3=A1llamokbeli z=C3=B3naid=C5=91 Alex > >=20 > > Deucher wrote: > > > Replace WAIT_REG_MEM with EVENT_WRITE flushes for all > > > shader types and PFP_SYNC_ME. That should accomplish > > > the same thing and avoid having to wait on a fence > > > preventing any issues with pipeline syncs during > > > queue resets. > > >=20 > > > Signed-off-by: Alex Deucher > > > --- > > >=20 > > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 ++++++++++++++++++-------= =2D- > > > 1 file changed, 21 insertions(+), 11 deletions(-) > > >=20 > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index > > > 7b012ca1153ea..d9dee3c11a05d > > > 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > @@ -5572,15 +5572,26 @@ static void gfx_v9_0_ring_emit_fence(struct > > > amdgpu_ring *ring, u64 addr, amdgpu_ring_write(ring, 0); > > >=20 > > > } > > >=20 > > > -static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *rin= g) > > > +static void gfx_v9_0_ring_emit_event_write(struct amdgpu_ring *ring, > > > + uint32_t event_type, > > > + uint32_t > >=20 > > event_index) > >=20 > > > { > > >=20 > > > - int usepfp =3D (ring->funcs->type =3D=3D AMDGPU_RING_TYPE_GFX); > > > - uint32_t seq =3D ring->fence_drv.sync_seq; > > > - uint64_t addr =3D ring->fence_drv.gpu_addr; > > > + amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0)); > > > + amdgpu_ring_write(ring, EVENT_TYPE(event_type) | > > > + EVENT_INDEX(event_index)); > > > +} > > >=20 > > > - gfx_v9_0_wait_reg_mem(ring, usepfp, 1, 0, > > > - lower_32_bits(addr), > >=20 > > upper_32_bits(addr), > >=20 > > > - seq, 0xffffffff, 4); > > > +static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *rin= g) > > > +{ > > > + if (ring->funcs->type =3D=3D AMDGPU_RING_TYPE_GFX) { > > > + gfx_v9_0_ring_emit_event_write(ring, VS_PARTIAL_FLUSH, > >=20 > > 4); > >=20 > > Is VS_PARTIAL_FLUSH necessary when we already have PS_PARTIAL_FLUSH? > > When we wait for all PS to finish, wouldn't that imply that all VS had > > already finished as well? >=20 > I'm not sure. The CP docs recommend all 3 if you want to wait for the > engine to idle. Alright, it doesn't hurt to have it here. >=20 > > > + gfx_v9_0_ring_emit_event_write(ring, PS_PARTIAL_FLUSH, > >=20 > > 4); > >=20 > > > + gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH, > >=20 > > 4); > >=20 > > > + amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, > >=20 > > 0)); > >=20 > > > + amdgpu_ring_write(ring, 0x0); > >=20 > > The above sequence just waits for all shaders to finish, but as far as I > > understand it doesn't wait for memory writes and cache flushes. Please > > correct me if I'm wrong about this. For that, I think we do need an > > ACQUIRE_MEM packet. (And, if the ACQUIRE_MEM is done on the PFP then we > > won't need the PFP_SYNC_ME.) >=20 > There is already a RELEASE_MEM (the fence from the previous job) prior > to this packet that would have flushed the caches. We just want to > block the PFP from further fetching until that is complete. In the > good case, the RELEASE_MEM would have handled pipeline idling and > cache flushes so these would be effectively noops and in the reset > case, we don't care because that bad job is gone anyway. I guess > probably all we really need is the PFP_SYNC_ME. >=20 > Alex RELEASE_MEM doesn't wait for the GPU to go idle, RELEASE_MEM just promises = to=20 write to the given fence address when the specified operations (eg. shaders= and=20 cache flush) are complete. Here in the pipeline sync, we actually want to w= ait=20 for the GPU to go idle, and AFAIK we need an ACQUIRE_MEM for that. >=20 > > > + } else { > > > + gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH, > >=20 > > 4); > >=20 > > > + } > > >=20 > > > } > > > =20 > > > static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring, > > >=20 > > > @@ -7404,7 +7415,7 @@ static const struct amdgpu_ring_funcs > > > gfx_v9_0_ring_funcs_gfx =3D { .set_wptr =3D gfx_v9_0_ring_set_wptr_gf= x, > > >=20 > > > .emit_frame_size =3D /* totally 242 maximum if 16 IBs */ > > > =20 > > > 5 + /* COND_EXEC */ > > >=20 > > > - 7 + /* PIPELINE_SYNC */ > > > + 8 + /* PIPELINE_SYNC */ > > >=20 > > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + > > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + > > > 2 + /* VM_FLUSH */ > > >=20 > > > @@ -7460,7 +7471,7 @@ static const struct amdgpu_ring_funcs > > > gfx_v9_0_sw_ring_funcs_gfx =3D { .set_wptr =3D amdgpu_sw_ring_set_wpt= r_gfx, > > >=20 > > > .emit_frame_size =3D /* totally 242 maximum if 16 IBs */ > > > =20 > > > 5 + /* COND_EXEC */ > > >=20 > > > - 7 + /* PIPELINE_SYNC */ > > > + 8 + /* PIPELINE_SYNC */ > > >=20 > > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + > > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + > > > 2 + /* VM_FLUSH */ > > >=20 > > > @@ -7521,7 +7532,7 @@ static const struct amdgpu_ring_funcs > > > gfx_v9_0_ring_funcs_compute =3D { 20 + /* gfx_v9_0_ring_emit_gds_swit= ch */ > > >=20 > > > 7 + /* gfx_v9_0_ring_emit_hdp_flush */ > > > 5 + /* hdp invalidate */ > > >=20 > > > - 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ > > > + 2 + /* gfx_v9_0_ring_emit_pipeline_sync */ > > >=20 > > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + > > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + > > > 8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user > >=20 > > fence, vm fence */ > >=20 > > > @@ -7564,7 +7575,6 @@ static const struct amdgpu_ring_funcs > > > gfx_v9_0_ring_funcs_kiq =3D { 20 + /* gfx_v9_0_ring_emit_gds_switch */ > > >=20 > > > 7 + /* gfx_v9_0_ring_emit_hdp_flush */ > > > 5 + /* hdp invalidate */ > > >=20 > > > - 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ > > >=20 > > > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + > > > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + > > > 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user > >=20 > > fence, vm fence */