* [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
@ 2025-03-20 10:11 Kenneth Graunke
2025-03-20 10:17 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Kenneth Graunke @ 2025-03-20 10:11 UTC (permalink / raw)
To: intel-xe; +Cc: zhanjun.dong, Kenneth Graunke, stable
Historically, the Vertex Fetcher unit has not been an L3 client. That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.
Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3. However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.
To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe." In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.
Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.
xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches). Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Cc: stable@vger.kernel.org # v6.13+
---
drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index a255946b6f77e..8cfcd3360896c 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -41,6 +41,7 @@
#define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
+#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
#define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 0c230ee53bba5..9d8901a33205a 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
int i)
{
- u32 flags = PIPE_CONTROL_CS_STALL |
+ u32 flags0 = 0;
+ u32 flags1 = PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
PIPE_CONTROL_STORE_DATA_INDEX;
if (invalidate_tlb)
- flags |= PIPE_CONTROL_TLB_INVALIDATE;
+ flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
- flags &= ~mask_flags;
+ flags1 &= ~mask_flags;
- return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
+ if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
+ flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
+
+ return emit_pipe_control(dw, i, flags0, flags1,
+ LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
}
static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
@ 2025-03-20 10:17 ` Patchwork
2025-03-27 22:39 ` [PATCH] " Dong, Zhanjun
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Patchwork @ 2025-03-20 10:17 UTC (permalink / raw)
To: Kenneth Graunke; +Cc: intel-xe
== Series Details ==
Series: drm/xe: Invalidate L3 read-only cachelines for geometry streams too
URL : https://patchwork.freedesktop.org/series/146522/
State : failure
== Summary ==
Address 'kenneth@whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
2025-03-20 10:17 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
@ 2025-03-27 22:39 ` Dong, Zhanjun
2025-03-27 23:49 ` Dong, Zhanjun
2025-03-28 19:18 ` Rodrigo Vivi
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Dong, Zhanjun @ 2025-03-27 22:39 UTC (permalink / raw)
To: Kenneth Graunke, intel-xe; +Cc: stable
On 2025-03-20 6:11 a.m., Kenneth Graunke wrote:
> Historically, the Vertex Fetcher unit has not been an L3 client. That
> meant that, when a buffer containing vertex data was written to, it was
> necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
> VF L2 cachelines associated with that buffer, so the new value would be
> properly read from memory.
>
> Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
> have included an "L3 Bypass Enable" bit which userspace drivers can set
> to request that the vertex fetcher unit snoop L3. However, unlike most
> true L3 clients, the "VF Cache Invalidate" bit continues to only
> invalidate the VF L2 cache - and not any associated L3 lines.
>
> To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
> Bit", which according to the docs, "controls the invalidation of the
> Geometry streams cached in L3 cache at the top of the pipe." In other
> words, the vertex and index buffer data that gets cached in L3 when
> "L3 Bypass Disable" is set.
>
> Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
> whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
> Cache Invalidate so that both L2 and L3 vertex data is invalidated.
>
> xe is issuing VF cache invalidates too (which handles cases like CPU
> writes to a buffer between GPU batches). Because userspace may enable
> L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
>
> Fixes significant flickering in Firefox on Meteorlake, which was writing
> to vertex buffers via the CPU between batches; the missing L3 Read Only
> invalidates were causing the vertex fetcher to read stale data from L3.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
> Cc: stable@vger.kernel.org # v6.13+
> ---
> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
> drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
> 2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index a255946b6f77e..8cfcd3360896c 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -41,6 +41,7 @@
>
> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>
> +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
> #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
>
> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 0c230ee53bba5..9d8901a33205a 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
> static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> int i)
> {
> - u32 flags = PIPE_CONTROL_CS_STALL |
> + u32 flags0 = 0;
> + u32 flags1 = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
> @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> PIPE_CONTROL_STORE_DATA_INDEX;
>
> if (invalidate_tlb)
> - flags |= PIPE_CONTROL_TLB_INVALIDATE;
> + flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>
> - flags &= ~mask_flags;
> + flags1 &= ~mask_flags;
>
> - return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> + if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
> + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
> +
> + return emit_pipe_control(dw, i, flags0, flags1,
> + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
New PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE defined as spec documented.
New flags0/1 handling looks good to me.
For some reason this patch did not triggers automatic CI run:
Address 'kenneth@whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!
Let me check what we can do. CI run result is required before moving
forward.
> }
>
> static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-27 22:39 ` [PATCH] " Dong, Zhanjun
@ 2025-03-27 23:49 ` Dong, Zhanjun
0 siblings, 0 replies; 9+ messages in thread
From: Dong, Zhanjun @ 2025-03-27 23:49 UTC (permalink / raw)
To: Kenneth Graunke, intel-xe; +Cc: stable
Hi Kenneth,
I'm trying to resend your patch from me to trigger the CI run,
meanwhile, I found your patch missed "Signed-off-by" tag, could you
resend with this tag? If CI still not run, I will resend your patch and try.
According to:
https://docs.kernel.org/process/5.Posting.html#before-creating-patches
Code without a proper signoff cannot be merged into the mainline.
Regards,
Zhanjun Dong
On 2025-03-27 6:39 p.m., Dong, Zhanjun wrote:
>
> On 2025-03-20 6:11 a.m., Kenneth Graunke wrote:
>> Historically, the Vertex Fetcher unit has not been an L3 client. That
>> meant that, when a buffer containing vertex data was written to, it was
>> necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
>> VF L2 cachelines associated with that buffer, so the new value would be
>> properly read from memory.
>>
>> Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
>> have included an "L3 Bypass Enable" bit which userspace drivers can set
>> to request that the vertex fetcher unit snoop L3. However, unlike most
>> true L3 clients, the "VF Cache Invalidate" bit continues to only
>> invalidate the VF L2 cache - and not any associated L3 lines.
>>
>> To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
>> Bit", which according to the docs, "controls the invalidation of the
>> Geometry streams cached in L3 cache at the top of the pipe." In other
>> words, the vertex and index buffer data that gets cached in L3 when
>> "L3 Bypass Disable" is set.
>>
>> Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
>> whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
>> Cache Invalidate so that both L2 and L3 vertex data is invalidated.
>>
>> xe is issuing VF cache invalidates too (which handles cases like CPU
>> writes to a buffer between GPU batches). Because userspace may enable
>> L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
>>
>> Fixes significant flickering in Firefox on Meteorlake, which was writing
>> to vertex buffers via the CPU between batches; the missing L3 Read Only
>> invalidates were causing the vertex fetcher to read stale data from L3.
>>
>> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
>> Cc: stable@vger.kernel.org # v6.13+
>> ---
>> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
>> drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
>> 2 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/
>> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> index a255946b6f77e..8cfcd3360896c 100644
>> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> @@ -41,6 +41,7 @@
>> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|
>> ((len)-2))
>> +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE
>> BIT(10) /* gen12 */
>> #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /*
>> gen12 */
>> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/
>> xe_ring_ops.c
>> index 0c230ee53bba5..9d8901a33205a 100644
>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>> @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0,
>> u32 bit_group_1, u32 offset,
>> static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb,
>> u32 *dw,
>> int i)
>> {
>> - u32 flags = PIPE_CONTROL_CS_STALL |
>> + u32 flags0 = 0;
>> + u32 flags1 = PIPE_CONTROL_CS_STALL |
>> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
>> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
>> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>> @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags,
>> bool invalidate_tlb, u32 *dw,
>> PIPE_CONTROL_STORE_DATA_INDEX;
>> if (invalidate_tlb)
>> - flags |= PIPE_CONTROL_TLB_INVALIDATE;
>> + flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>> - flags &= ~mask_flags;
>> + flags1 &= ~mask_flags;
>> - return emit_pipe_control(dw, i, 0, flags,
>> LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>> + if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
>> + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
>> +
>> + return emit_pipe_control(dw, i, flags0, flags1,
>> + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> New PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE defined as spec
> documented.
> New flags0/1 handling looks good to me.
>
> For some reason this patch did not triggers automatic CI run:
>
> Address 'kenneth@whitecape.org' is not on the allowlist!
> Exception occurred during validation, bailing out!
>
> Let me check what we can do. CI run result is required before moving
> forward.
>
>> }
>> static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
2025-03-20 10:17 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
2025-03-27 22:39 ` [PATCH] " Dong, Zhanjun
@ 2025-03-28 19:18 ` Rodrigo Vivi
2025-03-28 22:34 ` Kenneth Graunke
2025-03-28 22:37 ` [PATCH v2] " Kenneth Graunke
2025-03-28 21:48 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev2) Patchwork
2025-03-28 22:40 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev3) Patchwork
4 siblings, 2 replies; 9+ messages in thread
From: Rodrigo Vivi @ 2025-03-28 19:18 UTC (permalink / raw)
To: Kenneth Graunke; +Cc: intel-xe, zhanjun.dong, stable
On Thu, Mar 20, 2025 at 03:11:55AM -0700, Kenneth Graunke wrote:
> Historically, the Vertex Fetcher unit has not been an L3 client. That
> meant that, when a buffer containing vertex data was written to, it was
> necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
> VF L2 cachelines associated with that buffer, so the new value would be
> properly read from memory.
>
> Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
> have included an "L3 Bypass Enable" bit which userspace drivers can set
> to request that the vertex fetcher unit snoop L3. However, unlike most
> true L3 clients, the "VF Cache Invalidate" bit continues to only
> invalidate the VF L2 cache - and not any associated L3 lines.
>
> To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
> Bit", which according to the docs, "controls the invalidation of the
> Geometry streams cached in L3 cache at the top of the pipe." In other
> words, the vertex and index buffer data that gets cached in L3 when
> "L3 Bypass Disable" is set.
>
> Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
> whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
> Cache Invalidate so that both L2 and L3 vertex data is invalidated.
>
> xe is issuing VF cache invalidates too (which handles cases like CPU
> writes to a buffer between GPU batches). Because userspace may enable
> L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
>
> Fixes significant flickering in Firefox on Meteorlake, which was writing
> to vertex buffers via the CPU between batches; the missing L3 Read Only
> invalidates were causing the vertex fetcher to read stale data from L3.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
> Cc: stable@vger.kernel.org # v6.13+
> ---
> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
> drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
> 2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index a255946b6f77e..8cfcd3360896c 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -41,6 +41,7 @@
>
> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>
> +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
this definitely matches the spec on the bitgroup0, but Mesa
code got me a bit confused:
PIPE_CONTROL_L3_READ_ONLY_CACHE_INVALIDATE = (1 << 28),
> #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
>
> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 0c230ee53bba5..9d8901a33205a 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
> static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> int i)
> {
> - u32 flags = PIPE_CONTROL_CS_STALL |
> + u32 flags0 = 0;
> + u32 flags1 = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
> @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> PIPE_CONTROL_STORE_DATA_INDEX;
>
> if (invalidate_tlb)
> - flags |= PIPE_CONTROL_TLB_INVALIDATE;
> + flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>
> - flags &= ~mask_flags;
> + flags1 &= ~mask_flags;
>
> - return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> + if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
> + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
> +
> + return emit_pipe_control(dw, i, flags0, flags1,
> + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
Well, it respects the spec and if it is solving the issue let's go with it.
But last question, should we expect some performance change with this
extra invalidation in the Geometry streams caches?
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(I tried to trigger the CI manually, please confirm it is okay to
add your signed-off-by so we can get this merged soon)
Thanks a lot for finding and fixing this.
> }
>
> static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev2)
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
` (2 preceding siblings ...)
2025-03-28 19:18 ` Rodrigo Vivi
@ 2025-03-28 21:48 ` Patchwork
2025-03-28 22:40 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev3) Patchwork
4 siblings, 0 replies; 9+ messages in thread
From: Patchwork @ 2025-03-28 21:48 UTC (permalink / raw)
To: Kenneth Graunke; +Cc: intel-xe
== Series Details ==
Series: drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev2)
URL : https://patchwork.freedesktop.org/series/146522/
State : failure
== Summary ==
Address 'kenneth@whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-28 19:18 ` Rodrigo Vivi
@ 2025-03-28 22:34 ` Kenneth Graunke
2025-03-28 22:37 ` [PATCH v2] " Kenneth Graunke
1 sibling, 0 replies; 9+ messages in thread
From: Kenneth Graunke @ 2025-03-28 22:34 UTC (permalink / raw)
To: Rodrigo Vivi; +Cc: intel-xe, zhanjun.dong, stable
On Friday, March 28, 2025 12:18:29 PM Pacific Daylight Time Rodrigo Vivi wrote:
> On Thu, Mar 20, 2025 at 03:11:55AM -0700, Kenneth Graunke wrote:
> > Historically, the Vertex Fetcher unit has not been an L3 client. That
> > meant that, when a buffer containing vertex data was written to, it was
> > necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
> > VF L2 cachelines associated with that buffer, so the new value would be
> > properly read from memory.
> >
> > Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
> > have included an "L3 Bypass Enable" bit which userspace drivers can set
> > to request that the vertex fetcher unit snoop L3. However, unlike most
> > true L3 clients, the "VF Cache Invalidate" bit continues to only
> > invalidate the VF L2 cache - and not any associated L3 lines.
> >
> > To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
> > Bit", which according to the docs, "controls the invalidation of the
> > Geometry streams cached in L3 cache at the top of the pipe." In other
> > words, the vertex and index buffer data that gets cached in L3 when
> > "L3 Bypass Disable" is set.
> >
> > Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
> > whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
> > Cache Invalidate so that both L2 and L3 vertex data is invalidated.
> >
> > xe is issuing VF cache invalidates too (which handles cases like CPU
> > writes to a buffer between GPU batches). Because userspace may enable
> > L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
> >
> > Fixes significant flickering in Firefox on Meteorlake, which was writing
> > to vertex buffers via the CPU between batches; the missing L3 Read Only
> > invalidates were causing the vertex fetcher to read stale data from L3.
> >
> > References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
> > Cc: stable@vger.kernel.org # v6.13+
> > ---
> >
> > drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
> > drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
> > 2 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> > b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h index
> > a255946b6f77e..8cfcd3360896c 100644
> > --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> > @@ -41,6 +41,7 @@
> >
> > #define
> > GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
> >
> > +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10)
/* gen12 */
>
> this definitely matches the spec on the bitgroup0, but Mesa
> code got me a bit confused:
> PIPE_CONTROL_L3_READ_ONLY_CACHE_INVALIDATE = (1 << 28),
Yeah...the values in iris_context.h is just a bitfield enum of every possible
flag, in no particular order. The comment above tries to clarify that they
don't match the hardware packing. Sorry for the confusion! src/intel/genxml/
gen125.xml defines PIPE_CONTROL's actual bit layout where things get packed.
You can see that it's bit 10 there.
> > #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9)
/* gen12 */
> >
> > #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c index 0c230ee53bba5..9d8901a33205a
> > 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32
> > bit_group_1, u32 offset,>
> > static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32
> > *dw,>
> > int i)
> >
> > {
> >
> > - u32 flags = PIPE_CONTROL_CS_STALL |
> > + u32 flags0 = 0;
> > + u32 flags1 = PIPE_CONTROL_CS_STALL |
> >
> > PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> > PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
> > PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
> >
> > @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool
> > invalidate_tlb, u32 *dw,>
> > PIPE_CONTROL_STORE_DATA_INDEX;
> >
> > if (invalidate_tlb)
> >
> > - flags |= PIPE_CONTROL_TLB_INVALIDATE;
> > + flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
> >
> > - flags &= ~mask_flags;
> > + flags1 &= ~mask_flags;
> >
> > - return emit_pipe_control(dw, i, 0, flags,
> > LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0); + if (flags1 &
> > PIPE_CONTROL_VF_CACHE_INVALIDATE)
> > + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
> > +
> > + return emit_pipe_control(dw, i, flags0, flags1,
> > +
LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>
> Well, it respects the spec and if it is solving the issue let's go with it.
>
> But last question, should we expect some performance change with this
> extra invalidation in the Geometry streams caches?
Not that I'm aware of. I don't think we saw much of a performance drop when
enabling L3 bypass and the extra flushing. But having VF be an L3 client along
with most everything else let us have compute shaders do fewer expensive Data
Cache Flushes and instead only do cheaper HDC flushes, which did lead to
performance gains. So if there was a drop, I think it was more than offset.
I haven't specifically tried to benchmark this patch.
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>
> (I tried to trigger the CI manually, please confirm it is okay to
> add your signed-off-by so we can get this merged soon)
Thank you! I'll resend with my Signed-off-by for clarity (I definitely meant to
include it)
> Thanks a lot for finding and fixing this.
No problem!
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
2025-03-28 19:18 ` Rodrigo Vivi
2025-03-28 22:34 ` Kenneth Graunke
@ 2025-03-28 22:37 ` Kenneth Graunke
1 sibling, 0 replies; 9+ messages in thread
From: Kenneth Graunke @ 2025-03-28 22:37 UTC (permalink / raw)
To: intel-xe; +Cc: Kenneth Graunke, stable, Rodrigo Vivi
Historically, the Vertex Fetcher unit has not been an L3 client. That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.
Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3. However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.
To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe." In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.
Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.
xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches). Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index a255946b6f77e..8cfcd3360896c 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -41,6 +41,7 @@
#define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
+#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
#define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 917fc16de866a..a7582b097ae67 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -137,7 +137,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
int i)
{
- u32 flags = PIPE_CONTROL_CS_STALL |
+ u32 flags0 = 0;
+ u32 flags1 = PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -148,11 +149,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
PIPE_CONTROL_STORE_DATA_INDEX;
if (invalidate_tlb)
- flags |= PIPE_CONTROL_TLB_INVALIDATE;
+ flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
- flags &= ~mask_flags;
+ flags1 &= ~mask_flags;
- return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
+ if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
+ flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
+
+ return emit_pipe_control(dw, i, flags0, flags1,
+ LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
}
static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev3)
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
` (3 preceding siblings ...)
2025-03-28 21:48 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev2) Patchwork
@ 2025-03-28 22:40 ` Patchwork
4 siblings, 0 replies; 9+ messages in thread
From: Patchwork @ 2025-03-28 22:40 UTC (permalink / raw)
To: Kenneth Graunke; +Cc: intel-xe
== Series Details ==
Series: drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev3)
URL : https://patchwork.freedesktop.org/series/146522/
State : failure
== Summary ==
Address 'kenneth@whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-28 22:40 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-20 10:11 [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Kenneth Graunke
2025-03-20 10:17 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
2025-03-27 22:39 ` [PATCH] " Dong, Zhanjun
2025-03-27 23:49 ` Dong, Zhanjun
2025-03-28 19:18 ` Rodrigo Vivi
2025-03-28 22:34 ` Kenneth Graunke
2025-03-28 22:37 ` [PATCH v2] " Kenneth Graunke
2025-03-28 21:48 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev2) Patchwork
2025-03-28 22:40 ` ✗ LGCI.VerificationFailed: failure for drm/xe: Invalidate L3 read-only cachelines for geometry streams too (rev3) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox