From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com
Subject: Re: [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out
Date: Thu, 25 Sep 2025 08:24:23 +0100 [thread overview]
Message-ID: <7e07606b-d542-4407-a092-476f202cc8e2@igalia.com> (raw)
In-Reply-To: <aNRx66g6y4c-eoZk@intel.com>
On 24/09/2025 23:35, Ville Syrjälä wrote:
> On Wed, Sep 24, 2025 at 02:09:42PM +0100, Tvrtko Ursulin wrote:
>>
>> On 23/09/2025 15:52, Ville Syrjälä wrote:
>>> On Tue, Sep 23, 2025 at 03:40:55PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 23/09/2025 14:20, Ville Syrjälä wrote:
>>>>> On Tue, Sep 23, 2025 at 01:25:58PM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 23/09/2025 13:01, Ville Syrjälä wrote:
>>>>>>> On Tue, Sep 23, 2025 at 11:48:59AM +0100, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 23/09/2025 11:19, Ville Syrjälä wrote:
>>>>>>>>> On Tue, Sep 23, 2025 at 11:08:04AM +0100, Tvrtko Ursulin wrote:
>>>>>>>>>> Even though frame buffer objects are created as write-combined, in
>>>>>>>>>> practice, on top of all the ring buffer flushing, an additional clflush
>>>>>>>>>> seems to be needed before display engine can coherently scan out the
>>>>>>>>>> AuxCCS compressed data without transient artifacts.
>>>>>>>>>>
>>>>>>>>>> If for comparison we look at how i915 handles things (where AuxCCS works
>>>>>>>>>> fine), as it happens it has this same clflush before a frame buffer is
>>>>>>>>>> pinned for display for the first time, courtesy the dynamic tracking of
>>>>>>>>>> the buffer cache mode and setting the latter to uncached before handing
>>>>>>>>>> to display.
>>>>>>>>>>
>>>>>>>>>> Since xe considers the buffer object caching mode as static we can
>>>>>>>>>> implement the same approach by adding a flag telling us if the buffer
>>>>>>>>>> was ever pinned for display and flush on the first pin. Subsequent re-pins
>>>>>>>>>> will not repeat the clflush but so far I have not observed any glitching
>>>>>>>>>> after the first pin.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 14 +++++++++++++-
>>>>>>>>>> drivers/gpu/drm/xe/xe_bo_types.h | 14 +++++++++-----
>>>>>>>>>> 2 files changed, 22 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>>>> index d8aa23b8cf14..f247c0da6b9e 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>>>> @@ -382,6 +382,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>>>>>>>>>> struct xe_bo *bo = gem_to_xe_bo(obj);
>>>>>>>>>> struct xe_validation_ctx ctx;
>>>>>>>>>> struct drm_exec exec;
>>>>>>>>>> + bool first_pin;
>>>>>>>>>> int ret = 0;
>>>>>>>>>>
>>>>>>>>>> if (!vma)
>>>>>>>>>> @@ -422,8 +423,11 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>>>>>>>>>> ret = xe_bo_validate(bo, NULL, true, &exec);
>>>>>>>>>> drm_exec_retry_on_contention(&exec);
>>>>>>>>>> xe_validation_retry_on_oom(&ctx, &ret);
>>>>>>>>>> - if (!ret)
>>>>>>>>>> + if (!ret) {
>>>>>>>>>> ttm_bo_pin(&bo->ttm);
>>>>>>>>>> + first_pin = !bo->display_pin;
>>>>>>>>>> + bo->display_pin = true;
>>>>>>>>>> + }
>>>>>>>>>> }
>>>>>>>>>> if (ret)
>>>>>>>>>> goto err;
>>>>>>>>>> @@ -436,6 +440,14 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>>>>>>>>>> if (ret)
>>>>>>>>>> goto err_unpin;
>>>>>>>>>>
>>>>>>>>>> + /*
>>>>>>>>>> + * Force flush frame buffer data for non-coherent display access when
>>>>>>>>>> + * AuxCCS formats are used.
>>>>>>>>>> + */
>>>>>>>>>> + if (first_pin && !xe_bo_is_vram(bo) && !xe_bo_is_stolen(bo) &&
>>>>>>>>>> + intel_fb_is_ccs_modifier(fb->base.modifier))
>>>>>>>>>> + drm_clflush_sg(xe_bo_sg(bo));
>>>>>>>>>
>>>>>>>>> You still haven't found the actual bug that causes the dirty cache?
>>>>>>>>
>>>>>>>> Sadly no. I cross referenced everything numerous times, including
>>>>>>>> workarounds, tried pretty much 1:1 i915 vs xe ring buffer programming,
>>>>>>>> but this extra flush always remains required. I also heard that how
>>>>>>>> flushing of the aux metadata works isn't documented anywhere and i915
>>>>>>>> does have this flush, by design or accident I don't know.
>>>>>>>
>>>>>>> i915 has the flush for the *whole* bo because it started out as
>>>>>>> cached. As soon as it undergoes a cached->uncached change (either
>>>>>>> due to set_caching ioctl or becoming a display scanout buffer) we
>>>>>>> clflush it and switch the GPU page tables from WB to UC. After
>>>>>>> that the bo stays uncached, and will need no further clflushes,
>>>>>>> assuming that everyone follows the rules:
>>>>>>>
>>>>>>> - CPU accesses via a WB mapping to an uncached bo will explicitly
>>>>>>> clflush before (invalidate) and after (write-back) the access
>>>>>>> - userspace always sets the GPU to use use "consult the PTE" MOCS
>>>>>>> setting for any potential scanout buffer. That way the GPU will
>>>>>>> use WB as long as the bo is cached, and once it becomes
>>>>>>> uncached the GPU also switches to UC accesses
>>>>>>>
>>>>>>> With xe I presume the BO should already start out as UC/WC with
>>>>>>> clean caches, and nothing should be dirtying the caches unless
>>>>>>> there is a real bug somewhere.
>>>>>>
>>>>>> Correct, it is all WC and MOCS are correctly set as WC.
>>>>>>
>>>>>> For example for the media compression flavour that was broken and needed
>>>>>> fixing in b412b144685f ("lib/intel/veboxcopy: Respect buffer MOCS
>>>>>> index"). (For the render compression flavour MOCS was already correct.)
>>>>>> But MOCS is not enough on its own for some reason.
>>>>>>
>>>>>> I also looked into the rendercopy surface state programming and that too
>>>>>> looks okay. In fact it is possible to see how using the wrong MOCS makes
>>>>>> things worse. But it also appears the surface state MOCS only applies to
>>>>>> the main surface, while the aux state is the one which appears to
>>>>>> contain cached/unflushed data.
>>>>>
>>>>> Are you saying that if you start scanning out a compressed buffer,
>>>>> then clean the caches, and then do frontbuffer rendering you get
>>>>> more cache dirt?
>>>>
>>>> Render with the GPU or CPU? Do you know of any tests or userspaces which
>>>> do that?
>>>
>>> On the GPU.
>>>
>>> Can't think of anything nice off the top of my head.
>>>
>>> I'd probably just write a quick igt:
>>> 1. create a compressed fb
>>> 2. flip to it
>>> 3. manually clflush the whole thing to be sure it's clean
>>> 4. rendercopy some junk around, and keep an eye out for cache dirt
>>
>> Looks the same dirt on i915 and xe, which would support the theory that
>> MOCS settings do not apply to the aux surface.
>>
>> If you look at these videos frame by frame you can see it:
>>
>> https://people.igalia.com/tursulin/i915-ccs-cache-dirt.MOV
>> https://people.igalia.com/tursulin/xe-ccs-cache-dirt.MOV
>>
>
> That doesn't really look like cache dirt since it doesn't linger.
> So I'd say that is more tearing than anything. Tearing can look
> rather weird with tiled/compressed buffers.
Some occurrences have two frames of artefacts before the grey square
settles. With a 90Hz display and camera taking 60fps video can it be
tearing?
> To raelly see cache dirt you should tweak the test to
> select a wb mocs.
>
> Oh, and disable CPU c-states. Turns out deep c-states cause extra
> cache flushes all the time which will hide the dirt in short order.
> Without those extra flushes cache dirt will linger for a long time
> on screen.
This I didn't do, will try.
> And also enable_psr=0 + enable_fbc=0 to make sure the display
> engine is really fetching from the memory we just wrote.
>
This I did since the laptop has PSR.
Regards,
Tvrtko
next prev parent reply other threads:[~2025-09-25 7:24 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 10:07 [PATCH v12 00/13] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 01/13] drm/xe/xelpg: Flush CCS when flushing caches Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 02/13] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
2025-10-01 15:47 ` Rodrigo Vivi
2025-09-23 10:07 ` [PATCH v12 03/13] drm/xe/xelp: Support auxccs invalidation on blitter Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 04/13] drm/xe/xelp: Use MI_FLUSH_DW_CCS on auxccs platforms Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 05/13] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 06/13] drm/xe: Export xe_emit_aux_table_inv Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 07/13] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 08/13] drm/xe: Flush GGTT writes after populating DPT Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 09/13] drm/xe: Handle DPT in system memory Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 10/13] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out Tvrtko Ursulin
2025-09-23 10:19 ` Ville Syrjälä
2025-09-23 10:48 ` Tvrtko Ursulin
2025-09-23 12:01 ` Ville Syrjälä
2025-09-23 12:25 ` Tvrtko Ursulin
2025-09-23 13:20 ` Ville Syrjälä
2025-09-23 14:40 ` Tvrtko Ursulin
2025-09-23 14:52 ` Ville Syrjälä
2025-09-24 13:09 ` Tvrtko Ursulin
2025-09-24 22:35 ` Ville Syrjälä
2025-09-25 7:24 ` Tvrtko Ursulin [this message]
2025-09-25 10:08 ` Tvrtko Ursulin
2025-09-26 7:41 ` Ville Syrjälä
2025-09-26 19:35 ` Ville Syrjälä
2025-10-02 14:01 ` Tvrtko Ursulin
2025-10-02 14:36 ` Tvrtko Ursulin
2025-10-02 16:23 ` Ville Syrjälä
2025-10-02 17:04 ` Tvrtko Ursulin
2025-10-02 17:14 ` Ville Syrjälä
2025-10-02 22:02 ` Ville Syrjälä
2025-09-23 10:44 ` [PATCH v13 " Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 12/13] drm/xe: Do not use stolen memory for DPT on IGFX and AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 13/13] drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe Tvrtko Ursulin
2025-09-23 10:15 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 10:16 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 11:15 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 11:21 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 11:22 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 12:03 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 13:26 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 14:12 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 20:12 ` [PATCH v12 00/13] AuxCCS handling and render compression modifiers Ville Syrjälä
2025-09-24 7:59 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7e07606b-d542-4407-a092-476f202cc8e2@igalia.com \
--to=tvrtko.ursulin@igalia.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=kernel-dev@igalia.com \
--cc=ville.syrjala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox