Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com
Subject: Re: [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out
Date: Fri, 26 Sep 2025 10:41:56 +0300	[thread overview]
Message-ID: <aNZDxGoYxVFfRNOu@intel.com> (raw)
In-Reply-To: <cfc54165-b391-4905-a00a-27d2889225e3@igalia.com>

On Thu, Sep 25, 2025 at 11:08:50AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/09/2025 08:24, Tvrtko Ursulin wrote:
> > 
> > On 24/09/2025 23:35, Ville Syrjälä wrote:
> >> On Wed, Sep 24, 2025 at 02:09:42PM +0100, Tvrtko Ursulin wrote:
> >>>
> >>> On 23/09/2025 15:52, Ville Syrjälä wrote:
> >>>> On Tue, Sep 23, 2025 at 03:40:55PM +0100, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> On 23/09/2025 14:20, Ville Syrjälä wrote:
> >>>>>> On Tue, Sep 23, 2025 at 01:25:58PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 23/09/2025 13:01, Ville Syrjälä wrote:
> >>>>>>>> On Tue, Sep 23, 2025 at 11:48:59AM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>
> >>>>>>>>> On 23/09/2025 11:19, Ville Syrjälä wrote:
> >>>>>>>>>> On Tue, Sep 23, 2025 at 11:08:04AM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>>> Even though frame buffer objects are created as write- 
> >>>>>>>>>>> combined, in
> >>>>>>>>>>> practice, on top of all the ring buffer flushing, an 
> >>>>>>>>>>> additional clflush
> >>>>>>>>>>> seems to be needed before display engine can coherently scan 
> >>>>>>>>>>> out the
> >>>>>>>>>>> AuxCCS compressed data without transient artifacts.
> >>>>>>>>>>>
> >>>>>>>>>>> If for comparison we look at how i915 handles things (where 
> >>>>>>>>>>> AuxCCS works
> >>>>>>>>>>> fine), as it happens it has this same clflush before a frame 
> >>>>>>>>>>> buffer is
> >>>>>>>>>>> pinned for display for the first time, courtesy the dynamic 
> >>>>>>>>>>> tracking of
> >>>>>>>>>>> the buffer cache mode and setting the latter to uncached 
> >>>>>>>>>>> before handing
> >>>>>>>>>>> to display.
> >>>>>>>>>>>
> >>>>>>>>>>> Since xe considers the buffer object caching mode as static 
> >>>>>>>>>>> we can
> >>>>>>>>>>> implement the same approach by adding a flag telling us if 
> >>>>>>>>>>> the buffer
> >>>>>>>>>>> was ever pinned for display and flush on the first pin. 
> >>>>>>>>>>> Subsequent re-pins
> >>>>>>>>>>> will not repeat the clflush but so far I have not observed 
> >>>>>>>>>>> any glitching
> >>>>>>>>>>> after the first pin.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>       drivers/gpu/drm/xe/display/xe_fb_pin.c | 14 +++++++++++++-
> >>>>>>>>>>>       drivers/gpu/drm/xe/xe_bo_types.h       | 14 +++++++++-----
> >>>>>>>>>>>       2 files changed, 22 insertions(+), 6 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/ 
> >>>>>>>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> index d8aa23b8cf14..f247c0da6b9e 100644
> >>>>>>>>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> @@ -382,6 +382,7 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>           struct xe_bo *bo = gem_to_xe_bo(obj);
> >>>>>>>>>>>           struct xe_validation_ctx ctx;
> >>>>>>>>>>>           struct drm_exec exec;
> >>>>>>>>>>> +    bool first_pin;
> >>>>>>>>>>>           int ret = 0;
> >>>>>>>>>>>           if (!vma)
> >>>>>>>>>>> @@ -422,8 +423,11 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>                   ret = xe_bo_validate(bo, NULL, true, &exec);
> >>>>>>>>>>>               drm_exec_retry_on_contention(&exec);
> >>>>>>>>>>>               xe_validation_retry_on_oom(&ctx, &ret);
> >>>>>>>>>>> -        if (!ret)
> >>>>>>>>>>> +        if (!ret) {
> >>>>>>>>>>>                   ttm_bo_pin(&bo->ttm);
> >>>>>>>>>>> +            first_pin = !bo->display_pin;
> >>>>>>>>>>> +            bo->display_pin = true;
> >>>>>>>>>>> +        }
> >>>>>>>>>>>           }
> >>>>>>>>>>>           if (ret)
> >>>>>>>>>>>               goto err;
> >>>>>>>>>>> @@ -436,6 +440,14 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>           if (ret)
> >>>>>>>>>>>               goto err_unpin;
> >>>>>>>>>>> +    /*
> >>>>>>>>>>> +     * Force flush frame buffer data for non-coherent 
> >>>>>>>>>>> display access when
> >>>>>>>>>>> +     * AuxCCS formats are used.
> >>>>>>>>>>> +     */
> >>>>>>>>>>> +    if (first_pin && !xe_bo_is_vram(bo) && ! 
> >>>>>>>>>>> xe_bo_is_stolen(bo) &&
> >>>>>>>>>>> +        intel_fb_is_ccs_modifier(fb->base.modifier))
> >>>>>>>>>>> +        drm_clflush_sg(xe_bo_sg(bo));
> >>>>>>>>>>
> >>>>>>>>>> You still haven't found the actual bug that causes the dirty 
> >>>>>>>>>> cache?
> >>>>>>>>>
> >>>>>>>>> Sadly no. I cross referenced everything numerous times, including
> >>>>>>>>> workarounds, tried pretty much 1:1 i915 vs xe ring buffer 
> >>>>>>>>> programming,
> >>>>>>>>> but this extra flush always remains required. I also heard that 
> >>>>>>>>> how
> >>>>>>>>> flushing of the aux metadata works isn't documented anywhere 
> >>>>>>>>> and i915
> >>>>>>>>> does have this flush, by design or accident I don't know.
> >>>>>>>>
> >>>>>>>> i915 has the flush for the *whole* bo because it started out as
> >>>>>>>> cached. As soon as it undergoes a cached->uncached change (either
> >>>>>>>> due to set_caching ioctl or becoming a display scanout buffer) we
> >>>>>>>> clflush it and switch the GPU page tables from WB to UC. After
> >>>>>>>> that the bo stays uncached, and will need no further clflushes,
> >>>>>>>> assuming that everyone follows the rules:
> >>>>>>>>
> >>>>>>>> - CPU accesses via a WB mapping to an uncached bo will explicitly
> >>>>>>>>       clflush before (invalidate) and after (write-back) the access
> >>>>>>>> - userspace always sets the GPU to use use "consult the PTE" MOCS
> >>>>>>>>       setting for any potential scanout buffer. That way the GPU 
> >>>>>>>> will
> >>>>>>>>       use WB as long as the bo is cached, and once it becomes
> >>>>>>>>       uncached the GPU also switches to UC accesses
> >>>>>>>>
> >>>>>>>> With xe I presume the BO should already start out as UC/WC with
> >>>>>>>> clean caches, and nothing should be dirtying the caches unless
> >>>>>>>> there is a real bug somewhere.
> >>>>>>>
> >>>>>>> Correct, it is all WC and MOCS are correctly set as WC.
> >>>>>>>
> >>>>>>> For example for the media compression flavour that was broken and 
> >>>>>>> needed
> >>>>>>> fixing in b412b144685f ("lib/intel/veboxcopy: Respect buffer MOCS
> >>>>>>> index"). (For the render compression flavour MOCS was already 
> >>>>>>> correct.)
> >>>>>>> But MOCS is not enough on its own for some reason.
> >>>>>>>
> >>>>>>> I also looked into the rendercopy surface state programming and 
> >>>>>>> that too
> >>>>>>> looks okay. In fact it is possible to see how using the wrong 
> >>>>>>> MOCS makes
> >>>>>>> things worse. But it also appears the surface state MOCS only 
> >>>>>>> applies to
> >>>>>>> the main surface, while the aux state is the one which appears to
> >>>>>>> contain cached/unflushed data.
> >>>>>>
> >>>>>> Are you saying that if you start scanning out a compressed buffer,
> >>>>>> then clean the caches, and then do frontbuffer rendering you get
> >>>>>> more cache dirt?
> >>>>>
> >>>>> Render with the GPU or CPU? Do you know of any tests or userspaces 
> >>>>> which
> >>>>> do that?
> >>>>
> >>>> On the GPU.
> >>>>
> >>>> Can't think of anything nice off the top of my head.
> >>>>
> >>>> I'd probably just write a quick igt:
> >>>> 1. create a compressed fb
> >>>> 2. flip to it
> >>>> 3. manually clflush the whole thing to be sure it's clean
> >>>> 4. rendercopy some junk around, and keep an eye out for cache dirt
> >>>
> >>> Looks the same dirt on i915 and xe, which would support the theory that
> >>> MOCS settings do not apply to the aux surface.
> >>>
> >>> If you look at these videos frame by frame you can see it:
> >>>
> >>> https://people.igalia.com/tursulin/i915-ccs-cache-dirt.MOV
> >>> https://people.igalia.com/tursulin/xe-ccs-cache-dirt.MOV
> >>>
> >>
> >> That doesn't really look like cache dirt since it doesn't linger.
> >> So I'd say that is more tearing than anything. Tearing can look
> >> rather weird with tiled/compressed buffers.
> > 
> > Some occurrences have two frames of artefacts before the grey square 
> > settles. With a 90Hz display and camera taking 60fps video can it be 
> > tearing?
> > 
> >> To raelly see cache dirt you should tweak the test to
> >> select a wb mocs.
> >>
> >> Oh, and disable CPU c-states. Turns out deep c-states cause extra
> >> cache flushes all the time which will hide the dirt in short order.
> >> Without those extra flushes cache dirt will linger for a long time
> >> on screen.
> > 
> > This I didn't do, will try.
> 
> With no C-states results are indeed much more "fun". Although still the 
> same between i915 and xe:
> 
> https://people.igalia.com/tursulin/i915-ccs-cache-dirt-nocstates.MOV
> https://people.igalia.com/tursulin/xe-ccs-cache-dirt-nocstates.MOV
> 
> Still MOCS 3 so uncached in render state verified for both.

I reverse engineered this a bit and there's definitely a
MOCS issue at play.

First I noticed that if filled the entire MOCS table with
UC the problem went away. I then filled the entire table
with WB and essentially bisected what I need to make UC
to fix it. And I had to repeat that same process starting
from the other end of table.

Looks like there is some undocumented magic in the hardware.

MOCS 61 really is special:
- MOCS 61 UC, others WB, select MOCS 61 -> no corruption

MOCS 0 and 63 are special in other ways:
- MOCS X UC, others WB, select MOCS X -> corruption
- MOCS X+0 UC, others WB, select MOCS X -> corruption
- MOCS X+63 UC, others WB, select MOCS X -> corruption
- MOCS X+0+63 UC, others WB, select MOCS X -> no corruption
  where X != 61

I didn't actually test all values of X there, but I did spot
check a handful of them.

Also, ADL is affected, but TGL doesn't seem to be. Though I
still need to check the situation on TGL a bit more thoroughly.

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2025-09-26  7:42 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 10:07 [PATCH v12 00/13] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 01/13] drm/xe/xelpg: Flush CCS when flushing caches Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 02/13] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
2025-10-01 15:47   ` Rodrigo Vivi
2025-09-23 10:07 ` [PATCH v12 03/13] drm/xe/xelp: Support auxccs invalidation on blitter Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 04/13] drm/xe/xelp: Use MI_FLUSH_DW_CCS on auxccs platforms Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 05/13] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 06/13] drm/xe: Export xe_emit_aux_table_inv Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 07/13] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 08/13] drm/xe: Flush GGTT writes after populating DPT Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 09/13] drm/xe: Handle DPT in system memory Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 10/13] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out Tvrtko Ursulin
2025-09-23 10:19   ` Ville Syrjälä
2025-09-23 10:48     ` Tvrtko Ursulin
2025-09-23 12:01       ` Ville Syrjälä
2025-09-23 12:25         ` Tvrtko Ursulin
2025-09-23 13:20           ` Ville Syrjälä
2025-09-23 14:40             ` Tvrtko Ursulin
2025-09-23 14:52               ` Ville Syrjälä
2025-09-24 13:09                 ` Tvrtko Ursulin
2025-09-24 22:35                   ` Ville Syrjälä
2025-09-25  7:24                     ` Tvrtko Ursulin
2025-09-25 10:08                       ` Tvrtko Ursulin
2025-09-26  7:41                         ` Ville Syrjälä [this message]
2025-09-26 19:35                           ` Ville Syrjälä
2025-10-02 14:01                             ` Tvrtko Ursulin
2025-10-02 14:36                               ` Tvrtko Ursulin
2025-10-02 16:23                               ` Ville Syrjälä
2025-10-02 17:04                                 ` Tvrtko Ursulin
2025-10-02 17:14                                   ` Ville Syrjälä
2025-10-02 22:02                                     ` Ville Syrjälä
2025-09-23 10:44   ` [PATCH v13 " Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 12/13] drm/xe: Do not use stolen memory for DPT on IGFX and AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 13/13] drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe Tvrtko Ursulin
2025-09-23 10:15 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 10:16 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 11:15 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 11:21 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 11:22 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 12:03 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 13:26 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 14:12 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 20:12 ` [PATCH v12 00/13] AuxCCS handling and render compression modifiers Ville Syrjälä
2025-09-24  7:59   ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNZDxGoYxVFfRNOu@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox