All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com
Subject: Re: [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out
Date: Fri, 26 Sep 2025 10:41:56 +0300	[thread overview]
Message-ID: <aNZDxGoYxVFfRNOu@intel.com> (raw)
In-Reply-To: <cfc54165-b391-4905-a00a-27d2889225e3@igalia.com>

On Thu, Sep 25, 2025 at 11:08:50AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/09/2025 08:24, Tvrtko Ursulin wrote:
> > 
> > On 24/09/2025 23:35, Ville Syrjälä wrote:
> >> On Wed, Sep 24, 2025 at 02:09:42PM +0100, Tvrtko Ursulin wrote:
> >>>
> >>> On 23/09/2025 15:52, Ville Syrjälä wrote:
> >>>> On Tue, Sep 23, 2025 at 03:40:55PM +0100, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> On 23/09/2025 14:20, Ville Syrjälä wrote:
> >>>>>> On Tue, Sep 23, 2025 at 01:25:58PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 23/09/2025 13:01, Ville Syrjälä wrote:
> >>>>>>>> On Tue, Sep 23, 2025 at 11:48:59AM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>
> >>>>>>>>> On 23/09/2025 11:19, Ville Syrjälä wrote:
> >>>>>>>>>> On Tue, Sep 23, 2025 at 11:08:04AM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>>> Even though frame buffer objects are created as write- 
> >>>>>>>>>>> combined, in
> >>>>>>>>>>> practice, on top of all the ring buffer flushing, an 
> >>>>>>>>>>> additional clflush
> >>>>>>>>>>> seems to be needed before display engine can coherently scan 
> >>>>>>>>>>> out the
> >>>>>>>>>>> AuxCCS compressed data without transient artifacts.
> >>>>>>>>>>>
> >>>>>>>>>>> If for comparison we look at how i915 handles things (where 
> >>>>>>>>>>> AuxCCS works
> >>>>>>>>>>> fine), as it happens it has this same clflush before a frame 
> >>>>>>>>>>> buffer is
> >>>>>>>>>>> pinned for display for the first time, courtesy the dynamic 
> >>>>>>>>>>> tracking of
> >>>>>>>>>>> the buffer cache mode and setting the latter to uncached 
> >>>>>>>>>>> before handing
> >>>>>>>>>>> to display.
> >>>>>>>>>>>
> >>>>>>>>>>> Since xe considers the buffer object caching mode as static 
> >>>>>>>>>>> we can
> >>>>>>>>>>> implement the same approach by adding a flag telling us if 
> >>>>>>>>>>> the buffer
> >>>>>>>>>>> was ever pinned for display and flush on the first pin. 
> >>>>>>>>>>> Subsequent re-pins
> >>>>>>>>>>> will not repeat the clflush but so far I have not observed 
> >>>>>>>>>>> any glitching
> >>>>>>>>>>> after the first pin.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>       drivers/gpu/drm/xe/display/xe_fb_pin.c | 14 +++++++++++++-
> >>>>>>>>>>>       drivers/gpu/drm/xe/xe_bo_types.h       | 14 +++++++++-----
> >>>>>>>>>>>       2 files changed, 22 insertions(+), 6 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/ 
> >>>>>>>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> index d8aa23b8cf14..f247c0da6b9e 100644
> >>>>>>>>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>>>>>>>>> @@ -382,6 +382,7 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>           struct xe_bo *bo = gem_to_xe_bo(obj);
> >>>>>>>>>>>           struct xe_validation_ctx ctx;
> >>>>>>>>>>>           struct drm_exec exec;
> >>>>>>>>>>> +    bool first_pin;
> >>>>>>>>>>>           int ret = 0;
> >>>>>>>>>>>           if (!vma)
> >>>>>>>>>>> @@ -422,8 +423,11 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>                   ret = xe_bo_validate(bo, NULL, true, &exec);
> >>>>>>>>>>>               drm_exec_retry_on_contention(&exec);
> >>>>>>>>>>>               xe_validation_retry_on_oom(&ctx, &ret);
> >>>>>>>>>>> -        if (!ret)
> >>>>>>>>>>> +        if (!ret) {
> >>>>>>>>>>>                   ttm_bo_pin(&bo->ttm);
> >>>>>>>>>>> +            first_pin = !bo->display_pin;
> >>>>>>>>>>> +            bo->display_pin = true;
> >>>>>>>>>>> +        }
> >>>>>>>>>>>           }
> >>>>>>>>>>>           if (ret)
> >>>>>>>>>>>               goto err;
> >>>>>>>>>>> @@ -436,6 +440,14 @@ static struct i915_vma 
> >>>>>>>>>>> *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >>>>>>>>>>>           if (ret)
> >>>>>>>>>>>               goto err_unpin;
> >>>>>>>>>>> +    /*
> >>>>>>>>>>> +     * Force flush frame buffer data for non-coherent 
> >>>>>>>>>>> display access when
> >>>>>>>>>>> +     * AuxCCS formats are used.
> >>>>>>>>>>> +     */
> >>>>>>>>>>> +    if (first_pin && !xe_bo_is_vram(bo) && ! 
> >>>>>>>>>>> xe_bo_is_stolen(bo) &&
> >>>>>>>>>>> +        intel_fb_is_ccs_modifier(fb->base.modifier))
> >>>>>>>>>>> +        drm_clflush_sg(xe_bo_sg(bo));
> >>>>>>>>>>
> >>>>>>>>>> You still haven't found the actual bug that causes the dirty 
> >>>>>>>>>> cache?
> >>>>>>>>>
> >>>>>>>>> Sadly no. I cross referenced everything numerous times, including
> >>>>>>>>> workarounds, tried pretty much 1:1 i915 vs xe ring buffer 
> >>>>>>>>> programming,
> >>>>>>>>> but this extra flush always remains required. I also heard that 
> >>>>>>>>> how
> >>>>>>>>> flushing of the aux metadata works isn't documented anywhere 
> >>>>>>>>> and i915
> >>>>>>>>> does have this flush, by design or accident I don't know.
> >>>>>>>>
> >>>>>>>> i915 has the flush for the *whole* bo because it started out as
> >>>>>>>> cached. As soon as it undergoes a cached->uncached change (either
> >>>>>>>> due to set_caching ioctl or becoming a display scanout buffer) we
> >>>>>>>> clflush it and switch the GPU page tables from WB to UC. After
> >>>>>>>> that the bo stays uncached, and will need no further clflushes,
> >>>>>>>> assuming that everyone follows the rules:
> >>>>>>>>
> >>>>>>>> - CPU accesses via a WB mapping to an uncached bo will explicitly
> >>>>>>>>       clflush before (invalidate) and after (write-back) the access
> >>>>>>>> - userspace always sets the GPU to use use "consult the PTE" MOCS
> >>>>>>>>       setting for any potential scanout buffer. That way the GPU 
> >>>>>>>> will
> >>>>>>>>       use WB as long as the bo is cached, and once it becomes
> >>>>>>>>       uncached the GPU also switches to UC accesses
> >>>>>>>>
> >>>>>>>> With xe I presume the BO should already start out as UC/WC with
> >>>>>>>> clean caches, and nothing should be dirtying the caches unless
> >>>>>>>> there is a real bug somewhere.
> >>>>>>>
> >>>>>>> Correct, it is all WC and MOCS are correctly set as WC.
> >>>>>>>
> >>>>>>> For example for the media compression flavour that was broken and 
> >>>>>>> needed
> >>>>>>> fixing in b412b144685f ("lib/intel/veboxcopy: Respect buffer MOCS
> >>>>>>> index"). (For the render compression flavour MOCS was already 
> >>>>>>> correct.)
> >>>>>>> But MOCS is not enough on its own for some reason.
> >>>>>>>
> >>>>>>> I also looked into the rendercopy surface state programming and 
> >>>>>>> that too
> >>>>>>> looks okay. In fact it is possible to see how using the wrong 
> >>>>>>> MOCS makes
> >>>>>>> things worse. But it also appears the surface state MOCS only 
> >>>>>>> applies to
> >>>>>>> the main surface, while the aux state is the one which appears to
> >>>>>>> contain cached/unflushed data.
> >>>>>>
> >>>>>> Are you saying that if you start scanning out a compressed buffer,
> >>>>>> then clean the caches, and then do frontbuffer rendering you get
> >>>>>> more cache dirt?
> >>>>>
> >>>>> Render with the GPU or CPU? Do you know of any tests or userspaces 
> >>>>> which
> >>>>> do that?
> >>>>
> >>>> On the GPU.
> >>>>
> >>>> Can't think of anything nice off the top of my head.
> >>>>
> >>>> I'd probably just write a quick igt:
> >>>> 1. create a compressed fb
> >>>> 2. flip to it
> >>>> 3. manually clflush the whole thing to be sure it's clean
> >>>> 4. rendercopy some junk around, and keep an eye out for cache dirt
> >>>
> >>> Looks the same dirt on i915 and xe, which would support the theory that
> >>> MOCS settings do not apply to the aux surface.
> >>>
> >>> If you look at these videos frame by frame you can see it:
> >>>
> >>> https://people.igalia.com/tursulin/i915-ccs-cache-dirt.MOV
> >>> https://people.igalia.com/tursulin/xe-ccs-cache-dirt.MOV
> >>>
> >>
> >> That doesn't really look like cache dirt since it doesn't linger.
> >> So I'd say that is more tearing than anything. Tearing can look
> >> rather weird with tiled/compressed buffers.
> > 
> > Some occurrences have two frames of artefacts before the grey square 
> > settles. With a 90Hz display and camera taking 60fps video can it be 
> > tearing?
> > 
> >> To raelly see cache dirt you should tweak the test to
> >> select a wb mocs.
> >>
> >> Oh, and disable CPU c-states. Turns out deep c-states cause extra
> >> cache flushes all the time which will hide the dirt in short order.
> >> Without those extra flushes cache dirt will linger for a long time
> >> on screen.
> > 
> > This I didn't do, will try.
> 
> With no C-states results are indeed much more "fun". Although still the 
> same between i915 and xe:
> 
> https://people.igalia.com/tursulin/i915-ccs-cache-dirt-nocstates.MOV
> https://people.igalia.com/tursulin/xe-ccs-cache-dirt-nocstates.MOV
> 
> Still MOCS 3 so uncached in render state verified for both.

I reverse engineered this a bit and there's definitely a
MOCS issue at play.

First I noticed that if filled the entire MOCS table with
UC the problem went away. I then filled the entire table
with WB and essentially bisected what I need to make UC
to fix it. And I had to repeat that same process starting
from the other end of table.

Looks like there is some undocumented magic in the hardware.

MOCS 61 really is special:
- MOCS 61 UC, others WB, select MOCS 61 -> no corruption

MOCS 0 and 63 are special in other ways:
- MOCS X UC, others WB, select MOCS X -> corruption
- MOCS X+0 UC, others WB, select MOCS X -> corruption
- MOCS X+63 UC, others WB, select MOCS X -> corruption
- MOCS X+0+63 UC, others WB, select MOCS X -> no corruption
  where X != 61

I didn't actually test all values of X there, but I did spot
check a handful of them.

Also, ADL is affected, but TGL doesn't seem to be. Though I
still need to check the situation on TGL a bit more thoroughly.

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2025-09-26  7:42 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 10:07 [PATCH v12 00/13] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 01/13] drm/xe/xelpg: Flush CCS when flushing caches Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 02/13] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
2025-10-01 15:47   ` Rodrigo Vivi
2025-09-23 10:07 ` [PATCH v12 03/13] drm/xe/xelp: Support auxccs invalidation on blitter Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 04/13] drm/xe/xelp: Use MI_FLUSH_DW_CCS on auxccs platforms Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 05/13] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
2025-09-23 10:07 ` [PATCH v12 06/13] drm/xe: Export xe_emit_aux_table_inv Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 07/13] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 08/13] drm/xe: Flush GGTT writes after populating DPT Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 09/13] drm/xe: Handle DPT in system memory Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 10/13] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out Tvrtko Ursulin
2025-09-23 10:19   ` Ville Syrjälä
2025-09-23 10:48     ` Tvrtko Ursulin
2025-09-23 12:01       ` Ville Syrjälä
2025-09-23 12:25         ` Tvrtko Ursulin
2025-09-23 13:20           ` Ville Syrjälä
2025-09-23 14:40             ` Tvrtko Ursulin
2025-09-23 14:52               ` Ville Syrjälä
2025-09-24 13:09                 ` Tvrtko Ursulin
2025-09-24 22:35                   ` Ville Syrjälä
2025-09-25  7:24                     ` Tvrtko Ursulin
2025-09-25 10:08                       ` Tvrtko Ursulin
2025-09-26  7:41                         ` Ville Syrjälä [this message]
2025-09-26 19:35                           ` Ville Syrjälä
2025-10-02 14:01                             ` Tvrtko Ursulin
2025-10-02 14:36                               ` Tvrtko Ursulin
2025-10-02 16:23                               ` Ville Syrjälä
2025-10-02 17:04                                 ` Tvrtko Ursulin
2025-10-02 17:14                                   ` Ville Syrjälä
2025-10-02 22:02                                     ` Ville Syrjälä
2025-09-23 10:44   ` [PATCH v13 " Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 12/13] drm/xe: Do not use stolen memory for DPT on IGFX and AuxCCS Tvrtko Ursulin
2025-09-23 10:08 ` [PATCH v12 13/13] drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe Tvrtko Ursulin
2025-09-23 10:15 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 10:16 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 11:15 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 11:21 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 11:22 ` ✓ CI.KUnit: success " Patchwork
2025-09-23 12:03 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-23 13:26 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev15) Patchwork
2025-09-23 14:12 ` ✗ Xe.CI.Full: failure for AuxCCS handling and render compression modifiers (rev16) Patchwork
2025-09-23 20:12 ` [PATCH v12 00/13] AuxCCS handling and render compression modifiers Ville Syrjälä
2025-09-24  7:59   ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNZDxGoYxVFfRNOu@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.