From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 968FDCAC5AC for ; Tue, 23 Sep 2025 13:20:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 576DA10E126; Tue, 23 Sep 2025 13:20:53 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="S/3wxcu0"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id E2C7510E126 for ; Tue, 23 Sep 2025 13:20:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1758633652; x=1790169652; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=0j3nfMgVnOn+BhcN9vRJQygn2k2H+xs/4LpPwBNEqwY=; b=S/3wxcu0hmyZn++KQHwLI+YVj7ePms+OC+cev/N8dfei18wYRKc8InXW r0MbFCwMBWdAIvsjVzjFYyN/TDqoSLCWw/QzCQAvQR/1+K1s/nNbTZRkG v9al7c3h/tDQR7A6IH83fa/fj7v+4sdEMTrDaq2GrRJozIMtZEANLbZhU Z3x98WK0ok6OeNLtuZ4LxJ2P8vbD/up7EDbEnl/34GiHb8H6QGmWVse/D APBg7Qxdl1ITO+AD7S+YBkHob5bpwtIlb0HRA/f9iTMP6PuCCXyGgX9ia MfzSXSwhEBLucknmNVrThY+ChFJ78LpH62unTlIHC491DwMqa6mahZhJF Q==; X-CSE-ConnectionGUID: YR4xE2QFRBug4VturHjETA== X-CSE-MsgGUID: /A2uqNhsSkmCKz5SZ67TZQ== X-IronPort-AV: E=McAfee;i="6800,10657,11561"; a="63540435" X-IronPort-AV: E=Sophos;i="6.18,288,1751266800"; d="scan'208";a="63540435" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Sep 2025 06:20:51 -0700 X-CSE-ConnectionGUID: xjE86kTJQLadkZy+FAK1rw== X-CSE-MsgGUID: c6jk4XsdQE223rkrhIQOfw== X-ExtLoop1: 1 Received: from klitkey1-mobl1.ger.corp.intel.com (HELO localhost) ([10.245.245.13]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Sep 2025 06:20:50 -0700 Date: Tue, 23 Sep 2025 16:20:47 +0300 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= To: Tvrtko Ursulin Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com Subject: Re: [PATCH v12 11/13] drm/xe: Force flush system memory AuxCCS framebuffers before scan out Message-ID: References: <20250923100812.88257-1-tvrtko.ursulin@igalia.com> <20250923100812.88257-12-tvrtko.ursulin@igalia.com> <8bec1692-1bec-45ff-a552-10ff8c7420b5@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Patchwork-Hint: comment Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Tue, Sep 23, 2025 at 01:25:58PM +0100, Tvrtko Ursulin wrote: > > On 23/09/2025 13:01, Ville Syrjälä wrote: > > On Tue, Sep 23, 2025 at 11:48:59AM +0100, Tvrtko Ursulin wrote: > >> > >> On 23/09/2025 11:19, Ville Syrjälä wrote: > >>> On Tue, Sep 23, 2025 at 11:08:04AM +0100, Tvrtko Ursulin wrote: > >>>> Even though frame buffer objects are created as write-combined, in > >>>> practice, on top of all the ring buffer flushing, an additional clflush > >>>> seems to be needed before display engine can coherently scan out the > >>>> AuxCCS compressed data without transient artifacts. > >>>> > >>>> If for comparison we look at how i915 handles things (where AuxCCS works > >>>> fine), as it happens it has this same clflush before a frame buffer is > >>>> pinned for display for the first time, courtesy the dynamic tracking of > >>>> the buffer cache mode and setting the latter to uncached before handing > >>>> to display. > >>>> > >>>> Since xe considers the buffer object caching mode as static we can > >>>> implement the same approach by adding a flag telling us if the buffer > >>>> was ever pinned for display and flush on the first pin. Subsequent re-pins > >>>> will not repeat the clflush but so far I have not observed any glitching > >>>> after the first pin. > >>>> > >>>> Signed-off-by: Tvrtko Ursulin > >>>> --- > >>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 14 +++++++++++++- > >>>> drivers/gpu/drm/xe/xe_bo_types.h | 14 +++++++++----- > >>>> 2 files changed, 22 insertions(+), 6 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c > >>>> index d8aa23b8cf14..f247c0da6b9e 100644 > >>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c > >>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c > >>>> @@ -382,6 +382,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb, > >>>> struct xe_bo *bo = gem_to_xe_bo(obj); > >>>> struct xe_validation_ctx ctx; > >>>> struct drm_exec exec; > >>>> + bool first_pin; > >>>> int ret = 0; > >>>> > >>>> if (!vma) > >>>> @@ -422,8 +423,11 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb, > >>>> ret = xe_bo_validate(bo, NULL, true, &exec); > >>>> drm_exec_retry_on_contention(&exec); > >>>> xe_validation_retry_on_oom(&ctx, &ret); > >>>> - if (!ret) > >>>> + if (!ret) { > >>>> ttm_bo_pin(&bo->ttm); > >>>> + first_pin = !bo->display_pin; > >>>> + bo->display_pin = true; > >>>> + } > >>>> } > >>>> if (ret) > >>>> goto err; > >>>> @@ -436,6 +440,14 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb, > >>>> if (ret) > >>>> goto err_unpin; > >>>> > >>>> + /* > >>>> + * Force flush frame buffer data for non-coherent display access when > >>>> + * AuxCCS formats are used. > >>>> + */ > >>>> + if (first_pin && !xe_bo_is_vram(bo) && !xe_bo_is_stolen(bo) && > >>>> + intel_fb_is_ccs_modifier(fb->base.modifier)) > >>>> + drm_clflush_sg(xe_bo_sg(bo)); > >>> > >>> You still haven't found the actual bug that causes the dirty cache? > >> > >> Sadly no. I cross referenced everything numerous times, including > >> workarounds, tried pretty much 1:1 i915 vs xe ring buffer programming, > >> but this extra flush always remains required. I also heard that how > >> flushing of the aux metadata works isn't documented anywhere and i915 > >> does have this flush, by design or accident I don't know. > > > > i915 has the flush for the *whole* bo because it started out as > > cached. As soon as it undergoes a cached->uncached change (either > > due to set_caching ioctl or becoming a display scanout buffer) we > > clflush it and switch the GPU page tables from WB to UC. After > > that the bo stays uncached, and will need no further clflushes, > > assuming that everyone follows the rules: > > > > - CPU accesses via a WB mapping to an uncached bo will explicitly > > clflush before (invalidate) and after (write-back) the access > > - userspace always sets the GPU to use use "consult the PTE" MOCS > > setting for any potential scanout buffer. That way the GPU will > > use WB as long as the bo is cached, and once it becomes > > uncached the GPU also switches to UC accesses > > > > With xe I presume the BO should already start out as UC/WC with > > clean caches, and nothing should be dirtying the caches unless > > there is a real bug somewhere. > > Correct, it is all WC and MOCS are correctly set as WC. > > For example for the media compression flavour that was broken and needed > fixing in b412b144685f ("lib/intel/veboxcopy: Respect buffer MOCS > index"). (For the render compression flavour MOCS was already correct.) > But MOCS is not enough on its own for some reason. > > I also looked into the rendercopy surface state programming and that too > looks okay. In fact it is possible to see how using the wrong MOCS makes > things worse. But it also appears the surface state MOCS only applies to > the main surface, while the aux state is the one which appears to > contain cached/unflushed data. Are you saying that if you start scanning out a compressed buffer, then clean the caches, and then do frontbuffer rendering you get more cache dirt? -- Ville Syrjälä Intel