From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A9ECEC77B73 for ; Wed, 19 Apr 2023 15:16:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 11A9110E078; Wed, 19 Apr 2023 15:16:05 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id B1F5810E078; Wed, 19 Apr 2023 15:16:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681917363; x=1713453363; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Ke0IkHefRXSpgn9rnn83NxWhsHBT9VGyhzckBC2KGhQ=; b=kYoSIsp8mmYed+SXsgG9aQLw6C2hgY6GAAUCvT6ETI4hZbs4XLYbKXvX GEDfUYbYgUajBZ3/xhEo6Utt6taQ8FNUyE9HXqnUvwMpybB9cVakILyUU Zkrl3acourPnjcyGHVutNIYJaCp0/UPNnp7qaHW7cA4ZmolgNw8bwC9Gx U6+QXkljq9FInH7dELqrXy6V9LP1AShnfwnL9o8VE8IDNlwLLX3jLSg1i /w6M5Ny2tojWXvv2lcK9nuPeUBBUJQIl4oYpzkBE07O6tk8SR8mJdtRHP Z9Jou+t3+ZG0n1AMAG6CdRiM18UIZv4c2GKrOk4guRAAmMrOlu76LgcB5 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="345478500" X-IronPort-AV: E=Sophos;i="5.99,208,1677571200"; d="scan'208";a="345478500" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 08:14:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="1021245954" X-IronPort-AV: E=Sophos;i="5.99,208,1677571200"; d="scan'208";a="1021245954" Received: from nirmoyda-mobl.ger.corp.intel.com (HELO [10.252.32.240]) ([10.252.32.240]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 08:14:37 -0700 Message-ID: <80aa2409-e1b4-706b-91a3-0867a926fca2@linux.intel.com> Date: Wed, 19 Apr 2023 17:14:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Content-Language: en-US To: fei.yang@intel.com, intel-gfx@lists.freedesktop.org References: <20230417062503.1884465-1-fei.yang@intel.com> <20230417062503.1884465-5-fei.yang@intel.com> From: "Das, Nirmoy" In-Reply-To: <20230417062503.1884465-5-fei.yang@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 4/17/2023 8:24 AM, fei.yang@intel.com wrote: > From: Fei Yang > > This patch implements Wa_22016122933. > > In MTL, memory writes initiated by Media tile update the whole > cache line even for partial writes. This creates a coherency > problem for cacheable memory if both CPU and GPU are writing data > to different locations within a single cache line. CTB communication > is impacted by this issue because the head and tail pointers are > adjacent words within a cache line (see struct guc_ct_buffer_desc), > where one is written by GuC and the other by the host. > This patch circumvents the issue by making CPU/GPU shared memory > uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for > CTB which is being updated by both CPU and GuC, mfence instruction > is added to make sure the CPU writes are visible to GPU right away > (flush the write combining buffer). > > While fixing the CTB issue, we noticed some random GSC firmware > loading failure because the share buffers are cacheable (WB) on CPU > side but uncached on GPU side. To fix these issues we need to map > such shared buffers as WC on CPU side. Since such allocations are > not all done through GuC allocator, to avoid too many code changes, > the i915_coherent_map_type() is now hard coded to return WC for MTL. > > BSpec: 45101 > > Signed-off-by: Fei Yang This was a great find :) Acked-by: Nirmoy Das > --- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 ++++- > drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc.c | 7 +++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++++++++++++------ > 4 files changed, 36 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index ecd86130b74f..89fc8ea6bcfc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, > struct drm_i915_gem_object *obj, > bool always_coherent) > { > - if (i915_gem_object_is_lmem(obj)) > + /* > + * Wa_22016122933: always return I915_MAP_WC for MTL > + */ > + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) > return I915_MAP_WC; > if (HAS_LLC(i915) || always_coherent) > return I915_MAP_WB; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > index 1d9fdfb11268..236673c02f9a 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > if (obj->base.size < gsc->fw.size) > return -ENOSPC; > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > + if (IS_METEORLAKE(i915)) > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > + > dst = i915_gem_object_pin_map_unlocked(obj, > i915_coherent_map_type(i915, obj, true)); > if (IS_ERR(dst)) > @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > memset(dst, 0, obj->base.size); > memcpy(dst, src, gsc->fw.size); > > + /* > + * Wa_22016122933: Making sure the data in dst is > + * visible to GSC right away > + */ > + intel_guc_write_barrier(>->uc.guc); > + > i915_gem_object_unpin_map(gsc->fw.obj); > i915_gem_object_unpin_map(obj); > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > index d76508fa3af7..f9bddaa876d9 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > @@ -743,6 +743,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) > if (IS_ERR(obj)) > return ERR_CAST(obj); > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > + if (IS_METEORLAKE(gt->i915)) > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > + > vma = i915_vma_instance(obj, >->ggtt->vm, NULL); > if (IS_ERR(vma)) > goto err; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index 1803a633ed64..98e682b7df07 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct, > } > GEM_BUG_ON(tail > size); > > - /* > - * make sure H2G buffer update and LRC tail update (if this triggering a > - * submission) are visible before updating the descriptor tail > - */ > - intel_guc_write_barrier(ct_to_guc(ct)); > - > /* update local copies */ > ctb->tail = tail; > GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN); > @@ -429,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct, > /* now update descriptor */ > WRITE_ONCE(desc->tail, tail); > > + /* > + * make sure H2G buffer update and LRC tail update (if this triggering a > + * submission) are visible before updating the descriptor tail > + */ > + intel_guc_write_barrier(ct_to_guc(ct)); > + > return 0; > > corrupted: > @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > /* now update descriptor */ > WRITE_ONCE(desc->head, head); > > + /* > + * Wa_22016122933: Making sure the head update is > + * visible to GuC right away > + */ > + intel_guc_write_barrier(ct_to_guc(ct)); > + > return available - len; > > corrupted: