From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: "Intel-gfx@lists.freedesktop.org"
<Intel-gfx@lists.freedesktop.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
Chris Wilson <chris.p.wilson@linux.intel.com>
Subject: Re: [Intel-gfx] [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc
Date: Mon, 17 Jul 2023 11:55:30 +0100 [thread overview]
Message-ID: <b9afd2fe-426e-7d4a-2768-44c6d2507e29@linux.intel.com> (raw)
In-Reply-To: <20230715002023.GA138014@mdroper-desk1.amr.corp.intel.com>
On 15/07/2023 01:20, Matt Roper wrote:
> On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote:
>>
>> On 14/07/2023 06:43, Yang, Fei wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> According to the comment in i915_gem_object_can_bypass_llc the
>>>> purpose of the function is to return false if the platform/object
>>>> has a caching mode where GPU can bypass the LLC.
>>>>
>>>> So far the only platforms which allegedly can do this are Jasperlake
>>>> and Elkhartlake, and that via MOCS (not PAT).
>>>>
>>>> Instead of blindly assuming that objects where userspace has set the
>>>> PAT index can (bypass the LLC), question is is there a such PAT index
>>>> on a platform. Probably starting with Meteorlake since that one is the
>>>> only one where set PAT extension can be currently used. Or if there is
>>>> a MOCS entry which can achieve the same thing on Meteorlake.
>>>>
>>>> If there is such PAT, now that i915 can be made to understand them
>>>> better, we can make the check more fine grained. Or if there is a MOCS
>>>> entry then we probably should apply the blanket IS_METEORLAKE condition.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>> Cc: Fei Yang <fei.yang@intel.com>
>>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
>>>> 1 file changed, 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> index 33a1e97d18b3..1e34171c4162 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>>> if (!(obj->flags & I915_BO_ALLOC_USER))
>>>> return false;
>>>>
>>>> - /*
>>>> - * Always flush cache for UMD objects at creation time.
>>>> - */
>>>> - if (obj->pat_set_by_user)
>>>
>>> I'm afraid this is going to break MESA. Can we run MESA tests with this patch?
>>
>> I can't, but question is why it would break Mesa which would need a nice
>> comment here?
>>
>> For instance should the check be IS_METEORLAKE?
>>
>> Or should it be "is wb" && "not has 1-way coherent"?
>>
>> Or both?
>>
>> Or, given how Meteorlake does not have LLC, how can anything bypass it
>> there? Or is it about snooping on Meteorlake and how?
>
> I think the "LLC" in the function name is a bit misleading since this is
> really all just about the ability to avoid coherency (which might come
> from an LLC on some platforms or from snooping on others).
>
> The concern is that the CPU writes to the buffer and those writes sit in
> a CPU cache without making it to RAM immediately. If the GPU then
> reads the object with any of the non-coherent PAT settings that were
> introduced in Xe_LPG, it will not snoop the CPU cache and will read old,
> stale data from RAM.
>
> So I think we'd want a condition like ("Xe_LPG or later" && "any non
> coherent PAT"). The WB/WT/UC status of the GPU behavior shouldn't
> matter here, just the coherency setting.
Right, sounds plausible to me. So with this series the new condition in this function would look like this:
i915_gem_object_can_bypass_llc(..)
{
...
if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) != 1)
return true;
("!= 1" in the condition meaning either it is not coherent, or i915 does not know due table being incomplete - like some PAT index on some future platform was forgotten to be defined.)
That would catch any platform with non-coherent WB, as long as the PAT-to-i915-cache-mode tables are correct. It would currently only apply to Meteorlake:
#define MTL_CACHE_MODES \
.cache_modes = { \
[0] = I915_CACHE(WB), \
[1] = I915_CACHE(WT), \
[2] = I915_CACHE(UC), \
[3] = _I915_CACHE(WB, COH1W), \
[4] = __I915_CACHE(WB, BIT(I915_CACHE_FLAG_COH1W) | BIT(I915_CACHE_FLAG_COH2W)), \
}
Or are saying it should apply to UC and WT too somehow?
I'll also try to join sub-threads to Fei's reply here too.
So in terms of the stated issue with _CPU_ access from Mesa seeing stale data (non-zeroed pages) depending on the PAT index - I don't understand that yet. That seems like a completely CPU cache problem space and I do not understand how PAT index gets into the picture.
But the proposed patch from your email Fei looks like it would be covered by the snippet I have in this reply.
Regards,
Tvrtko
next prev parent reply other threads:[~2023-07-17 10:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-13 15:27 [Intel-gfx] [RFC 1/2] drm/i915: Refactor PAT/object cache handling Tvrtko Ursulin
2023-07-13 15:27 ` [Intel-gfx] [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc Tvrtko Ursulin
2023-07-14 5:43 ` Yang, Fei
2023-07-14 10:11 ` Tvrtko Ursulin
2023-07-14 17:38 ` Yang, Fei
2023-07-15 0:20 ` Matt Roper
2023-07-17 10:55 ` Tvrtko Ursulin [this message]
2023-07-13 19:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [RFC,1/2] drm/i915: Refactor PAT/object cache handling Patchwork
2023-07-13 19:38 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-07-13 19:49 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-07-14 0:42 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-07-14 5:36 ` [Intel-gfx] [RFC 1/2] " Yang, Fei
2023-07-14 10:08 ` Tvrtko Ursulin
2023-07-14 18:00 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [RFC,1/2] drm/i915: Refactor PAT/object cache handling (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9afd2fe-426e-7d4a-2768-44c6d2507e29@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=Intel-gfx@lists.freedesktop.org \
--cc=chris.p.wilson@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=matthew.d.roper@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox