Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gem: only allow WB for smem only placements

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gem: only allow WB for smem only placements
Date: Mon, 28 Jun 2021 12:55:57 +0200	[thread overview]
Message-ID: <03645a57-f044-ed25-61b4-a95a40a3f714@linux.intel.com> (raw)
In-Reply-To: <55e1981f-8945-1370-8753-d5749e0d69ba@intel.com>

On 6/28/21 12:20 PM, Matthew Auld wrote:
> On 28/06/2021 10:38, Thomas Hellström wrote:
>> Hi,
>>
>> On 6/28/21 11:12 AM, Matthew Auld wrote:
>>> On 28/06/2021 08:41, Thomas Hellström wrote:
>>>>
>>>> On 6/25/21 2:27 PM, Matthew Auld wrote:
>>>>> We only support single mode and this should be immutable. For smem 
>>>>> only
>>>>> placements on DGFX this should be WB. On DG1 everything is snooped,
>>>>> always, and so should be coherent.
>>>>>
>>>>> I915_GEM_DOMAIN_GTT looks like it's for the aperture which is now 
>>>>> gone
>>>>> for DGFX, so hopefully can also be safely rejected.
>>>>>
>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c |  7 +++++++
>>>>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c   | 10 ++++++++++
>>>>>   2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>> index d0c91697bb22..e3459a524e64 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>> @@ -577,6 +577,13 @@ i915_gem_set_domain_ioctl(struct drm_device 
>>>>> *dev, void *data,
>>>>>           goto out_unpin;
>>>>>       }
>>>>> +    if (IS_DGFX(to_i915(obj->base.dev)) && obj->mm.n_placements 
>>>>> == 1 &&
>>>>> +        i915_gem_object_placements_contain_type(obj, 
>>>>> INTEL_MEMORY_SYSTEM) &&
>>>>> +        read_domains != I915_GEM_DOMAIN_CPU) {
>>>>> +        err = -EINVAL;
>>>>> +        goto out_unpin;
>>>>> +    }
>>>>> +
>>>>>       if (read_domains & I915_GEM_DOMAIN_WC)
>>>>>           err = i915_gem_object_set_to_wc_domain(obj, write_domain);
>>>>>       else if (read_domains & I915_GEM_DOMAIN_GTT)
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> index f3586b36dd53..afc9f3dc38b9 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>>> @@ -673,6 +673,7 @@ __assign_mmap_offset(struct 
>>>>> drm_i915_gem_object *obj,
>>>>>                enum i915_mmap_type mmap_type,
>>>>>                u64 *offset, struct drm_file *file)
>>>>>   {
>>>>> +    struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>>>>       struct i915_mmap_offset *mmo;
>>>>>       if (i915_gem_object_never_mmap(obj))
>>>>> @@ -697,6 +698,15 @@ __assign_mmap_offset(struct 
>>>>> drm_i915_gem_object *obj,
>>>>>           i915_gem_object_placements_contain_type(obj, 
>>>>> INTEL_MEMORY_LOCAL))
>>>>>           return -ENODEV;
>>>>> +    /*
>>>>> +     * For smem only placements on DGFX we need to default to WB. 
>>>>> On DG1
>>>>> +     * everything is snooped always, so should always be coherent.
>>>>> +     */
>>>>> +     if (IS_DGFX(i915) &&
>>>>> +         mmap_type != I915_MMAP_TYPE_WB && obj->mm.n_placements 
>>>>> == 1 &&
>>>>> +         i915_gem_object_placements_contain_type(obj, 
>>>>> INTEL_MEMORY_SYSTEM))
>>>>> +        return -ENODEV;
>>>>> +
>>>>
>>>> Same thing here as in the previous patch.
>>>>
>>>> Also do we need to modify i915_coherent_map_type() to also include 
>>>> HAS_SNOOP()?
>>>>
>>>> While we're at it, that "always_coherent" argument to 
>>>> i915_coherent_map_type() appears scary to me and probably needs 
>>>> some documentation. It seems used for page-tables. Is it because we 
>>>> know those are always snooped?
>>>
>>> Yeah, it's either because the caller has/will mark the pages as 
>>> coherent(which translates to some special ppGTT bits), or we 
>>> manually flush ourselves. In i915_coherent_map_type() we should 
>>> account for DG1 somehow.
>>>
>>> Historically I don't think we enabled snooping by default since it's 
>>> considered slow compared to shared LLC. On DG1 this is a different 
>>> story though.
>>>
>>> Also the pin_map() interface is pretty much only for kernel internal 
>>> objects, so I don't think we have any users which try to map 
>>> userspace objects with that interface. Ok, except for vm_access it 
>>> seems, but that should hopefully be a simple fix to use the correct 
>>> caching mode? We can maybe add some sanity checking there if someone 
>>> tries to map a userspace object?
>> I'm not fully sure that's sufficient, see below.
>>>
>>> For all the other callers of pin_map() which should all be kernel 
>>> internal do we still need to force WB for system memory? By design 
>>> we only support a single mm.mapping there. For lmem we already use 
>>> WC only.
>>
>> We're only allowed to map with the same caching mode as the linear 
>> kernel mapping for discrete. Otherwise things may blow up on 
>> non-intel architectures. We can probably update 
>> 195_ttm_select_tt_caching to always use WB for system pages for 
>> kernel objects, but then we must make sure we don't try to map these WC.
>
> Ok, do you think that should be a separate series? It looks like our 
> internal objects don't use ttm(?). Should it? If so should we make a 
> region for it, or can we just make create_internal use the ttm system 
> region? It should be pretty much the same, except we don't want 
> swapping, clearing or eviction, and ideally we would have some way of 
> marking the pages as volatile(I think we can just keep IS_SHRINKABLE 
> for that).
>
> Or can we keep create_internal as is and then it's just a case of 
> dealing with all the pin_map() callers?
>
For now, I think, the internal objects don't really need to use ttm, and 
if we want to move them over, as you say, it's just a matter of reusing 
the SYSTEM region with suitable flags. I think we should restrict this 
to deal with pin_map() for now, and a separate series would work fine.

Although there seem to be callers, at least in selftests, that use 
pin_map with create_shmem() which maps to TTM system on discrete.

/Thomas

>>
>> /Thomas
>>
>>
>>>
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>       mmo = mmap_offset_attach(obj, mmap_type, file);
>>>>>       if (IS_ERR(mmo))
>>>>>           return PTR_ERR(mmo);
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx