From: "Christian König" <christian.koenig@amd.com>
To: "Matthew Auld" <matthew.auld@intel.com>,
dri-devel@lists.freedesktop.org,
"Intel Graphics" <intel-gfx@lists.freedesktop.org>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage
Date: Wed, 31 Aug 2022 10:16:26 +0200 [thread overview]
Message-ID: <ce090a95-a822-5079-7b86-0c949e98cd64@amd.com> (raw)
In-Reply-To: <cc4c59ad-5d69-b174-5464-bd9896459169@intel.com>
Hi Matthew,
Am 30.08.22 um 12:45 schrieb Matthew Auld:
> Hi,
>
> On 30/08/2022 08:33, Christian König wrote:
>> Hi guys,
>>
>> can we get an rb/acked-by for this i915 change?
>>
>> Basically we are just making sure that the driver doesn't crash when
>> bo->resource is NULL and a bo doesn't have any backing store assigned
>> to it.
>>
>> The Intel CI seems to be happy with this change, so I'm pretty sure
>> it is correct.
>
> It looks like DG2/DG1 (which happen to use TTM here) are no longer
> loading the module:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_107680v1%2Findex.html&data=05%7C01%7Cchristian.koenig%40amd.com%7Caa9bdb0e31064859568708da8a74b899%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974531164663116%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UW8BEnIFXHawhAfLUcknGmE88g2wwAiTLAQ3Y5v1pFA%3D&reserved=0?
>
>
> According to the logs the firmware is failing to load, so perhaps
> related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare
> users. See below.
>
>>
>> Thanks,
>> Christian.
>>
>> Am 24.08.22 um 16:23 schrieb Luben Tuikov:
>>> From: Christian König <christian.koenig@amd.com>
>>>
>>> Make sure we can at least move and alloc TT objects without backing
>>> store.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++----
>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +-
>>> 2 files changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index bc9c432edffe03..45ce2d1f754cc4 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct
>>> ttm_buffer_object *bo,
>>> {
>>> struct drm_i915_private *i915 = container_of(bo->bdev,
>>> typeof(*i915),
>>> bdev);
>>> - struct ttm_resource_manager *man =
>>> - ttm_manager_type(bo->bdev, bo->resource->mem_type);
>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>> unsigned long ccs_pages = 0;
>>> enum ttm_caching caching;
>>> @@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct
>>> ttm_buffer_object *bo,
>>> if (!i915_tt)
>>> return NULL;
>>> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR &&
>>> - man->use_tt)
>>> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource &&
>>> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt)
>>> page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
>
> AFAICT i915 was massively relying on everything starting out in a
> "dummy" system memory resource (ttm_tt), where it then later
> "transitions" to the real resource. And if we need to clear the memory
> we rely on ZERO_ALLOC being set before calling into the
> i915_ttm_move() callback (even when allocating local-memory).
>
> For ttm_bo_type_device objects (userspace stuff) it looks like this
> was previously handled by ttm_bo_validate() always doing:
>
> ret = ttm_tt_create(bo, true); /* clear = true */
>
> Which we would always hit since the resource was always "compatible"
> for the dummy case. But it looks like this is no longer even called,
> since we can now call into ttm_move with bo->resource == NULL, which
> still calls tt_create eventually, which not always with clear = true.
>
> All other objects are then ttm_bo_type_kernel so we don't care about
> clearing, except in the rare case of ALLOC_CPU_CLEAR, which was
> handled as per above in i915_ttm_tt_create(). But I think here
> bo->resource is NULL at the start when first creating the object,
> which will skip setting ZERO_ALLOC, which might explain the CI failure.
>
> The other possible concern (not sure since CI didn't get that far) is
> around ttm_bo_pipeline_gutting(), which now leaves bo->resource =
> NULL. It looks like i915_ttm_shrink() was relying on that to
> unpopulate the ttm_tt. When later calling ttm_bo_validate(),
> i915_ttm_move() would see the SWAPPED flag set on the ttm_tt ,
> re-populate it and then potentially move it back to local-memory.
>
> What are your thoughts here? Also sorry if i915 is making a bit of
> mess here.
First of all thanks a lot for taking a look. We previously got reports
about strange crashes with this patch, but couldn't really reproduce
them (even not by sending them out again). This explains that a bit.
The simplest solution would be to just change the && into a ||, e.g.
when previously either no resource is allocated or the resource requires
to use a tt we clear it.
That should give you the same behavior as before, but I agree that this
is a bit messy.
I've been considering to replacing the ttm_bo_type with a bunch of
behavior flags for a bo. I'm hoping that this will clean things up a bit.
Regards,
Christian.
>
>>> caching = i915_ttm_select_tt_caching(obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index 9a7e50534b84bb..c420d1ab605b6f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo,
>>> bool evict,
>>> bool clear;
>>> int ret;
>>> - if (GEM_WARN_ON(!obj)) {
>>> + if (GEM_WARN_ON(!obj) || !bo->resource) {
>>> ttm_bo_move_null(bo, dst_mem);
>>> return 0;
>>> }
>>
next prev parent reply other threads:[~2022-08-31 8:16 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov
2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov
2022-08-24 14:23 ` [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting Luben Tuikov
2022-08-24 16:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage Patchwork
2022-08-24 16:44 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-08-30 7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König
2022-08-30 10:45 ` Matthew Auld
2022-08-31 8:16 ` Christian König [this message]
2022-08-31 9:26 ` Matthew Auld
2022-08-31 9:38 ` Christian König
2022-08-31 10:37 ` Matthew Auld
2022-08-31 11:03 ` Christian König
2022-08-31 12:06 ` Matthew Auld
2022-08-31 12:35 ` Christian König
2022-08-31 12:50 ` Matthew Auld
2022-08-31 13:34 ` Christian König
2022-08-31 14:53 ` Matthew Auld
2022-08-31 16:32 ` Matthew Auld
2022-09-01 8:00 ` Christian König
2022-09-01 12:52 ` Matthew Auld
2022-09-01 17:48 ` Thomas Hellström
2022-09-01 8:43 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) Patchwork
-- strict thread matches above, loose matches on Subject: below --
2022-07-12 11:46 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Christian König
2022-07-13 10:08 ` Matthew Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce090a95-a822-5079-7b86-0c949e98cd64@amd.com \
--to=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=luben.tuikov@amd.com \
--cc=matthew.auld@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox