Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Intel Graphics" <intel-gfx@lists.freedesktop.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage
Date: Wed, 31 Aug 2022 17:32:33 +0100	[thread overview]
Message-ID: <aa115f44-3771-fd37-8ac5-d831d13021fb@intel.com> (raw)
In-Reply-To: <d96ee168-f298-63fe-058c-cd421e2c49a0@intel.com>

On 31/08/2022 15:53, Matthew Auld wrote:
> On 31/08/2022 14:34, Christian König wrote:
>> Am 31.08.22 um 14:50 schrieb Matthew Auld:
>>> On 31/08/2022 13:35, Christian König wrote:
>>>> Am 31.08.22 um 14:06 schrieb Matthew Auld:
>>>>> On 31/08/2022 12:03, Christian König wrote:
>>>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld:
>>>>>>> [SNIP]
>>>>>>>>>
>>>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping 
>>>>>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty 
>>>>>>>>> placements to force the pipeline-gutting path, which 
>>>>>>>>> importantly unpopulates the ttm_tt for us (since 
>>>>>>>>> ttm_tt_unpopulate is not exported it seems). But AFAICT it 
>>>>>>>>> looks like that will now also nuke the bo->resource, instead of 
>>>>>>>>> just leaving it in system memory. My assumption is that when 
>>>>>>>>> later calling ttm_bo_validate(), it will just do the 
>>>>>>>>> bo_move_null() in i915_ttm_move(), instead of re-populating the 
>>>>>>>>> ttm_tt and then potentially copying it back to local-memory?
>>>>>>>>
>>>>>>>> Well you do ttm_bo_validate() with something like GTT domain, 
>>>>>>>> don't you? This should result in re-populating the tt object, 
>>>>>>>> but I'm not 100% sure if that really works as expected.
>>>>>>>
>>>>>>> AFAIK for domains we either have system memory (which uses ttm_tt 
>>>>>>> and might be shmem underneath) or local-memory. But perhaps i915 
>>>>>>> is doing something wrong here, or abusing TTM in some way. I'm 
>>>>>>> not sure tbh.
>>>>>>>
>>>>>>> Anyway, I think we have two cases here:
>>>>>>>
>>>>>>> - We have some system memory only object. After doing 
>>>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call 
>>>>>>> ttm_bo_validate() at some later point, but here we don't need to 
>>>>>>> copy anything, but it also looks like ttm_bo_handle_move_mem() 
>>>>>>> won't populate the ttm_tt or us either, since mem_type == 
>>>>>>> TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of 
>>>>>>> this, but now we just call ttm_bo_move_null().
>>>>>>>
>>>>>>> - We have a local-memory only object, which was evicted to shmem, 
>>>>>>> and then swapped out by the shrinker like above. The bo->resource 
>>>>>>> is NULL. However this time when calling ttm_bo_validate() we need 
>>>>>>> to actually do a copy in i915_ttm_move(), as well as re-populate 
>>>>>>> the ttm_tt. i915_ttm_move() was taking care of this, but now we 
>>>>>>> just call ttm_bo_move_null().
>>>>>>>
>>>>>>> Perhaps i915 is doing something wrong in the above two cases?
>>>>>>
>>>>>> Mhm, as far as I can see that should still work.
>>>>>>
>>>>>> See previously you should got a transition from SYSTEM->GTT in 
>>>>>> i915_ttm_move() to re-create your backing store. Not you get 
>>>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then 
>>>>>> SYSTEM->GTT.
>>>>>
>>>>> What is GTT here in TTM world? Also I'm not seeing where there is 
>>>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, 
>>>>> i915 is only calling ttm_bo_validate() once when acquiring the 
>>>>> pages, and we don't call it again, unless it was evicted (and 
>>>>> potentially swapped out).
>>>>
>>>> Well GTT means TTM_PL_TT.
>>>>
>>>> And calling it only once is perfectly fine, TTM will internally see 
>>>> that we need two hops to reach TTM_PL_TT and so does the 
>>>> NULL->SYSTEM transition and then SYSTEM->TT.
>>>
>>> Ah interesting, so that's what the multi-hop thing does. But AFAICT 
>>> i915 is not using either TTM_PL_TT or -EMULTIHOP.
>>
>> Mhm, it could be that we then have a problem and the i915 driver only 
>> sees NULL->TT directly. But I really don't know the i915 driver code 
>> good enough to judge that.
>>
>> Can you take a look at this and test it maybe?
> 
> I'll grab a machine and try to see what is going on here.

Well at least the issue with the firmware not loading looks to be fixed now.

So running some eviction + oom tests it looks it now does:

/* eviction kicks in */
i915_ttm_move(bo):  LMEM -> PL_SYSTEM

/* shrinker/oom kicks in at some point */
i915_ttm_shrink(bo):
     bo->resource = NULL, /* pipeline_gutting */
     shmem ttm_tt is unpopulated and pages are correctly swapped out

/* user touches the same object later */
i915_ttm_move(bo):  NULL -> LMEM, bo_move_null()

So seems to incorrectly skip swapping it back in and then copy over to 
lmem. It just allocates directly in lmem.

And previously the last two steps would have been:

i915_ttm_shrink(bo):
     bo->resource = PL_SYSTEM, /* pipeline_gutting */
     shmem ttm_tt is unpopulated and pages are correctly swapped out

i915_ttm_move(bo):
     PL_SYSTEM -> LMEM,
     ttm_tt is repopulated and pages are copied over to lmem

> 
>>
>>>
>>> Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When 
>>> should you use one over the other?
>>
>> TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has 
>> the control over the backing store and can swapout/swapin as it wants it.
>>
>> TTM_PL_TT means that the device is accessing the data (TT stands for 
>> translation table) and so TTM can't swap the backing store in/out.
>>
>> TTM_PL_VRAM well that one is obvious.
> 
> Thanks for the explanation. So it looks like i915 is using TTM_PL_SYSTEM 
> even for device access it seems.
> 
>>
>> Thanks,
>> Christian.
>>
>>>
>>>>
>>>> As far as I can see that should work like it did before.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>>>
>>>>>> If you just validated to SYSTEM memory before I think the tt 
>>>>>> object wouldn't have been populated either.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I've been considering to replacing the ttm_bo_type with a 
>>>>>>>>>> bunch of behavior flags for a bo. I'm hoping that this will 
>>>>>>>>>> clean things up a bit.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>       caching = i915_ttm_select_tt_caching(obj);
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>>>>>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct 
>>>>>>>>>>>>> ttm_buffer_object *bo, bool evict,
>>>>>>>>>>>>>       bool clear;
>>>>>>>>>>>>>       int ret;
>>>>>>>>>>>>> -    if (GEM_WARN_ON(!obj)) {
>>>>>>>>>>>>> +    if (GEM_WARN_ON(!obj) || !bo->resource) {
>>>>>>>>>>>>>           ttm_bo_move_null(bo, dst_mem);
>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>       }
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

  reply	other threads:[~2022-08-31 16:32 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov
2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov
2022-08-24 14:23 ` [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting Luben Tuikov
2022-08-24 16:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage Patchwork
2022-08-24 16:44 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-08-30  7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König
2022-08-30 10:45   ` Matthew Auld
2022-08-31  8:16     ` Christian König
2022-08-31  9:26       ` Matthew Auld
2022-08-31  9:38         ` Christian König
2022-08-31 10:37           ` Matthew Auld
2022-08-31 11:03             ` Christian König
2022-08-31 12:06               ` Matthew Auld
2022-08-31 12:35                 ` Christian König
2022-08-31 12:50                   ` Matthew Auld
2022-08-31 13:34                     ` Christian König
2022-08-31 14:53                       ` Matthew Auld
2022-08-31 16:32                         ` Matthew Auld [this message]
2022-09-01  8:00                           ` Christian König
2022-09-01 12:52                             ` Matthew Auld
2022-09-01 17:48                       ` Thomas Hellström
2022-09-01  8:43 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-07-12 11:46 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Christian König
2022-07-13 10:08 ` Matthew Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa115f44-3771-fd37-8ac5-d831d13021fb@intel.com \
    --to=matthew.auld@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=luben.tuikov@amd.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox