All of lore.kernel.org
 help / color / mirror / Atom feed
From: Riana Tauro <riana.tauro@intel.com>
To: Matthew Auld <matthew.william.auld@gmail.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 0/2] Fix deadlock issue on d3cold
Date: Mon, 4 Dec 2023 17:27:38 +0530	[thread overview]
Message-ID: <6a0bbe2f-c990-4be2-8a14-b09e224feead@intel.com> (raw)
In-Reply-To: <CAM0jSHOk7sGu6tHOmnsYryRf7dNSJM5rk_gqV-+6wF4ZvzNxBQ@mail.gmail.com>



On 12/4/2023 4:27 PM, Matthew Auld wrote:
> Hi,
> 
> On Mon, 4 Dec 2023 at 05:18, Riana Tauro <riana.tauro@intel.com> wrote:
>>
>> kernel BOs need to be restored to the same place in VRAM, and with
>> d3cold that means that any VRAM allocation can
>> potentially steal the spot from kernel BOs which then blows up when
>> waking the device up.
>>
>> However if we end up moving xe_device_mem_access_get() much higher
>> up in the hierarchy (start of the gem_create_ioctl) then
>> this is no longer possible.
>>
>> This patch fixes the deadlock issue seen in
>> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/256
>> Also enables d3cold to get CI results
>>
>> Riana Tauro (2):
>>    RFC drm/xe: Move xe_device_mem_access_get to the top of
>>      gem_create_ioctl
>>    CI drm/xe: Enable d3cold
> 
Hi Matthew


> Tried this locally on DG2 and it triggers lockdep splats for me when
> loading the module, so it looks like a lot more is needed before
> turning on d3cold. 
The lockdep splat seen on load when d3cold is enabled has the below 
stack trace

xe_tile_init_noalloc is called before runtime suspend is initialized
using xe_pm_init. Seems to be a false positive

[  150.900520]
                -> #1 (xe_device_mem_access_lockdep_map){+.+.}-{0:0}:
[  150.908078]        lock_acquire+0x169/0x3d0
[  150.912276]        xe_device_mem_access_get+0x53/0x220 [xe]
[  150.918067]        __xe_ggtt_insert_bo_at+0x12a/0x3e0 [xe]
[  150.923760]        __xe_bo_create_locked+0x2f5/0x6e0 [xe]
[  150.929353]        xe_bo_create_pin_map_at+0x42/0x270 [xe]
[  150.935033]        xe_bo_create_pin_map+0x1a/0x20 [xe]
[  150.940366]        xe_sa_bo_manager_init+0xac/0x300 [xe]
[  150.945884]        xe_tile_init_noalloc+0x74/0x110 [xe]
[  150.951316]        xe_device_probe+0x765/0xaa0 [xe]
[  150.956392]        xe_pci_probe+0x53d/0x860 [xe]
[  150.961220]        local_pci_probe+0x7d/0xe0

                -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
[  151.049443]        check_prev_add+0x1ba/0x14a0
[  151.053886]        __lock_acquire+0x203e/0x2ff0
[  151.058413]        lock_acquire+0x169/0x3d0
[  151.062596]        __ww_mutex_lock.constprop.0+0x164/0x1e50
[  151.068161]        ww_mutex_lock+0x42/0x1a0
[  151.072343]        xe_bo_lock+0x2f/0x40 [xe]
[  151.076817]        xe_bo_evict_all+0x57d/0x610 [xe]
[  151.081893]        xe_pm_runtime_suspend+0x38f/0x3b0 [xe]

This does not affect the functionality of d3cold.

However I also had to manually set the
> d3cold.capable=true. Wondering if we have machines in CI that are
> d3cold capable, since BAT results are reporting success?Yeah didn't see this lockdep splat on load in the CI DG2. it also has
display enabled so it won't enter runtime suspend.

Thanks
Riana Tauro

> 
>>
>>   drivers/gpu/drm/xe/xe_bo.c | 26 ++++++++++++++++++++------
>>   drivers/gpu/drm/xe/xe_pm.h |  2 +-
>>   2 files changed, 21 insertions(+), 7 deletions(-)
>>
>> --
>> 2.40.0
>>

      parent reply	other threads:[~2023-12-04 11:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04  5:26 [Intel-xe] [PATCH 0/2] Fix deadlock issue on d3cold Riana Tauro
2023-12-04  5:21 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-12-04  5:21 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-12-04  5:22 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-12-04  5:26 ` [Intel-xe] [PATCH 1/2] RFC drm/xe: Move xe_device_mem_access_get to the top of gem_create_ioctl Riana Tauro
2023-12-04  5:26 ` [Intel-xe] [PATCH 2/2] CI drm/xe: Enable d3cold Riana Tauro
2023-12-04  5:30 ` [Intel-xe] ✓ CI.Build: success for Fix deadlock issue on d3cold Patchwork
2023-12-04  5:30 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-12-04  5:31 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-12-04  6:02 ` [Intel-xe] ✓ CI.BAT: " Patchwork
2023-12-04 10:57 ` [Intel-xe] [PATCH 0/2] " Matthew Auld
2023-12-04 11:19   ` Thomas Hellström
2023-12-04 11:57   ` Riana Tauro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a0bbe2f-c990-4be2-8a14-b09e224feead@intel.com \
    --to=riana.tauro@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.william.auld@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.