From: Imre Deak <imre.deak@intel.com>
To: "Kandpal, Suraj" <suraj.kandpal@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Shankar, Uma" <uma.shankar@intel.com>
Subject: Re: [PATCH] drm/xe/pm: Move xe_rpm_lockmap_acquire
Date: Wed, 11 Sep 2024 15:05:54 +0300 [thread overview]
Message-ID: <ZuGHotS095lXkdBF@ideak-desk.fi.intel.com> (raw)
In-Reply-To: <SN7PR11MB6750912B7086DD851CB8CC07E39B2@SN7PR11MB6750.namprd11.prod.outlook.com>
On Wed, Sep 11, 2024 at 03:01:17PM +0300, Kandpal, Suraj wrote:
>
>
> > -----Original Message-----
> > From: Deak, Imre <imre.deak@intel.com>
> > Sent: Wednesday, September 11, 2024 5:05 PM
> > To: Kandpal, Suraj <suraj.kandpal@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; Shankar, Uma <uma.shankar@intel.com>
> > Subject: Re: [PATCH] drm/xe/pm: Move xe_rpm_lockmap_acquire
> >
> > On Wed, Sep 11, 2024 at 03:00:25PM +0530, Suraj Kandpal wrote:
> > > Move xe_rpm_lockmap_acquire after display_pm_suspend and resume
> > > funtions to avoid cirular locking dependency because of locks being
> > > taken in intel_fbdev, intel_dp_mst_mgr suspend and resume functions.
> > >
> > > Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
> >
> > The actual problem is that MST is being suspended during runtime suspend. This
> > is not required (adding only unnecessary overhead) but also incorrect as it
> > involves AUX transfers which itself depends on the device being runtime
> > resumed. This is what lockdep is also trying to say.
> >
> > So the solution would be not to suspend/resume MST during runtime
> > suspend/resume.
>
> I think that would also mean the same thing for
> intel_fb_dev_set_suspend not to suspend it during suspend resume Where
> we see
Yes, that should be addressed already by Rodrigo's
[PATCH 4/4] drm/xe/display: Reduce and streamline d3cold display sequence
patch.
> 4> [213.826919]
> -> #3 (xe_rpm_d3cold_map){+.+.}-{0:0}:
> <4> [213.826924] xe_rpm_lockmap_acquire+0x5f/0x70 [xe]
> <4> [213.827102] xe_pm_runtime_get+0x59/0x110 [xe]
> <4> [213.827270] xe_gem_fault+0x85/0x280 [xe]
> <4> [213.827384] __do_fault+0x36/0x140
> <4> [213.827391] do_pte_missing+0x68/0xe10
> <4> [213.827401] __handle_mm_fault+0x7a6/0xe60
> <4> [213.827406] handle_mm_fault+0x12e/0x2a0
> <4> [213.827411] do_user_addr_fault+0x366/0x970
> <4> [213.827418] exc_page_fault+0x87/0x2b0
> <4> [213.827423] asm_exc_page_fault+0x27/0x30
> <4> [213.827428]
> -> #2 (&mm->mmap_lock){++++}-{3:3}:
> <4> [213.827432] __might_fault+0x63/0x90
> <4> [213.827435] _copy_to_user+0x23/0x70
> <4> [213.827441] tty_ioctl+0x846/0x9a0
> <4> [213.827447] __x64_sys_ioctl+0x95/0xd0
> <4> [213.827453] x64_sys_call+0x1205/0x20d0
> <4> [213.827459] do_syscall_64+0x85/0x140
> <4> [213.827464] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [213.827468]
> -> #1 (&tty->winsize_mutex){+.+.}-{3:3}:
> <4> [213.827471] __mutex_lock+0x9a/0xde0
> <4> [213.827476] mutex_lock_nested+0x1b/0x30
> <4> [213.827480] tty_do_resize+0x27/0x90
> <4> [213.827482] vc_do_resize+0x3ee/0x550
> <4> [213.827488] __vc_resize+0x23/0x30
> <4> [213.827493] fbcon_do_set_font+0x140/0x2f0
> <4> [213.827498] fbcon_set_font+0x30a/0x530
> <4> [213.827500] con_font_op+0x284/0x410
> <4> [213.827503] vt_ioctl+0x3dd/0x1580
> <4> [213.827507] tty_ioctl+0x39e/0x9a0
> <4> [213.827510] __x64_sys_ioctl+0x95/0xd0
> <4> [213.827515] x64_sys_call+0x1205/0x20d0
> <4> [213.827518] do_syscall_64+0x85/0x140
> <4> [213.827523] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [213.827526]
> -> #0 (console_lock){+.+.}-{0:0}:
> <4> [213.827530] __lock_acquire+0x126b/0x26f0
> <4> [213.827536] lock_acquire+0xc7/0x2e0
> <4> [213.827539] console_lock+0x54/0xa0
> <4> [213.827545] intel_fbdev_set_suspend+0x169/0x1f0 [xe]
> <4> [213.827729] xe_display_pm_suspend+0x6a/0x260 [xe]
> <4> [213.827945] xe_display_pm_runtime_suspend+0x4b/0x70 [xe]
> <4> [213.828158] xe_pm_runtime_suspend+0xbc/0x3c0 [xe]
> <4> [213.828327] xe_pci_runtime_suspend+0x1f/0xc0 [xe]
> <4> [213.828491] pci_pm_runtime_suspend+0x6a/0x1e0
> <4> [213.828497] __rpm_callback+0x48/0x120
> <4> [213.828505] rpm_callback+0x60/0x70
> <4> [213.828509] rpm_suspend+0x124/0x650
> <4> [213.828515] rpm_idle+0x237/0x3d0
> <4> [213.828520] pm_runtime_work+0x9f/0xd0
> <4> [213.828523] process_scheduled_works+0x39f/0x730
> <4> [213.828527] worker_thread+0x14f/0x2c0
> <4> [213.828529] kthread+0xf5/0x130
> <4> [213.828533] ret_from_fork+0x39/0x60
> <4> [213.828538] ret_from_fork_asm+0x1a/0x30
> <4> [213.828542]
> other info that might help us debug this:
> <4> [213.828543] Chain exists of:
> console_lock --> &mm->mmap_lock --> xe_rpm_d3cold_map
> <4> [213.828548] Possible unsafe locking scenario:
> <4> [213.828549] CPU0 CPU1
> <4> [213.828550] ---- ----
> <4> [213.828551] lock(xe_rpm_d3cold_map);
> <4> [213.828553] lock(&mm->mmap_lock);
> <4> [213.828555] lock(xe_rpm_d3cold_map);
> <4> [213.828557] lock(console_lock);
> <4> [213.828559]
> *** DEADLOCK ***
>
> Regards,
> Suraj Kandpal
> >
> > > ---
> > > drivers/gpu/drm/xe/xe_pm.c | 28 ++++++++++++++--------------
> > > 1 file changed, 14 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > index a3d1509066f7..7f33e553728a 100644
> > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > @@ -363,6 +363,18 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > > xe_pm_write_callback_task(xe, current);
> > >
> > > + /*
> > > + * Applying lock for entire list op as xe_ttm_bo_destroy and
> > xe_bo_move_notify
> > > + * also checks and delets bo entry from user fault list.
> > > + */
> > > + mutex_lock(&xe->mem_access.vram_userfault.lock);
> > > + list_for_each_entry_safe(bo, on,
> > > + &xe->mem_access.vram_userfault.list,
> > vram_userfault_link)
> > > + xe_bo_runtime_pm_release_mmap_offset(bo);
> > > + mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > > +
> > > + xe_display_pm_runtime_suspend(xe);
> > > +
> > > /*
> > > * The actual xe_pm_runtime_put() is always async underneath, so
> > > * exactly where that is called should makes no difference to us.
> > > However @@ -386,18 +398,6 @@ int xe_pm_runtime_suspend(struct
> > xe_device *xe)
> > > */
> > > xe_rpm_lockmap_acquire(xe);
> > >
> > > - /*
> > > - * Applying lock for entire list op as xe_ttm_bo_destroy and
> > xe_bo_move_notify
> > > - * also checks and delets bo entry from user fault list.
> > > - */
> > > - mutex_lock(&xe->mem_access.vram_userfault.lock);
> > > - list_for_each_entry_safe(bo, on,
> > > - &xe->mem_access.vram_userfault.list,
> > vram_userfault_link)
> > > - xe_bo_runtime_pm_release_mmap_offset(bo);
> > > - mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > > -
> > > - xe_display_pm_runtime_suspend(xe);
> > > -
> > > if (xe->d3cold.allowed) {
> > > err = xe_bo_evict_all(xe);
> > > if (err)
> > > @@ -438,8 +438,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > > xe_pm_write_callback_task(xe, current);
> > >
> > > - xe_rpm_lockmap_acquire(xe);
> > > -
> > > if (xe->d3cold.allowed) {
> > > err = xe_pcode_ready(xe, true);
> > > if (err)
> > > @@ -463,6 +461,8 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > >
> > > xe_display_pm_runtime_resume(xe);
> > >
> > > + xe_rpm_lockmap_acquire(xe);
> > > +
> > > if (xe->d3cold.allowed) {
> > > err = xe_bo_restore_user(xe);
> > > if (err)
> > > --
> > > 2.43.2
> > >
next prev parent reply other threads:[~2024-09-11 12:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-11 9:30 [PATCH] drm/xe/pm: Move xe_rpm_lockmap_acquire Suraj Kandpal
2024-09-11 10:41 ` Matthew Auld
2024-09-11 10:58 ` Kandpal, Suraj
2024-09-11 11:05 ` ✓ CI.Patch_applied: success for " Patchwork
2024-09-11 11:05 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-11 11:06 ` ✓ CI.KUnit: success " Patchwork
2024-09-11 11:18 ` ✓ CI.Build: " Patchwork
2024-09-11 11:21 ` ✓ CI.Hooks: " Patchwork
2024-09-11 11:22 ` ✓ CI.checksparse: " Patchwork
2024-09-11 11:34 ` [PATCH] " Imre Deak
2024-09-11 12:01 ` Kandpal, Suraj
2024-09-11 12:05 ` Imre Deak [this message]
2024-09-11 12:20 ` Kandpal, Suraj
2024-09-11 11:39 ` ✗ CI.BAT: failure for " Patchwork
2024-09-11 13:12 ` ✗ CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZuGHotS095lXkdBF@ideak-desk.fi.intel.com \
--to=imre.deak@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=suraj.kandpal@intel.com \
--cc=uma.shankar@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox