From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org,
Matthew Auld <matthew.auld@intel.com>,
intel-gfx@lists.freedesktop.org,
Francois Dugast <francois.dugast@intel.com>
Subject: Re: [PATCH] drm/xe/xe_ggtt: No need to use xe_pm_runtime_get_noresume
Date: Mon, 29 Apr 2024 14:19:06 -0700 [thread overview]
Message-ID: <85plu7yiid.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <Zi_l25fsIrvUO12U@intel.com>
On Mon, 29 Apr 2024 11:24:27 -0700, Rodrigo Vivi wrote:
>
> On Mon, Apr 29, 2024 at 09:29:15AM -0700, Ashutosh Dixit wrote:
> > Switching from xe_device_mem_access_get/put to xe_pm_runtime_get/put
> > results in the following WARNING in xe_oa:
> >
> > [11614.356168] xe 0000:00:02.0: Missing outer runtime PM protection
> > [11614.356187] WARNING: CPU: 1 PID: 13075 at drivers/gpu/drm/xe/xe_pm.c:549 xe_pm_runtime_get_noresume+0x60/0x80 [xe]
> > ...
> > [11614.356377] Call Trace:
> > [11614.356379] <TASK>
> > [11614.356381] ? __warn+0x7e/0x180
> > [11614.356387] ? xe_pm_runtime_get_noresume+0x60/0x80 [xe]
> > [11614.356507] xe_ggtt_remove_node+0x22/0x80 [xe]
> > [11614.356546] xe_ttm_bo_destroy+0xea/0xf0 [xe]
> > [11614.356579] xe_oa_stream_destroy+0xf7/0x120 [xe]
> > [11614.356627] xe_oa_release+0x35/0xc0 [xe]
> > [11614.356673] __fput+0xa1/0x2d0
> > [11614.356679] __x64_sys_close+0x37/0x80
> > [11614.356697] do_syscall_64+0x6d/0x140
> > [11614.356700] entry_SYSCALL_64_after_hwframe+0x71/0x79
> > [11614.356702] RIP: 0033:0x7f2b37314f67
> >
> > There seems to be no reason to use xe_pm_runtime_get_noresume in xe_ggtt
> > functions. Just use xe_pm_runtime_get.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_ggtt.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index 0d541f55b4fc..8548a2eb3b32 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -404,7 +404,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > if (err)
> > return err;
> >
> > - xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile));
> > + xe_pm_runtime_get(tile_to_xe(ggtt->tile));
> > mutex_lock(&ggtt->lock);
> > err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node, bo->size,
> > alignment, 0, start, end, 0);
> > @@ -433,7 +433,7 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > bool invalidate)
> > {
> > - xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile));
> > + xe_pm_runtime_get(tile_to_xe(ggtt->tile));
>
> we cannot do this as this place gets called from locked places.
> This is a deadlock risk.
> We need to ensure to have an outer caller of the xe_pm_runtime_get that will
> ensure to get the device waked first, then then we continue with the _noresume
> variant here that only ensures that we have an extra reference.
>
> These warnings are indeed poping up in multiple places, and this is a good
> thing since we killed the mem_access... at least now we know and have a
> backtrace of the places that are putting our device at risk of deadlock
> and can use this information to now find the right outer place protections.
>
> https://gitlab.freedesktop.org/drm/xe/kernel/issues/1705
OK Rodrigo, thanks for the explanation. I wasn't sure, so I thought I'll
send the patch. Anyway, I'll add an outer call for
xe_pm_runtime_get. Thanks.
>
> >
> > mutex_lock(&ggtt->lock);
> > xe_ggtt_clear(ggtt, node->start, node->size);
> > --
> > 2.41.0
> >
next prev parent reply other threads:[~2024-04-29 21:19 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-29 16:29 [PATCH] drm/xe/xe_ggtt: No need to use xe_pm_runtime_get_noresume Ashutosh Dixit
2024-04-29 16:34 ` ✓ CI.Patch_applied: success for " Patchwork
2024-04-29 16:34 ` ✓ CI.checkpatch: " Patchwork
2024-04-29 16:35 ` ✓ CI.KUnit: " Patchwork
2024-04-29 17:17 ` ✗ Fi.CI.SPARSE: warning " Patchwork
2024-04-29 17:23 ` ✓ Fi.CI.BAT: success " Patchwork
2024-04-29 18:24 ` [PATCH] " Rodrigo Vivi
2024-04-29 21:19 ` Dixit, Ashutosh [this message]
2024-04-29 22:16 ` ✓ Fi.CI.IGT: success for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85plu7yiid.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.