From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org,
"Paulo Zanoni" <paulo.r.zanoni@intel.com>,
"Francois Dugast" <francois.dugast@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH] drm/xe: Fix missing runtime outer protection for ggtt_remove_node
Date: Fri, 31 May 2024 12:31:34 -0400 [thread overview]
Message-ID: <Zln7ZnxjwWjbkABK@intel.com> (raw)
In-Reply-To: <Zln3o08O5Oz+SwQ3@DUT025-TGLU.fm.intel.com>
On Fri, May 31, 2024 at 04:15:31PM +0000, Matthew Brost wrote:
> On Fri, May 31, 2024 at 12:02:05PM -0400, Rodrigo Vivi wrote:
> > Defer the ggtt node removal to a thread if runtime_pm is not active.
> >
> > The ggtt node removal can be called from multiple places, including
> > places where we cannot protect with outer callers and places we are
> > within other locks. So, try to grab the runtime reference if the
> > device is already active, otherwise defer the removal to a separate
> > thread from where we are sure we can wake the device up.
> >
> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > Cc: Francois Dugast <francois.dugast@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_ggtt.c | 56 ++++++++++++++++++++++++++++++++----
> > 1 file changed, 51 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index b01a670fecb8..d63bf1a744b5 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -443,16 +443,14 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> > }
> >
> > -void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > - bool invalidate)
> > +static void ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > + bool invalidate)
> > {
> > struct xe_device *xe = tile_to_xe(ggtt->tile);
> > bool bound;
> > int idx;
> >
> > bound = drm_dev_enter(&xe->drm, &idx);
> > - if (bound)
> > - xe_pm_runtime_get_noresume(xe);
> >
> > mutex_lock(&ggtt->lock);
> > if (bound)
> > @@ -467,10 +465,58 @@ void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > if (invalidate)
> > xe_ggtt_invalidate(ggtt);
> >
> > - xe_pm_runtime_put(xe);
> > drm_dev_exit(idx);
> > }
> >
> > +struct remove_node_work {
> > + struct work_struct work;
> > + struct xe_ggtt *ggtt;
> > + struct drm_mm_node *node;
> > + bool invalidate;
> > +};
> > +
> > +static void ggtt_remove_node_work_func(struct work_struct *work)
> > +{
> > + struct remove_node_work *remove_node = container_of(work, struct remove_node_work, work);
> > + struct xe_device *xe = tile_to_xe(remove_node->ggtt->tile);
> > +
> > + xe_pm_runtime_get(xe);
> > + ggtt_remove_node(remove_node->ggtt, remove_node->node, remove_node->invalidate);
> > + xe_pm_runtime_put(xe);
> > +
> > + kfree(remove_node);
> > +}
> > +
> > +static void ggtt_queue_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > + bool invalidate)
> > +{
> > + struct remove_node_work *remove_node;
> > +
> > + remove_node = kmalloc(sizeof(*remove_node), GFP_KERNEL);
>
> Are we sure this code cannot be in an atomic context or in the path of a
> dma-fence? If either of the former is true, then we cannot allocate
> memory here.
not sure tbh
> Alternatively, we could use GFP_ATOMIC or preallocate
> 'remove_node_work' as part of the initial GGTT node allocation. The
> latter requires a bit more memory, but GGTT allocations are heavyweight
> objects, and using a bit more memory seems fine to me.
I had thought about simply going with GFP_ATOMIC.
The pre-allocation doesn't work. Unless we encapsulate the drm_mm_node
into a xe_mm_node with the removal info in it.
> Also if we do the
> later, maybe just add the node to a list and kick a dedicated work item
> which process all nodes on the list.
The list with the single worker also sounds elegant solution here,
to process all the removals in the same way. But for simplicity,
if GFP_ATOMIC works I would prefer to go with this that minimizes
the thread and it has 1:1 work:item.
>
> > + if (!remove_node)
> > + return;
> > +
> > + INIT_WORK(&remove_node->work, ggtt_remove_node_work_func);
> > + remove_node->ggtt = ggtt;
> > + remove_node->node = node;
> > + remove_node->invalidate = invalidate;
> > +
> > + queue_work(system_unbound_wq, &remove_node->work);
>
> I think we need to be careful with system wq usage. Recently we have had
> two bugs [1][2] exposed in 6.9 in which we deadlocked by using system
> wqs. I think it is likely safer to use a driver dedicated queue here.
ouch! probably good to create a dedicated wq for xe_ggtt so we don't
interfeer with anything else.
>
> Other than these questions, design of patch (try grabbing a PM, if we
> can't defer to worker) LGTM.
>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/133210/
> [2] https://patchwork.freedesktop.org/patch/586095/?series=131904&rev=1
>
> > +}
> > +
> > +void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > + bool invalidate)
> > +{
> > + struct xe_device *xe = tile_to_xe(ggtt->tile);
> > +
> > + if (xe_pm_runtime_get_if_active(xe)) {
> > + ggtt_remove_node(ggtt, node, invalidate);
> > + xe_pm_runtime_put(xe);
> > + } else {
> > + ggtt_queue_remove_node(ggtt, node, invalidate);
> > + }
> > +}
> > +
> > void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > {
> > if (XE_WARN_ON(!bo->ggtt_node.size))
> > --
> > 2.45.1
> >
next prev parent reply other threads:[~2024-05-31 16:32 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-31 16:02 [PATCH] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-05-31 16:08 ` ✓ CI.Patch_applied: success for " Patchwork
2024-05-31 16:08 ` ✓ CI.checkpatch: " Patchwork
2024-05-31 16:09 ` ✓ CI.KUnit: " Patchwork
2024-05-31 16:15 ` [PATCH] " Matthew Brost
2024-05-31 16:31 ` Rodrigo Vivi [this message]
2024-05-31 16:36 ` Matthew Brost
2024-05-31 16:21 ` ✓ CI.Build: success for " Patchwork
2024-05-31 16:21 ` ✗ CI.Hooks: failure " Patchwork
2024-05-31 16:22 ` ✓ CI.checksparse: success " Patchwork
2024-05-31 17:03 ` ✓ CI.BAT: " Patchwork
2024-05-31 18:19 ` ✗ CI.FULL: failure " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2024-05-31 19:53 [PATCH] " Rodrigo Vivi
2024-05-31 20:08 Rodrigo Vivi
2024-06-03 13:25 ` Thomas Hellström
2024-06-03 18:15 ` Matthew Brost
2024-06-03 21:03 ` Thomas Hellström
2024-06-12 17:27 Rodrigo Vivi
2024-06-12 18:05 ` Matthew Brost
2024-06-12 18:20 Rodrigo Vivi
2024-06-13 9:06 ` Matthew Auld
2024-06-13 21:53 Rodrigo Vivi
2024-06-13 22:39 ` Matthew Brost
2024-06-14 14:29 ` Matthew Auld
2024-06-14 18:37 Rodrigo Vivi
2024-06-14 21:42 Rodrigo Vivi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zln7ZnxjwWjbkABK@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=paulo.r.zanoni@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox