From: Matthew Brost <matthew.brost@intel.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v7 11/13] drm/xe/hw_engine_group: Resume exec queues suspended by dma fence jobs
Date: Thu, 8 Aug 2024 03:48:13 +0000 [thread overview]
Message-ID: <ZrQ//cHkUKTpolmJ@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20240807162416.1307061-12-francois.dugast@intel.com>
On Wed, Aug 07, 2024 at 06:23:40PM +0200, Francois Dugast wrote:
> Submission of a dma fence job leads to suspending the faulting long
> running exec queues of the hw engine group. Work is queued in the resume
> worker for this group and execution is resumed on the attached exec queues
> in faulting long running mode.
>
> This is another entry point for execution on the hw engine group so the
> execution mode is updated.
>
> v2: Kick the resume worker from exec IOCTL, switch to unordered workqueue,
> destroy it after use (Matt Brost)
>
> v3: Do not resume if no exec queue was suspended (Matt Brost)
>
Same comment here [1] applies, patch itself LGTM though.
Matt
[1] https://patchwork.freedesktop.org/patch/607432/?series=136192&rev=7#comment_1104033
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec.c | 3 ++
> drivers/gpu/drm/xe/xe_hw_engine_group.c | 49 ++++++++++++++++++++++++-
> drivers/gpu/drm/xe/xe_hw_engine_group.h | 1 +
> 3 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 2169fbf766d3..484acfbe0e61 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -324,6 +324,9 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> spin_unlock(&xe->ttm.lru_lock);
> }
>
> + if (mode == EXEC_MODE_LR)
> + xe_hw_engine_group_resume_faulting_lr_jobs(group);
> +
> err_repin:
> if (!xe_vm_in_lr_mode(vm))
> up_read(&vm->userptr.notifier_lock);
> diff --git a/drivers/gpu/drm/xe/xe_hw_engine_group.c b/drivers/gpu/drm/xe/xe_hw_engine_group.c
> index 4781d6d606aa..170355e984ea 100644
> --- a/drivers/gpu/drm/xe/xe_hw_engine_group.c
> +++ b/drivers/gpu/drm/xe/xe_hw_engine_group.c
> @@ -17,9 +17,36 @@ hw_engine_group_free(struct drm_device *drm, void *arg)
> {
> struct xe_hw_engine_group *group = arg;
>
> + destroy_workqueue(group->resume_wq);
> kfree(group);
> }
>
> +static void
> +hw_engine_group_resume_lr_jobs_func(struct work_struct *w)
> +{
> + struct xe_exec_queue *q;
> + struct xe_hw_engine_group *group = container_of(w, struct xe_hw_engine_group, resume_work);
> + int err;
> + enum xe_hw_engine_group_execution_mode previous_mode;
> +
> + err = xe_hw_engine_group_get_mode(group, EXEC_MODE_LR, &previous_mode);
> + if (err)
> + return;
> +
> + if (previous_mode == EXEC_MODE_LR)
> + goto put;
> +
> + list_for_each_entry(q, &group->exec_queue_list, hw_engine_group_link) {
> + if (!xe_vm_in_fault_mode(q->vm))
> + continue;
> +
> + q->ops->resume(q);
> + }
> +
> +put:
> + xe_hw_engine_group_put(group);
> +}
> +
> static struct xe_hw_engine_group *
> hw_engine_group_alloc(struct xe_device *xe)
> {
> @@ -30,7 +57,12 @@ hw_engine_group_alloc(struct xe_device *xe)
> if (!group)
> return ERR_PTR(-ENOMEM);
>
> + group->resume_wq = alloc_workqueue("xe-resume-lr-jobs-wq", 0, 0);
> + if (!group->resume_wq)
> + return ERR_PTR(-ENOMEM);
> +
> init_rwsem(&group->mode_sem);
> + INIT_WORK(&group->resume_work, hw_engine_group_resume_lr_jobs_func);
> INIT_LIST_HEAD(&group->exec_queue_list);
>
> err = drmm_add_action_or_reset(&xe->drm, hw_engine_group_free, group);
> @@ -130,7 +162,7 @@ int xe_hw_engine_group_add_exec_queue(struct xe_hw_engine_group *group, struct x
> if (xe_vm_in_fault_mode(q->vm) && group->cur_mode == EXEC_MODE_DMA_FENCE) {
> q->ops->suspend(q);
> q->ops->suspend_wait(q);
> - queue_work(group->resume_wq, &group->resume_work);
> + xe_hw_engine_group_resume_faulting_lr_jobs(group);
> }
>
> list_add(&q->hw_engine_group_link, &group->exec_queue_list);
> @@ -156,6 +188,16 @@ void xe_hw_engine_group_del_exec_queue(struct xe_hw_engine_group *group, struct
> up_write(&group->mode_sem);
> }
>
> +/**
> + * xe_hw_engine_group_resume_faulting_lr_jobs() - Asynchronously resume the hw engine group's
> + * faulting LR jobs
> + * @group: The hw engine group
> + */
> +void xe_hw_engine_group_resume_faulting_lr_jobs(struct xe_hw_engine_group *group)
> +{
> + queue_work(group->resume_wq, &group->resume_work);
> +}
> +
> /**
> * xe_hw_engine_group_suspend_faulting_lr_jobs() - Suspend the faulting LR jobs of this group
> * @group: The hw engine group
> @@ -163,6 +205,7 @@ void xe_hw_engine_group_del_exec_queue(struct xe_hw_engine_group *group, struct
> static void xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_group *group)
> {
> struct xe_exec_queue *q;
> + bool need_resume = false;
>
> lockdep_assert_held_write(&group->mode_sem);
>
> @@ -170,6 +213,7 @@ static void xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_grou
> if (!xe_vm_in_fault_mode(q->vm))
> continue;
>
> + need_resume = true;
> q->ops->suspend(q);
> }
>
> @@ -179,6 +223,9 @@ static void xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_grou
>
> q->ops->suspend_wait(q);
> }
> +
> + if (need_resume)
> + xe_hw_engine_group_resume_faulting_lr_jobs(group);
> }
>
> /**
> diff --git a/drivers/gpu/drm/xe/xe_hw_engine_group.h b/drivers/gpu/drm/xe/xe_hw_engine_group.h
> index 0f196c0ad98d..797ee81acbf2 100644
> --- a/drivers/gpu/drm/xe/xe_hw_engine_group.h
> +++ b/drivers/gpu/drm/xe/xe_hw_engine_group.h
> @@ -24,5 +24,6 @@ void xe_hw_engine_group_put(struct xe_hw_engine_group *group);
>
> enum xe_hw_engine_group_execution_mode
> xe_hw_engine_group_find_exec_mode(struct xe_exec_queue *q);
> +void xe_hw_engine_group_resume_faulting_lr_jobs(struct xe_hw_engine_group *group);
>
> #endif
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-08-08 3:49 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-07 16:23 [PATCH v7 00/13] Parallel submission of dma fence jobs and LR jobs with shared hardware resources Francois Dugast
2024-08-07 16:23 ` [PATCH v7 01/13] drm/xe/hw_engine_group: Introduce xe_hw_engine_group Francois Dugast
2024-08-08 2:56 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 02/13] drm/xe/guc_submit: Make suspend_wait interruptible Francois Dugast
2024-08-08 3:13 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 03/13] drm/xe/hw_engine_group: Register hw engine group's exec queues Francois Dugast
2024-08-08 3:22 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 04/13] drm/xe/hw_engine_group: Add helper to suspend faulting LR jobs Francois Dugast
2024-08-08 3:10 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 05/13] drm/xe/exec_queue: Remove duplicated code Francois Dugast
2024-08-08 3:50 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 06/13] drm/xe/exec_queue: Prepare last fence for hw engine group resume context Francois Dugast
2024-08-08 3:24 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 07/13] drm/xe/hw_engine_group: Add helper to wait for dma fence jobs Francois Dugast
2024-08-08 3:05 ` Matthew Brost
2024-08-08 16:51 ` Francois Dugast
2024-08-07 16:23 ` [PATCH v7 08/13] drm/xe/hw_engine_group: Ensure safe transition between execution modes Francois Dugast
2024-08-08 3:26 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 09/13] drm/xe/exec: Switch hw engine group execution mode upon job submission Francois Dugast
2024-08-08 3:33 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 10/13] drm/xe/guc_submit: Allow calling guc_exec_queue_resume with pending resume Francois Dugast
2024-08-07 16:23 ` [PATCH v7 11/13] drm/xe/hw_engine_group: Resume exec queues suspended by dma fence jobs Francois Dugast
2024-08-08 3:48 ` Matthew Brost [this message]
2024-08-07 16:23 ` [PATCH v7 12/13] drm/xe/vm: Remove restriction that all VMs must be faulting if one is Francois Dugast
2024-08-08 3:45 ` Matthew Brost
2024-08-07 16:23 ` [PATCH v7 13/13] drm/xe/device: Remove unused xe_device::usm::num_vm_in_* Francois Dugast
2024-08-08 3:45 ` Matthew Brost
2024-08-07 16:30 ` ✓ CI.Patch_applied: success for Parallel submission of dma fence jobs and LR jobs with shared hardware resources (rev7) Patchwork
2024-08-07 16:30 ` ✗ CI.checkpatch: warning " Patchwork
2024-08-07 16:31 ` ✓ CI.KUnit: success " Patchwork
2024-08-07 16:43 ` ✓ CI.Build: " Patchwork
2024-08-07 16:45 ` ✓ CI.Hooks: " Patchwork
2024-08-07 16:46 ` ✓ CI.checksparse: " Patchwork
2024-08-07 17:18 ` ✓ CI.BAT: " Patchwork
2024-08-07 19:04 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrQ//cHkUKTpolmJ@DUT025-TGLU.fm.intel.com \
--to=matthew.brost@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox