Re: [PATCH v4 11/13] drm/xe/hw_engine_group: Resume exec queues suspended by dma fence jobs

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v4 11/13] drm/xe/hw_engine_group: Resume exec queues suspended by dma fence jobs
Date: Mon, 5 Aug 2024 20:32:55 +0000	[thread overview]
Message-ID: <ZrE2994xDxw/Dg4t@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20240801125748.355078-12-francois.dugast@intel.com>

On Thu, Aug 01, 2024 at 02:56:52PM +0200, Francois Dugast wrote:
> Submission of a dma fence job leads to suspending the faulting long
> running exec queues of the hw engine group. Work is queued in the resume
> worker for this group and execution is resumed on the attached exec queues
> in faulting long running mode.
> 
> This is another entry point for execution on the hw engine group so the
> execution mode is updated.
> 
> v2: Kick the resume worker from exec IOCTL, switch to unordered workqueue,
>     destroy it after use (Matt Brost)
> 
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec.c            |  3 ++
>  drivers/gpu/drm/xe/xe_hw_engine_group.c | 46 ++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_hw_engine_group.h |  1 +
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 2169fbf766d3..484acfbe0e61 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -324,6 +324,9 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		spin_unlock(&xe->ttm.lru_lock);
>  	}
>  
> +	if (mode == EXEC_MODE_LR)
> +		xe_hw_engine_group_resume_faulting_lr_jobs(group);
> +
>  err_repin:
>  	if (!xe_vm_in_lr_mode(vm))
>  		up_read(&vm->userptr.notifier_lock);
> diff --git a/drivers/gpu/drm/xe/xe_hw_engine_group.c b/drivers/gpu/drm/xe/xe_hw_engine_group.c
> index 16635f00c5f0..ca8b06df63f6 100644
> --- a/drivers/gpu/drm/xe/xe_hw_engine_group.c
> +++ b/drivers/gpu/drm/xe/xe_hw_engine_group.c
> @@ -17,9 +17,36 @@ hw_engine_group_free(struct drm_device *drm, void *arg)
>  {
>  	struct xe_hw_engine_group *group = arg;
>  
> +	destroy_workqueue(group->resume_wq);
>  	kfree(group);
>  }
>  
> +static void
> +hw_engine_group_resume_lr_jobs_func(struct work_struct *w)
> +{
> +	struct xe_exec_queue *q;
> +	struct xe_hw_engine_group *group = container_of(w, struct xe_hw_engine_group, resume_work);
> +	int err;
> +	enum xe_hw_engine_group_execution_mode previous_mode;
> +
> +	err = xe_hw_engine_group_get_mode(group, EXEC_MODE_LR, &previous_mode);
> +	if (err)
> +		return;
> +
> +	if (previous_mode == EXEC_MODE_LR)
> +		goto put;
> +
> +	list_for_each_entry(q, &group->exec_queue_list, hw_engine_group_link) {
> +		if (!xe_vm_in_fault_mode(q->vm))
> +			continue;
> +
> +		q->ops->resume(q);
> +	}
> +
> +put:
> +	xe_hw_engine_group_put(group);
> +}
> +
>  static struct xe_hw_engine_group *
>  hw_engine_group_alloc(struct xe_device *xe)
>  {
> @@ -30,7 +57,12 @@ hw_engine_group_alloc(struct xe_device *xe)
>  	if (!group)
>  		return ERR_PTR(-ENOMEM);
>  
> +	group->resume_wq = alloc_workqueue("xe-resume-lr-jobs-wq", 0, 0);
> +	if (!group->resume_wq)
> +		return ERR_PTR(-ENOMEM);
> +
>  	init_rwsem(&group->mode_sem);
> +	INIT_WORK(&group->resume_work, hw_engine_group_resume_lr_jobs_func);
>  	INIT_LIST_HEAD(&group->exec_queue_list);
>  
>  	err = drmm_add_action_or_reset(&xe->drm, hw_engine_group_free, group);
> @@ -125,7 +157,7 @@ int xe_hw_engine_group_add_exec_queue(struct xe_hw_engine_group *group, struct x
>  	if (xe_vm_in_fault_mode(q->vm) && group->cur_mode == EXEC_MODE_DMA_FENCE) {
>  		q->ops->suspend(q);
>  		q->ops->suspend_wait(q);
> -		queue_work(group->resume_wq, &group->resume_work);
> +		xe_hw_engine_group_resume_faulting_lr_jobs(group);
>  	}
>  
>  	list_add(&q->hw_engine_group_link, &group->exec_queue_list);
> @@ -151,6 +183,16 @@ void xe_hw_engine_group_del_exec_queue(struct xe_hw_engine_group *group, struct
>  	up_write(&group->mode_sem);
>  }
>  
> +/**
> + * xe_hw_engine_group_resume_faulting_lr_jobs() - Asynchronously resume the hw engine group's
> + * faulting LR jobs
> + * @group: The hw engine group
> + */
> +void xe_hw_engine_group_resume_faulting_lr_jobs(struct xe_hw_engine_group *group)
> +{
> +	queue_work(group->resume_wq, &group->resume_work);
> +}
> +
>  /**
>   * xe_hw_engine_group_suspend_faulting_lr_jobs() - Suspend the faulting LR jobs of this group
>   * @group: The hw engine group
> @@ -174,6 +216,8 @@ static void xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_grou
>  
>  		q->ops->suspend_wait(q);
>  	}
> +
> +	xe_hw_engine_group_resume_faulting_lr_jobs(group);

You are going to want to skip this call if none of the queues in the
group are in LR mode as this would needlessly queue the worker which
waits on dma-fences under a lock, serializing all dma-fence jovs in the
group. This is source of CI failure [1] too.

Matt

[1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-136192v5/bat-bmg-1/igt@xe_vm@bind-execqueues-independent.html

>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_hw_engine_group.h b/drivers/gpu/drm/xe/xe_hw_engine_group.h
> index 0f196c0ad98d..797ee81acbf2 100644
> --- a/drivers/gpu/drm/xe/xe_hw_engine_group.h
> +++ b/drivers/gpu/drm/xe/xe_hw_engine_group.h
> @@ -24,5 +24,6 @@ void xe_hw_engine_group_put(struct xe_hw_engine_group *group);
>  
>  enum xe_hw_engine_group_execution_mode
>  xe_hw_engine_group_find_exec_mode(struct xe_exec_queue *q);
> +void xe_hw_engine_group_resume_faulting_lr_jobs(struct xe_hw_engine_group *group);
>  
>  #endif
> -- 
> 2.43.0
>

next prev parent reply	other threads:[~2024-08-05 20:34 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-01 12:56 [PATCH v4 00/13] Parallel submission of dma fence jobs and LR jobs with shared hardware resources Francois Dugast
2024-08-01 12:56 ` [PATCH v4 01/13] drm/xe/hw_engine_group: Introduce xe_hw_engine_group Francois Dugast
2024-08-01 12:56 ` [PATCH v4 02/13] drm/xe/guc_submit: Make suspend_wait interruptible Francois Dugast
2024-08-01 12:56 ` [PATCH v4 03/13] drm/xe/hw_engine_group: Register hw engine group's exec queues Francois Dugast
2024-08-05 20:19   ` Matthew Brost
2024-08-01 12:56 ` [PATCH v4 04/13] drm/xe/hw_engine_group: Add helper to suspend faulting LR jobs Francois Dugast
2024-08-01 12:56 ` [PATCH v4 05/13] drm/xe/exec_queue: Remove duplicated code Francois Dugast
2024-08-01 12:56 ` [PATCH v4 06/13] drm/xe/exec_queue: Prepare last fence for hw engine group resume context Francois Dugast
2024-08-05 20:25   ` Matthew Brost
2024-08-06  9:46     ` Francois Dugast
2024-08-06 14:33       ` Matthew Brost
2024-08-01 12:56 ` [PATCH v4 07/13] drm/xe/hw_engine_group: Add helper to wait for dma fence jobs Francois Dugast
2024-08-01 12:56 ` [PATCH v4 08/13] drm/xe/hw_engine_group: Ensure safe transition between execution modes Francois Dugast
2024-08-05 20:29   ` Matthew Brost
2024-08-01 12:56 ` [PATCH v4 09/13] drm/xe/exec: Switch hw engine group execution mode upon job submission Francois Dugast
2024-08-01 12:56 ` [PATCH v4 10/13] drm/xe/guc_submit: Allow calling guc_exec_queue_resume with pending resume Francois Dugast
2024-08-01 12:56 ` [PATCH v4 11/13] drm/xe/hw_engine_group: Resume exec queues suspended by dma fence jobs Francois Dugast
2024-08-05 20:32   ` Matthew Brost [this message]
2024-08-01 12:56 ` [PATCH v4 12/13] drm/xe/vm: Remove restriction that all VMs must be faulting if one is Francois Dugast
2024-08-01 12:56 ` [PATCH v4 13/13] drm/xe/device: Remove unused xe_device::usm::num_vm_in_* Francois Dugast
2024-08-01 13:03 ` ✓ CI.Patch_applied: success for Parallel submission of dma fence jobs and LR jobs with shared hardware resources (rev4) Patchwork
2024-08-01 13:04 ` ✗ CI.checkpatch: warning " Patchwork
2024-08-01 13:05 ` ✓ CI.KUnit: success " Patchwork
2024-08-01 13:17 ` ✓ CI.Build: " Patchwork
2024-08-01 13:19 ` ✗ CI.Hooks: failure " Patchwork
2024-08-01 13:21 ` ✓ CI.checksparse: success " Patchwork
2024-08-01 13:41 ` ✗ CI.BAT: failure " Patchwork
2024-08-01 14:56 ` ✗ CI.FULL: " Patchwork
2024-08-01 16:27 ` ✓ CI.Patch_applied: success for Parallel submission of dma fence jobs and LR jobs with shared hardware resources (rev5) Patchwork
2024-08-01 16:27 ` ✗ CI.checkpatch: warning " Patchwork
2024-08-01 16:28 ` ✓ CI.KUnit: success " Patchwork
2024-08-01 16:40 ` ✓ CI.Build: " Patchwork
2024-08-01 16:42 ` ✗ CI.Hooks: failure " Patchwork
2024-08-01 16:44 ` ✓ CI.checksparse: success " Patchwork
2024-08-01 17:13 ` ✗ CI.BAT: failure " Patchwork
2024-08-01 20:32 ` ✗ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrE2994xDxw/Dg4t@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox