All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, sarah.walker@imgtec.com,
	ketil.johnsen@arm.com, Liviu.Dudau@arm.com, mcanal@igalia.com,
	frank.binns@imgtec.com, dri-devel@lists.freedesktop.org,
	christian.koenig@amd.com, luben.tuikov@amd.com,
	donald.robson@imgtec.com, daniel@ffwll.ch, lina@asahilina.net,
	airlied@gmail.com, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Tue, 12 Sep 2023 09:29:53 +0200	[thread overview]
Message-ID: <20230912092953.36a7cdf1@collabora.com> (raw)
In-Reply-To: <20230912021615.2086698-3-matthew.brost@intel.com>

On Mon, 11 Sep 2023 19:16:04 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>   *
>   * @sched: scheduler instance
>   * @ops: backend operations for this scheduler
> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
>   * @hw_submission: number of hw submissions that can be in flight
>   * @hang_limit: number of times to allow a job to hang before dropping it
>   * @timeout: timeout value in jiffies for the scheduler
> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>   */
>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>  		   unsigned hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
>  		   atomic_t *score, const char *name, struct device *dev)
>  {
> -	int i, ret;
> +	int i;
>  	sched->ops = ops;
>  	sched->hw_submission_limit = hw_submission;
>  	sched->name = name;
> +	sched->submit_wq = submit_wq ? : system_wq;

My understanding is that the new design is based on the idea of
splitting the drm_sched_main function into work items that can be
scheduled independently so users/drivers can insert their own
steps/works without requiring changes to drm_sched. This approach is
relying on the properties of ordered workqueues (1 work executed at a
time, FIFO behavior) to guarantee that these steps are still executed
in order, and one at a time.

Given what you're trying to achieve I think we should create an ordered
workqueue instead of using the system_wq when submit_wq is NULL,
otherwise you lose this ordering/serialization guarantee which both
the dedicated kthread and ordered wq provide. It will probably work for
most drivers, but might lead to subtle/hard to spot ordering issues.

>  	sched->timeout = timeout;
>  	sched->timeout_wq = timeout_wq ? : system_wq;
>  	sched->hang_limit = hang_limit;
> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>  
> -	init_waitqueue_head(&sched->wake_up_worker);
>  	init_waitqueue_head(&sched->job_scheduled);
>  	INIT_LIST_HEAD(&sched->pending_list);
>  	spin_lock_init(&sched->job_list_lock);
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
> -
> -	/* Each scheduler will run on a seperate kernel thread */
> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> -	if (IS_ERR(sched->thread)) {
> -		ret = PTR_ERR(sched->thread);
> -		sched->thread = NULL;
> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> -		return ret;
> -	}
> +	sched->pause_submit = false;
>  
>  	sched->ready = true;
>  	return 0;

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, thomas.hellstrom@linux.intel.com,
	sarah.walker@imgtec.com, ketil.johnsen@arm.com,
	Liviu.Dudau@arm.com, mcanal@igalia.com,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	luben.tuikov@amd.com, donald.robson@imgtec.com,
	lina@asahilina.net, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Tue, 12 Sep 2023 09:29:53 +0200	[thread overview]
Message-ID: <20230912092953.36a7cdf1@collabora.com> (raw)
In-Reply-To: <20230912021615.2086698-3-matthew.brost@intel.com>

On Mon, 11 Sep 2023 19:16:04 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>   *
>   * @sched: scheduler instance
>   * @ops: backend operations for this scheduler
> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
>   * @hw_submission: number of hw submissions that can be in flight
>   * @hang_limit: number of times to allow a job to hang before dropping it
>   * @timeout: timeout value in jiffies for the scheduler
> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>   */
>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>  		   unsigned hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
>  		   atomic_t *score, const char *name, struct device *dev)
>  {
> -	int i, ret;
> +	int i;
>  	sched->ops = ops;
>  	sched->hw_submission_limit = hw_submission;
>  	sched->name = name;
> +	sched->submit_wq = submit_wq ? : system_wq;

My understanding is that the new design is based on the idea of
splitting the drm_sched_main function into work items that can be
scheduled independently so users/drivers can insert their own
steps/works without requiring changes to drm_sched. This approach is
relying on the properties of ordered workqueues (1 work executed at a
time, FIFO behavior) to guarantee that these steps are still executed
in order, and one at a time.

Given what you're trying to achieve I think we should create an ordered
workqueue instead of using the system_wq when submit_wq is NULL,
otherwise you lose this ordering/serialization guarantee which both
the dedicated kthread and ordered wq provide. It will probably work for
most drivers, but might lead to subtle/hard to spot ordering issues.

>  	sched->timeout = timeout;
>  	sched->timeout_wq = timeout_wq ? : system_wq;
>  	sched->hang_limit = hang_limit;
> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>  
> -	init_waitqueue_head(&sched->wake_up_worker);
>  	init_waitqueue_head(&sched->job_scheduled);
>  	INIT_LIST_HEAD(&sched->pending_list);
>  	spin_lock_init(&sched->job_list_lock);
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
> -
> -	/* Each scheduler will run on a seperate kernel thread */
> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> -	if (IS_ERR(sched->thread)) {
> -		ret = PTR_ERR(sched->thread);
> -		sched->thread = NULL;
> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> -		return ret;
> -	}
> +	sched->pause_submit = false;
>  
>  	sched->ready = true;
>  	return 0;

  reply	other threads:[~2023-09-12  7:30 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
2023-09-12  2:16 ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  7:29   ` Boris Brezillon [this message]
2023-09-12  7:29     ` Boris Brezillon
2023-09-12 15:02     ` [Intel-xe] " Matthew Brost
2023-09-12 15:02       ` Matthew Brost
2023-09-14  3:41       ` [Intel-xe] " Luben Tuikov
2023-09-14  3:41         ` Luben Tuikov
2023-09-14  3:35   ` [Intel-xe] " Luben Tuikov
2023-09-14  3:35     ` Luben Tuikov
2023-09-16 17:07   ` [Intel-xe] " Danilo Krummrich
2023-09-16 17:07     ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  7:37   ` [Intel-xe] " Boris Brezillon
2023-09-12  7:37     ` Boris Brezillon
2023-09-12 15:14     ` [Intel-xe] " Matthew Brost
2023-09-12 15:14       ` Matthew Brost
2023-09-12 14:11   ` [Intel-xe] " kernel test robot
2023-09-12 14:11     ` kernel test robot
2023-09-12 14:11     ` kernel test robot
2023-09-12 15:17     ` [Intel-xe] " Matthew Brost
2023-09-12 15:17       ` Matthew Brost
2023-09-12 15:17       ` Matthew Brost
2023-09-14  4:18   ` [Intel-xe] " Luben Tuikov
2023-09-14  4:18     ` Luben Tuikov
2023-09-14  4:23     ` [Intel-xe] " Luben Tuikov
2023-09-14  4:23       ` Luben Tuikov
2023-09-14 15:48       ` [Intel-xe] " Matthew Brost
2023-09-14 15:48         ` Matthew Brost
2023-09-14 15:49     ` [Intel-xe] " Matthew Brost
2023-09-14 15:49       ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-13 12:30   ` [Intel-xe] " kernel test robot
2023-09-13 12:30     ` kernel test robot
2023-09-13 12:30     ` kernel test robot
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  8:08   ` [Intel-xe] " Boris Brezillon
2023-09-12  8:08     ` Boris Brezillon
2023-09-12 14:37     ` [Intel-xe] " Matthew Brost
2023-09-12 14:37       ` Matthew Brost
2023-09-12 14:53       ` [Intel-xe] " Boris Brezillon
2023-09-12 14:53         ` Boris Brezillon
2023-09-12 14:55         ` [Intel-xe] " Matthew Brost
2023-09-12 14:55           ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  8:23   ` [Intel-xe] " Boris Brezillon
2023-09-12  8:23     ` Boris Brezillon
2023-09-12 14:50     ` [Intel-xe] " Matthew Brost
2023-09-12 14:50       ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-14  2:56   ` [Intel-xe] " Luben Tuikov
2023-09-14  2:56     ` Luben Tuikov
2023-09-14 17:48     ` [Intel-xe] " Matthew Brost
2023-09-14 17:48       ` Matthew Brost
2023-09-21  3:35       ` [Intel-xe] " Luben Tuikov
2023-09-21  3:35         ` Luben Tuikov
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-14  2:38   ` [Intel-xe] " Luben Tuikov
2023-09-14  2:38     ` Luben Tuikov
2023-09-14 17:36     ` [Intel-xe] " Matthew Brost
2023-09-14 17:36       ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  8:44   ` [Intel-xe] " Boris Brezillon
2023-09-12  8:44     ` Boris Brezillon
2023-09-12  9:57   ` [Intel-xe] " Christian König
2023-09-12  9:57     ` Christian König
2023-09-12 14:47     ` [Intel-xe] " Matthew Brost
2023-09-12 14:47       ` Matthew Brost
2023-09-16 17:52       ` [Intel-xe] " Danilo Krummrich
2023-09-16 17:52         ` Danilo Krummrich
2023-09-18 11:03         ` [Intel-xe] " Christian König
2023-09-18 11:03           ` Christian König
2023-09-18 14:57           ` [Intel-xe] " Danilo Krummrich
2023-09-18 14:57             ` Danilo Krummrich
2023-09-19  5:55             ` [Intel-xe] " Christian König
2023-09-19  5:55               ` Christian König
2023-09-12 10:28   ` [Intel-xe] " Boris Brezillon
2023-09-12 10:28     ` Boris Brezillon
2023-09-12 14:54     ` [Intel-xe] " Matthew Brost
2023-09-12 14:54       ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-13 15:04   ` [Intel-xe] " Christian König
2023-09-13 15:04     ` Christian König
2023-09-14  2:06   ` [Intel-xe] " Luben Tuikov
2023-09-14  2:06     ` Luben Tuikov
2023-09-16 18:06   ` [Intel-xe] " Danilo Krummrich
2023-09-16 18:06     ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler Matthew Brost
2023-09-12  2:16   ` Matthew Brost
2023-09-12  2:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5) Patchwork
2023-09-14  1:45 ` [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Luben Tuikov
2023-09-14  1:45   ` Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230912092953.36a7cdf1@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=frank.binns@imgtec.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=ketil.johnsen@arm.com \
    --cc=lina@asahilina.net \
    --cc=luben.tuikov@amd.com \
    --cc=matthew.brost@intel.com \
    --cc=mcanal@igalia.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.