Re: [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, Sarah Walker <sarah.walker@imgtec.com>,
	airlied@linux.ie, lina@asahilina.net,
	Frank Binns <frank.binns@imgtec.com>,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	Donald Robson <donald.robson@imgtec.com>,
	daniel@ffwll.ch, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Fri, 9 Jun 2023 08:58:39 +0200	[thread overview]
Message-ID: <20230609085839.3fc9e237@collabora.com> (raw)
In-Reply-To: <20230404002211.3611376-2-matthew.brost@intel.com>

Hi Matthew,

On Mon,  3 Apr 2023 17:22:02 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> -static int drm_sched_main(void *param)
> +static void drm_sched_main(struct work_struct *w)
>  {
> -	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_run);
>  	int r;
>  
> -	sched_set_fifo_low(current);
> -
> -	while (!kthread_should_stop()) {
> -		struct drm_sched_entity *entity = NULL;
> +	while (!READ_ONCE(sched->pause_run_wq)) {

During an informal discussion on IRC I mentioned that this loop might
become problematic if all the 1:1 entities share the same wq
(especially if it's an ordered wq), and one of them is getting passed a
lot of requests. Just wanted to tell you that we've hit that case in
PowerVR:

Geometry and fragment queues get passed X requests respectively, each
pair of request corresponding to a rendering operation. Because we're
using an ordered wq (which I know we shouldn't do, and I intend to
fix that, but I think it shows the problem exists by making it more
visible), all geometry requests get submitted first, then come the
fragment requests. It turns out the submission time is non-negligible
compared to the geometry job execution time, and geometry jobs end up
generating data for the fragment jobs that are not consumed fast enough
by the fragment job to allow the following geom jobs to re-use the same
portion of memory, leading to on-demand allocation of extra memory
chunks which wouldn't happen if submissions were interleaved.

I know you were not fundamentally opposed to killing this loop and doing
one iteration at a time (you even provided a patch doing that), just
wanted to share my findings to prove this is not just a theoretical
issue, and the lack of fairness in the submission path can cause trouble
in practice.

Best Regards,

Boris

> +		struct drm_sched_entity *entity;
>  		struct drm_sched_fence *s_fence;
>  		struct drm_sched_job *sched_job;
>  		struct dma_fence *fence;
> -		struct drm_sched_job *cleanup_job = NULL;
> +		struct drm_sched_job *cleanup_job;
>  
> -		wait_event_interruptible(sched->wake_up_worker,
> -					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
> -					 (!drm_sched_blocked(sched) &&
> -					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop());
> +		cleanup_job = drm_sched_get_cleanup_job(sched);
> +		entity = drm_sched_select_entity(sched);
>  
>  		if (cleanup_job)
>  			sched->ops->free_job(cleanup_job);
>  
> -		if (!entity)
> +		if (!entity) {
> +			if (!cleanup_job)
> +				break;
>  			continue;
> +		}
>  
>  		sched_job = drm_sched_entity_pop_job(entity);
>  
>  		if (!sched_job) {
>  			complete_all(&entity->entity_idle);
> +			if (!cleanup_job)
> +				break;
>  			continue;
>  		}
>  
> @@ -1055,14 +1083,14 @@ static int drm_sched_main(void *param)
>  					  r);
>  		} else {
>  			if (IS_ERR(fence))
> -				dma_fence_set_error(&s_fence->finished, PTR_ERR(fence));
> +				dma_fence_set_error(&s_fence->finished,
> +						    PTR_ERR(fence));
>  
>  			drm_sched_job_done(sched_job);
>  		}
>  
>  		wake_up(&sched->job_scheduled);
>  	}
> -	return 0;
>  }

next prev parent reply	other threads:[~2023-06-09  6:58 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04  0:22 [Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-06-09  6:58   ` Boris Brezillon [this message]
2023-07-31  0:56     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-04-05 17:37   ` Luben Tuikov
2023-04-05 18:29     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 03/10] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 04/10] drm/sched: Add generic scheduler message interface Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  2:42     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 05/10] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 06/10] drm/sched: Submit job before starting TDR Matthew Brost
2023-05-04  5:23   ` Luben Tuikov
2023-07-31  1:00     ` Matthew Brost
2023-07-31  7:26       ` Boris Brezillon
2023-08-31 19:48         ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 07/10] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  1:09     ` Matthew Brost
2023-08-31 19:52       ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences Matthew Brost
2023-04-04  9:09   ` Christian König
2023-04-04 12:54     ` Thomas Hellström
2023-04-04 13:10       ` Christian König
2023-04-04 18:14         ` Thomas Hellström (Intel)
2023-04-04 19:02           ` Matthew Brost
2023-04-04 19:25             ` Daniel Vetter
2023-04-04 19:48               ` Matthew Brost
2023-04-05 13:09                 ` Daniel Vetter
2023-04-05 23:58                   ` Matthew Brost
2023-04-06  6:32                     ` Daniel Vetter
2023-04-06 16:58                       ` Matthew Brost
2023-04-06 17:09                         ` Daniel Vetter
2023-04-05 12:35               ` Thomas Hellström
2023-04-05 12:39                 ` Christian König
2023-04-05 12:45                   ` Daniel Vetter
2023-04-05 14:08                     ` Christian König
2023-04-04 19:00         ` Daniel Vetter
2023-04-04 20:03           ` Matthew Brost
2023-04-04 20:11             ` Daniel Vetter
2023-04-04 20:19               ` Matthew Brost
2023-04-04 20:31                 ` Daniel Vetter
2023-04-04 20:46                   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 09/10] drm/sched: Support long-running sched entities Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 10/10] drm/syncobj: Warn on long running dma-fences Matthew Brost
2023-04-04  0:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for Xe DRM scheduler and long running workload plans Patchwork
2023-04-04  1:07 ` [Intel-xe] [RFC PATCH 00/10] " Asahi Lina
2023-04-04  1:58   ` Matthew Brost
2023-04-08  7:05     ` Asahi Lina
2023-04-11 14:07       ` Daniel Vetter
2023-04-12  5:47         ` Asahi Lina
2023-04-12  8:18           ` Daniel Vetter
2023-04-17  0:03       ` Matthew Brost
2023-04-04  9:04 ` Christian König
2023-04-04 13:23   ` Matthew Brost
2023-04-04  9:13 ` Christian König
2023-04-04 13:37   ` Matthew Brost
2023-04-05  7:41     ` Christian König
2023-04-05  8:34       ` Daniel Vetter
2023-04-05  8:53         ` Christian König
2023-04-05  9:07           ` Daniel Vetter
2023-04-05  9:57             ` Christian König
2023-04-05 10:12               ` Daniel Vetter
2023-04-06  2:08                 ` Matthew Brost
2023-04-06  6:37                   ` Daniel Vetter
2023-04-06 10:14                     ` Christian König
2023-04-06 10:32                       ` Daniel Vetter
2023-04-04  9:43 ` Tvrtko Ursulin
2023-04-04  9:48   ` Christian König
2023-04-04 13:43     ` Matthew Brost
2023-04-04 13:52   ` Matthew Brost
2023-04-04 17:29     ` Tvrtko Ursulin
2023-04-04 19:07       ` Daniel Vetter
2023-04-04 18:02 ` Zeng, Oak
2023-04-04 18:08   ` Matthew Brost
2023-04-05  7:30     ` Christian König
2023-04-05  8:42       ` Daniel Vetter
2023-04-05 18:06       ` Zeng, Oak
2023-04-05 18:53         ` Matthew Brost
2023-04-06 10:04           ` Christian König
2023-04-07  0:20           ` Zeng, Oak
2023-04-11  9:02             ` Christian König
2023-04-11 14:13               ` Daniel Vetter
2023-04-17  6:47                 ` Christian König
2023-04-17  8:39                   ` Daniel Vetter
2023-04-18 15:10 ` Liviu Dudau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230609085839.3fc9e237@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=airlied@linux.ie \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=frank.binns@imgtec.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lina@asahilina.net \
    --cc=matthew.brost@intel.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox