From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <intel-xe-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id EAF23CA0EF4
	for <intel-xe@archiver.kernel.org>; Tue, 12 Sep 2023 15:13:29 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id B6CA310E42E;
	Tue, 12 Sep 2023 15:13:29 +0000 (UTC)
Received: from madras.collabora.co.uk (madras.collabora.co.uk
 [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab])
 by gabe.freedesktop.org (Postfix) with ESMTPS id C0C2A10E42E;
 Tue, 12 Sep 2023 15:13:27 +0000 (UTC)
Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (No client certificate requested) (Authenticated sender: bbrezillon)
 by madras.collabora.co.uk (Postfix) with ESMTPSA id 21ABE6607321;
 Tue, 12 Sep 2023 16:13:26 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com;
 s=mail; t=1694531606;
 bh=SxXY+Ho9UJZ147/js2Q/IttP2ZD4VJtsVzP/Ke8Ai9w=;
 h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
 b=J6j3WgLM1JrFySh8imc7Rx1erfJoEFKk0zF457nDfeNBsm3iDVuVV6AkkqyxpVFiQ
 ftynl+sK3wxrByiSqGyTjtO27WnaQ+zejQuiKZXutNNIlpTbd4REB7VyAGetl25pkZ
 Q7XfPxEJoYD/4muOmnGFe10xWHOy1oItVXQNkDXaZ02FnCsGJ0e+KIPVfd3e/451wS
 xnLxnbnTGeYmHoPuzi7h13LOhJyUpvK8R6mj+q1kdhpFabxTRvZCFr0hcaWJ8GXOks
 +idiE2y9bvDtFxUchJuKuQkja/rHFLGuMVcYtYM9Os08VEUiADzLIk4uGDGh2r409T
 HWvthsI8V20HQ==
Date: Tue, 12 Sep 2023 17:13:22 +0200
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Danilo Krummrich <dakr@redhat.com>
Message-ID: <20230912171322.6c47a973@collabora.com>
In-Reply-To: <20230912164909.018d13c8@collabora.com>
References: <20230811023137.659037-1-matthew.brost@intel.com>
 <20230811023137.659037-2-matthew.brost@intel.com>
 <69b648f8-c6b3-5846-0d03-05a380d010d8@redhat.com>
 <069e6cd0-abd3-fdd9-217d-173e8f8e1d29@amd.com>
 <b9a6493c-243b-1078-afbc-d9270cac904a@redhat.com>
 <982800c1-e7d3-f276-51d0-1a431f92eacb@amd.com>
 <5fdf7d59-3323-24b5-a35a-bd60b06b4ce5@redhat.com>
 <0bf839df-db7f-41fa-8b34-59792d2ba8be@amd.com>
 <e8fa305a-0ac8-ece7-efeb-f9cec2892d44@redhat.com>
 <20230912162838.34135959@collabora.com>
 <ef8f493b-212a-3a97-bb37-28d6fb2623a4@redhat.com>
 <20230912164909.018d13c8@collabora.com>
Organization: Collabora
X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to
 use a work queue rather than kthread
X-BeenThere: intel-xe@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel Xe graphics driver <intel-xe.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-xe>
List-Post: <mailto:intel-xe@lists.freedesktop.org>
List-Help: <mailto:intel-xe-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=subscribe>
Cc: robdclark@chromium.org, sarah.walker@imgtec.com, ketil.johnsen@arm.com,
 lina@asahilina.net, Liviu.Dudau@arm.com, dri-devel@lists.freedesktop.org,
 intel-xe@lists.freedesktop.org, luben.tuikov@amd.com, donald.robson@imgtec.com,
 Christian =?UTF-8?B?S8O2bmln?= <christian.koenig@amd.com>,
 faith.ekstrand@collabora.com
Errors-To: intel-xe-bounces@lists.freedesktop.org
Sender: "Intel-xe" <intel-xe-bounces@lists.freedesktop.org>

On Tue, 12 Sep 2023 16:49:09 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Tue, 12 Sep 2023 16:33:01 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
> > On 9/12/23 16:28, Boris Brezillon wrote:  
> > > On Thu, 17 Aug 2023 13:13:31 +0200
> > > Danilo Krummrich <dakr@redhat.com> wrote:
> > >     
> > >> I think that's a misunderstanding. I'm not trying to say that it is
> > >> *always* beneficial to fill up the ring as much as possible. But I think
> > >> it is under certain circumstances, exactly those circumstances I
> > >> described for Nouveau.
> > >>
> > >> As mentioned, in Nouveau the size of a job is only really limited by the
> > >> ring size, which means that one job can (but does not necessarily) fill
> > >> up the whole ring. We both agree that this is inefficient, because it
> > >> potentially results into the HW run dry due to hw_submission_limit == 1.
> > >>
> > >> I recognize you said that one should define hw_submission_limit and
> > >> adjust the other parts of the equation accordingly, the options I see are:
> > >>
> > >> (1) Increase the ring size while keeping the maximum job size.
> > >> (2) Decrease the maximum job size while keeping the ring size.
> > >> (3) Let the scheduler track the actual job size rather than the maximum
> > >> job size.
> > >>
> > >> (1) results into potentially wasted ring memory, because we're not
> > >> always reaching the maximum job size, but the scheduler assumes so.
> > >>
> > >> (2) results into more IOCTLs from userspace for the same amount of IBs
> > >> and more jobs result into more memory allocations and more work being
> > >> submitted to the workqueue (with Matt's patches).
> > >>
> > >> (3) doesn't seem to have any of those draw backs.
> > >>
> > >> What would be your take on that?
> > >>
> > >> Actually, if none of the other drivers is interested into a more precise
> > >> way of keeping track of the ring utilization, I'd be totally fine to do
> > >> it in a driver specific way. However, unfortunately I don't see how this
> > >> would be possible.    
> > > 
> > > I'm not entirely sure, but I think PowerVR is pretty close to your
> > > description: jobs size is dynamic size, and the ring buffer size is
> > > picked by the driver at queue initialization time. What we did was to
> > > set hw_submission_limit to an arbitrarily high value of 64k (we could
> > > have used something like ringbuf_size/min_job_size instead), and then
> > > have the control flow implemented with ->prepare_job() [1] (CCCB is the
> > > PowerVR ring buffer). This allows us to maximize ring buffer utilization
> > > while still allowing dynamic-size jobs.    
> > 
> > I guess this would work, but I think it would be better to bake this in,
> > especially if more drivers do have this need. I already have an
> > implementation [1] for doing that in the scheduler. My plan was to push
> > that as soon as Matt sends out V3.
> > 
> > [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95  
> 
> PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
> native fences waits are passed to the FW, and those add to the job size.
> When we know our job is ready for execution (all non-native deps are
> signaled), we evict already signaled native-deps (or native fences) to
> shrink the job size further more, but that's something we need to
> calculate late if we want the job size to be minimal. Of course, we can
> always over-estimate the job size, but if we go for a full-blown
> drm_sched integration, I wonder if it wouldn't be preferable to have a
> ->get_job_size() callback returning the number of units needed by job,  
> and have the core pick 1 when the hook is not implemented.

FWIW, I think last time I asked how to do that, I've been pointed to
->prepare_job() by someone  (don't remember if it was Daniel or
Christian), hence the PowerVR implementation. If that's still the
preferred solution, there's some opportunity to have a generic layer to
automate ringbuf utilization tracking and some helpers to prepare
wait_for_ringbuf dma_fences that drivers could return from
->prepare_job() (those fences would then be signaled when the driver
calls drm_ringbuf_job_done() and the next job waiting for ringbuf space
now fits in the ringbuf).

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A691CA0EF5
	for <dri-devel@archiver.kernel.org>; Tue, 12 Sep 2023 15:13:31 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 7F0A410E432;
	Tue, 12 Sep 2023 15:13:30 +0000 (UTC)
Received: from madras.collabora.co.uk (madras.collabora.co.uk
 [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab])
 by gabe.freedesktop.org (Postfix) with ESMTPS id C0C2A10E42E;
 Tue, 12 Sep 2023 15:13:27 +0000 (UTC)
Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (No client certificate requested) (Authenticated sender: bbrezillon)
 by madras.collabora.co.uk (Postfix) with ESMTPSA id 21ABE6607321;
 Tue, 12 Sep 2023 16:13:26 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com;
 s=mail; t=1694531606;
 bh=SxXY+Ho9UJZ147/js2Q/IttP2ZD4VJtsVzP/Ke8Ai9w=;
 h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
 b=J6j3WgLM1JrFySh8imc7Rx1erfJoEFKk0zF457nDfeNBsm3iDVuVV6AkkqyxpVFiQ
 ftynl+sK3wxrByiSqGyTjtO27WnaQ+zejQuiKZXutNNIlpTbd4REB7VyAGetl25pkZ
 Q7XfPxEJoYD/4muOmnGFe10xWHOy1oItVXQNkDXaZ02FnCsGJ0e+KIPVfd3e/451wS
 xnLxnbnTGeYmHoPuzi7h13LOhJyUpvK8R6mj+q1kdhpFabxTRvZCFr0hcaWJ8GXOks
 +idiE2y9bvDtFxUchJuKuQkja/rHFLGuMVcYtYM9Os08VEUiADzLIk4uGDGh2r409T
 HWvthsI8V20HQ==
Date: Tue, 12 Sep 2023 17:13:22 +0200
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Danilo Krummrich <dakr@redhat.com>
Subject: Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work
 queue rather than kthread
Message-ID: <20230912171322.6c47a973@collabora.com>
In-Reply-To: <20230912164909.018d13c8@collabora.com>
References: <20230811023137.659037-1-matthew.brost@intel.com>
 <20230811023137.659037-2-matthew.brost@intel.com>
 <69b648f8-c6b3-5846-0d03-05a380d010d8@redhat.com>
 <069e6cd0-abd3-fdd9-217d-173e8f8e1d29@amd.com>
 <b9a6493c-243b-1078-afbc-d9270cac904a@redhat.com>
 <982800c1-e7d3-f276-51d0-1a431f92eacb@amd.com>
 <5fdf7d59-3323-24b5-a35a-bd60b06b4ce5@redhat.com>
 <0bf839df-db7f-41fa-8b34-59792d2ba8be@amd.com>
 <e8fa305a-0ac8-ece7-efeb-f9cec2892d44@redhat.com>
 <20230912162838.34135959@collabora.com>
 <ef8f493b-212a-3a97-bb37-28d6fb2623a4@redhat.com>
 <20230912164909.018d13c8@collabora.com>
Organization: Collabora
X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: Matthew Brost <matthew.brost@intel.com>, robdclark@chromium.org,
 sarah.walker@imgtec.com, thomas.hellstrom@linux.intel.com,
 ketil.johnsen@arm.com, lina@asahilina.net, Liviu.Dudau@arm.com,
 dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
 luben.tuikov@amd.com, donald.robson@imgtec.com,
 Christian =?UTF-8?B?S8O2bmln?= <christian.koenig@amd.com>,
 faith.ekstrand@collabora.com
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

On Tue, 12 Sep 2023 16:49:09 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Tue, 12 Sep 2023 16:33:01 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
> > On 9/12/23 16:28, Boris Brezillon wrote:  
> > > On Thu, 17 Aug 2023 13:13:31 +0200
> > > Danilo Krummrich <dakr@redhat.com> wrote:
> > >     
> > >> I think that's a misunderstanding. I'm not trying to say that it is
> > >> *always* beneficial to fill up the ring as much as possible. But I think
> > >> it is under certain circumstances, exactly those circumstances I
> > >> described for Nouveau.
> > >>
> > >> As mentioned, in Nouveau the size of a job is only really limited by the
> > >> ring size, which means that one job can (but does not necessarily) fill
> > >> up the whole ring. We both agree that this is inefficient, because it
> > >> potentially results into the HW run dry due to hw_submission_limit == 1.
> > >>
> > >> I recognize you said that one should define hw_submission_limit and
> > >> adjust the other parts of the equation accordingly, the options I see are:
> > >>
> > >> (1) Increase the ring size while keeping the maximum job size.
> > >> (2) Decrease the maximum job size while keeping the ring size.
> > >> (3) Let the scheduler track the actual job size rather than the maximum
> > >> job size.
> > >>
> > >> (1) results into potentially wasted ring memory, because we're not
> > >> always reaching the maximum job size, but the scheduler assumes so.
> > >>
> > >> (2) results into more IOCTLs from userspace for the same amount of IBs
> > >> and more jobs result into more memory allocations and more work being
> > >> submitted to the workqueue (with Matt's patches).
> > >>
> > >> (3) doesn't seem to have any of those draw backs.
> > >>
> > >> What would be your take on that?
> > >>
> > >> Actually, if none of the other drivers is interested into a more precise
> > >> way of keeping track of the ring utilization, I'd be totally fine to do
> > >> it in a driver specific way. However, unfortunately I don't see how this
> > >> would be possible.    
> > > 
> > > I'm not entirely sure, but I think PowerVR is pretty close to your
> > > description: jobs size is dynamic size, and the ring buffer size is
> > > picked by the driver at queue initialization time. What we did was to
> > > set hw_submission_limit to an arbitrarily high value of 64k (we could
> > > have used something like ringbuf_size/min_job_size instead), and then
> > > have the control flow implemented with ->prepare_job() [1] (CCCB is the
> > > PowerVR ring buffer). This allows us to maximize ring buffer utilization
> > > while still allowing dynamic-size jobs.    
> > 
> > I guess this would work, but I think it would be better to bake this in,
> > especially if more drivers do have this need. I already have an
> > implementation [1] for doing that in the scheduler. My plan was to push
> > that as soon as Matt sends out V3.
> > 
> > [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95  
> 
> PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
> native fences waits are passed to the FW, and those add to the job size.
> When we know our job is ready for execution (all non-native deps are
> signaled), we evict already signaled native-deps (or native fences) to
> shrink the job size further more, but that's something we need to
> calculate late if we want the job size to be minimal. Of course, we can
> always over-estimate the job size, but if we go for a full-blown
> drm_sched integration, I wonder if it wouldn't be preferable to have a
> ->get_job_size() callback returning the number of units needed by job,  
> and have the core pick 1 when the hook is not implemented.

FWIW, I think last time I asked how to do that, I've been pointed to
->prepare_job() by someone  (don't remember if it was Daniel or
Christian), hence the PowerVR implementation. If that's still the
preferred solution, there's some opportunity to have a generic layer to
automate ringbuf utilization tracking and some helpers to prepare
wait_for_ringbuf dma_fences that drivers could return from
->prepare_job() (those fences would then be signaled when the driver
calls drm_ringbuf_job_done() and the next job waiting for ringbuf space
now fits in the ringbuf).