From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amd-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id CABD7C54ED1
	for <amd-gfx@archiver.kernel.org>; Fri, 23 May 2025 14:16:18 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C317A10E7FA;
	Fri, 23 May 2025 14:16:16 +0000 (UTC)
Authentication-Results: gabe.freedesktop.org;
	dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="OWVW0lfs";
	dkim-atps=neutral
Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 9FB2310E7FA;
 Fri, 23 May 2025 14:16:11 +0000 (UTC)
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
 by sea.source.kernel.org (Postfix) with ESMTP id 6AD4443E51;
 Fri, 23 May 2025 14:16:11 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1ECD2C4CEE9;
 Fri, 23 May 2025 14:16:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
 s=k20201202; t=1748009771;
 bh=SUh3NHH+Mw5I/qtqxEBZxPgE07z4Igyhup6+mKEtOgo=;
 h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
 b=OWVW0lfsPlHZuqq8r9G0qi21QNNZ64Y7N15/05sf2M0jt6zhhJxI6pCDWhOptbgsA
 CgwQ7aeJZS+gwbAPWvVIAr30cUMrLxlyAQw8LMztGy0snOo1inW/EXUF3WjtFTUPsC
 DysXCmVnNnkU7chn6vfR7SAhAVZes/Due9dqKhYp0JqXRKK+XlVqI+368LqXenMA1N
 jIr8Z81iAOeSmuS5TPkF5KRfLxA/f+bN6snGNXdgzZkdtduCazXEaptxCKnJ82FVby
 j9teDWCDInzm5K53RlydJ0CNPxc27Sj+y9v/tVhXn/7FG6wSqh4jEQ55S87OyvwJqY
 fnB/FkFevr3xg==
Date: Fri, 23 May 2025 16:16:07 +0200
From: Danilo Krummrich <dakr@kernel.org>
To: Christian =?iso-8859-1?Q?K=F6nig?= <ckoenig.leichtzumerken@gmail.com>
Cc: tursulin@ursulin.net, phasta@mailbox.org, amd-gfx@lists.freedesktop.org,
 dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
Message-ID: <aDCDJ-sK9rRI6wse@cassiopeiae>
References: <20250523125643.7540-1-christian.koenig@amd.com>
 <20250523125643.7540-2-christian.koenig@amd.com>
 <aDCCF0JFhO7lR2VJ@cassiopeiae>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <aDCCF0JFhO7lR2VJ@cassiopeiae>
X-BeenThere: amd-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion list for AMD gfx <amd-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/amd-gfx>,
 <mailto:amd-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/amd-gfx>
List-Post: <mailto:amd-gfx@lists.freedesktop.org>
List-Help: <mailto:amd-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>,
 <mailto:amd-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: amd-gfx-bounces@lists.freedesktop.org
Sender: "amd-gfx" <amd-gfx-bounces@lists.freedesktop.org>

On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > It turned out that we can actually massively optimize here.
> > 
> > The previous code was horrible inefficient since it constantly released
> > and re-acquired the lock of the xarray and started each iteration from the
> > base of the array to avoid concurrent modification which in our case
> > doesn't exist.
> > 
> > Additional to that the xas_find() and xas_store() functions are explicitly
> > made in a way so that you can efficiently check entries and if you don't
> > find a match store a new one at the end or replace existing ones.
> > 
> > So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
> > It's a bit more code, but should be much faster in the end.
> 
> This commit message does neither explain the motivation of the commit nor what it
> does. It describes what instead belongs into the changelog between versions.

Sorry, this is wrong. I got confused, the commit message is perfectly fine. :)

The rest still applies though.

> Speaking of versioning of the patch series, AFAIK there were previous versions,
> but this series was sent as a whole new series -- why?
> 
> Please resend with a proper commit message, version and changelog. Thanks!
> 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index f7118497e47a..cf200b1b643e 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >  int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  				 struct dma_fence *fence)
> >  {
> > +	XA_STATE(xas, &job->dependencies, 0);
> >  	struct dma_fence *entry;
> > -	unsigned long index;
> > -	u32 id = 0;
> > -	int ret;
> >  
> >  	if (!fence)
> >  		return 0;
> > @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  	 * This lets the size of the array of deps scale with the number of
> >  	 * engines involved, rather than the number of BOs.
> >  	 */
> > -	xa_for_each(&job->dependencies, index, entry) {
> > +	xa_lock(&job->dependencies);
> > +	xas_for_each(&xas, entry, ULONG_MAX) {
> >  		if (entry->context != fence->context)
> >  			continue;
> >  
> >  		if (dma_fence_is_later(fence, entry)) {
> >  			dma_fence_put(entry);
> > -			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > +			xas_store(&xas, fence);
> >  		} else {
> >  			dma_fence_put(fence);
> >  		}
> > -		return 0;
> > +		xa_unlock(&job->dependencies);
> > +		return xas_error(&xas);
> >  	}
> >  
> > -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> > -	if (ret != 0)
> > +retry:
> > +	entry = xas_store(&xas, fence);
> > +	xa_unlock(&job->dependencies);
> > +
> > +	/* There shouldn't be any concurrent add, so no need to loop again */
> 
> Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in the
> XA_STATE not the xarray. Hence, I think we should remove the comment.
> 
> > +	if (xas_nomem(&xas, GFP_KERNEL)) {
> > +		xa_lock(&job->dependencies);
> > +		goto retry;
> 
> Please don't use a goto here, if we would have failed to allocate memory here,
> this would be an endless loop until we succeed eventually. It would be equal to:
> 
> 	while (!ptr) {
> 		ptr = kmalloc();
> 	}
> 
> Instead just take the lock and call xas_store() again.
> 
> > +	}
> > +
> > +	if (xas_error(&xas))
> >  		dma_fence_put(fence);
> > +	else
> > +		WARN_ON(entry);
> 
> Please don't call WARN_ON() here, this isn't fatal, we only need to return the
> error code.