From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CABD7C54ED1 for ; Fri, 23 May 2025 14:16:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C317A10E7FA; Fri, 23 May 2025 14:16:16 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="OWVW0lfs"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9FB2310E7FA; Fri, 23 May 2025 14:16:11 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 6AD4443E51; Fri, 23 May 2025 14:16:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1ECD2C4CEE9; Fri, 23 May 2025 14:16:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1748009771; bh=SUh3NHH+Mw5I/qtqxEBZxPgE07z4Igyhup6+mKEtOgo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OWVW0lfsPlHZuqq8r9G0qi21QNNZ64Y7N15/05sf2M0jt6zhhJxI6pCDWhOptbgsA CgwQ7aeJZS+gwbAPWvVIAr30cUMrLxlyAQw8LMztGy0snOo1inW/EXUF3WjtFTUPsC DysXCmVnNnkU7chn6vfR7SAhAVZes/Due9dqKhYp0JqXRKK+XlVqI+368LqXenMA1N jIr8Z81iAOeSmuS5TPkF5KRfLxA/f+bN6snGNXdgzZkdtduCazXEaptxCKnJ82FVby j9teDWCDInzm5K53RlydJ0CNPxc27Sj+y9v/tVhXn/7FG6wSqh4jEQ55S87OyvwJqY fnB/FkFevr3xg== Date: Fri, 23 May 2025 16:16:07 +0200 From: Danilo Krummrich To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: tursulin@ursulin.net, phasta@mailbox.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Message-ID: References: <20250523125643.7540-1-christian.koenig@amd.com> <20250523125643.7540-2-christian.koenig@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote: > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote: > > It turned out that we can actually massively optimize here. > > > > The previous code was horrible inefficient since it constantly released > > and re-acquired the lock of the xarray and started each iteration from the > > base of the array to avoid concurrent modification which in our case > > doesn't exist. > > > > Additional to that the xas_find() and xas_store() functions are explicitly > > made in a way so that you can efficiently check entries and if you don't > > find a match store a new one at the end or replace existing ones. > > > > So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc(). > > It's a bit more code, but should be much faster in the end. > > This commit message does neither explain the motivation of the commit nor what it > does. It describes what instead belongs into the changelog between versions. Sorry, this is wrong. I got confused, the commit message is perfectly fine. :) The rest still applies though. > Speaking of versioning of the patch series, AFAIK there were previous versions, > but this series was sent as a whole new series -- why? > > Please resend with a proper commit message, version and changelog. Thanks! > > > Signed-off-by: Christian König > > --- > > drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++-------- > > 1 file changed, 20 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > > index f7118497e47a..cf200b1b643e 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm); > > int drm_sched_job_add_dependency(struct drm_sched_job *job, > > struct dma_fence *fence) > > { > > + XA_STATE(xas, &job->dependencies, 0); > > struct dma_fence *entry; > > - unsigned long index; > > - u32 id = 0; > > - int ret; > > > > if (!fence) > > return 0; > > @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job, > > * This lets the size of the array of deps scale with the number of > > * engines involved, rather than the number of BOs. > > */ > > - xa_for_each(&job->dependencies, index, entry) { > > + xa_lock(&job->dependencies); > > + xas_for_each(&xas, entry, ULONG_MAX) { > > if (entry->context != fence->context) > > continue; > > > > if (dma_fence_is_later(fence, entry)) { > > dma_fence_put(entry); > > - xa_store(&job->dependencies, index, fence, GFP_KERNEL); > > + xas_store(&xas, fence); > > } else { > > dma_fence_put(fence); > > } > > - return 0; > > + xa_unlock(&job->dependencies); > > + return xas_error(&xas); > > } > > > > - ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL); > > - if (ret != 0) > > +retry: > > + entry = xas_store(&xas, fence); > > + xa_unlock(&job->dependencies); > > + > > + /* There shouldn't be any concurrent add, so no need to loop again */ > > Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in the > XA_STATE not the xarray. Hence, I think we should remove the comment. > > > + if (xas_nomem(&xas, GFP_KERNEL)) { > > + xa_lock(&job->dependencies); > > + goto retry; > > Please don't use a goto here, if we would have failed to allocate memory here, > this would be an endless loop until we succeed eventually. It would be equal to: > > while (!ptr) { > ptr = kmalloc(); > } > > Instead just take the lock and call xas_store() again. > > > + } > > + > > + if (xas_error(&xas)) > > dma_fence_put(fence); > > + else > > + WARN_ON(entry); > > Please don't call WARN_ON() here, this isn't fatal, we only need to return the > error code.