public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Rob Herring <robh@kernel.org>
Cc: "Tomeu Vizoso" <tomeu@tomeuvizoso.net>,
	"Krzysztof Kozlowski" <krzk+dt@kernel.org>,
	"Conor Dooley" <conor+dt@kernel.org>,
	"Oded Gabbay" <ogabbay@kernel.org>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Steven Price" <steven.price@arm.com>,
	"Daniel Stone" <daniel@fooishbar.org>,
	"Frank Li" <Frank.li@nxp.com>,
	"Sui Jingfeng" <sui.jingfeng@linux.dev>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH v5 2/2] accel: Add Arm Ethos-U NPU driver
Date: Sat, 18 Oct 2025 00:42:30 -0700	[thread overview]
Message-ID: <aPNE5po45Umson5V@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <aPM3J2jZcct7ODIp@lstrano-desk.jf.intel.com>

On Fri, Oct 17, 2025 at 11:43:51PM -0700, Matthew Brost wrote:
> On Fri, Oct 17, 2025 at 10:37:46AM -0500, Rob Herring wrote:
> > On Thu, Oct 16, 2025 at 11:25:34PM -0700, Matthew Brost wrote:
> > > On Thu, Oct 16, 2025 at 04:06:05PM -0500, Rob Herring (Arm) wrote:
> > > > Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> > > > relatively simple interface with single command stream to describe
> > > > buffers, operation settings, and network operations. It supports up to 8
> > > > memory regions (though no h/w bounds on a region). The Ethos NPUs
> > > > are designed to use an SRAM for scratch memory. Region 2 is reserved
> > > > for SRAM (like the downstream driver stack and compiler). Userspace
> > > > doesn't need access to the SRAM.
> > 
> > Thanks for the review.
> > 
> > [...]
> > 
> > > > +static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
> > > > +{
> > > > +	struct ethosu_job *job = to_ethosu_job(sched_job);
> > > > +	struct ethosu_device *dev = job->dev;
> > > > +	struct dma_fence *fence = NULL;
> > > > +	int ret;
> > > > +
> > > > +	if (unlikely(job->base.s_fence->finished.error))
> > > > +		return NULL;
> > > > +
> > > > +	fence = ethosu_fence_create(dev);
> > > 
> > > Another reclaim issue: ethosu_fence_create allocates memory using
> > > GFP_KERNEL. Since we're already in the DMA fence signaling path
> > > (reclaim), this can lead to a deadlock.
> > > 
> > > Without too much thought, you likely want to move this allocation to
> > > ethosu_job_do_push, but before taking dev->sched_lock or calling
> > > drm_sched_job_arm.
> > > 
> > > We really should fix the DRM scheduler work queue to be tainted with
> > > reclaim. If I recall correctly, we'd need to update the work queue
> > > layer. Let me look into that—I've seen this type of bug several times,
> > > and lockdep should be able to catch it.
> > 
> > Likely the rocket driver suffers from the same issues...
> > 
> 
> I am not surprised by this statement.
> 
> > > 
> > > > +	if (IS_ERR(fence))
> > > > +		return fence;
> > > > +
> > > > +	if (job->done_fence)
> > > > +		dma_fence_put(job->done_fence);
> > > > +	job->done_fence = dma_fence_get(fence);
> > > > +
> > > > +	ret = pm_runtime_get_sync(dev->base.dev);
> > > 
> > > I haven't looked at your PM design, but this generally looks quite
> > > dangerous with respect to reclaim. For example, if your PM resume paths
> > > allocate memory or take locks that allocate memory underneath, you're
> > > likely to run into issues.
> > > 
> > > A better approach would be to attach a PM reference to your job upon
> > > creation and release it upon job destruction. That would be safer and
> > > save you headaches in the long run.
> > 
> > Our PM is nothing more than clock enable/disable and register init. 
> > 
> > If the runtime PM API doesn't work and needs special driver wrappers, 
> > then I'm inclined to just not use it and manage clocks directly (as 
> > that's all it is doing).
> > 
> 
> Yes, then you’re probably fine. More complex drivers can do all sorts of
> things during a PM wake, which is why PM wakes should generally be the
> outermost layer. I still suggest, to future-proof your code, that you
> move the PM reference to an outer layer.
> 

Also, taking a PM reference in a function call — as opposed to tying it
to a object's lifetime — is risky. It can quickly lead to imbalances in
PM references if things go sideways or function calls become unbalanced.
Depending on how your driver uses the DRM scheduler, this seems like a
real possibility.

Matt

> > > 
> > > This is what we do in Xe [1] [2].
> > > 
> > > Also, in general, this driver has been reviewed (RB’d), but it's not
> > > great that I spotted numerous issues within just five minutes. I suggest
> > > taking a step back and thoroughly evaluating everything this driver is
> > > doing.
> > 
> > Well, if it is hard to get simple drivers right, then it's a problem 
> > with the subsystem APIs IMO.
> > 
> 
> Yes, agreed. We should have assertions and lockdep annotations in place
> to catch driver-side misuses. This is the second driver I’ve randomly
> looked at over the past year that has broken DMA fencing and reclaim
> rules. I’ll take an action item to fix this in the DRM scheduler, but
> I’m afraid I’ll likely break multiple drivers in the process as misuess
> / lockdep will complain. 
> 
> Matt
> 
> > Rob

  reply	other threads:[~2025-10-18  7:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16 21:06 [PATCH v5 0/2] accel: Add Arm Ethos-U NPU Rob Herring (Arm)
2025-10-16 21:06 ` [PATCH v5 1/2] dt-bindings: npu: Add Arm Ethos-U65/U85 Rob Herring (Arm)
2025-10-16 21:06 ` [PATCH v5 2/2] accel: Add Arm Ethos-U NPU driver Rob Herring (Arm)
2025-10-16 21:35   ` Frank Li
2025-10-17  6:03   ` Matthew Brost
2025-10-17  6:25   ` Matthew Brost
2025-10-17 15:37     ` Rob Herring
2025-10-18  6:43       ` Matthew Brost
2025-10-18  7:42         ` Matthew Brost [this message]
2025-10-21 21:43           ` Matthew Brost
2025-10-22 16:22             ` Rob Herring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPNE5po45Umson5V@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Frank.li@nxp.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=conor+dt@kernel.org \
    --cc=daniel@fooishbar.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=krzk+dt@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=ogabbay@kernel.org \
    --cc=robh@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=sui.jingfeng@linux.dev \
    --cc=sumit.semwal@linaro.org \
    --cc=tomeu@tomeuvizoso.net \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox