From: Philipp Stanner <phasta@mailbox.org>
To: "Pierre-Eric Pelloux-Prayer" <pierre-eric@damsy.net>,
phasta@kernel.org,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Sumit Semwal" <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH v1] drm/sched: fix deadlock in drm_sched_entity_kill_jobs_cb
Date: Thu, 30 Oct 2025 13:26:10 +0100 [thread overview]
Message-ID: <c51ea5a408ca6d404074be1df219077457ea76f6.camel@mailbox.org> (raw)
In-Reply-To: <442d0e70-c9e2-4bd6-a144-ea083dbf86d2@damsy.net>
On Thu, 2025-10-30 at 13:06 +0100, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 30/10/2025 à 12:17, Philipp Stanner a écrit :
> > On Wed, 2025-10-29 at 10:11 +0100, Pierre-Eric Pelloux-Prayer wrote:
> > > https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 pointed out
> >
> > This link should be moved to the tag section at the bottom at a Closes:
> > tag. Optionally a Reported-by:, too.
>
> The bug report is about a different issue. The potential deadlock being fixed by
> this patch was discovered while investigating it.
> I'll add a Reported-by tag though.
>
> >
> > > a possible deadlock:
> > >
> > > [ 1231.611031] Possible interrupt unsafe locking scenario:
> > >
> > > [ 1231.611033] CPU0 CPU1
> > > [ 1231.611034] ---- ----
> > > [ 1231.611035] lock(&xa->xa_lock#17);
> > > [ 1231.611038] local_irq_disable();
> > > [ 1231.611039] lock(&fence->lock);
> > > [ 1231.611041] lock(&xa->xa_lock#17);
> > > [ 1231.611044] <Interrupt>
> > > [ 1231.611045] lock(&fence->lock);
> > > [ 1231.611047]
> > > *** DEADLOCK ***
> > >
> >
> > The commit message is lacking an explanation as to _how_ and _when_ the
> > deadlock comes to be. That's a prerequisite for understanding why the
> > below is the proper fix and solution.
>
> I copy-pasted a small chunk of the full deadlock analysis report included in the
> ticket because it's 300+ lines long. Copying the full log isn't useful IMO, but
> I can add more context.
The log wouldn't be useful, but a human-generated explanation as you
detail it below.
>
> The problem is that a thread (CPU0 above) can lock the job's dependencies
> xa_array without disabling the interrupts.
Which drm_sched function would that be?
> If a fence signals while CPU0 holds this lock and drm_sched_entity_kill_jobs_cb
> is called, it will try to grab the xa_array lock which is not possible because
> CPU0 holds it already.
You mean an *interrupt* signals the fence? Shouldn't interrupt issues
be solved with spin_lock_irqdisable() – but we can't have that because
it's the xarray doing that internally?
You don't have to explain that in this mail-thread, a v2 detailing that
would be suficient.
>
>
> >
> > The issue seems to be that you cannot perform certain tasks from within
> > that work item?
[…]
> >
> > > +static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> > > + struct dma_fence_cb *cb);
> > > +
> > > static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk)
> > > {
> > > struct drm_sched_job *job = container_of(wrk, typeof(*job), work);
> > > -
> > > - drm_sched_fence_scheduled(job->s_fence, NULL);
> > > - drm_sched_fence_finished(job->s_fence, -ESRCH);
> > > - WARN_ON(job->s_fence->parent);
> > > - job->sched->ops->free_job(job);
> >
> > Can free_job() really not be called from within work item context?
>
> It's still called from drm_sched_entity_kill_jobs_work but the diff is slightly
> confusing.
OK, probably my bad. But just asking, do you use
git format-patch --histogram
?
histogram often produces better diffs than default.
P.
prev parent reply other threads:[~2025-10-30 12:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 9:11 [PATCH v1] drm/sched: fix deadlock in drm_sched_entity_kill_jobs_cb Pierre-Eric Pelloux-Prayer
2025-10-29 13:05 ` Christian König
2025-10-30 11:17 ` Philipp Stanner
2025-10-30 12:06 ` Pierre-Eric Pelloux-Prayer
2025-10-30 12:26 ` Philipp Stanner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c51ea5a408ca6d404074be1df219077457ea76f6.camel@mailbox.org \
--to=phasta@mailbox.org \
--cc=airlied@gmail.com \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=phasta@kernel.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=pierre-eric@damsy.net \
--cc=simona@ffwll.ch \
--cc=sumit.semwal@linaro.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).