Re: [PATCH] drm/xe/pxp: Don't kill queues while holding the spinlock

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>,
	<intel-xe@lists.freedesktop.org>,
	John Harrison <John.C.Harrison@intel.com>
Subject: Re: [PATCH] drm/xe/pxp: Don't kill queues while holding the spinlock
Date: Tue, 18 Feb 2025 19:20:27 -0800	[thread overview]
Message-ID: <Z7VN+7p5eFfrZcmx@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <Z7VNhu4kgfEqTjb5@lstrano-desk.jf.intel.com>

On Tue, Feb 18, 2025 at 07:18:30PM -0800, Matthew Brost wrote:
> On Tue, Feb 18, 2025 at 04:38:34PM -0800, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 2/13/2025 12:19 PM, Matthew Brost wrote:
> > > On Thu, Feb 13, 2025 at 09:23:26AM -0800, Daniele Ceraolo Spurio wrote:
> > > > 
> > > > On 2/12/2025 10:42 PM, Dan Carpenter wrote:
> > > > > On Wed, Feb 12, 2025 at 05:26:55PM -0800, Matthew Brost wrote:
> > > > > > On Wed, Feb 12, 2025 at 04:40:32PM -0800, Daniele Ceraolo Spurio wrote:
> > > > > > > xe_exec_queue_kill can sleep, so we can't call it from under the lock.
> > > > > > > We can instead move the queues to a separate list and then kill them all
> > > > > > > after we release the lock.
> > > > > > > 
> > > > > > > Since being in the list is used to track whether RPM cleanup is needed,
> > > > > > > we can no longer defer that to queue_destroy, so we perform it
> > > > > > > immediately instead.
> > > > > > > 
> > > > > > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> > > > > > > Fixes: f8caa80154c4 ("drm/xe/pxp: Add PXP queue tracking and session start")
> > > > > > > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > > > Patch LGTM but can this actually happen though? i.e. Can or do we enable
> > > > > > PXP on LR queues?
> > > > > > 
> > > > > This isn't really an answer to your question, but when I reported this
> > > > > bug I didn't notice the if (xe_vm_in_preempt_fence_mode()) check in
> > > > > xe_vm_remove_compute_exec_queue().  So it's possible that this was a
> > > > > false positive?
> > > > We currently don't have a use-case where we need a vm in preempt_fence_mode
> > > > for a queue that uses PXP, but I didn't block the combination because there
> > > > is a chance we might want to use it in the future (compute PXP is supported
> > > > by the HW, even if we don't currently support it in Xe), so a user can still
> > > > set things up that way.
> > > > 
> > > > > > Also as a follow should be add a might_sleep() to xe_exec_queue_kill to
> > > > > > catch this type of bug immediately?
> > > > > There is a might_sleep() in down_write().  If this is a real bug that
> > > > > would have caught it.  The problem is that people don't generally test
> > > > > with CONFIG_DEBUG_ATOMIC_SLEEP so the might_sleep() calls are turned off.
> > > > We do have CONFIG_DEBUG_ATOMIC_SLEEP enabled in CI (and I have it locally
> > > > since I use the CI config), but since PXP + preempt_fence_mode is not an
> > > > expected use-case we don't have any tests that cover that combination, so we
> > > > return early from that xe_vm_remove_compute_exec_queue() and don't hit the
> > > > down_write/might_sleep. I'll see if I can add a test to cover it, as there
> > > > might be other issues I've missed.
> > > > Also, I don't think it'd be right to add a might_sleep at the top of the
> > > > exec_queue_kill() function either, because if a caller is sure that
> > > > xe_vm_in_preempt_fence_mode() is false they should be allowed to
> > > > call exec_queue_kill() from atomic context.
> > > I see what you are saying here but allowing something 'like we know we
> > > not preempt queue so it is safe to kill in an atomic conetxt' seems
> > > risky and a very odd thing to support. IMO we just make it clear that
> > > this function can't be called in an atomic context.
> > > 
> > > We likely have some upcoming TLB invalidation changes too which I think
> > > will move all queues to a per VM list with the list being protected by a
> > > sleeping lock. Removal from this list should likely be done in kill too.
> > > This is speculation however.
> > > 
> > > I agree some test cases for preempt queues and PXP would be a good idea
> > > if this isn't explictly disallowed at the IOCTL level.
> > 
> > I have a test written locally and I've managed to repro the atomic sleep
> > issue. Unfortunately, I have found a second issue with a locking inversion:
> > we take pxp->mutex under the vm lock when we create an exec_queue that uses
> > PXP, while here we do the opposite with the kill() taking the vm lock under
> > pxp mutex. Not sure yet which of the 2 sides is easier to fix and therefore
> > if this patch needs an update, so I'll hold the merge for now until I have a
> > clearer idea.
> > 
> 
> I'd suggest q->pxp.link to pxp->kill.list + a worker to do the kill
> then. The kill is already async so likely not a big to deal make it
> more async.
> 

Or if we want to make this generic... Add xe_exec_queue_kill_async with
the same idea as above but in non-PXP specific way.

Matt

> Matt 
> 
> > Daniele
> > 
> > > 
> > > Matt
> > > 
> > > > Thanks,
> > > > Daniele
> > > > 
> > > > > regards,
> > > > > dan carpenter
> > > > > 
> >

next prev parent reply	other threads:[~2025-02-19  3:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13  0:40 [PATCH] drm/xe/pxp: Don't kill queues while holding the spinlock Daniele Ceraolo Spurio
2025-02-13  0:47 ` ✓ CI.Patch_applied: success for " Patchwork
2025-02-13  0:47 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-13  0:48 ` ✗ CI.KUnit: failure " Patchwork
2025-02-13  1:26 ` [PATCH] " Matthew Brost
2025-02-13  6:42   ` Dan Carpenter
2025-02-13 17:23     ` Daniele Ceraolo Spurio
2025-02-13 20:19       ` Matthew Brost
2025-02-19  0:38         ` Daniele Ceraolo Spurio
2025-02-19  3:18           ` Matthew Brost
2025-02-19  3:20             ` Matthew Brost [this message]
2025-02-19 21:33               ` Daniele Ceraolo Spurio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7VN+7p5eFfrZcmx@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=John.C.Harrison@intel.com \
    --cc=dan.carpenter@linaro.org \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox