Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: <htejun@gmail.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>,
	<intel-xe@lists.freedesktop.org>,
	<thomas.hellstrom@linux.intel.com>, <rodrigo.vivi@intel.com>
Subject: Re: [PATCH 0/3] Rework work queue usage
Date: Thu, 28 Mar 2024 19:30:41 +0000	[thread overview]
Message-ID: <ZgXFYWuukcLjMtmf@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <ZgXBcpRv_FTIoq91@slm.duckdns.org>

On Thu, Mar 28, 2024 at 09:13:54AM -1000, htejun@gmail.com wrote:
> Hello,
> 
> On Thu, Mar 28, 2024 at 02:02:47PM -0500, Lucas De Marchi wrote:
> > On Thu, Mar 28, 2024 at 11:21:44AM -0700, Matthew Brost wrote:
> > > Avoid sleeping or grabbing locks in work queues shared with the system.
> > > Recent changes to work queues [1] have exposed deadlocks [2] in Xe.
> 
> Can you elaborate it a bit? I'm having a bit of hard time imagining how the
> latest workqueue changes would have exposed deadlocks.
> 

Sure. Let me explain what is happening in CI failure.

The test creates 100s of exec queues that all can be preempted in
parallel. In the current code this results in each exec queue kicking a
worker which is scheduled on the system_unbound_wq, these workers wait
and sleep (using a waitqueue) on signaling from another worker. The
other worker, which is also scheduled system_unbound_wq, is processing a
queue which interacts with the GPU. I'm thinking the worker which
interacts with hardware gets straved by the waiter resulting in a
deadlock.

This patch changes the waiters to uses a device private ordered work
queue so at most we have 1 waiter a time. Regardless of the new work
queue behavior this a better design.

It is beyond my knowledge if the old behavior, albiet poorly designed,
should still work with the work queue changes in 6.9.

> > I think we need some of this information in the commit message in patch
> > 1. Because patch 1 simply says it's moving to a device private wq to
> > avoid hogging the system one, but the issue is much more serious.
> > 
> > Also, is the "Fixes:"  really correct? It seems more like a regression
> > from the wq changes and there could be other drivers showing similar
> > issues now. But it could alos be my lack of understanding of the real
> > issue.
> 
> I don't have enough context to tell whether this is a workqueue problem but
> if so we should definitely fix workqueue.
> 

It is beyond my knowledge if the old behavior, albeit poorly designed,
should still work with the work queue changes in 6.9.

Matt

> Thanks.
> 
> -- 
> tejun

  reply	other threads:[~2024-03-28 19:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 18:21 [PATCH 0/3] Rework work queue usage Matthew Brost
2024-03-28 18:21 ` [PATCH 1/3] drm/xe: Use ordered wq for preempt fence waiting Matthew Brost
2024-03-28 18:56   ` Matt Roper
2024-03-28 19:00     ` Matthew Brost
2024-04-01 19:37       ` Matthew Brost
2024-03-28 18:21 ` [PATCH 2/3] drm/xe: Use device, gt ordered work queues for resource cleanup Matthew Brost
2024-03-28 18:21 ` [PATCH 3/3] drm/xe: Use ordered WQ for TLB invalidation fences Matthew Brost
2024-03-28 19:02 ` [PATCH 0/3] Rework work queue usage Lucas De Marchi
2024-03-28 19:13   ` htejun
2024-03-28 19:30     ` Matthew Brost [this message]
2024-03-28 19:40       ` Tejun Heo
2024-03-29 16:52         ` Matthew Brost
2024-03-29  1:59 ` ✓ CI.Patch_applied: success for " Patchwork
2024-03-29  2:00 ` ✓ CI.checkpatch: " Patchwork
2024-03-29  2:00 ` ✓ CI.KUnit: " Patchwork
2024-03-29  2:12 ` ✓ CI.Build: " Patchwork
2024-03-29  2:15 ` ✓ CI.Hooks: " Patchwork
2024-03-29  2:16 ` ✓ CI.checksparse: " Patchwork
2024-03-29  2:53 ` ✗ CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgXFYWuukcLjMtmf@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=htejun@gmail.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox