Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, jiangshanlai@gmail.com,
	tj@kernel.org, simona.vetter@ffwll.ch, pstanner@redhat.com,
	dakr@kernel.org
Subject: Re: [RFC PATCH 1/3] workqueue: Add an interface to taint workqueue lockdep with reclaim
Date: Wed, 29 Oct 2025 10:48:43 +0100	[thread overview]
Message-ID: <9565366c-6821-4767-bcfc-079378fb4348@amd.com> (raw)
In-Reply-To: <aQEkq7DYy5/AaJ4R@lstrano-desk.jf.intel.com>

On 10/28/25 21:16, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 10:32:54AM +0100, Christian König wrote:
>> On 10/21/25 23:39, Matthew Brost wrote:
>>> Drivers often use workqueues that are in the reclaim path (e.g., DRM
>>> scheduler workqueues). It is useful to teach lockdep that memory cannot
>>> be allocated on these workqueues. Add an interface to taint workqueue
>>> lockdep with reclaim.
>>
>> Oh that is so wonderfully evil. I'm absolutely in favor of doing this.
>>
>> But can't we check for the existing WQ_MEM_RECLAIM flag in the workqueue handling instead?
>>
> 
> Tejun suggested tying the lockdep annotation to WQ_MEM_RECLAIM, but the
> entire kernel explodes because many workqueues throughout Linux don’t
> adhere to this rule. Here's a link to my latest reply to Tejun [1].
> 
> [1] https://patchwork.freedesktop.org/patch/682494/?series=156284&rev=1#comment_1255380 

Sorry my fault, I hadn't read up to the latest discussion when I wrote the mail.

My educated guess is that a lot of wq just set WQ_MEM_RECLAIM to be guaranteed to to start even under memory pressure.

So yeah probably best to keep your approach here for now and somebody from core MM should take a look at cleaning it up later on.
>> Additional to that we should also make sure that the same wq is used for timeout and free and that this wq is single threaded *and* has the WQ_MEM_RECLAIM flag set.
>>
> 
> Currently, free runs on the same work queue as run_job. We could look
> into moving it to a separate queue, but that’s a separate issue.

We really need to make sure the free and timeout wq are the same and single threaded.

The hack the scheduler currently does with removing and re-inserting the job on a timeout is something we should really try to fix.

> IIRC the workqueue_struct is private and so we can't fish that out in
> the DRM scheduler without adding helpers to workqueue layer. Ofc we
> could do that too if you think this would be helpful.

I might be wrong, but IIRC there was a helper to get the flags from the wq.

That should be enough to test if it is single threaded or not.

> 
>> Otherwise we run into the same lifetime issue with the job and memory reclaim during device reset as well.
>>
> 
> My patches in this series taint the submit_wq and timeout_wq in the DRM
> scheduler [2]. I have a solid understanding of reclaim rules, and this
> change helped uncover some convoluted cases in Xe—specifically in our
> device reset code involving power management and reclaim [3]. So I can
> confirm this has been quite helpful.

Yeah, completely agree. We most likely have quite a bunch of issues in our reset code path as well.

Regards,
Christian.

> 
> Matt
> 
> [2] https://patchwork.freedesktop.org/patch/682496/?series=156284&rev=1
> [3] https://patchwork.freedesktop.org/series/156292/
> 
>> Regards,
>> Christian.
>>
>>>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  include/linux/workqueue.h | 19 +++++++++++++++++++
>>>  kernel/workqueue.c        |  9 +++++++++
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
>>> index dabc351cc127..954c7eb7e225 100644
>>> --- a/include/linux/workqueue.h
>>> +++ b/include/linux/workqueue.h
>>> @@ -553,6 +553,25 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags, int max_active,
>>>  						1, lockdep_map, ##args))
>>>  #endif
>>>  
>>> +
>>> +#ifdef CONFIG_LOCKDEP
>>> +/**
>>> + * taint_reclaim_workqueue - taint workqueue lockdep map with reclaim
>>> + * @wq: workqueue to taint with reclaim
>>> + * gfp: gfp taint
>>> + *
>>> + * Drivers often use workqueues that are in the reclaim path (e.g., DRM
>>> + * scheduler workqueues). It is useful to teach lockdep that memory cannot be
>>> + * allocated on these workqueues.
>>> + */
>>> +extern void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t gfp);
>>> +#else
>>> +static inline void taint_reclaim_workqueue(struct workqueue_struct *wq,
>>> +					   gfp_t gfp)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>  /**
>>>   * alloc_ordered_workqueue - allocate an ordered workqueue
>>>   * @fmt: printf format for the name of the workqueue
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index 45320e27a16c..fea410c20b71 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -5846,6 +5846,15 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags,
>>>  	return wq;
>>>  }
>>>  EXPORT_SYMBOL_GPL(alloc_workqueue_lockdep_map);
>>> +
>>> +void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t gfp)
>>> +{
>>> +	fs_reclaim_acquire(gfp);
>>> +	lock_map_acquire(wq->lockdep_map);
>>> +	lock_map_release(wq->lockdep_map);
>>> +	fs_reclaim_release(gfp);
>>> +}
>>> +EXPORT_SYMBOL_GPL(taint_reclaim_workqueue);
>>>  #endif
>>>  
>>>  static bool pwq_busy(struct pool_workqueue *pwq)
>>


  reply	other threads:[~2025-10-29  9:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-21 21:39 [RFC PATCH 0/3] Enforce DRM scheduler reclaim rules Matthew Brost
2025-10-21 21:39 ` [RFC PATCH 1/3] workqueue: Add an interface to taint workqueue lockdep with reclaim Matthew Brost
2025-10-21 21:56   ` Tejun Heo
2025-10-21 22:04     ` Matthew Brost
2025-10-21 22:06       ` Matthew Brost
2025-10-21 23:25         ` Tejun Heo
2025-10-22  1:16           ` Matthew Brost
2025-10-21 23:28       ` Tejun Heo
2025-10-22  1:22         ` Matthew Brost
2025-10-22  1:51           ` Tejun Heo
2025-10-27 21:58             ` Matthew Brost
2025-10-28  9:32   ` Christian König
2025-10-28 20:16     ` Matthew Brost
2025-10-29  9:48       ` Christian König [this message]
2025-10-29 15:06       ` Tejun Heo
2025-10-29 16:46         ` Matthew Brost
2025-10-29 18:16           ` Tejun Heo
2025-10-21 21:39 ` [RFC PATCH 2/3] drm/sched: Taint workqueues " Matthew Brost
2025-10-27 11:03   ` Philipp Stanner
2025-10-27 17:00     ` Matthew Brost
2025-10-21 21:39 ` [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an armed job Matthew Brost
2025-10-27 11:13   ` Philipp Stanner
2025-10-27 16:56     ` Matthew Brost
2025-10-28  9:27       ` Philipp Stanner
2025-10-21 21:56 ` ✗ CI.checkpatch: warning for Enforce DRM scheduler reclaim rules Patchwork
2025-10-21 21:58 ` ✓ CI.KUnit: success " Patchwork
2025-10-21 22:13 ` ✗ CI.checksparse: warning " Patchwork
2025-10-21 22:31 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-22  1:46 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9565366c-6821-4767-bcfc-079378fb4348@amd.com \
    --to=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=pstanner@redhat.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox