From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
DRI <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] How to convert drivers/gpu/drm/i915/ to use local workqueue?
Date: Thu, 30 Jun 2022 09:59:00 -0400 [thread overview]
Message-ID: <Yr2sJBdZ3TnXZF+s@intel.com> (raw)
In-Reply-To: <b21d9f1e-65e3-8f2f-a5c3-04bf866823e3@linux.intel.com>
On Thu, Jun 30, 2022 at 02:09:02PM +0100, Tvrtko Ursulin wrote:
>
> On 30/06/2022 12:19, Tetsuo Handa wrote:
> > On 2022/06/30 19:17, Tvrtko Ursulin wrote:
> > > Could you give more context on reasoning here please? What is the difference
> > > between using the system_wq and flushing it from a random context? Or concern
> > > is about flushing from specific contexts?
> >
> > Excuse me, but I couldn't catch what you want. Thus, I show three patterns of
> > problems with an example.
> >
> > Commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a macro") says:
> >
> > Tejun Heo commented that it makes no sense at all to call flush_workqueue()
> > on the shared WQs as the caller has no idea what it's gonna end up waiting for.
> >
> > The "Think twice before calling this function! It's very easy to get into trouble
> > if you don't take great care." warning message does not help avoiding problems.
> >
> > Let's change the direction to that developers had better use their local WQs
> > if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable.
> >
> > Three patterns of problems are:
> >
> > (a) Flushing from specific contexts (e.g. GFP_NOFS/GFP_NOIO) can cause deadlock
> > (circular locking dependency) problem.
> >
> > (b) Flushing with specific locks (e.g. module_mutex) held can cause deadlock
> > (circular locking dependency) problem.
> >
> > (c) Even if there is no possibility of deadlock, flushing with specific locks
> > held can cause khungtaskd to complain.
> >
> > An example of (a):
> >
> > ext4 filesystem called flush_scheduled_work(), which meant to wait for only
> > work item scheduled by ext4 filesystem, tried to also wait for work item
> > scheduled by 9p filesystem.
> > https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8
> >
> > Fixed by reverting the problematic patch.
> >
> > An example of (b):
> >
> > It is GFP_KERNEL context when module's __exit function is called. But whether
> > flush_workqueue() is called from restricted context depends on what locks does
> > the module's __exit function hold.
> >
> > If request_module() is called from some work function using one of system-wide WQs,
> > and flush_workqueue() is called on that WQ from module's __exit function, the kernel
> > might deadlock on module_mutex lock. Making sure that flush_workqueue() is not called
> > on system-wide WQs is the safer choice.
> >
> > Commit 1b3ce51dde365296 ("Input: psmouse-smbus - avoid flush_scheduled_work() usage")
> > is for drivers/input/mouse/psmouse-smbus.c .
> >
> > An example of (c):
> >
> > ath9k driver calls schedule_work() via request_firmware_nowait().
> > https://syzkaller.appspot.com/bug?id=78a242c8f1f4d15752c8ef4efc22974e2c52c833
> >
> > ath6kl driver calls flush_scheduled_work() which needlessly waits for completion
> > of works scheduled by ath9k driver (i.e. loading firmware used by ath9k driver).
> > https://syzkaller.appspot.com/bug?id=10a1cba59c42d11e12f897644341156eac9bb7ee
> >
> > Commit 4b2b8e748467985c ("ath6kl: avoid flush_scheduled_work() usage") in linux-next.git
> > might be able to mitigate these problems. (Waiting for next merge window...)
>
> Okay, from 1b3ce51dde365296:
>
> "Flushing system-wide workqueues is dangerous and will be forbidden."
>
> Thank you, this exactly explains the motivation which is what I was after. I
> certainly agree there is a possibility for lock coupling via the shared wq
> so that is fine by me.
>
> > > On the i915 specifics, the caller in drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > I am pretty sure can be removed. It is synchronized with the error capture side of
> > > things which is not required for the test to work.
> > >
> > > I can send a patch for that or you can, as you prefer?
> >
> > OK. Please send a patch for that, for that patch will go linux-next.git tree via
> > a tree for gpu/drm/i915 driver.
>
> Patch sent. If I am right the easiest solution was just to remove the flush.
> If I was wrong though I'll need to create a dedicated wq so we will see what
> our automated CI will say.
But besides of flush_scheduled_work() it looks like
we also need to take care of the flush_workqueue() calls, no?!
* i915_gem_drain_workqueue()
* intel_ggtt.c: flush_workqueue(ggtt->vm.i915->wq);
* i915_gem_pm.c: flush_workqueue(i915->wq);
and the display ones for
dev_priv->modeset_wq
i915->flip_wq
besides the flush_scheduled_work in intel_modeset_driver_remove_noirq
>
> Regards,
>
> Tvrtko
WARNING: multiple messages have this Message-ID (diff)
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
DRI <dri-devel@lists.freedesktop.org>
Subject: Re: How to convert drivers/gpu/drm/i915/ to use local workqueue?
Date: Thu, 30 Jun 2022 09:59:00 -0400 [thread overview]
Message-ID: <Yr2sJBdZ3TnXZF+s@intel.com> (raw)
In-Reply-To: <b21d9f1e-65e3-8f2f-a5c3-04bf866823e3@linux.intel.com>
On Thu, Jun 30, 2022 at 02:09:02PM +0100, Tvrtko Ursulin wrote:
>
> On 30/06/2022 12:19, Tetsuo Handa wrote:
> > On 2022/06/30 19:17, Tvrtko Ursulin wrote:
> > > Could you give more context on reasoning here please? What is the difference
> > > between using the system_wq and flushing it from a random context? Or concern
> > > is about flushing from specific contexts?
> >
> > Excuse me, but I couldn't catch what you want. Thus, I show three patterns of
> > problems with an example.
> >
> > Commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a macro") says:
> >
> > Tejun Heo commented that it makes no sense at all to call flush_workqueue()
> > on the shared WQs as the caller has no idea what it's gonna end up waiting for.
> >
> > The "Think twice before calling this function! It's very easy to get into trouble
> > if you don't take great care." warning message does not help avoiding problems.
> >
> > Let's change the direction to that developers had better use their local WQs
> > if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable.
> >
> > Three patterns of problems are:
> >
> > (a) Flushing from specific contexts (e.g. GFP_NOFS/GFP_NOIO) can cause deadlock
> > (circular locking dependency) problem.
> >
> > (b) Flushing with specific locks (e.g. module_mutex) held can cause deadlock
> > (circular locking dependency) problem.
> >
> > (c) Even if there is no possibility of deadlock, flushing with specific locks
> > held can cause khungtaskd to complain.
> >
> > An example of (a):
> >
> > ext4 filesystem called flush_scheduled_work(), which meant to wait for only
> > work item scheduled by ext4 filesystem, tried to also wait for work item
> > scheduled by 9p filesystem.
> > https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8
> >
> > Fixed by reverting the problematic patch.
> >
> > An example of (b):
> >
> > It is GFP_KERNEL context when module's __exit function is called. But whether
> > flush_workqueue() is called from restricted context depends on what locks does
> > the module's __exit function hold.
> >
> > If request_module() is called from some work function using one of system-wide WQs,
> > and flush_workqueue() is called on that WQ from module's __exit function, the kernel
> > might deadlock on module_mutex lock. Making sure that flush_workqueue() is not called
> > on system-wide WQs is the safer choice.
> >
> > Commit 1b3ce51dde365296 ("Input: psmouse-smbus - avoid flush_scheduled_work() usage")
> > is for drivers/input/mouse/psmouse-smbus.c .
> >
> > An example of (c):
> >
> > ath9k driver calls schedule_work() via request_firmware_nowait().
> > https://syzkaller.appspot.com/bug?id=78a242c8f1f4d15752c8ef4efc22974e2c52c833
> >
> > ath6kl driver calls flush_scheduled_work() which needlessly waits for completion
> > of works scheduled by ath9k driver (i.e. loading firmware used by ath9k driver).
> > https://syzkaller.appspot.com/bug?id=10a1cba59c42d11e12f897644341156eac9bb7ee
> >
> > Commit 4b2b8e748467985c ("ath6kl: avoid flush_scheduled_work() usage") in linux-next.git
> > might be able to mitigate these problems. (Waiting for next merge window...)
>
> Okay, from 1b3ce51dde365296:
>
> "Flushing system-wide workqueues is dangerous and will be forbidden."
>
> Thank you, this exactly explains the motivation which is what I was after. I
> certainly agree there is a possibility for lock coupling via the shared wq
> so that is fine by me.
>
> > > On the i915 specifics, the caller in drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > I am pretty sure can be removed. It is synchronized with the error capture side of
> > > things which is not required for the test to work.
> > >
> > > I can send a patch for that or you can, as you prefer?
> >
> > OK. Please send a patch for that, for that patch will go linux-next.git tree via
> > a tree for gpu/drm/i915 driver.
>
> Patch sent. If I am right the easiest solution was just to remove the flush.
> If I was wrong though I'll need to create a dedicated wq so we will see what
> our automated CI will say.
But besides of flush_scheduled_work() it looks like
we also need to take care of the flush_workqueue() calls, no?!
* i915_gem_drain_workqueue()
* intel_ggtt.c: flush_workqueue(ggtt->vm.i915->wq);
* i915_gem_pm.c: flush_workqueue(i915->wq);
and the display ones for
dev_priv->modeset_wq
i915->flip_wq
besides the flush_scheduled_work in intel_modeset_driver_remove_noirq
>
> Regards,
>
> Tvrtko
next prev parent reply other threads:[~2022-06-30 13:59 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-10 14:57 [Intel-gfx] How to convert drivers/gpu/drm/i915/ to use local workqueue? Tetsuo Handa
2022-06-10 14:57 ` Tetsuo Handa
2022-06-30 4:30 ` [Intel-gfx] " Tetsuo Handa
2022-06-30 4:30 ` Tetsuo Handa
2022-06-30 7:46 ` [Intel-gfx] " Tvrtko Ursulin
2022-06-30 7:46 ` Tvrtko Ursulin
2022-06-30 8:06 ` [Intel-gfx] " Tetsuo Handa
2022-06-30 8:06 ` Tetsuo Handa
2022-06-30 10:17 ` [Intel-gfx] " Tvrtko Ursulin
2022-06-30 10:17 ` Tvrtko Ursulin
2022-06-30 11:19 ` [Intel-gfx] " Tetsuo Handa
2022-06-30 11:19 ` Tetsuo Handa
2022-06-30 13:09 ` [Intel-gfx] " Tvrtko Ursulin
2022-06-30 13:09 ` Tvrtko Ursulin
2022-06-30 13:59 ` Rodrigo Vivi [this message]
2022-06-30 13:59 ` Rodrigo Vivi
2022-06-30 14:34 ` [Intel-gfx] " Jani Nikula
2022-06-30 14:34 ` Jani Nikula
2022-06-30 15:25 ` [Intel-gfx] " Rodrigo Vivi
2022-06-30 15:25 ` Rodrigo Vivi
2022-08-05 13:14 ` [Intel-gfx] " Tetsuo Handa
2022-08-09 16:22 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yr2sJBdZ3TnXZF+s@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.