From: Lyude Paul <lyude@redhat.com>
To: Lukas Wunner <lukas@wunner.de>, Tejun Heo <tj@kernel.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Alex Deucher <alexander.deucher@amd.com>,
Dave Airlie <airlied@redhat.com>, Ben Skeggs <bskeggs@redhat.com>
Cc: dri-devel@lists.freedesktop.org, Peter Wu <peter@lekensteyn.nl>,
nouveau@lists.freedesktop.org,
Hans de Goede <hdegoede@redhat.com>,
Pierre Moreau <pierre.morrow@free.fr>,
linux-kernel@vger.kernel.org,
Ismo Toijala <ismo.toijala@gmail.com>,
intel-gfx@lists.freedesktop.org,
Liviu Dudau <Liviu.Dudau@arm.com>,
Archit Taneja <architt@codeaurora.org>
Subject: Re: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
Date: Mon, 12 Feb 2018 12:43:20 -0500 [thread overview]
Message-ID: <1518457400.5319.5.camel@redhat.com> (raw)
In-Reply-To: <cover.1518338788.git.lukas@wunner.de>
This patchset is a no-brainer for me. I'm ashamed to say I think I might have
seen this issue before, but polling gets used so rarely nowadays it slipped
under the rug by accident.
Anyway, for the whole series (except patch #2, which I will respond to with a
short comment in just a moment):
Reviewed-by: Lyude Paul <lyude@redhat.com>
On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote:
> Fix a deadlock on hybrid graphics laptops that's been present since 2013:
>
> DRM drivers poll connectors in 10 sec intervals. The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync(). However
> the poll worker invokes the DRM drivers' ->detect callbacks, which call
> pm_runtime_get_sync(). If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards. The result
> is a circular wait between poll worker and autosuspend worker.
>
> I'm seeing this deadlock so often it's not funny anymore. I've also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
>
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context. In that case
> it doesn't need to call pm_runtime_get_sync() because the poll worker is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish. But this approach would require touching every
> single DRM driver's ->detect hook. Moreover the ->detect hook is called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not. I've tried to do that for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
>
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface. In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
>
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves the
> work_struct and checks if it's the poll worker. All that's needed is
> one small helper in the workqueue code and another small helper in the
> DRM library. This solution lends itself nicely to stable-backporting.
>
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test. To test the patches, add an
> "msleep(12*1000);" at the top of the driver's ->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status. Without this
> series, the status will be stuck at "suspending" and you'll get hung task
> errors in dmesg after a few minutes.
>
> i915, malidp and msm "solved" this issue by not stopping the poll worker
> on runtime suspend. But this results in the GPU bouncing back and forth
> between D0 and D3 continuously. That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy. (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable. msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx)
>
> Please review. Thanks,
>
> Lukas
>
> Lukas Wunner (5):
> workqueue: Allow retrieval of current task's work struct
> drm: Allow determining if current task is output poll worker
> drm/nouveau: Fix deadlock on runtime suspend
> drm/radeon: Fix deadlock on runtime suspend
> drm/amdgpu: Fix deadlock on runtime suspend
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
> drivers/gpu/drm/drm_probe_helper.c | 14 +++++
> drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++--
> drivers/gpu/drm/radeon/radeon_connectors.c | 74 +++++++++++++++++----
> -----
> include/drm/drm_crtc_helper.h | 1 +
> include/linux/workqueue.h | 1 +
> kernel/workqueue.c | 16 ++++++
> 7 files changed, 132 insertions(+), 50 deletions(-)
>
--
Cheers,
Lyude Paul
next prev parent reply other threads:[~2018-02-12 17:43 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-11 9:38 [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Lukas Wunner
2018-02-11 9:38 ` [PATCH 2/5] drm: Allow determining if current task is output poll worker Lukas Wunner
2018-02-12 17:46 ` Lyude Paul
2018-02-12 17:50 ` [Intel-gfx] " Chris Wilson
2018-02-12 18:40 ` Lukas Wunner
2018-02-14 8:19 ` Lukas Wunner
2018-02-14 7:41 ` [PATCH v2] " Lukas Wunner
2018-02-14 19:07 ` Lyude Paul
2018-02-11 9:38 ` [PATCH 4/5] drm/radeon: Fix deadlock on runtime suspend Lukas Wunner
2018-02-11 9:38 ` [PATCH 1/5] workqueue: Allow retrieval of current task's work struct Lukas Wunner
2018-02-12 17:07 ` Tejun Heo
2018-02-11 9:38 ` [PATCH 5/5] drm/amdgpu: Fix deadlock on runtime suspend Lukas Wunner
2018-02-11 9:38 ` [PATCH 3/5] drm/nouveau: " Lukas Wunner
2018-02-11 18:58 ` [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Mike Lothian
2018-02-11 19:23 ` Lukas Wunner
2018-02-11 19:41 ` Lukas Wunner
2018-02-12 0:35 ` Mike Lothian
2018-02-12 3:39 ` Lukas Wunner
2018-02-12 9:03 ` Mike Lothian
2018-02-12 9:45 ` Lukas Wunner
2018-02-12 18:58 ` Alex Deucher
2018-02-13 8:17 ` Lukas Wunner
2018-02-13 15:19 ` Alex Deucher
2018-02-12 13:04 ` Imre Deak
2018-02-16 8:49 ` i915 runtime PM (was: Re: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers) Lukas Wunner
2018-02-16 12:33 ` Imre Deak
2018-02-12 17:43 ` Lyude Paul [this message]
[not found] ` <cover.1518338788.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-02-13 10:55 ` [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Liviu Dudau
2018-02-13 11:52 ` Lukas Wunner
2018-02-13 15:46 ` Liviu Dudau
[not found] ` <20180213154608.GI9111-A/Nd4k6kWRHZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2018-02-14 13:57 ` Lukas Wunner
2018-02-14 8:46 ` Lukas Wunner
2018-02-14 9:26 ` Maarten Lankhorst
[not found] ` <1ab1ea48-125c-a104-c11d-16d1e9d94b98-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-02-14 14:08 ` Sean Paul
2018-02-14 14:43 ` Michel Dänzer
2018-02-14 14:58 ` Sean Paul
2018-02-15 5:38 ` Lukas Wunner
2018-02-19 11:48 ` [Intel-gfx] " Daniel Vetter
2018-02-19 12:22 ` Lukas Wunner
2018-02-17 10:38 ` Lukas Wunner
2018-02-19 11:34 ` [Nouveau] " Daniel Vetter
2018-02-19 11:58 ` Lukas Wunner
2018-02-19 14:05 ` [Intel-gfx] " Daniel Vetter
2018-02-19 14:47 ` Lukas Wunner
2018-02-19 14:54 ` Daniel Vetter
2018-02-19 15:50 ` [Intel-gfx] " Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518457400.5319.5.camel@redhat.com \
--to=lyude@redhat.com \
--cc=Liviu.Dudau@arm.com \
--cc=airlied@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=architt@codeaurora.org \
--cc=bskeggs@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hdegoede@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=ismo.toijala@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=nouveau@lists.freedesktop.org \
--cc=peter@lekensteyn.nl \
--cc=pierre.morrow@free.fr \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox