From: Lyude Paul <lyude@redhat.com>
To: Lukas Wunner <lukas@wunner.de>, Tejun Heo <tj@kernel.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Alex Deucher <alexander.deucher@amd.com>,
Dave Airlie <airlied@redhat.com>, Ben Skeggs <bskeggs@redhat.com>
Cc: dri-devel@lists.freedesktop.org, Peter Wu <peter@lekensteyn.nl>,
nouveau@lists.freedesktop.org,
Hans de Goede <hdegoede@redhat.com>,
Pierre Moreau <pierre.morrow@free.fr>,
linux-kernel@vger.kernel.org,
Ismo Toijala <ismo.toijala@gmail.com>,
intel-gfx@lists.freedesktop.org,
Liviu Dudau <Liviu.Dudau@arm.com>,
Archit Taneja <architt@codeaurora.org>
Subject: Re: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
Date: Mon, 12 Feb 2018 12:43:20 -0500 [thread overview]
Message-ID: <1518457400.5319.5.camel@redhat.com> (raw)
In-Reply-To: <cover.1518338788.git.lukas@wunner.de>
This patchset is a no-brainer for me. I'm ashamed to say I think I might have
seen this issue before, but polling gets used so rarely nowadays it slipped
under the rug by accident.
Anyway, for the whole series (except patch #2, which I will respond to with a
short comment in just a moment):
Reviewed-by: Lyude Paul <lyude@redhat.com>
On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote:
> Fix a deadlock on hybrid graphics laptops that's been present since 2013:
>
> DRM drivers poll connectors in 10 sec intervals. The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync(). However
> the poll worker invokes the DRM drivers' ->detect callbacks, which call
> pm_runtime_get_sync(). If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards. The result
> is a circular wait between poll worker and autosuspend worker.
>
> I'm seeing this deadlock so often it's not funny anymore. I've also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
>
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context. In that case
> it doesn't need to call pm_runtime_get_sync() because the poll worker is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish. But this approach would require touching every
> single DRM driver's ->detect hook. Moreover the ->detect hook is called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not. I've tried to do that for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
>
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface. In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
>
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves the
> work_struct and checks if it's the poll worker. All that's needed is
> one small helper in the workqueue code and another small helper in the
> DRM library. This solution lends itself nicely to stable-backporting.
>
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test. To test the patches, add an
> "msleep(12*1000);" at the top of the driver's ->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status. Without this
> series, the status will be stuck at "suspending" and you'll get hung task
> errors in dmesg after a few minutes.
>
> i915, malidp and msm "solved" this issue by not stopping the poll worker
> on runtime suspend. But this results in the GPU bouncing back and forth
> between D0 and D3 continuously. That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy. (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable. msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx)
>
> Please review. Thanks,
>
> Lukas
>
> Lukas Wunner (5):
> workqueue: Allow retrieval of current task's work struct
> drm: Allow determining if current task is output poll worker
> drm/nouveau: Fix deadlock on runtime suspend
> drm/radeon: Fix deadlock on runtime suspend
> drm/amdgpu: Fix deadlock on runtime suspend
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
> drivers/gpu/drm/drm_probe_helper.c | 14 +++++
> drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++--
> drivers/gpu/drm/radeon/radeon_connectors.c | 74 +++++++++++++++++----
> -----
> include/drm/drm_crtc_helper.h | 1 +
> include/linux/workqueue.h | 1 +
> kernel/workqueue.c | 16 ++++++
> 7 files changed, 132 insertions(+), 50 deletions(-)
>
--
Cheers,
Lyude Paul
next prev parent reply other threads:[~2018-02-12 17:43 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-11 9:38 [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Lukas Wunner
2018-02-11 9:38 ` [PATCH 2/5] drm: Allow determining if current task is output poll worker Lukas Wunner
2018-02-12 17:46 ` Lyude Paul
2018-02-12 17:46 ` Lyude Paul
2018-02-12 17:50 ` [Intel-gfx] " Chris Wilson
2018-02-12 17:50 ` Chris Wilson
2018-02-12 18:40 ` Lukas Wunner
2018-02-14 8:19 ` Lukas Wunner
2018-02-14 7:41 ` [PATCH v2] " Lukas Wunner
2018-02-14 19:07 ` Lyude Paul
2018-02-11 9:38 ` [PATCH 1/5] workqueue: Allow retrieval of current task's work struct Lukas Wunner
2018-02-12 17:07 ` Tejun Heo
2018-02-12 17:07 ` Tejun Heo
2018-02-11 9:38 ` [PATCH 5/5] drm/amdgpu: Fix deadlock on runtime suspend Lukas Wunner
2018-02-11 9:38 ` [PATCH 3/5] drm/nouveau: " Lukas Wunner
2018-02-11 9:38 ` [PATCH 4/5] drm/radeon: " Lukas Wunner
2018-02-11 18:58 ` [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Mike Lothian
2018-02-11 18:58 ` Mike Lothian
2018-02-11 19:23 ` Lukas Wunner
2018-02-11 19:41 ` Lukas Wunner
2018-02-11 19:41 ` Lukas Wunner
2018-02-12 0:35 ` Mike Lothian
2018-02-12 0:35 ` Mike Lothian
2018-02-12 3:39 ` Lukas Wunner
2018-02-12 3:39 ` Lukas Wunner
2018-02-12 9:03 ` Mike Lothian
2018-02-12 9:03 ` Mike Lothian
2018-02-12 9:45 ` Lukas Wunner
2018-02-12 9:45 ` Lukas Wunner
2018-02-12 18:58 ` Alex Deucher
2018-02-12 18:58 ` Alex Deucher
2018-02-13 8:17 ` Lukas Wunner
2018-02-13 8:17 ` Lukas Wunner
2018-02-13 15:19 ` Alex Deucher
2018-02-12 13:04 ` Imre Deak
2018-02-12 13:04 ` Imre Deak
2018-02-16 8:49 ` i915 runtime PM (was: Re: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers) Lukas Wunner
2018-02-16 12:33 ` Imre Deak
2018-02-12 17:43 ` Lyude Paul [this message]
[not found] ` <cover.1518338788.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-02-13 10:55 ` [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Liviu Dudau
2018-02-13 10:55 ` Liviu Dudau
2018-02-13 11:52 ` Lukas Wunner
2018-02-13 15:46 ` Liviu Dudau
2018-02-13 15:46 ` Liviu Dudau
[not found] ` <20180213154608.GI9111-A/Nd4k6kWRHZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2018-02-14 13:57 ` Lukas Wunner
2018-02-14 13:57 ` Lukas Wunner
2018-02-14 8:46 ` Lukas Wunner
2018-02-14 9:26 ` Maarten Lankhorst
[not found] ` <1ab1ea48-125c-a104-c11d-16d1e9d94b98-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-02-14 14:08 ` Sean Paul
2018-02-14 14:43 ` Michel Dänzer
2018-02-14 14:58 ` Sean Paul
2018-02-15 5:38 ` Lukas Wunner
2018-02-19 11:48 ` [Intel-gfx] " Daniel Vetter
2018-02-19 12:22 ` Lukas Wunner
2018-02-17 10:38 ` Lukas Wunner
2018-02-17 10:38 ` Lukas Wunner
2018-02-19 11:34 ` [Nouveau] " Daniel Vetter
2018-02-19 11:34 ` Daniel Vetter
2018-02-19 11:58 ` Lukas Wunner
2018-02-19 11:58 ` Lukas Wunner
2018-02-19 14:05 ` [Intel-gfx] " Daniel Vetter
2018-02-19 14:05 ` Daniel Vetter
2018-02-19 14:47 ` Lukas Wunner
2018-02-19 14:47 ` Lukas Wunner
2018-02-19 14:54 ` Daniel Vetter
2018-02-19 14:54 ` [Intel-gfx] " Daniel Vetter
2018-02-19 15:50 ` Alex Deucher
2018-02-19 15:50 ` Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518457400.5319.5.camel@redhat.com \
--to=lyude@redhat.com \
--cc=Liviu.Dudau@arm.com \
--cc=airlied@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=architt@codeaurora.org \
--cc=bskeggs@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hdegoede@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=ismo.toijala@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=nouveau@lists.freedesktop.org \
--cc=peter@lekensteyn.nl \
--cc=pierre.morrow@free.fr \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.