From: Stephen Boyd <sboyd@kernel.org>
To: Michael Turquette <mturquette@baylibre.com>,
Stephen Boyd <sboyd@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-clk@vger.kernel.org,
patches@lists.linux.dev, linux-arm-msm@vger.kernel.org,
Douglas Anderson <dianders@chromium.org>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Ulf Hansson <ulf.hansson@linaro.org>,
Krzysztof Kozlowski <krzk@kernel.org>
Subject: Re: [PATCH v2 4/5] clk: Get runtime PM before walking tree during disable_unused
Date: Sun, 07 Apr 2024 19:36:49 -0700 [thread overview]
Message-ID: <c1dc0e4c1d4c9ba2b5e9c0fc207db267.sboyd@kernel.org> (raw)
In-Reply-To: <20240325184204.745706-5-sboyd@kernel.org>
Quoting Stephen Boyd (2024-03-25 11:41:58)
> Doug reported [1] the following hung task:
>
> INFO: task swapper/0:1 blocked for more than 122 seconds.
> Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
> Call trace:
> __switch_to+0xf4/0x1f4
> __schedule+0x418/0xb80
> schedule+0x5c/0x10c
> rpm_resume+0xe0/0x52c
> rpm_resume+0x178/0x52c
> __pm_runtime_resume+0x58/0x98
> clk_pm_runtime_get+0x30/0xb0
> clk_disable_unused_subtree+0x58/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused_subtree+0x38/0x208
> clk_disable_unused+0x4c/0xe4
> do_one_initcall+0xcc/0x2d8
> do_initcall_level+0xa4/0x148
> do_initcalls+0x5c/0x9c
> do_basic_setup+0x24/0x30
> kernel_init_freeable+0xec/0x164
> kernel_init+0x28/0x120
> ret_from_fork+0x10/0x20
> INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
> Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008
> Workqueue: events_unbound deferred_probe_work_func
> Call trace:
> __switch_to+0xf4/0x1f4
> __schedule+0x418/0xb80
> schedule+0x5c/0x10c
> schedule_preempt_disabled+0x2c/0x48
> __mutex_lock+0x238/0x488
> __mutex_lock_slowpath+0x1c/0x28
> mutex_lock+0x50/0x74
> clk_prepare_lock+0x7c/0x9c
> clk_core_prepare_lock+0x20/0x44
> clk_prepare+0x24/0x30
> clk_bulk_prepare+0x40/0xb0
> mdss_runtime_resume+0x54/0x1c8
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x108/0x1f4
> __rpm_callback+0x84/0x144
> rpm_callback+0x30/0x88
> rpm_resume+0x1f4/0x52c
> rpm_resume+0x178/0x52c
> __pm_runtime_resume+0x58/0x98
> __device_attach+0xe0/0x170
> device_initial_probe+0x1c/0x28
> bus_probe_device+0x3c/0x9c
> device_add+0x644/0x814
> mipi_dsi_device_register_full+0xe4/0x170
> devm_mipi_dsi_device_register_full+0x28/0x70
> ti_sn_bridge_probe+0x1dc/0x2c0
> auxiliary_bus_probe+0x4c/0x94
> really_probe+0xcc/0x2c8
> __driver_probe_device+0xa8/0x130
> driver_probe_device+0x48/0x110
> __device_attach_driver+0xa4/0xcc
> bus_for_each_drv+0x8c/0xd8
> __device_attach+0xf8/0x170
> device_initial_probe+0x1c/0x28
> bus_probe_device+0x3c/0x9c
> deferred_probe_work_func+0x9c/0xd8
> process_one_work+0x148/0x518
> worker_thread+0x138/0x350
> kthread+0x138/0x1e0
> ret_from_fork+0x10/0x20
>
> The first thread is walking the clk tree and calling
> clk_pm_runtime_get() to power on devices required to read the clk
> hardware via struct clk_ops::is_enabled(). This thread holds the clk
> prepare_lock, and is trying to runtime PM resume a device, when it finds
> that the device is in the process of resuming so the thread schedule()s
> away waiting for the device to finish resuming before continuing. The
> second thread is runtime PM resuming the same device, but the runtime
> resume callback is calling clk_prepare(), trying to grab the
> prepare_lock waiting on the first thread.
>
> This is a classic ABBA deadlock. To properly fix the deadlock, we must
> never runtime PM resume or suspend a device with the clk prepare_lock
> held. Actually doing that is near impossible today because the global
> prepare_lock would have to be dropped in the middle of the tree, the
> device runtime PM resumed/suspended, and then the prepare_lock grabbed
> again to ensure consistency of the clk tree topology. If anything
> changes with the clk tree in the meantime, we've lost and will need to
> start the operation all over again.
>
> Luckily, most of the time we're simply incrementing or decrementing the
> runtime PM count on an active device, so we don't have the chance to
> schedule away with the prepare_lock held. Let's fix this immediate
> problem that can be triggered more easily by simply booting on Qualcomm
> sc7180.
>
> Introduce a list of clk_core structures that have been registered, or
> are in the process of being registered, that require runtime PM to
> operate. Iterate this list and call clk_pm_runtime_get() on each of them
> without holding the prepare_lock during clk_disable_unused(). This way
> we can be certain that the runtime PM state of the devices will be
> active and resumed so we can't schedule away while walking the clk tree
> with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> the prepare_lock held to properly drop the runtime PM reference. We
> remove the calls to clk_pm_runtime_{get,put}() in this path because
> they're superfluous now that we know the devices are runtime resumed.
>
> Reported-by: Douglas Anderson <dianders@chromium.org>
> Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
> Closes: https://issuetracker.google.com/328070191
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Fixes: 9a34b45397e5 ("clk: Add support for runtime PM")
> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
> ---
Applied to clk-fixes
next prev parent reply other threads:[~2024-04-08 2:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-25 18:41 [PATCH v2 0/5] Fix a deadlock with clk_pm_runtime_get() Stephen Boyd
2024-03-25 18:41 ` [PATCH v2 1/5] clk: Remove prepare_lock hold assertion in __clk_release() Stephen Boyd
2024-04-08 2:35 ` Stephen Boyd
2024-03-25 18:41 ` [PATCH v2 2/5] clk: Don't hold prepare_lock when calling kref_put() Stephen Boyd
2024-04-08 2:35 ` Stephen Boyd
2024-03-25 18:41 ` [PATCH v2 3/5] clk: Initialize struct clk_core kref earlier Stephen Boyd
2024-04-08 2:36 ` Stephen Boyd
2024-03-25 18:41 ` [PATCH v2 4/5] clk: Get runtime PM before walking tree during disable_unused Stephen Boyd
2024-03-25 19:39 ` Doug Anderson
2024-04-08 2:36 ` Stephen Boyd [this message]
2024-04-09 10:32 ` Ulf Hansson
2024-04-09 11:55 ` Stephen Boyd
2024-03-25 18:41 ` [PATCH v2 5/5] clk: Get runtime PM before walking tree for clk_summary Stephen Boyd
2024-03-25 19:39 ` Doug Anderson
2024-04-08 2:37 ` Stephen Boyd
2024-03-25 18:44 ` [PATCH v2 0/5] Fix a deadlock with clk_pm_runtime_get() Stephen Boyd
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1dc0e4c1d4c9ba2b5e9c0fc207db267.sboyd@kernel.org \
--to=sboyd@kernel.org \
--cc=dianders@chromium.org \
--cc=krzk@kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=mturquette@baylibre.com \
--cc=patches@lists.linux.dev \
--cc=ulf.hansson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.