* [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot
@ 2025-04-29 8:33 Stephan Gerhold
2025-04-29 15:54 ` Doug Anderson
2025-04-30 11:18 ` Konrad Dybcio
0 siblings, 2 replies; 3+ messages in thread
From: Stephan Gerhold @ 2025-04-29 8:33 UTC (permalink / raw)
To: Rob Clark
Cc: Sean Paul, Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov,
Marijn Suijten, David Airlie, Simona Vetter, Douglas Anderson,
linux-arm-msm, dri-devel, freedreno, linux-kernel, Johan Hovold
There is a small chance that the GPU is already hot during boot. In that
case, the call to of_devfreq_cooling_register() will immediately try to
apply devfreq cooling, as seen in the following crash:
Unable to handle kernel paging request at virtual address 0000000000014110
pc : a6xx_gpu_busy+0x1c/0x58 [msm]
lr : msm_devfreq_get_dev_status+0xbc/0x140 [msm]
Call trace:
a6xx_gpu_busy+0x1c/0x58 [msm] (P)
devfreq_simple_ondemand_func+0x3c/0x150
devfreq_update_target+0x44/0xd8
qos_max_notifier_call+0x30/0x84
blocking_notifier_call_chain+0x6c/0xa0
pm_qos_update_target+0xd0/0x110
freq_qos_apply+0x3c/0x74
apply_constraint+0x88/0x148
__dev_pm_qos_update_request+0x7c/0xcc
dev_pm_qos_update_request+0x38/0x5c
devfreq_cooling_set_cur_state+0x98/0xf0
__thermal_cdev_update+0x64/0xb4
thermal_cdev_update+0x4c/0x58
step_wise_manage+0x1f0/0x318
__thermal_zone_device_update+0x278/0x424
__thermal_cooling_device_register+0x2bc/0x308
thermal_of_cooling_device_register+0x10/0x1c
of_devfreq_cooling_register_power+0x240/0x2bc
of_devfreq_cooling_register+0x14/0x20
msm_devfreq_init+0xc4/0x1a0 [msm]
msm_gpu_init+0x304/0x574 [msm]
adreno_gpu_init+0x1c4/0x2e0 [msm]
a6xx_gpu_init+0x5c8/0x9c8 [msm]
adreno_bind+0x2a8/0x33c [msm]
...
At this point we haven't initialized the GMU at all yet, so we cannot read
the GMU registers inside a6xx_gpu_busy(). A similar issue was fixed before
in commit 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in
6xx gpu_busy"): msm_devfreq_init() does call devfreq_suspend_device(), but
unlike msm_devfreq_suspend(), it doesn't set the df->suspended flag
accordingly. This means the df->suspended flag does not match the actual
devfreq state after initialization and msm_devfreq_get_dev_status() will
end up accessing GMU registers, causing the crash.
Fix this by setting df->suspended correctly during initialization.
Cc: stable@vger.kernel.org
Fixes: 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
---
drivers/gpu/drm/msm/msm_gpu_devfreq.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index 6970b0f7f457c8535ecfeaa705db871594ae5fc4..2e1d5c3432728cde15d91f69da22bb915588fe86 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -156,6 +156,7 @@ void msm_devfreq_init(struct msm_gpu *gpu)
priv->gpu_devfreq_config.downdifferential = 10;
mutex_init(&df->lock);
+ df->suspended = true;
ret = dev_pm_qos_add_request(&gpu->pdev->dev, &df->boost_freq,
DEV_PM_QOS_MIN_FREQUENCY, 0);
---
base-commit: 33035b665157558254b3c21c3f049fd728e72368
change-id: 20250428-drm-msm-gpu-hot-devfreq-boot-36184dbc7075
Best regards,
--
Stephan Gerhold <stephan.gerhold@linaro.org>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot
2025-04-29 8:33 [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot Stephan Gerhold
@ 2025-04-29 15:54 ` Doug Anderson
2025-04-30 11:18 ` Konrad Dybcio
1 sibling, 0 replies; 3+ messages in thread
From: Doug Anderson @ 2025-04-29 15:54 UTC (permalink / raw)
To: Stephan Gerhold
Cc: Rob Clark, Sean Paul, Konrad Dybcio, Abhinav Kumar,
Dmitry Baryshkov, Marijn Suijten, David Airlie, Simona Vetter,
linux-arm-msm, dri-devel, freedreno, linux-kernel, Johan Hovold
Hi,
On Tue, Apr 29, 2025 at 1:34 AM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> There is a small chance that the GPU is already hot during boot. In that
> case, the call to of_devfreq_cooling_register() will immediately try to
> apply devfreq cooling, as seen in the following crash:
>
> Unable to handle kernel paging request at virtual address 0000000000014110
> pc : a6xx_gpu_busy+0x1c/0x58 [msm]
> lr : msm_devfreq_get_dev_status+0xbc/0x140 [msm]
> Call trace:
> a6xx_gpu_busy+0x1c/0x58 [msm] (P)
> devfreq_simple_ondemand_func+0x3c/0x150
> devfreq_update_target+0x44/0xd8
> qos_max_notifier_call+0x30/0x84
> blocking_notifier_call_chain+0x6c/0xa0
> pm_qos_update_target+0xd0/0x110
> freq_qos_apply+0x3c/0x74
> apply_constraint+0x88/0x148
> __dev_pm_qos_update_request+0x7c/0xcc
> dev_pm_qos_update_request+0x38/0x5c
> devfreq_cooling_set_cur_state+0x98/0xf0
> __thermal_cdev_update+0x64/0xb4
> thermal_cdev_update+0x4c/0x58
> step_wise_manage+0x1f0/0x318
> __thermal_zone_device_update+0x278/0x424
> __thermal_cooling_device_register+0x2bc/0x308
> thermal_of_cooling_device_register+0x10/0x1c
> of_devfreq_cooling_register_power+0x240/0x2bc
> of_devfreq_cooling_register+0x14/0x20
> msm_devfreq_init+0xc4/0x1a0 [msm]
> msm_gpu_init+0x304/0x574 [msm]
> adreno_gpu_init+0x1c4/0x2e0 [msm]
> a6xx_gpu_init+0x5c8/0x9c8 [msm]
> adreno_bind+0x2a8/0x33c [msm]
> ...
>
> At this point we haven't initialized the GMU at all yet, so we cannot read
> the GMU registers inside a6xx_gpu_busy(). A similar issue was fixed before
> in commit 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in
> 6xx gpu_busy"): msm_devfreq_init() does call devfreq_suspend_device(), but
> unlike msm_devfreq_suspend(), it doesn't set the df->suspended flag
> accordingly. This means the df->suspended flag does not match the actual
> devfreq state after initialization and msm_devfreq_get_dev_status() will
> end up accessing GMU registers, causing the crash.
>
> Fix this by setting df->suspended correctly during initialization.
>
> Cc: stable@vger.kernel.org
> Fixes: 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy")
> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> ---
> drivers/gpu/drm/msm/msm_gpu_devfreq.c | 1 +
> 1 file changed, 1 insertion(+)
Thanks!
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot
2025-04-29 8:33 [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot Stephan Gerhold
2025-04-29 15:54 ` Doug Anderson
@ 2025-04-30 11:18 ` Konrad Dybcio
1 sibling, 0 replies; 3+ messages in thread
From: Konrad Dybcio @ 2025-04-30 11:18 UTC (permalink / raw)
To: Stephan Gerhold, Rob Clark
Cc: Sean Paul, Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov,
Marijn Suijten, David Airlie, Simona Vetter, Douglas Anderson,
linux-arm-msm, dri-devel, freedreno, linux-kernel, Johan Hovold
On 4/29/25 10:33 AM, Stephan Gerhold wrote:
> There is a small chance that the GPU is already hot during boot. In that
> case, the call to of_devfreq_cooling_register() will immediately try to
> apply devfreq cooling, as seen in the following crash:
>
> Unable to handle kernel paging request at virtual address 0000000000014110
> pc : a6xx_gpu_busy+0x1c/0x58 [msm]
> lr : msm_devfreq_get_dev_status+0xbc/0x140 [msm]
> Call trace:
> a6xx_gpu_busy+0x1c/0x58 [msm] (P)
> devfreq_simple_ondemand_func+0x3c/0x150
> devfreq_update_target+0x44/0xd8
> qos_max_notifier_call+0x30/0x84
> blocking_notifier_call_chain+0x6c/0xa0
> pm_qos_update_target+0xd0/0x110
> freq_qos_apply+0x3c/0x74
> apply_constraint+0x88/0x148
> __dev_pm_qos_update_request+0x7c/0xcc
> dev_pm_qos_update_request+0x38/0x5c
> devfreq_cooling_set_cur_state+0x98/0xf0
> __thermal_cdev_update+0x64/0xb4
> thermal_cdev_update+0x4c/0x58
> step_wise_manage+0x1f0/0x318
> __thermal_zone_device_update+0x278/0x424
> __thermal_cooling_device_register+0x2bc/0x308
> thermal_of_cooling_device_register+0x10/0x1c
> of_devfreq_cooling_register_power+0x240/0x2bc
> of_devfreq_cooling_register+0x14/0x20
> msm_devfreq_init+0xc4/0x1a0 [msm]
> msm_gpu_init+0x304/0x574 [msm]
> adreno_gpu_init+0x1c4/0x2e0 [msm]
> a6xx_gpu_init+0x5c8/0x9c8 [msm]
> adreno_bind+0x2a8/0x33c [msm]
> ...
>
> At this point we haven't initialized the GMU at all yet, so we cannot read
> the GMU registers inside a6xx_gpu_busy(). A similar issue was fixed before
> in commit 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in
> 6xx gpu_busy"): msm_devfreq_init() does call devfreq_suspend_device(), but
> unlike msm_devfreq_suspend(), it doesn't set the df->suspended flag
> accordingly. This means the df->suspended flag does not match the actual
> devfreq state after initialization and msm_devfreq_get_dev_status() will
> end up accessing GMU registers, causing the crash.
>
> Fix this by setting df->suspended correctly during initialization.
>
> Cc: stable@vger.kernel.org
> Fixes: 6694482a70e9 ("drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy")
> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-30 11:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 8:33 [PATCH] drm/msm/gpu: Fix crash when throttling GPU immediately during boot Stephan Gerhold
2025-04-29 15:54 ` Doug Anderson
2025-04-30 11:18 ` Konrad Dybcio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).