Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
From: Konrad Dybcio <konrad.dybcio@linaro.org>
To: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Johan Hovold <johan@kernel.org>,
	linux-arm-msm@vger.kernel.org, andersson@kernel.org,
	agross@kernel.org, marijn.suijten@somainline.org,
	Rob Clark <robdclark@gmail.com>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Sean Paul <sean@poorly.run>, David Airlie <airlied@gmail.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Johan Hovold <johan+linaro@kernel.org>,
	Akhil P Oommen <quic_akhilpo@quicinc.com>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>,
	Nathan Chancellor <nathan@kernel.org>,
	dri-devel@lists.freedesktop.org, freedreno@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/msm/adreno: adreno_gpu: Use suspend() instead of idle() on load error
Date: Thu, 30 Mar 2023 16:34:41 +0200	[thread overview]
Message-ID: <b985e536-227d-df86-0f23-b58882e9654b@linaro.org> (raw)
In-Reply-To: <CAA8EJpohEo+kMw7fx5112m+z7JHSLDmsqOL4T7hmyvr2fPP8vQ@mail.gmail.com>



On 29.03.2023 21:45, Dmitry Baryshkov wrote:
> On Wed, 29 Mar 2023 at 18:48, Konrad Dybcio <konrad.dybcio@linaro.org> wrote:
>>
>>
>>
>> On 29.03.2023 16:37, Johan Hovold wrote:
>>> On Wed, Mar 29, 2023 at 04:04:44PM +0200, Konrad Dybcio wrote:
>>>> If we fail to initialize the GPU for whatever reason (say we don't
>>>> embed the GPU firmware files in the initrd), the error path involves
>>>> pm_runtime_put_sync() which then calls idle() instead of suspend().
>>>>
>>>> This is suboptimal, as it means that we're not going through the
>>>> clean shutdown sequence. With at least A619_holi, this makes the GPU
>>>> not wake up until it goes through at least one more start-fail-stop
>>>> cycle. Fix that by using pm_runtime_put_sync_suspend to force a clean
>>>> shutdown.
>>>
>>> This does not sound right. If pm_runtime_put_sync() fails to suspend the
>>> device when the usage count drops to zero, then you have a bug somewhere
>>> else.
>> I was surprised to see that it was not called as well, but I wasn't able
>> to track it down before..
> 
> Could you please check that it's autosuspend who kicks in? In other
> words, if we disable autosuspend, the pm_runtime_put_sync is enough()?
> 
> That would probably mean that we lack some kind of reset in the hw_init path.
> 
> On the other hand, I do not know how the device will react to the
> error-in-the-middle state. Modems for example, can enter the state
> where you can not properly turn it off once it starts the boot
> process.
> 
> And if we remember the efforts that Akhil has put into making sure
> that the GPU is properly reset in case of an _error_, it might be
> nearly impossible to shut it down in a proper way.
> 
> Thus said, I think that unless there is an obvious way to restart the
> init process, Korad's pm_runtime_put_sync_suspend() looks like a
> correct fix to me.
On the GPU side, when you cut GX and CX GDSCs, the hardware is off.
Some clock / gdsc logic may be retained, but the GPU itself gets
cut off. Parking the clocks and shuttting down VDD_GX (if exists)
only makes that stronger.

> 
>>> Also since commit 2c087a336676 ("drm/msm/adreno: Load the firmware
>>> before bringing up the hardware") the firmware is loaded before even
>>> hitting these paths so the above description does not sound right in
>>> that respect either (or is missing some details).
>> ..but I did some more digging and I found that the precise "firmware"
>> that fails is the ZAP blob, which is not checked like SQE in the
>> commit you mentioned!
>>
>> Now I don't think that we can easily check for it as-is since
>> zap_shader_load_mdt() does the entire find-load-authenticate
>> dance which is required with secure assets, but it's obviously
>> possible to rip out the find-load part of that and go on from
>> there.
> 
> Yes, I think we should load all firmware early. ZAP shader is a bit
> unique since the DT can override the name, but it might be nice to
> check for its presence earlier.
> 
> At the same time it probably should not stop us from fixing the idle()
> vs suspend() bug.
I'm open to both solutions, as long as it can unblock me from
resubmitting the (hopefully) final version of GMU wrapper!

Konrad
> 
>>
>> Do you think that would be a better solution?
>>
>> Konrad
>>
>>>
>>>> Test cases:
>>>> 1. firmware baked into kernel
>>>> 2. error loading fw in initrd -> load from rootfs at DE start
>>>>
>>>> Both succeed on A619_holi (SM6375) and A630 (SDM845).
>>>>
>>>> Fixes: 0d997f95b70f ("drm/msm/adreno: fix runtime PM imbalance at gpu load")
>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>> ---
>>>>  drivers/gpu/drm/msm/adreno/adreno_device.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> index f61896629be6..59f3302e8167 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> @@ -477,7 +477,7 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev)
>>>>      return gpu;
>>>>
>>>>  err_put_rpm:
>>>> -    pm_runtime_put_sync(&pdev->dev);
>>>> +    pm_runtime_put_sync_suspend(&pdev->dev);
>>>>  err_disable_rpm:
>>>>      pm_runtime_disable(&pdev->dev);
>>>
>>> Johan
> 
> 
> 

  reply	other threads:[~2023-03-30 14:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 14:04 [PATCH] drm/msm/adreno: adreno_gpu: Use suspend() instead of idle() on load error Konrad Dybcio
2023-03-29 14:37 ` Johan Hovold
2023-03-29 15:48   ` Konrad Dybcio
2023-03-29 17:30     ` Rob Clark
2023-03-29 17:31       ` Konrad Dybcio
2023-03-29 19:45     ` Dmitry Baryshkov
2023-03-30 14:34       ` Konrad Dybcio [this message]
2023-03-30 14:52         ` Dmitry Baryshkov
2023-03-30 14:49       ` Johan Hovold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b985e536-227d-df86-0f23-b58882e9654b@linaro.org \
    --to=konrad.dybcio@linaro.org \
    --cc=agross@kernel.org \
    --cc=airlied@gmail.com \
    --cc=andersson@kernel.org \
    --cc=daniel@ffwll.ch \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=joel@joelfernandes.org \
    --cc=johan+linaro@kernel.org \
    --cc=johan@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marijn.suijten@somainline.org \
    --cc=nathan@kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=quic_akhilpo@quicinc.com \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox