Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
From: Akhil P Oommen <akhilpo@oss.qualcomm.com>
To: Aaron Kling <webgeek1234@gmail.com>
Cc: rob.clark@oss.qualcomm.com,
	Neil Armstrong <neil.armstrong@linaro.org>,
	linux-arm-msm@vger.kernel.org
Subject: Re: Questions About SM8550 Support
Date: Thu, 5 Feb 2026 18:59:20 +0530	[thread overview]
Message-ID: <593afb89-031d-4376-8ea7-024b645c62cb@oss.qualcomm.com> (raw)
In-Reply-To: <CALHNRZ_=c0JZ4B779rCciP+_U+YMqEbby1F5RaeyUTZiNZdc2Q@mail.gmail.com>

On 2/5/2026 1:31 PM, Aaron Kling wrote:
> On Thu, Jan 29, 2026 at 8:35 PM Aaron Kling <webgeek1234@gmail.com> wrote:
>>
>> On Thu, Jan 29, 2026 at 5:11 PM Akhil P Oommen <akhilpo@oss.qualcomm.com> wrote:
>>>
>>> On 1/28/2026 11:24 PM, Aaron Kling wrote:
>>>> On Wed, Jan 28, 2026 at 8:46 AM Rob Clark <rob.clark@oss.qualcomm.com> wrote:
>>>>>
>>>>> On Wed, Jan 28, 2026 at 12:54 AM Neil Armstrong
>>>>> <neil.armstrong@linaro.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 1/27/26 23:48, Aaron Kling wrote:
>>>>>>> I am working on the AYN Odin 2 qcs8550 series of devices, specifically
>>>>>>> for Android, using mainline kernel drivers. I have come across some
>>>>>>> missing functionality and failures that I would like to inquire about.
>>>>>>>
>>>>>>> * ABL fails to load a dtbo using a baseline dtb unmodified from
>>>>>>> mainline. Using changes described in the gunyah watchdog thread [0], a
>>>>>>> dtbo loads and the devices boot as expected. If any of the changes in
>>>>>>> that post don't exist in the base dtb, abl will fail to load the dtbo
>>>>>>> and go to the bootloader menu. This appears to be an issue in the
>>>>>>> baseline abl code, affecting all devices of that generation. Would it
>>>>>>> be allowable to merge a change adding those changes to the sm8550
>>>>>>> dtsi, allowing an unmodified mainline dtb to work with overlays?
>>>>>>
>>>>>> Any addition to the DT must be documented in dt-bindings, so if it's needed
>>>>>> for boot they should be documented and added for sure.
>>>>>>
>>>>>>>
>>>>>>> * SM8550 does not have cpu opp tables, thus cpufreq does not work. I
>>>>>>> have locally copied the commits from sm8650 and adapted for sm8550,
>>>>>>> and that seems to work okay. But no measuring of bandwidth was done,
>>>>>>> so the numbers are likely not entirely correct. Is there any plan to
>>>>>>> generate correct tables for sm8550?
>>>>>>
>>>>>> Cpufreq works but not the interconnect scaling, so doing the same as sm8650
>>>>>> is fine but since the values were calculated from downstream DT tables,
>>>>>> the same should be done for sm8550.
>>>>>>
>>>>>>>
>>>>>>> * As part of a series to support the original Odin 2, a patch to
>>>>>>> update sm8550 EAS values was submitted [1]. But that series stalled
>>>>>>> and this was never merged. If this change is valid, which per that
>>>>>>> discussion it appears to be, can it be resubmitted by itself and
>>>>>>> merged?
>>>>>>
>>>>>> I missed this patch, please re-submit, I also need to update the ones
>>>>>> for SM8650.
>>>>>>
>>>>>>>
>>>>>>> * Per the mainline kernel device trees and audio topology provide by
>>>>>>> the oem, these devices use primary i2s for the speakers path. There
>>>>>>> was a commit adding clock support for that as part of an hdmi series
>>>>>>> [2], but that seems to have stalled. Is this going to be picked back
>>>>>>> up?
>>>>>>
>>>>>> No, I do not plan to do this work, it required adding callbacks in the
>>>>>> code to handle the clocks like done for the pre-audioreach firmwares.
>>>>>>
>>>>>>>
>>>>>>> * Inline crypto fails to detect hwkm support. And I see other logs
>>>>>>> online, such as for the sm8550 qrd, that logs the same way my device
>>>>>>> does. I traced the issue to the check for wrapped key support [3]. On
>>>>>>> my devices, the derive call is supported, but the other three calls
>>>>>>> are not. I was pointed at the downstream headers for sm8550 support
>>>>>>> and only derive is listed there, the other three don't appear to be
>>>>>>> used in the downstream driver. Is this expected? And if so, will this
>>>>>>> case be added to the mainline drivers?
>>>>>>
>>>>>> Does hwkm work with you remove the last 3 calls ?
>>>>>>
>>>>>>>
>>>>>>> * Some gpu related clocks complain about being stuck off during boot,
>>>>>>> causing stack traces, but the gpu does work. I tried to do some
>>>>>>> research into this, but quickly got lost in the weeds and I have no
>>>>>>> idea where to even look.
>>>>>>> [    0.367278] gpu_cc_cxo_clk status stuck at 'off'
>>>>>>> [    0.367962] gpu_cc_hub_cx_int_clk status stuck at 'off'
>>>>>>> [    0.368595] gpu_cc_cx_gmu_clk status stuck at 'off'
>>>>>>> [    0.369245] disp_cc_mdss_ahb1_clk status stuck at 'off'
>>>>>>
>>>>>> This may be related with the display handoff from ABL, did you add the
>>>>>> plat region to the reserved memories ?
>>>>>>
>>>>>>>
>>>>>>> * Sometimes when starting rendering, a bandwidth submission times out,
>>>>>>> then the driver immediately complains that said id was left on the
>>>>>>> queue. I have tried increasing the timeout, but the same sequence
>>>>>>> still happens. Timeout happens, immediately followed by a matching
>>>>>>> unexpected response. Implying that this isn't actually a delay /
>>>>>>> timeout issue.
>>>>>>> [ 1848.517020] platform 3d6a000.gmu:
>>>>>>> [drm:a6xx_hfi_wait_for_msg_interrupt [msm]] *ERROR* Message
>>>>>>> HFI_H2F_MSG_GX_BW_PERF_VOTE id 1015 timed out waiting for response
>>>>>>> [ 1848.518020] platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg [msm]]
>>>>>>> *ERROR* Unexpected message id 1015 on the response queue
>>>>>>
>>>>>> Weird the timeout was extended for this very purpose
>>>>>>
>>>>>>>
>>>>>>> * Some 3dmark benchmarks such as solar bay cause a gpu crash. I am
>>>>>>> unsure if this is a kernel problem or userspace, so I'm submitting
>>>>>>> here first. If the consensus is that it's a userspace issue, I'll
>>>>>>> submit it to mesa.
>>>>>>> [ 1860.112008] adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu
>>>>>>> fault ring 2 fence a261 status 00EF0585 rb 06df/090f ib1
>>>>>>> 00000001512E9000/003d ib2 00000001512E7000/0000
>>>>>>> [ 1860.113122] msm_dpu ae01000.display-controller: [drm:recover_worker
>>>>>>> [msm]] *ERROR* 67.5.10.1: hangcheck recover!
>>>>>>> [ 1860.113238] msm_dpu ae01000.display-controller: [drm:recover_worker
>>>>>>> [msm]] *ERROR* 67.5.10.1: offending task: Thread-23
>>>>>>> (com.futuremark.dmandroid.application)
>>>>>>> [ 1860.258126] revision: 0 (67.5.10.1)
>>>>>>> [ 1860.258132] rb 0: fence:    2884/2884
>>>>>>> [ 1860.258133] rptr:     36
>>>>>>> [ 1860.258134] rb wptr:  36
>>>>>>> [ 1860.258135] rb 1: fence:    -256/-256
>>>>>>> [ 1860.258138] rptr:     0
>>>>>>> [ 1860.258138] rb wptr:  0
>>>>>>> [ 1860.258139] rb 2: fence:    41563/41569
>>>>>>> [ 1860.258140] rptr:     1752
>>>>>>> [ 1860.258140] rb wptr:  2319
>>>>>>> [ 1860.258141] rb 3: fence:    -256/-256
>>>>>>> [ 1860.258141] rptr:     0
>>>>>>> [ 1860.258142] rb wptr:  0
>>>>>>> [ 1860.258146] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG0: 0
>>>>>>> [ 1860.258220] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG1: 0
>>>>>>> [ 1860.258266] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>> CP_SCRATCH_REG2: 41562
>>>>>>> [ 1860.258310] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG3: 0
>>>>>>> [ 1860.258354] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>> CP_SCRATCH_REG4: 3736059565
>>>>>>> [ 1860.258399] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>> CP_SCRATCH_REG5: 3736059565
>>>>>>> [ 1860.258443] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>> CP_SCRATCH_REG6: 3736059565
>>>>>>> [ 1860.258487] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>> CP_SCRATCH_REG7: 3736059565
>>>>>>
>>>>>> @rob do you have any idea how to solve this crash on a740 ?
>>>>>
>>>>> The clk and a6xx_hfi_wait_for_msg_interrupt errors indicate that
>>>>> something is unhappy about gpu pm.  I'd focus on that first, since
>>>>> that is almost certainly the cause of the later issues.  If things
>>>>> _sorta_ work (rendering UI, etc) you could try removing all but the
>>>>> lowest gpu OPP as an experiment.  Could be that power related problems
>>>>> surface when the GPU ramps up to higher OPPs.
>>>>
>>>> Things work amazingly well compared to what I was expecting. Using
>>>> mesa staging 26.0 as of yesterday, I'm getting roughly 80% performance
>>>> in the benchmarks that do run, compared to the stock Android. And
>>>> rendering is correct everywhere that I've seen so far. Mesa 25.3.3
>>>> gives about 89% compared to stock, but there are graphical glitches in
>>>> some of the benchmarks.
>>>>
>>>> I set gpu max_freq via devfreq to the minimum available frequency and
>>>> ran the failing benchmark again. It completed once, but failed with a
>>>> similar stack trace on the second run. And per sysfs, the gpu did stay
>>>> at that minimum. Of note, that causes the benchmark to fail, but
>>>> rendering does recover and the unit is still usable afterwards.
>>>
>>> In sm8550.dtsi, I see that ACD values are not specified in the GPU OPP
>>> table. Can we add those (from downstream dt) and try again?
>>
>> I don't know what I'm looking for in the downstream dt. But if such a
>> change gets pushed to lkml, I can grab that and verify.
> 
> I took at look at the downstream dt and took a guess at importing the
> acd values. I'm not sure if the gpu here is the baseline kalama or
> kalama v2. I guessed the former. There were a couple values missing
> however, that I had to extrapolate based on other frequencies. This
> however changed nothing about my test results. Still getting crashes.

Please use the values from kalama v2 dtsi. And if the acd property is
missing in any OPP node, that is a hint to the the driver+gmu-fw that
ACD should be kept disabled for that freq corner. So, please follow the
same.

ACD configurations are required to meet the hw specifications. We can't
predict how the hw fails in case of a spec violation. I don't know if
this issue is ACD related, but we should ensure that all power related
configurations are accurate first.

Also, could you please try the latest firmwares (sqe and gmu) from here:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/qcom?id=30979b116b5c5857b72c4332db8db0ff1ca2dc08

> 
> From my perspective, this part does not appear to be a PM or frequency
> related issue. Some of the 3dmark benchmarks I have never seen crash.
> Like Wild Life Extreme. I can run the stress variant of that and it
> beats the unit for 20 minutes at full clocks with a screaming fan and>
that runs perfectly stable. Solar Bay Extreme also runs completely
> stable in all of its glorious 3 fps. The two problems are the standard
> non-extreme Solar Bay and Steel Nomad Light. Both of these
> intermittently crash with similar traces to what I posted before.
> There doesn't seem to be consistency in the faults, sometimes it will
> be almost immediately after starting the benchmark, other times it
> will get 90% through and then fail. But they virtually always fail to
> complete. For another point of data, I have never seen GravityMark
> cause a fault either.

The peak current draw can vary between benchmarks. So we can't rule out
power issues. And are you able to reproduce the same issue on another
device?

-Akhil.

> 
> Aaron


  parent reply	other threads:[~2026-02-05 13:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27 22:48 Questions About SM8550 Support Aaron Kling
2026-01-28  8:50 ` Neil Armstrong
2026-01-28 14:46   ` Rob Clark
2026-01-28 17:54     ` Aaron Kling
2026-01-29 23:11       ` Akhil P Oommen
2026-01-30  2:35         ` Aaron Kling
2026-02-05  8:01           ` Aaron Kling
2026-02-05 10:54             ` Konrad Dybcio
2026-02-05 13:29             ` Akhil P Oommen [this message]
2026-02-05 17:40               ` Aaron Kling
2026-03-10 21:33                 ` Akhil P Oommen
2026-03-10 21:53                   ` Aaron Kling
2026-03-11  8:47                     ` Konrad Dybcio
2026-03-11 23:33                       ` Aaron Kling
2026-02-05 14:43             ` Dmitry Baryshkov
2026-01-28 18:42   ` Aaron Kling
2026-02-06 15:04     ` Neil Armstrong
2026-01-28 14:03 ` Konrad Dybcio
2026-01-28 18:20   ` Aaron Kling
2026-02-02  9:35   ` Taniya Das
2026-02-02 23:01     ` Aaron Kling
2026-02-03  6:34       ` Jagadeesh Kona
2026-02-03 23:21         ` Aaron Kling
2026-02-04 16:53           ` Taniya Das
2026-02-04 18:18             ` Aaron Kling
2026-01-30 11:01 ` Konrad Dybcio
2026-01-30 17:13   ` Aaron Kling
2026-02-02 10:36     ` Konrad Dybcio
2026-02-02 23:12       ` Aaron Kling
2026-02-03 10:31         ` Konrad Dybcio
2026-02-03 17:31           ` Aaron Kling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=593afb89-031d-4376-8ea7-024b645c62cb@oss.qualcomm.com \
    --to=akhilpo@oss.qualcomm.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=neil.armstrong@linaro.org \
    --cc=rob.clark@oss.qualcomm.com \
    --cc=webgeek1234@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox