From: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
To: Aaron Kling <webgeek1234@gmail.com>,
Akhil P Oommen <akhilpo@oss.qualcomm.com>
Cc: rob.clark@oss.qualcomm.com,
Neil Armstrong <neil.armstrong@linaro.org>,
linux-arm-msm@vger.kernel.org
Subject: Re: Questions About SM8550 Support
Date: Wed, 11 Mar 2026 09:47:04 +0100 [thread overview]
Message-ID: <71ec1014-e357-4368-9ed7-37083ead9989@oss.qualcomm.com> (raw)
In-Reply-To: <CALHNRZ-Uc9HEHc_8wJ3SAKxHX+cE0Gu7_BeakvF5muCmS9wmhg@mail.gmail.com>
On 3/10/26 10:53 PM, Aaron Kling wrote:
> On Tue, Mar 10, 2026 at 4:33 PM Akhil P Oommen <akhilpo@oss.qualcomm.com> wrote:
>>
>> On 2/5/2026 11:10 PM, Aaron Kling wrote:
>>> On Thu, Feb 5, 2026 at 7:29 AM Akhil P Oommen <akhilpo@oss.qualcomm.com> wrote:
>>>>
>>>> On 2/5/2026 1:31 PM, Aaron Kling wrote:
>>>>> On Thu, Jan 29, 2026 at 8:35 PM Aaron Kling <webgeek1234@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jan 29, 2026 at 5:11 PM Akhil P Oommen <akhilpo@oss.qualcomm.com> wrote:
>>>>>>>
>>>>>>> On 1/28/2026 11:24 PM, Aaron Kling wrote:
>>>>>>>> On Wed, Jan 28, 2026 at 8:46 AM Rob Clark <rob.clark@oss.qualcomm.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Jan 28, 2026 at 12:54 AM Neil Armstrong
>>>>>>>>> <neil.armstrong@linaro.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 1/27/26 23:48, Aaron Kling wrote:
>>>>>>>>>>> I am working on the AYN Odin 2 qcs8550 series of devices, specifically
>>>>>>>>>>> for Android, using mainline kernel drivers. I have come across some
>>>>>>>>>>> missing functionality and failures that I would like to inquire about.
>>>>>>>>>>>
>>>>>>>>>>> * ABL fails to load a dtbo using a baseline dtb unmodified from
>>>>>>>>>>> mainline. Using changes described in the gunyah watchdog thread [0], a
>>>>>>>>>>> dtbo loads and the devices boot as expected. If any of the changes in
>>>>>>>>>>> that post don't exist in the base dtb, abl will fail to load the dtbo
>>>>>>>>>>> and go to the bootloader menu. This appears to be an issue in the
>>>>>>>>>>> baseline abl code, affecting all devices of that generation. Would it
>>>>>>>>>>> be allowable to merge a change adding those changes to the sm8550
>>>>>>>>>>> dtsi, allowing an unmodified mainline dtb to work with overlays?
>>>>>>>>>>
>>>>>>>>>> Any addition to the DT must be documented in dt-bindings, so if it's needed
>>>>>>>>>> for boot they should be documented and added for sure.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * SM8550 does not have cpu opp tables, thus cpufreq does not work. I
>>>>>>>>>>> have locally copied the commits from sm8650 and adapted for sm8550,
>>>>>>>>>>> and that seems to work okay. But no measuring of bandwidth was done,
>>>>>>>>>>> so the numbers are likely not entirely correct. Is there any plan to
>>>>>>>>>>> generate correct tables for sm8550?
>>>>>>>>>>
>>>>>>>>>> Cpufreq works but not the interconnect scaling, so doing the same as sm8650
>>>>>>>>>> is fine but since the values were calculated from downstream DT tables,
>>>>>>>>>> the same should be done for sm8550.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * As part of a series to support the original Odin 2, a patch to
>>>>>>>>>>> update sm8550 EAS values was submitted [1]. But that series stalled
>>>>>>>>>>> and this was never merged. If this change is valid, which per that
>>>>>>>>>>> discussion it appears to be, can it be resubmitted by itself and
>>>>>>>>>>> merged?
>>>>>>>>>>
>>>>>>>>>> I missed this patch, please re-submit, I also need to update the ones
>>>>>>>>>> for SM8650.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * Per the mainline kernel device trees and audio topology provide by
>>>>>>>>>>> the oem, these devices use primary i2s for the speakers path. There
>>>>>>>>>>> was a commit adding clock support for that as part of an hdmi series
>>>>>>>>>>> [2], but that seems to have stalled. Is this going to be picked back
>>>>>>>>>>> up?
>>>>>>>>>>
>>>>>>>>>> No, I do not plan to do this work, it required adding callbacks in the
>>>>>>>>>> code to handle the clocks like done for the pre-audioreach firmwares.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * Inline crypto fails to detect hwkm support. And I see other logs
>>>>>>>>>>> online, such as for the sm8550 qrd, that logs the same way my device
>>>>>>>>>>> does. I traced the issue to the check for wrapped key support [3]. On
>>>>>>>>>>> my devices, the derive call is supported, but the other three calls
>>>>>>>>>>> are not. I was pointed at the downstream headers for sm8550 support
>>>>>>>>>>> and only derive is listed there, the other three don't appear to be
>>>>>>>>>>> used in the downstream driver. Is this expected? And if so, will this
>>>>>>>>>>> case be added to the mainline drivers?
>>>>>>>>>>
>>>>>>>>>> Does hwkm work with you remove the last 3 calls ?
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * Some gpu related clocks complain about being stuck off during boot,
>>>>>>>>>>> causing stack traces, but the gpu does work. I tried to do some
>>>>>>>>>>> research into this, but quickly got lost in the weeds and I have no
>>>>>>>>>>> idea where to even look.
>>>>>>>>>>> [ 0.367278] gpu_cc_cxo_clk status stuck at 'off'
>>>>>>>>>>> [ 0.367962] gpu_cc_hub_cx_int_clk status stuck at 'off'
>>>>>>>>>>> [ 0.368595] gpu_cc_cx_gmu_clk status stuck at 'off'
>>>>>>>>>>> [ 0.369245] disp_cc_mdss_ahb1_clk status stuck at 'off'
>>>>>>>>>>
>>>>>>>>>> This may be related with the display handoff from ABL, did you add the
>>>>>>>>>> plat region to the reserved memories ?
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * Sometimes when starting rendering, a bandwidth submission times out,
>>>>>>>>>>> then the driver immediately complains that said id was left on the
>>>>>>>>>>> queue. I have tried increasing the timeout, but the same sequence
>>>>>>>>>>> still happens. Timeout happens, immediately followed by a matching
>>>>>>>>>>> unexpected response. Implying that this isn't actually a delay /
>>>>>>>>>>> timeout issue.
>>>>>>>>>>> [ 1848.517020] platform 3d6a000.gmu:
>>>>>>>>>>> [drm:a6xx_hfi_wait_for_msg_interrupt [msm]] *ERROR* Message
>>>>>>>>>>> HFI_H2F_MSG_GX_BW_PERF_VOTE id 1015 timed out waiting for response
>>>>>>>>>>> [ 1848.518020] platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg [msm]]
>>>>>>>>>>> *ERROR* Unexpected message id 1015 on the response queue
>>>>>>>>>>
>>>>>>>>>> Weird the timeout was extended for this very purpose
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * Some 3dmark benchmarks such as solar bay cause a gpu crash. I am
>>>>>>>>>>> unsure if this is a kernel problem or userspace, so I'm submitting
>>>>>>>>>>> here first. If the consensus is that it's a userspace issue, I'll
>>>>>>>>>>> submit it to mesa.
>>>>>>>>>>> [ 1860.112008] adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu
>>>>>>>>>>> fault ring 2 fence a261 status 00EF0585 rb 06df/090f ib1
>>>>>>>>>>> 00000001512E9000/003d ib2 00000001512E7000/0000
>>>>>>>>>>> [ 1860.113122] msm_dpu ae01000.display-controller: [drm:recover_worker
>>>>>>>>>>> [msm]] *ERROR* 67.5.10.1: hangcheck recover!
>>>>>>>>>>> [ 1860.113238] msm_dpu ae01000.display-controller: [drm:recover_worker
>>>>>>>>>>> [msm]] *ERROR* 67.5.10.1: offending task: Thread-23
>>>>>>>>>>> (com.futuremark.dmandroid.application)
>>>>>>>>>>> [ 1860.258126] revision: 0 (67.5.10.1)
>>>>>>>>>>> [ 1860.258132] rb 0: fence: 2884/2884
>>>>>>>>>>> [ 1860.258133] rptr: 36
>>>>>>>>>>> [ 1860.258134] rb wptr: 36
>>>>>>>>>>> [ 1860.258135] rb 1: fence: -256/-256
>>>>>>>>>>> [ 1860.258138] rptr: 0
>>>>>>>>>>> [ 1860.258138] rb wptr: 0
>>>>>>>>>>> [ 1860.258139] rb 2: fence: 41563/41569
>>>>>>>>>>> [ 1860.258140] rptr: 1752
>>>>>>>>>>> [ 1860.258140] rb wptr: 2319
>>>>>>>>>>> [ 1860.258141] rb 3: fence: -256/-256
>>>>>>>>>>> [ 1860.258141] rptr: 0
>>>>>>>>>>> [ 1860.258142] rb wptr: 0
>>>>>>>>>>> [ 1860.258146] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG0: 0
>>>>>>>>>>> [ 1860.258220] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG1: 0
>>>>>>>>>>> [ 1860.258266] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>>>>>> CP_SCRATCH_REG2: 41562
>>>>>>>>>>> [ 1860.258310] adreno 3d00000.gpu: [drm:a6xx_recover [msm]] CP_SCRATCH_REG3: 0
>>>>>>>>>>> [ 1860.258354] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>>>>>> CP_SCRATCH_REG4: 3736059565
>>>>>>>>>>> [ 1860.258399] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>>>>>> CP_SCRATCH_REG5: 3736059565
>>>>>>>>>>> [ 1860.258443] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>>>>>> CP_SCRATCH_REG6: 3736059565
>>>>>>>>>>> [ 1860.258487] adreno 3d00000.gpu: [drm:a6xx_recover [msm]]
>>>>>>>>>>> CP_SCRATCH_REG7: 3736059565
>>>>>>>>>>
>>>>>>>>>> @rob do you have any idea how to solve this crash on a740 ?
>>>>>>>>>
>>>>>>>>> The clk and a6xx_hfi_wait_for_msg_interrupt errors indicate that
>>>>>>>>> something is unhappy about gpu pm. I'd focus on that first, since
>>>>>>>>> that is almost certainly the cause of the later issues. If things
>>>>>>>>> _sorta_ work (rendering UI, etc) you could try removing all but the
>>>>>>>>> lowest gpu OPP as an experiment. Could be that power related problems
>>>>>>>>> surface when the GPU ramps up to higher OPPs.
>>>>>>>>
>>>>>>>> Things work amazingly well compared to what I was expecting. Using
>>>>>>>> mesa staging 26.0 as of yesterday, I'm getting roughly 80% performance
>>>>>>>> in the benchmarks that do run, compared to the stock Android. And
>>>>>>>> rendering is correct everywhere that I've seen so far. Mesa 25.3.3
>>>>>>>> gives about 89% compared to stock, but there are graphical glitches in
>>>>>>>> some of the benchmarks.
>>>>>>>>
>>>>>>>> I set gpu max_freq via devfreq to the minimum available frequency and
>>>>>>>> ran the failing benchmark again. It completed once, but failed with a
>>>>>>>> similar stack trace on the second run. And per sysfs, the gpu did stay
>>>>>>>> at that minimum. Of note, that causes the benchmark to fail, but
>>>>>>>> rendering does recover and the unit is still usable afterwards.
>>>>>>>
>>>>>>> In sm8550.dtsi, I see that ACD values are not specified in the GPU OPP
>>>>>>> table. Can we add those (from downstream dt) and try again?
>>>>>>
>>>>>> I don't know what I'm looking for in the downstream dt. But if such a
>>>>>> change gets pushed to lkml, I can grab that and verify.
>>>>>
>>>>> I took at look at the downstream dt and took a guess at importing the
>>>>> acd values. I'm not sure if the gpu here is the baseline kalama or
>>>>> kalama v2. I guessed the former. There were a couple values missing
>>>>> however, that I had to extrapolate based on other frequencies. This
>>>>> however changed nothing about my test results. Still getting crashes.
>>>>
>>>> Please use the values from kalama v2 dtsi. And if the acd property is
>>>> missing in any OPP node, that is a hint to the the driver+gmu-fw that
>>>> ACD should be kept disabled for that freq corner. So, please follow the
>>>> same.
>>>
>>> Alright, I updated the change using values from the downstream v2
>>> dtsi. Still getting the same results. Since it's needed regardless,
>>> would you like me to submit the ACD patch?
>>
>> Sorry for the super delayed response.
>>
>> Please go ahead and post the patch.
>
> I sent it here [0].
>
>>>
>>>> ACD configurations are required to meet the hw specifications. We can't
>>>> predict how the hw fails in case of a spec violation. I don't know if
>>>> this issue is ACD related, but we should ensure that all power related
>>>> configurations are accurate first.
>>>>
>>>> Also, could you please try the latest firmwares (sqe and gmu) from here:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/qcom?id=30979b116b5c5857b72c4332db8db0ff1ca2dc08
>>>
>>> These are what I'm already using.
>>>
>>>>>
>>>>> From my perspective, this part does not appear to be a PM or frequency
>>>>> related issue. Some of the 3dmark benchmarks I have never seen crash.
>>>>> Like Wild Life Extreme. I can run the stress variant of that and it
>>>>> beats the unit for 20 minutes at full clocks with a screaming fan and>
>>>> that runs perfectly stable. Solar Bay Extreme also runs completely
>>>>> stable in all of its glorious 3 fps. The two problems are the standard
>>>>> non-extreme Solar Bay and Steel Nomad Light. Both of these
>>>>> intermittently crash with similar traces to what I posted before.
>>>>> There doesn't seem to be consistency in the faults, sometimes it will
>>>>> be almost immediately after starting the benchmark, other times it
>>>>> will get 90% through and then fail. But they virtually always fail to
>>>>> complete. For another point of data, I have never seen GravityMark
>>>>> cause a fault either.
>>>>
>>>> The peak current draw can vary between benchmarks. So we can't rule out
>>>> power issues. And are you able to reproduce the same issue on another
>>>> device?
>>>
>>> The only relevant devices I have are two of the AYN qcs8550 devices, a
>>> Thor and an Odin 2 Mini. The issue happens on both, yes. But I don't
>>> have anything like a phone or devkit with sm8550.
>>
>> I will post a few fixes in the next few days. At least, with that there
>> should be a coredump generated for hfi errors. Please share that.
>
> I posted an issue on the mesa tracker here [1] and attached some
> devcoredumps to one of my replies. I can add more when the new patches
> are available.
>
>> iirc, you are using upstream drivers with downstream kernel (ACK?). Any
>> chance you can try pure upstream kernel?
>
> Yes, that is correct. My current setup is ACK 6.18.13. I unfortunately
> do not have a pure Linux setup. If I had uart access on these devices,
> I could use the minimal busybox setup like I do for tegra, but I do
> not have such access here. As far as I can tell, no closed case debug
> setup is available either. Google does have a mainline tracking branch
> which I could use to get closer to -next for verification, but it's
> still not unmodified upstream.
FWIW you can run AOSP on pure upstream, perhaps not with all the features,
but you can. Try copying the config from ACK and give it a spin
Konrad
next prev parent reply other threads:[~2026-03-11 8:47 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 22:48 Questions About SM8550 Support Aaron Kling
2026-01-28 8:50 ` Neil Armstrong
2026-01-28 14:46 ` Rob Clark
2026-01-28 17:54 ` Aaron Kling
2026-01-29 23:11 ` Akhil P Oommen
2026-01-30 2:35 ` Aaron Kling
2026-02-05 8:01 ` Aaron Kling
2026-02-05 10:54 ` Konrad Dybcio
2026-02-05 13:29 ` Akhil P Oommen
2026-02-05 17:40 ` Aaron Kling
2026-03-10 21:33 ` Akhil P Oommen
2026-03-10 21:53 ` Aaron Kling
2026-03-11 8:47 ` Konrad Dybcio [this message]
2026-03-11 23:33 ` Aaron Kling
2026-02-05 14:43 ` Dmitry Baryshkov
2026-01-28 18:42 ` Aaron Kling
2026-02-06 15:04 ` Neil Armstrong
2026-01-28 14:03 ` Konrad Dybcio
2026-01-28 18:20 ` Aaron Kling
2026-02-02 9:35 ` Taniya Das
2026-02-02 23:01 ` Aaron Kling
2026-02-03 6:34 ` Jagadeesh Kona
2026-02-03 23:21 ` Aaron Kling
2026-02-04 16:53 ` Taniya Das
2026-02-04 18:18 ` Aaron Kling
2026-01-30 11:01 ` Konrad Dybcio
2026-01-30 17:13 ` Aaron Kling
2026-02-02 10:36 ` Konrad Dybcio
2026-02-02 23:12 ` Aaron Kling
2026-02-03 10:31 ` Konrad Dybcio
2026-02-03 17:31 ` Aaron Kling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71ec1014-e357-4368-9ed7-37083ead9989@oss.qualcomm.com \
--to=konrad.dybcio@oss.qualcomm.com \
--cc=akhilpo@oss.qualcomm.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=neil.armstrong@linaro.org \
--cc=rob.clark@oss.qualcomm.com \
--cc=webgeek1234@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox