linux-amlogic.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
@ 2017-12-01  0:21 Kevin Hilman
  2017-12-01  7:08 ` Heiner Kallweit
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kevin Hilman @ 2017-12-01  0:21 UTC (permalink / raw)
  To: linus-amlogic

Hi Sudeep,

There's been a pretty major regression in v4.15-rc1 compared to v4.15
in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
and tries to set the OPP for the first time[1].

I ran out of time to narrow it down further since there have been
quite a few changes since v4.14, but simply reverting
drivers/firmware/arm_scpi.c to its v4.14 state gets things working
again.

This has been happening for awhile, and we should've caught it sooner
in kernelCI.org, however this warning splat still allows the kernel to
finish booting, so it still resulted in a PASS for the boot test.
That combined with the fact that we've been tracking some other
regressions, we didn't notice it until now.

Also, is this the expected result for the pre-1.0 firmware:

    scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version

Kevin

[1] Here are a few boot logs from v4.15-rc1 with the splat:

https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905x-khadas-vim.html

https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxbb-p200.html

https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905d-p230.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-01  0:21 SCPI regressions in v4.15-rc1 on Amlogic SoCs Kevin Hilman
@ 2017-12-01  7:08 ` Heiner Kallweit
  2017-12-01 15:59   ` Kevin Hilman
  2017-12-04  2:24   ` Sudeep Holla
  2017-12-01 16:03 ` Jerome Brunet
  2017-12-04  2:24 ` Sudeep Holla
  2 siblings, 2 replies; 7+ messages in thread
From: Heiner Kallweit @ 2017-12-01  7:08 UTC (permalink / raw)
  To: linus-amlogic

Am 01.12.2017 um 01:21 schrieb Kevin Hilman:
> Hi Sudeep,
> 
> There's been a pretty major regression in v4.15-rc1 compared to v4.15
> in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
> and tries to set the OPP for the first time[1].
> 
Thanks for the report. Strange enough, it works perfectly fine on my
Odroid-C2, see below log part from latest next kernel.

Your log seems to indicate that due to deferred probing something is
not done in the right order.
Can you bisect the issue? I'd assume that it's commit 931cf0c53e69
("firmware: arm_scpi: pre-populate dvfs info in scpi_probe").

Rgds, Heiner

[    0.034293] soc soc0: Amlogic Meson GXBB (S905) Revision 1f:0 (c:1) Detected
[    0.036666] c81004c0.serial: ttyAML0 at MMIO 0xc81004c0 (irq = 13, base_baud = 1500000) is a meson_uart
[    0.606914] console [ttyAML0] enabled
[    0.615031] loop: module loaded
[    0.616117] meson-gx-mmc d0074000.mmc: allocated mmc-pwrseq
[    0.643406] ledtrig-cpu: registered to indicate activity on CPUs
[    0.644047] meson-sm: secure-monitor enabled
[    0.648156] hidraw: raw HID events driver (C) Jiri Kosina
[    0.653572] platform-mhu c883c404.mailbox: Platform MHU Mailbox registered
[    0.660439] NET: Registered protocol family 17
[    0.665031] registered taskstats version 1
[    0.668689] Loading compiled-in X.509 certificates
[    0.678049] meson-gx-mmc d0072000.mmc: Got CD GPIO
[    0.712946] scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
[    0.715600] cpu cpu0: bL_cpufreq_init: CPU 0 initialized
[    0.719245] arm_big_little: bL_cpufreq_register: Registered platform driver: scpi
[    0.727344] mmc0: new HS400 MMC card at address 0001
[    0.729179] hctosys: unable to open rtc device (rtc0)
[    0.729355] USB_OTG_PWR: disabling
[    0.729358] TFLASH_VDD: disabling
[    0.729361] TF_IO: disabling
[    0.747230] mmcblk0: mmc0:0001 DJNB4R 116 GiB
[    0.752588] mmcblk0boot0: mmc0:0001 DJNB4R partition 1 4.00 MiB
[    0.757220] mmcblk0boot1: mmc0:0001 DJNB4R partition 2 4.00 MiB
[    0.762274] mmcblk0rpmb: mmc0:0001 DJNB4R partition 3 4.00 MiB, chardev (249:0)
[    0.770243]  mmcblk0: p1
[    0.781732] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)


> I ran out of time to narrow it down further since there have been
> quite a few changes since v4.14, but simply reverting
> drivers/firmware/arm_scpi.c to its v4.14 state gets things working
> again.
> 
> This has been happening for awhile, and we should've caught it sooner
> in kernelCI.org, however this warning splat still allows the kernel to
> finish booting, so it still resulted in a PASS for the boot test.
> That combined with the fact that we've been tracking some other
> regressions, we didn't notice it until now.
> 
> Also, is this the expected result for the pre-1.0 firmware:
> 
>     scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
> 
> Kevin
> 
> [1] Here are a few boot logs from v4.15-rc1 with the splat:
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905x-khadas-vim.html
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxbb-p200.html
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905d-p230.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-01  7:08 ` Heiner Kallweit
@ 2017-12-01 15:59   ` Kevin Hilman
  2017-12-04  2:24   ` Sudeep Holla
  1 sibling, 0 replies; 7+ messages in thread
From: Kevin Hilman @ 2017-12-01 15:59 UTC (permalink / raw)
  To: linus-amlogic

On Thu, Nov 30, 2017 at 11:08 PM, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> Am 01.12.2017 um 01:21 schrieb Kevin Hilman:
>> Hi Sudeep,
>>
>> There's been a pretty major regression in v4.15-rc1 compared to v4.15
>> in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
>> and tries to set the OPP for the first time[1].
>>
> Thanks for the report. Strange enough, it works perfectly fine on my
> Odroid-C2, see below log part from latest next kernel.

There are alot more amlogic SoCs out there that should've been tested
with this change, and you didn't Cc linux-amlogic or ask for more help
testing.  Now we're in a position to have major regressions on most
amlogic boards for v4.15.

> Your log seems to indicate that due to deferred probing something is
> not done in the right order.
> Can you bisect the issue? I'd assume that it's commit 931cf0c53e69
> ("firmware: arm_scpi: pre-populate dvfs info in scpi_probe").

I do not currnetly have the time to bisect this, and we're in the
"fixes" phase of the merge window so there is urgency.

I would much rather see these patches reverted, and actually tested on
affected platforms before they make it into mainline.

Sudeep, any chance of reverting these while we're still in the -rc phase?

Thanks,

Kevin

> Rgds, Heiner
>
> [    0.034293] soc soc0: Amlogic Meson GXBB (S905) Revision 1f:0 (c:1) Detected
> [    0.036666] c81004c0.serial: ttyAML0 at MMIO 0xc81004c0 (irq = 13, base_baud = 1500000) is a meson_uart
> [    0.606914] console [ttyAML0] enabled
> [    0.615031] loop: module loaded
> [    0.616117] meson-gx-mmc d0074000.mmc: allocated mmc-pwrseq
> [    0.643406] ledtrig-cpu: registered to indicate activity on CPUs
> [    0.644047] meson-sm: secure-monitor enabled
> [    0.648156] hidraw: raw HID events driver (C) Jiri Kosina
> [    0.653572] platform-mhu c883c404.mailbox: Platform MHU Mailbox registered
> [    0.660439] NET: Registered protocol family 17
> [    0.665031] registered taskstats version 1
> [    0.668689] Loading compiled-in X.509 certificates
> [    0.678049] meson-gx-mmc d0072000.mmc: Got CD GPIO
> [    0.712946] scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
> [    0.715600] cpu cpu0: bL_cpufreq_init: CPU 0 initialized
> [    0.719245] arm_big_little: bL_cpufreq_register: Registered platform driver: scpi
> [    0.727344] mmc0: new HS400 MMC card at address 0001
> [    0.729179] hctosys: unable to open rtc device (rtc0)
> [    0.729355] USB_OTG_PWR: disabling
> [    0.729358] TFLASH_VDD: disabling
> [    0.729361] TF_IO: disabling
> [    0.747230] mmcblk0: mmc0:0001 DJNB4R 116 GiB
> [    0.752588] mmcblk0boot0: mmc0:0001 DJNB4R partition 1 4.00 MiB
> [    0.757220] mmcblk0boot1: mmc0:0001 DJNB4R partition 2 4.00 MiB
> [    0.762274] mmcblk0rpmb: mmc0:0001 DJNB4R partition 3 4.00 MiB, chardev (249:0)
> [    0.770243]  mmcblk0: p1
> [    0.781732] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
>
>
>> I ran out of time to narrow it down further since there have been
>> quite a few changes since v4.14, but simply reverting
>> drivers/firmware/arm_scpi.c to its v4.14 state gets things working
>> again.
>>
>> This has been happening for awhile, and we should've caught it sooner
>> in kernelCI.org, however this warning splat still allows the kernel to
>> finish booting, so it still resulted in a PASS for the boot test.
>> That combined with the fact that we've been tracking some other
>> regressions, we didn't notice it until now.
>>
>> Also, is this the expected result for the pre-1.0 firmware:
>>
>>     scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
>>
>> Kevin
>>
>> [1] Here are a few boot logs from v4.15-rc1 with the splat:
>>
>> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905x-khadas-vim.html
>>
>> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxbb-p200.html
>>
>> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-baylibre-seattle/boot-meson-gxl-s905d-p230.html
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-01  0:21 SCPI regressions in v4.15-rc1 on Amlogic SoCs Kevin Hilman
  2017-12-01  7:08 ` Heiner Kallweit
@ 2017-12-01 16:03 ` Jerome Brunet
  2017-12-04  2:24 ` Sudeep Holla
  2 siblings, 0 replies; 7+ messages in thread
From: Jerome Brunet @ 2017-12-01 16:03 UTC (permalink / raw)
  To: linus-amlogic

On Thu, 2017-11-30 at 16:21 -0800, Kevin Hilman wrote:
> Hi Sudeep,
> 
> There's been a pretty major regression in v4.15-rc1 compared to v4.15
> in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
> and tries to set the OPP for the first time[1].
> 
> I ran out of time to narrow it down further since there have been
> quite a few changes since v4.14, but simply reverting
> drivers/firmware/arm_scpi.c to its v4.14 state gets things working
> again.

Same thing for me. One of my early libretech-cc gets completely stuck during
boot with v4.15-rc1

reverting drivers/firmware/arm_scpi.c to v4.14 fixes the problem.

On the u-art, we see traces that appear to be coming from the FW:
> domain-0 init dvfs: 4 

My platform gets stuck on one of these traces with v4.15-rc1

> 
> This has been happening for awhile, and we should've caught it sooner
> in kernelCI.org, however this warning splat still allows the kernel to
> finish booting, so it still resulted in a PASS for the boot test.
> That combined with the fact that we've been tracking some other
> regressions, we didn't notice it until now.
> 
> Also, is this the expected result for the pre-1.0 firmware:
> 
>     scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
> 
> Kevin
> 
> [1] Here are a few boot logs from v4.15-rc1 with the splat:
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-bay
> libre-seattle/boot-meson-gxl-s905x-khadas-vim.html
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-bay
> libre-seattle/boot-meson-gxbb-p200.html
> 
> https://storage.kernelci.org/mainline/master/v4.15-rc1/arm64/defconfig/lab-bay
> libre-seattle/boot-meson-gxl-s905d-p230.html
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-01  0:21 SCPI regressions in v4.15-rc1 on Amlogic SoCs Kevin Hilman
  2017-12-01  7:08 ` Heiner Kallweit
  2017-12-01 16:03 ` Jerome Brunet
@ 2017-12-04  2:24 ` Sudeep Holla
  2 siblings, 0 replies; 7+ messages in thread
From: Sudeep Holla @ 2017-12-04  2:24 UTC (permalink / raw)
  To: linus-amlogic

On Thu, Nov 30, 2017 at 04:21:13PM -0800, Kevin Hilman wrote:
> Hi Sudeep,
> 
> There's been a pretty major regression in v4.15-rc1 compared to v4.15
> in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
> and tries to set the OPP for the first time[1].
>

Looks like some issue with firmware we are hitting. I assume CPUFreq was
never initialised on this platform before(i.e. v4.14) and these changes
are changing the probe path, so it could be that change causing the
regression.

> I ran out of time to narrow it down further since there have been
> quite a few changes since v4.14, but simply reverting
> drivers/firmware/arm_scpi.c to its v4.14 state gets things working
> again.
>

Makes sense.

> This has been happening for awhile, and we should've caught it sooner
> in kernelCI.org, however this warning splat still allows the kernel to
> finish booting, so it still resulted in a PASS for the boot test.
> That combined with the fact that we've been tracking some other
> regressions, we didn't notice it until now.
>

It's unfortunate that it didn't get tested in linux-next as it was pulled
a while a ago.

> Also, is this the expected result for the pre-1.0 firmware:
>
>     scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
>

I think so, since Amlogic has unreleased/draft version of the specification
implemented, we did discuss to print something similar in the past.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-01  7:08 ` Heiner Kallweit
  2017-12-01 15:59   ` Kevin Hilman
@ 2017-12-04  2:24   ` Sudeep Holla
  2017-12-04 18:14     ` Kevin Hilman
  1 sibling, 1 reply; 7+ messages in thread
From: Sudeep Holla @ 2017-12-04  2:24 UTC (permalink / raw)
  To: linus-amlogic

On Fri, Dec 01, 2017 at 08:08:25AM +0100, Heiner Kallweit wrote:
> Am 01.12.2017 um 01:21 schrieb Kevin Hilman:
> > Hi Sudeep,
> > 
> > There's been a pretty major regression in v4.15-rc1 compared to v4.15
> > in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
> > and tries to set the OPP for the first time[1].
> > 
> Thanks for the report. Strange enough, it works perfectly fine on my
> Odroid-C2, see below log part from latest next kernel.
>

OK, I would like to understand/get the list of AmLogic SoC using SCPI,
so that I get any changes tested on all of them next time. As Kevin
mentioned, it's always safer to just cc the amlogic mailing list. I have
done that in past and failed to observe that this time.

> Your log seems to indicate that due to deferred probing something is
> not done in the right order.
> Can you bisect the issue? I'd assume that it's commit 931cf0c53e69
> ("firmware: arm_scpi: pre-populate dvfs info in scpi_probe").
> 

Yes, even I suspect the same change. But I don't fully understand the issue.

I remember asking you to make some changes in probe path. I think I
wanted you to continue with SCPI probe even if DVFS fails as that causes
issues on platforms that have partial DVFS implemented but other protocols
like clock and sensors working fine.

I guess all platforms with broken SCPI implementation should have it
disabled in the DT instead of taking special care for that in DT.

I assume that's the case even with the platform under regression now.
The correct way to fix it would be to disable DVFS node but it may need
some investigation to narrow down to this comment.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

* SCPI regressions in v4.15-rc1 on Amlogic SoCs.
  2017-12-04  2:24   ` Sudeep Holla
@ 2017-12-04 18:14     ` Kevin Hilman
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Hilman @ 2017-12-04 18:14 UTC (permalink / raw)
  To: linus-amlogic

Sudeep Holla <sudeep.holla@arm.com> writes:

> On Fri, Dec 01, 2017 at 08:08:25AM +0100, Heiner Kallweit wrote:
>> Am 01.12.2017 um 01:21 schrieb Kevin Hilman:
>> > Hi Sudeep,
>> > 
>> > There's been a pretty major regression in v4.15-rc1 compared to v4.15
>> > in SCPI causing warning splats on amlogic SoCs when cpufreq starts up
>> > and tries to set the OPP for the first time[1].
>> > 
>> Thanks for the report. Strange enough, it works perfectly fine on my
>> Odroid-C2, see below log part from latest next kernel.
>>
>
> OK, I would like to understand/get the list of AmLogic SoC using SCPI,
> so that I get any changes tested on all of them next time.

You can start here: https://kernelci.org/soc/amlogic/

The "Available boards" section lists 8 boards we have fully automated
that are representative (enough) to catch problems.  We have a bunch
more that are not (yet) fully automated in kernelCI.

Anything that is meson-gx* will use SCPI, and it's entirely possible
that they've changed SCPI firmware since the GXBB SoCs (which Heiner is
testing) and the GXL SoCs which seem to be the ones failing.  I don't
have much visibiliy into the firmware, but as you mentioned this is
starting to look like it could be related to a firmware change.

The failing logs are showing some new messages on the console that are
not coming from the kernel.  I'm guessing they're from the firmware
(still using the serial console!) but I have not fully verified that.

Kevin

> As Kevin mentioned, it's always safer to just cc the amlogic mailing
> list. I have done that in past and failed to observe that this time.
>
>> Your log seems to indicate that due to deferred probing something is
>> not done in the right order.
>> Can you bisect the issue? I'd assume that it's commit 931cf0c53e69
>> ("firmware: arm_scpi: pre-populate dvfs info in scpi_probe").
>> 
>
> Yes, even I suspect the same change. But I don't fully understand the issue.
>
> I remember asking you to make some changes in probe path. I think I
> wanted you to continue with SCPI probe even if DVFS fails as that causes
> issues on platforms that have partial DVFS implemented but other protocols
> like clock and sensors working fine.
>
> I guess all platforms with broken SCPI implementation should have it
> disabled in the DT instead of taking special care for that in DT.
>
> I assume that's the case even with the platform under regression now.
> The correct way to fix it would be to disable DVFS node but it may need
> some investigation to narrow down to this comment.
>
> --
> Regards,
> Sudeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-04 18:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-01  0:21 SCPI regressions in v4.15-rc1 on Amlogic SoCs Kevin Hilman
2017-12-01  7:08 ` Heiner Kallweit
2017-12-01 15:59   ` Kevin Hilman
2017-12-04  2:24   ` Sudeep Holla
2017-12-04 18:14     ` Kevin Hilman
2017-12-01 16:03 ` Jerome Brunet
2017-12-04  2:24 ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).