linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* PSCI checker query
@ 2019-12-05 12:38 John Garry
  2019-12-05 13:30 ` Sudeep Holla
  2019-12-05 15:55 ` Marc Zyngier
  0 siblings, 2 replies; 8+ messages in thread
From: John Garry @ 2019-12-05 12:38 UTC (permalink / raw)
  To: Mark Rutland, Lorenzo Pieralisi
  Cc: wanghuiqiang, Linuxarm, linux-arm-kernel@lists.infradead.org

Hi guys,

I enabled the kernel PSCI checker and it kills my Huawei D05:

[    0.000000] Booting Linux on physical CPU 0x0000010000 [0x410fd082]
[    0.000000] Linux version 5.4.0-00001-gd45a90825ab2-dirty 
(john@john-ThinkCentre-M93p) (gcc version 7.3.1 20180425 
[linaro-7.3-2018.05-rc1 revision 
38aec9a676236eaa42ca03ccb3a6c1dd0182c29f] (Linaro GCC 7.3-2018.05-rc1)) 
#676 SMP PREEMPT Thu Dec 5 12:12:55 GMT 2019
[    0.000000] Machine model: Hisilicon PhosphorV660 Development Board
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.60 by EDK II
[    0.000000] efi:  SMBIOS=0x3eff0000  SMBIOS 3.0=0x39aa0000 
ACPI=0x39b70000  ACPI 2.0=0x39b70014  MEMATTR=0x3b86d018 
MEMRESERVE=0x3a002e98
[    0.000000] crashkernel reserved: 0x0000000002000000 - 
0x0000000012000000 (256 MB)
[    0.000000] cma: Reserved 32 MiB at 0x000000003cc00000
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x0000000039B70014 000024 (v02 HISI  )
[    0.000000] ACPI: XSDT 0x0000000039B600E8 000084 (v01 HISI   HIP07 
00000000      01000013)
[    0.000000] ACPI: FACP 0x0000000039A20000 00010C (v05 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: DSDT 0x00000000399E0000 0080C8 (v02 HISI   HIP07 
00000000 INTL 20170728)
[    0.000000] ACPI: MCFG 0x0000000039A80000 0000AC (v01 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: SLIT 0x0000000039A70000 00003C (v01 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: SPCR 0x0000000039A60000 000050 (v02 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: SRAT 0x0000000039A50000 0005B0 (v03 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: DBG2 0x0000000039A40000 00005A (v00 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: GTDT 0x0000000039A10000 000098 (v02 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: APIC 0x0000000039A00000 0013E4 (v01 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: IORT 0x00000000399F0000 00080C (v00 HISI   HIP07 
00000000 INTL 20170728)
[    0.000000] ACPI: PPTT 0x0000000031870000 001754 (v01 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: SPMI 0x0000000031860000 000041 (v05 HISI   HIP07 
00000000 INTL 20151124)
[    0.000000] ACPI: iBFT 0x00000000317C0000 000800 (v01 HISI   HIP07 
00000000      00000000)
[    0.000000] ACPI: SPCR: console: pl011,mmio32,0x602b0000,115200
[    0.000000] earlycon: pl11 at MMIO32 0x00000000602b0000 (options 
'115200')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x40000000000-0x4003fffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x41000000000-0x41fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe800-0x1ffbffffff]
[    0.000000] NUMA: Initmem setup node 1 [<memory-less node>]
[    0.000000] NUMA: NODE_DATA [mem 0x41febfc1800-0x41febfc2fff]
[    0.000000] NUMA: NODE_DATA(1) on node 2
[    0.000000] NUMA: NODE_DATA [mem 0x41febfc0000-0x41febfc17ff]
[    0.000000] NUMA: Initmem setup node 3 [<memory-less node>]
[    0.000000] NUMA: NODE_DATA [mem 0x41febfbe800-0x41febfbffff]
[    0.000000] NUMA: NODE_DATA(3) on node 2
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000000000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000041ffbffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000003188afff]
[    0.000000]   node   0: [mem 0x000000003188b000-0x000000003188efff]
[    0.000000]   node   0: [mem 0x000000003188f000-0x000000003992ffff]
[    0.000000]   node   0: [mem 0x0000000039930000-0x00000000399dffff]
[    0.000000]   node   0: [mem 0x00000000399e0000-0x0000000039a2ffff]
[    0.000000]   node   0: [mem 0x0000000039a30000-0x0000000039a3ffff]
[    0.000000]   node   0: [mem 0x0000000039a40000-0x0000000039a8ffff]
[    0.000000]   node   0: [mem 0x0000000039a90000-0x0000000039b5ffff]
[    0.000000]   node   0: [mem 0x0000000039b60000-0x0000000039b7ffff]
[    0.000000]   node   0: [mem 0x0000000039b80000-0x0000000039ffffff]
[    0.000000]   node   0: [mem 0x000000003a000000-0x000000003efeffff]
[    0.000000]   node   0: [mem 0x000000003eff0000-0x000000003f01ffff]
[    0.000000]   node   0: [mem 0x000000003f020000-0x000000003fbfffff]
[    0.000000]   node   0: [mem 0x0000001040000000-0x0000001ffbffffff]
[    0.000000]   node   2: [mem 0x0000041000000000-0x0000041ffbffffff]
[    0.000000] Zeroed struct page in unavailable ranges: 548 pages
[    0.000000] Initmem setup node 0 [mem 
0x0000000000000000-0x0000001ffbffffff]
[    0.000000] Could not find start_pfn for node 1
[    0.000000] Initmem setup node 1 [mem 
0x0000000000000000-0x0000000000000000]
[    0.000000] Initmem setup node 2 [mem 
0x0000041000000000-0x0000041ffbffffff]
[    0.000000] Could not find start_pfn for node 3
[    0.000000] Initmem setup node 3 [mem 
0x0000000000000000-0x0000000000000000]
[    0.000000] psci: probing for conduit method from ACPI.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x10000 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x10001 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x10002 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x10003 -> Node 0

[snip]

[   17.970973] hub 1-1:1.0: USB hub found
[   17.974829] hub 1-1:1.0: 4 ports detected
[   18.033941] rtc-efi rtc-efi: setting system clock to 
2019-12-05T12:30:06 UTC (1575549006)
[   18.042122] psci_checker: PSCI checker started using 64 CPUs
[   18.047774] psci_checker: Starting hotplug tests
[   18.052387] psci_checker: Trying to turn off and on again all CPUs
[   18.059082] CPU0: shutdown
[   18.061777] psci: CPU0 killed.
[   18.069140] CPU1: shutdown
[   18.071844] psci: CPU1 killed.
[   18.078530] CPU2: shutdown
[   18.081227] psci: CPU2 killed.
[   18.087874] CPU3: shutdown
[   18.090605] psci: CPU3 killed.
[   18.097415] CPU4: shutdown
[   18.100119] psci: CPU4 killed.
[   18.105989] usb 1-2: new high-speed USB device number 3 using 
ehci-platform
[   18.113286] CPU5: shutdown
[   18.116007] psci: CPU5 killed.
[   18.122432] CPU6: shutdown
[   18.125130] psci: CPU6 killed.
[   18.131525] CPU7: shutdown
[   18.134243] psci: CPU7 killed.
[   18.140625] CPU8: shutdown
[   18.143335] psci: CPU8 killed.
[   18.149755] CPU9: shutdown
[   18.152465] psci: CPU9 killed.
[   18.158867] CPU10: shutdown

[snip]

[   18.521459] CPU52: shutdown
[   18.524256] psci: CPU52 killed.
[   18.528634] CPU53: shutdown
[   18.531461] psci: CPU53 killed.
[   18.535847] CPU54: shutdown
[   18.538645] psci: CPU54 killed.
[   18.542977] CPU55: shutdown
[   18.545761] psci: CPU55 killed.
[   18.550050] CPU56: shutdown
[   18.552836] psci: CPU56 killed.
[   18.557059] CPU57: shutdown
[   18.559855] psci: CPU57 killed.
[   18.564012] CPU58: shutdown
[   18.566809] psci: CPU58 killed.
[   18.570941] CPU59: shutdown
[   18.573725] psci: CPU59 killed.
[   18.577778] CPU60: shutdown
[   18.580576] psci: CPU60 killed.
[   18.584592] CPU61: shutdown
[   18.587400] psci: CPU61 killed.
[   18.591351] CPU62: shutdown
[   18.594148] psci: CPU62 killed.
[   18.597997] usb 1-2.1: new full-speed USB device number 4 using 
ehci-platform
rxx�
     �c�� � ������aC�BV�� 8%�� ������ ���� b��Q����>��{(�ZhF�
                                                             "@��r � Մ�� 
�@���Q


[cut remaining garbage]


The console is unresponsive at this point.

My D06 does not have this issue and the test completes successfully:

D06:

root@(none)$ dmesg | grep -i psci
[    0.000000] psci: probing for conduit method from ACPI.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.1
[   24.252657] psci_checker: PSCI checker started using 96 CPUs
[   24.258305] psci_checker: Starting hotplug tests
[   24.262914] psci_checker: Trying to turn off and on again all CPUs
[   24.277545] psci: CPU0 killed.
[   24.298682] psci: CPU1 killed.
[   24.318704] psci: CPU2 killed.
[   24.343580] psci: CPU3 killed.

[snip]

[   46.053433] psci_checker: cpuidle not available on CPU 92, ignoring
[   46.059690] psci_checker: cpuidle not available on CPU 93, ignoring
[   46.065946] psci_checker: cpuidle not available on CPU 94, ignoring
[   46.072203] psci_checker: cpuidle not available on CPU 95, ignoring
[   46.078465] psci_checker: Could not start suspend tests on any CPU
[   46.084635] psci_checker: PSCI checker completed
root@(none)$

Is there anything we can check to know what's going wrong?

Cheers,
John

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 12:38 PSCI checker query John Garry
@ 2019-12-05 13:30 ` Sudeep Holla
  2019-12-05 14:22   ` John Garry
  2019-12-05 15:55 ` Marc Zyngier
  1 sibling, 1 reply; 8+ messages in thread
From: Sudeep Holla @ 2019-12-05 13:30 UTC (permalink / raw)
  To: John Garry
  Cc: Mark Rutland, Lorenzo Pieralisi, Linuxarm, wanghuiqiang,
	Sudeep Holla, linux-arm-kernel@lists.infradead.org

On Thu, Dec 05, 2019 at 12:38:25PM +0000, John Garry wrote:
> Hi guys,
>
> I enabled the kernel PSCI checker and it kills my Huawei D05:
>

[...]

> [   18.042122] psci_checker: PSCI checker started using 64 CPUs
> [   18.047774] psci_checker: Starting hotplug tests
> [   18.052387] psci_checker: Trying to turn off and on again all CPUs
> [   18.059082] CPU0: shutdown
> [   18.061777] psci: CPU0 killed.
> [   18.069140] CPU1: shutdown
> [   18.071844] psci: CPU1 killed.
> [   18.078530] CPU2: shutdown
> [   18.081227] psci: CPU2 killed.
> [   18.087874] CPU3: shutdown
> [   18.090605] psci: CPU3 killed.
> [   18.097415] CPU4: shutdown
> [   18.100119] psci: CPU4 killed.
> [   18.105989] usb 1-2: new high-speed USB device number 3 using
> ehci-platform
> [   18.113286] CPU5: shutdown
> [   18.116007] psci: CPU5 killed.
> [   18.122432] CPU6: shutdown
> [   18.125130] psci: CPU6 killed.
> [   18.131525] CPU7: shutdown
> [   18.134243] psci: CPU7 killed.
> [   18.140625] CPU8: shutdown
> [   18.143335] psci: CPU8 killed.
> [   18.149755] CPU9: shutdown
> [   18.152465] psci: CPU9 killed.
> [   18.158867] CPU10: shutdown
>
> [snip]
>
> [   18.521459] CPU52: shutdown
> [   18.524256] psci: CPU52 killed.
> [   18.528634] CPU53: shutdown
> [   18.531461] psci: CPU53 killed.
> [   18.535847] CPU54: shutdown
> [   18.538645] psci: CPU54 killed.
> [   18.542977] CPU55: shutdown
> [   18.545761] psci: CPU55 killed.
> [   18.550050] CPU56: shutdown
> [   18.552836] psci: CPU56 killed.
> [   18.557059] CPU57: shutdown
> [   18.559855] psci: CPU57 killed.
> [   18.564012] CPU58: shutdown
> [   18.566809] psci: CPU58 killed.
> [   18.570941] CPU59: shutdown
> [   18.573725] psci: CPU59 killed.
> [   18.577778] CPU60: shutdown
> [   18.580576] psci: CPU60 killed.
> [   18.584592] CPU61: shutdown
> [   18.587400] psci: CPU61 killed.
> [   18.591351] CPU62: shutdown
> [   18.594148] psci: CPU62 killed.
> [   18.597997] usb 1-2.1: new full-speed USB device number 4 using ehci-platform
>
> The console is unresponsive at this point.
>
> My D06 does not have this issue and the test completes successfully:
>
> D06:
>
> root@(none)$ dmesg | grep -i psci
> [    0.000000] psci: probing for conduit method from ACPI.
> [    0.000000] psci: PSCIv1.1 detected in firmware.
> [    0.000000] psci: Using standard PSCI v0.2 function IDs
> [    0.000000] psci: MIGRATE_INFO_TYPE not supported.
> [    0.000000] psci: SMC Calling Convention v1.1
> [   24.252657] psci_checker: PSCI checker started using 96 CPUs
> [   24.258305] psci_checker: Starting hotplug tests
> [   24.262914] psci_checker: Trying to turn off and on again all CPUs
> [   24.277545] psci: CPU0 killed.
> [   24.298682] psci: CPU1 killed.
> [   24.318704] psci: CPU2 killed.
> [   24.343580] psci: CPU3 killed.
>
> [snip]
>
> [   46.053433] psci_checker: cpuidle not available on CPU 92, ignoring
> [   46.059690] psci_checker: cpuidle not available on CPU 93, ignoring
> [   46.065946] psci_checker: cpuidle not available on CPU 94, ignoring
> [   46.072203] psci_checker: cpuidle not available on CPU 95, ignoring
> [   46.078465] psci_checker: Could not start suspend tests on any CPU
> [   46.084635] psci_checker: PSCI checker completed
> root@(none)$
>
> Is there anything we can check to know what's going wrong?
>

Both use the same firmware(or at-least the baseline) ? Are there any
significant hardware or firmware changes around CPU power-off sequence ?
If you are running same kernel image on both, firmware becomes easy
target to start with. Have you run some tests on PSCI firmware ?

--
Regards,
Sudeep

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 13:30 ` Sudeep Holla
@ 2019-12-05 14:22   ` John Garry
  2019-12-05 15:48     ` Sudeep Holla
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2019-12-05 14:22 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Mark Rutland, Huangming (Mark), Linuxarm, Lorenzo Pieralisi,
	linux-arm-kernel@lists.infradead.org, wanghuiqiang

>> D06:
>>
>> root@(none)$ dmesg | grep -i psci
>> [    0.000000] psci: probing for conduit method from ACPI.
>> [    0.000000] psci: PSCIv1.1 detected in firmware.
>> [    0.000000] psci: Using standard PSCI v0.2 function IDs
>> [    0.000000] psci: MIGRATE_INFO_TYPE not supported.
>> [    0.000000] psci: SMC Calling Convention v1.1
>> [   24.252657] psci_checker: PSCI checker started using 96 CPUs
>> [   24.258305] psci_checker: Starting hotplug tests
>> [   24.262914] psci_checker: Trying to turn off and on again all CPUs
>> [   24.277545] psci: CPU0 killed.
>> [   24.298682] psci: CPU1 killed.
>> [   24.318704] psci: CPU2 killed.
>> [   24.343580] psci: CPU3 killed.
>>
>> [snip]
>>
>> [   46.053433] psci_checker: cpuidle not available on CPU 92, ignoring
>> [   46.059690] psci_checker: cpuidle not available on CPU 93, ignoring
>> [   46.065946] psci_checker: cpuidle not available on CPU 94, ignoring
>> [   46.072203] psci_checker: cpuidle not available on CPU 95, ignoring
>> [   46.078465] psci_checker: Could not start suspend tests on any CPU
>> [   46.084635] psci_checker: PSCI checker completed
>> root@(none)$
>>
>> Is there anything we can check to know what's going wrong?
>>
> 

Hi Sudeep,

> Both use the same firmware(or at-least the baseline) ? 

Well from the kernel logs provided we have for D05:
psci: PSCIv1.0 detected in firmware.

and for the D06 board:
psci: PSCIv1.1 detected in firmware.

Both seem to be using v1.4 ATF also from the bios logs (they don't tell 
much else).

Are there any
> significant hardware or firmware changes around CPU power-off sequence ?

Not that I know about specifically. I've cc'ed some of our firmware 
guys, who may know more details.

> If you are running same kernel image on both, firmware becomes easy
> target to start with.

Yes, same kernel build. That's v5.4 with some unrelated changes.

  Have you run some tests on PSCI firmware ?

Again, I'll have to refer to our firmware guys. BTW, I'll say now that 
this D05 board is legacy...

Thanks,
John



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 14:22   ` John Garry
@ 2019-12-05 15:48     ` Sudeep Holla
  0 siblings, 0 replies; 8+ messages in thread
From: Sudeep Holla @ 2019-12-05 15:48 UTC (permalink / raw)
  To: John Garry
  Cc: Mark Rutland, Lorenzo Pieralisi, Linuxarm, Huangming (Mark),
	wanghuiqiang, Sudeep Holla, linux-arm-kernel@lists.infradead.org

Hi john,

On Thu, Dec 05, 2019 at 02:22:01PM +0000, John Garry wrote:
> Hi Sudeep,
>
> > Both use the same firmware(or at-least the baseline) ?
>
> Well from the kernel logs provided we have for D05:
> psci: PSCIv1.0 detected in firmware.
>
> and for the D06 board:
> psci: PSCIv1.1 detected in firmware.
>
> Both seem to be using v1.4 ATF also from the bios logs (they don't tell much
> else).
>

Good, it's standard TF-A project. You should be able to run some TF-tests
easily then.

> Are there any
> > significant hardware or firmware changes around CPU power-off sequence ?
>
> Not that I know about specifically. I've cc'ed some of our firmware guys,
> who may know more details.
>

OK

> > If you are running same kernel image on both, firmware becomes easy
> > target to start with.
>
> Yes, same kernel build. That's v5.4 with some unrelated changes.
>
>  Have you run some tests on PSCI firmware ?
>
> Again, I'll have to refer to our firmware guys. BTW, I'll say now that this
> D05 board is legacy...
>

Just for your information, I am referring to these tests in TF-A [1]

I don't have must details myself as I tried them long back, but I just
had to refer to the user guide [2] back then.

--
Regards,
Sudeep

[1] https://git.trustedfirmware.org/TF-A/tf-a-tests.git
[2] https://git.trustedfirmware.org/TF-A/tf-a-tests.git/about/docs/user-guide.rst

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 12:38 PSCI checker query John Garry
  2019-12-05 13:30 ` Sudeep Holla
@ 2019-12-05 15:55 ` Marc Zyngier
  2019-12-05 16:53   ` Sudeep Holla
  1 sibling, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2019-12-05 15:55 UTC (permalink / raw)
  To: John Garry
  Cc: Mark Rutland, Lorenzo Pieralisi, Linuxarm, wanghuiqiang,
	linux-arm-kernel

Hi John,

On 2019-12-05 12:38, John Garry wrote:
> Hi guys,
>
> I enabled the kernel PSCI checker and it kills my Huawei D05:

[...]

> [   18.521459] CPU52: shutdown
> [   18.524256] psci: CPU52 killed.
> [   18.528634] CPU53: shutdown
> [   18.531461] psci: CPU53 killed.
> [   18.535847] CPU54: shutdown
> [   18.538645] psci: CPU54 killed.
> [   18.542977] CPU55: shutdown
> [   18.545761] psci: CPU55 killed.
> [   18.550050] CPU56: shutdown
> [   18.552836] psci: CPU56 killed.
> [   18.557059] CPU57: shutdown
> [   18.559855] psci: CPU57 killed.
> [   18.564012] CPU58: shutdown
> [   18.566809] psci: CPU58 killed.
> [   18.570941] CPU59: shutdown
> [   18.573725] psci: CPU59 killed.
> [   18.577778] CPU60: shutdown
> [   18.580576] psci: CPU60 killed.
> [   18.584592] CPU61: shutdown
> [   18.587400] psci: CPU61 killed.
> [   18.591351] CPU62: shutdown
> [   18.594148] psci: CPU62 killed.
> [   18.597997] usb 1-2.1: new full-speed USB device number 4 using
> ehci-platform
> rxx�
>     �c�� � ������aC�BV�� 8%�� ������ ���� b��Q����>��{(�ZhF�
>                                                             "@��r �
> Մ�� �@���Q
>
>
> [cut remaining garbage]

I get the same garbage, and a couple of:

[   10.986303] CPU0: failed to come online
[   10.986405] CPU0: failed in unknown state : 0x0
[   10.986585] psci_checker: Error occurred (-5) while trying to power 
up CPU 0
[...]
[   12.468864] ------------[ cut here ]------------
[   12.468995] WARNING: CPU: 2 PID: 1 at 
drivers/firmware/psci/psci_checker.c:135 down_and_up_cpus+0x1d4/0x1f4
[   12.469242] Modules linked in:
[   12.469324] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W        
5.4.0-00079-g0a881ca5de9a #214
[   12.469556] Hardware name: Huawei Technologies Co., Ltd. D05/D05, 
BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
[   12.469816] pstate: 20000005 (nzCv daif -PAN -UAO)
[   12.469939] pc : down_and_up_cpus+0x1d4/0x1f4
[   12.470051] lr : down_and_up_cpus+0x1b4/0x1f4
[   12.470162] sp : ffff80001172bcd0
[   12.470246] x29: ffff80001172bcd0 x28: ffff800010d73010
[   12.470382] x27: ffff001fb6524660 x26: 0000000000000001
[   12.470518] x25: ffff800010d72eb0 x24: ffff80001134a390
[   12.470654] x23: ffff80001172bd98 x22: 0000000000000100
[   12.470789] x21: 0000000000000000 x20: 0000000000000001
[   12.470925] x19: ffff80001172bd98 x18: 0000000000000001
[   12.471061] x17: 0000000000000000 x16: 0000000000000000
[   12.471196] x15: 0000000000000000 x14: 0000000000000000
[   12.471331] x13: 0000000000000000 x12: 0000000000000000
[   12.471467] x11: 0000000000000000 x10: 0000000000000a60
[   12.471602] x9 : ffff80001172b940 x8 : ffff002fb7e81940
[   12.471737] x7 : 0000000000000000 x6 : 0000000000000001
[   12.471873] x5 : ffff80001135c0e8 x4 : 0000000000000000
[   12.472008] x3 : 0000000000000000 x2 : 0000000000000100
[   12.472143] x1 : 0000000000000040 x0 : 000000000000003f
[   12.472279] Call trace:
[   12.472344]  down_and_up_cpus+0x1d4/0x1f4
[   12.472451]  psci_checker+0x250/0x4cc
[   12.472547]  do_one_initcall+0x54/0x220
[   12.472646]  kernel_init_freeable+0x1ec/0x2b4
[   12.472760]  kernel_init+0x18/0x108
[   12.472851]  ret_from_fork+0x10/0x18
[   12.472942] ---[ end trace c328815eb39fc505 ]---

where the psci checker is unhappy about the number of CPUs. So CPU0 
doesn't
come back up, and probably has taken down a few things with it.

The console seems to be on a rather bizarre baud rate, and I can't 
manage
to reset it. On reboot, the console recovers though, so the firmware is
able to restore some level of sanity (yay!).

You can also reproduce it as:

root@hot-poop:/home/maz# echo 0 >/sys/devices/system/cpu/cpu0/online
root@hot-poop:/home/maz# echo 1 >/sys/devices/system/cpu/cpu0/online
bash: echo: write error: Input/output error

The kernel log says:

[   47.145006] IRQ 254: no longer affine to CPU0
[   47.149380] IRQ 382: no longer affine to CPU0
[   47.153844] CPU0: shutdown
[   47.156551] psci: CPU0 killed.
[   60.904531] CPU0: failed to come online
[   60.904634] CPU0: failed in unknown state : 0x0

and the console is dead. I guess nobody ever turned CPU0 off... :-/

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 15:55 ` Marc Zyngier
@ 2019-12-05 16:53   ` Sudeep Holla
  2019-12-05 16:59     ` Marc Zyngier
  2019-12-05 17:12     ` John Garry
  0 siblings, 2 replies; 8+ messages in thread
From: Sudeep Holla @ 2019-12-05 16:53 UTC (permalink / raw)
  To: Marc Zyngier, John Garry
  Cc: Mark Rutland, Lorenzo Pieralisi, Linuxarm, wanghuiqiang,
	Sudeep Holla, linux-arm-kernel

On Thu, Dec 05, 2019 at 03:55:22PM +0000, Marc Zyngier wrote:
> Hi John,
>
> On 2019-12-05 12:38, John Garry wrote:
> > Hi guys,
> >
> > I enabled the kernel PSCI checker and it kills my Huawei D05:
>
> [...]
>
> > [   18.521459] CPU52: shutdown
> > [   18.524256] psci: CPU52 killed.
> > [   18.528634] CPU53: shutdown
> > [   18.531461] psci: CPU53 killed.
> > [   18.535847] CPU54: shutdown
> > [   18.538645] psci: CPU54 killed.
> > [   18.542977] CPU55: shutdown
> > [   18.545761] psci: CPU55 killed.
> > [   18.550050] CPU56: shutdown
> > [   18.552836] psci: CPU56 killed.
> > [   18.557059] CPU57: shutdown
> > [   18.559855] psci: CPU57 killed.
> > [   18.564012] CPU58: shutdown
> > [   18.566809] psci: CPU58 killed.
> > [   18.570941] CPU59: shutdown
> > [   18.573725] psci: CPU59 killed.
> > [   18.577778] CPU60: shutdown
> > [   18.580576] psci: CPU60 killed.
> > [   18.584592] CPU61: shutdown
> > [   18.587400] psci: CPU61 killed.
> > [   18.591351] CPU62: shutdown
> > [   18.594148] psci: CPU62 killed.
> > [   18.597997] usb 1-2.1: new full-speed USB device number 4 using
> > ehci-platform
> > rxx�
> >     �c�� � ������aC�BV�� 8%�� ������ ���� b��Q����>��{(�ZhF�
> >                                                             "@��r �
> > Մ�� �@���Q
> >
> >
> > [cut remaining garbage]
>
> I get the same garbage, and a couple of:
>
> [   10.986303] CPU0: failed to come online
> [   10.986405] CPU0: failed in unknown state : 0x0
> [   10.986585] psci_checker: Error occurred (-5) while trying to power up
> CPU 0
> [...]
> [   12.468864] ------------[ cut here ]------------
> [   12.468995] WARNING: CPU: 2 PID: 1 at
> drivers/firmware/psci/psci_checker.c:135 down_and_up_cpus+0x1d4/0x1f4
> [   12.469242] Modules linked in:
> [   12.469324] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W
> 5.4.0-00079-g0a881ca5de9a #214
> [   12.469556] Hardware name: Huawei Technologies Co., Ltd. D05/D05, BIOS
> Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
> [   12.469816] pstate: 20000005 (nzCv daif -PAN -UAO)
> [   12.469939] pc : down_and_up_cpus+0x1d4/0x1f4
> [   12.470051] lr : down_and_up_cpus+0x1b4/0x1f4
> [   12.470162] sp : ffff80001172bcd0
> [   12.470246] x29: ffff80001172bcd0 x28: ffff800010d73010
> [   12.470382] x27: ffff001fb6524660 x26: 0000000000000001
> [   12.470518] x25: ffff800010d72eb0 x24: ffff80001134a390
> [   12.470654] x23: ffff80001172bd98 x22: 0000000000000100
> [   12.470789] x21: 0000000000000000 x20: 0000000000000001
> [   12.470925] x19: ffff80001172bd98 x18: 0000000000000001
> [   12.471061] x17: 0000000000000000 x16: 0000000000000000
> [   12.471196] x15: 0000000000000000 x14: 0000000000000000
> [   12.471331] x13: 0000000000000000 x12: 0000000000000000
> [   12.471467] x11: 0000000000000000 x10: 0000000000000a60
> [   12.471602] x9 : ffff80001172b940 x8 : ffff002fb7e81940
> [   12.471737] x7 : 0000000000000000 x6 : 0000000000000001
> [   12.471873] x5 : ffff80001135c0e8 x4 : 0000000000000000
> [   12.472008] x3 : 0000000000000000 x2 : 0000000000000100
> [   12.472143] x1 : 0000000000000040 x0 : 000000000000003f
> [   12.472279] Call trace:
> [   12.472344]  down_and_up_cpus+0x1d4/0x1f4
> [   12.472451]  psci_checker+0x250/0x4cc
> [   12.472547]  do_one_initcall+0x54/0x220
> [   12.472646]  kernel_init_freeable+0x1ec/0x2b4
> [   12.472760]  kernel_init+0x18/0x108
> [   12.472851]  ret_from_fork+0x10/0x18
> [   12.472942] ---[ end trace c328815eb39fc505 ]---
>
> where the psci checker is unhappy about the number of CPUs. So CPU0 doesn't
> come back up, and probably has taken down a few things with it.
>
> The console seems to be on a rather bizarre baud rate, and I can't manage
> to reset it. On reboot, the console recovers though, so the firmware is
> able to restore some level of sanity (yay!).
>
> You can also reproduce it as:
>
> root@hot-poop:/home/maz# echo 0 >/sys/devices/system/cpu/cpu0/online
> root@hot-poop:/home/maz# echo 1 >/sys/devices/system/cpu/cpu0/online
> bash: echo: write error: Input/output error
>
> The kernel log says:
>
> [   47.145006] IRQ 254: no longer affine to CPU0
> [   47.149380] IRQ 382: no longer affine to CPU0
> [   47.153844] CPU0: shutdown
> [   47.156551] psci: CPU0 killed.
> [   60.904531] CPU0: failed to come online
> [   60.904634] CPU0: failed in unknown state : 0x0
>
> and the console is dead. I guess nobody ever turned CPU0 off... :-/
>

For a moment, I thought PSCI checker found some issue that normal hotplug
operation didn't. Guess what, I am wrong :). Normal HP tests from the
kernel triggers this, which is good as not all normally run this PSCI
tests.

Anyways, looks like the firmware is broken. If there are hardware
limitations, the firmware can fail to poweroff as a workaround. If
it is anything to do with some secure service or OS, we have PSCI
ways to convey the same and we now avoid starting the CPU down sequence
by marking it not hotpluggable. PSCI tests may not be able to use it
but I expect the firmware to return error for CPU_DOWN in that case.

--
Regards,
Sudeep

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 16:53   ` Sudeep Holla
@ 2019-12-05 16:59     ` Marc Zyngier
  2019-12-05 17:12     ` John Garry
  1 sibling, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2019-12-05 16:59 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Mark Rutland, Lorenzo Pieralisi, John Garry, Linuxarm,
	linux-arm-kernel, wanghuiqiang

On 2019-12-05 16:53, Sudeep Holla wrote:

> For a moment, I thought PSCI checker found some issue that normal 
> hotplug
> operation didn't. Guess what, I am wrong :). Normal HP tests from the
> kernel triggers this, which is good as not all normally run this PSCI
> tests.
>
> Anyways, looks like the firmware is broken. If there are hardware
> limitations, the firmware can fail to poweroff as a workaround. If
> it is anything to do with some secure service or OS, we have PSCI
> ways to convey the same and we now avoid starting the CPU down 
> sequence
> by marking it not hotpluggable. PSCI tests may not be able to use it
> but I expect the firmware to return error for CPU_DOWN in that case.

Indeed. Failing the CPU_DOWN would have been just fine. Failing CPU_UP
is pretty bad though, and I don't think we can work around it.

Oh well. As John said, this is "legacy"...

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PSCI checker query
  2019-12-05 16:53   ` Sudeep Holla
  2019-12-05 16:59     ` Marc Zyngier
@ 2019-12-05 17:12     ` John Garry
  1 sibling, 0 replies; 8+ messages in thread
From: John Garry @ 2019-12-05 17:12 UTC (permalink / raw)
  To: Sudeep Holla, Marc Zyngier
  Cc: Mark Rutland, Lorenzo Pieralisi, Linuxarm, Xiaofei Tan,
	linux-arm-kernel@lists.infradead.org, wanghuiqiang, Shiju Jose

On 05/12/2019 16:53, Sudeep Holla wrote:

Hi Sudeep, Marc,

>>
>> I get the same garbage, and a couple of:

Thanks for testing

>>
>> [   10.986303] CPU0: failed to come online
>> [   10.986405] CPU0: failed in unknown state : 0x0
>> [   10.986585] psci_checker: Error occurred (-5) while trying to power up
>> CPU 0
>> [...]
>> [   12.468864] ------------[ cut here ]------------
>> [   12.468995] WARNING: CPU: 2 PID: 1 at
>> drivers/firmware/psci/psci_checker.c:135 down_and_up_cpus+0x1d4/0x1f4
>> [   12.469242] Modules linked in:
>> [   12.469324] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W
>> 5.4.0-00079-g0a881ca5de9a #214
>> [   12.469556] Hardware name: Huawei Technologies Co., Ltd. D05/D05, BIOS
>> Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [   12.469816] pstate: 20000005 (nzCv daif -PAN -UAO)
>> [   12.469939] pc : down_and_up_cpus+0x1d4/0x1f4
>> [   12.470051] lr : down_and_up_cpus+0x1b4/0x1f4
>> [   12.470162] sp : ffff80001172bcd0
>> [   12.470246] x29: ffff80001172bcd0 x28: ffff800010d73010
>> [   12.470382] x27: ffff001fb6524660 x26: 0000000000000001
>> [   12.470518] x25: ffff800010d72eb0 x24: ffff80001134a390
>> [   12.470654] x23: ffff80001172bd98 x22: 0000000000000100
>> [   12.470789] x21: 0000000000000000 x20: 0000000000000001
>> [   12.470925] x19: ffff80001172bd98 x18: 0000000000000001
>> [   12.471061] x17: 0000000000000000 x16: 0000000000000000
>> [   12.471196] x15: 0000000000000000 x14: 0000000000000000
>> [   12.471331] x13: 0000000000000000 x12: 0000000000000000
>> [   12.471467] x11: 0000000000000000 x10: 0000000000000a60
>> [   12.471602] x9 : ffff80001172b940 x8 : ffff002fb7e81940
>> [   12.471737] x7 : 0000000000000000 x6 : 0000000000000001
>> [   12.471873] x5 : ffff80001135c0e8 x4 : 0000000000000000
>> [   12.472008] x3 : 0000000000000000 x2 : 0000000000000100
>> [   12.472143] x1 : 0000000000000040 x0 : 000000000000003f
>> [   12.472279] Call trace:
>> [   12.472344]  down_and_up_cpus+0x1d4/0x1f4
>> [   12.472451]  psci_checker+0x250/0x4cc
>> [   12.472547]  do_one_initcall+0x54/0x220
>> [   12.472646]  kernel_init_freeable+0x1ec/0x2b4
>> [   12.472760]  kernel_init+0x18/0x108
>> [   12.472851]  ret_from_fork+0x10/0x18
>> [   12.472942] ---[ end trace c328815eb39fc505 ]---
>>
>> where the psci checker is unhappy about the number of CPUs. So CPU0 doesn't
>> come back up, and probably has taken down a few things with it.
>>
>> The console seems to be on a rather bizarre baud rate, and I can't manage
>> to reset it. On reboot, the console recovers though, so the firmware is
>> able to restore some level of sanity (yay!).
>>
>> You can also reproduce it as:
>>
>> root@hot-poop:/home/maz# echo 0 >/sys/devices/system/cpu/cpu0/online
>> root@hot-poop:/home/maz# echo 1 >/sys/devices/system/cpu/cpu0/online
>> bash: echo: write error: Input/output error
>>

For this I just get a little garbage spurted out along with an 
unresponsive console. Not good.

>> The kernel log says:
>>
>> [   47.145006] IRQ 254: no longer affine to CPU0
>> [   47.149380] IRQ 382: no longer affine to CPU0
>> [   47.153844] CPU0: shutdown
>> [   47.156551] psci: CPU0 killed.
>> [   60.904531] CPU0: failed to come online
>> [   60.904634] CPU0: failed in unknown state : 0x0
>>
>> and the console is dead. I guess nobody ever turned CPU0 off... :-/
>>
> 
> For a moment, I thought PSCI checker found some issue that normal hotplug
> operation didn't. Guess what, I am wrong :). Normal HP tests from the
> kernel triggers this, which is good as not all normally run this PSCI
> tests.

OK, but now for my D06 - which passed the PSCI test - I get this:

root@ubuntu:/home/john# echo 0 > /sys/devices/system/cpu/cpu0/online
root@ubuntu:/home/john# echo 1 > /sys/devices/system/cpu/cpu0/online
[   78.537579] CPU0: failed to come online
[   78.541406] CPU0: failed in unknown state : 0x0
bash: echo: write error: Input/output error
root@ubuntu:/home/john# dmesg | tail
[   26.490683] hid-generic 0003:12D1:0003.0002: input: USB HID v1.10 
Mouse [Keyboard/Mouse KVM 1.1.0] on usb-0000:7a:01.0-2.1/input1
[   70.758432] CPU0: shutdown
[   70.777581] psci: Retrying again to check for CPU kill
[   70.777585] psci: CPU0 killed.
[   78.537579] CPU0: failed to come online
[   78.541406] CPU0: failed in unknown state : 0x0
root@ubuntu:/home/john#
root@ubuntu:/home/john# echo 1 > /sys/devices/system/cpu/cpu0/online
[  458.026871] psci: failed to boot CPU0 (-22)
[  458.031049] CPU0: failed to boot: -22

I intermittently saw this issue a little while ago - that is, CPU0 does 
not come back online. I meant to revisit...

> 
> Anyways, looks like the firmware is broken. 

Yeah

If there are hardware
> limitations, the firmware can fail to poweroff as a workaround. If
> it is anything to do with some secure service or OS, we have PSCI
> ways to convey the same and we now avoid starting the CPU down sequence
> by marking it not hotpluggable. PSCI tests may not be able to use it
> but I expect the firmware to return error for CPU_DOWN in that case.
> 
Thanks,
John

> --
> Regards,
> Sudeep
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-05 17:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-05 12:38 PSCI checker query John Garry
2019-12-05 13:30 ` Sudeep Holla
2019-12-05 14:22   ` John Garry
2019-12-05 15:48     ` Sudeep Holla
2019-12-05 15:55 ` Marc Zyngier
2019-12-05 16:53   ` Sudeep Holla
2019-12-05 16:59     ` Marc Zyngier
2019-12-05 17:12     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).