* ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
@ 2020-02-25 0:35 Tim Harvey
2020-02-25 0:50 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 0:35 UTC (permalink / raw)
To: linux-arm-kernel, Will Deacon, Catalin Marinas, Sunil Goutham,
Robert Richter
Greetings,
I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
OcteonTX (CN80XX) SoC would cause the kernel to hang.
Here's what I'm seeing using arch/arm64/defconfig +
CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
firmware:
Starting kernel ...
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
[ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
(gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
#2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
[ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
[ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000500000-0x000000007fffffff]
[ 0.000000] On node 0 totalpages: 523008
[ 0.000000] DMA zone: 4076 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 260864 pages, LIFO batch:63
[ 0.000000] DMA32 zone: 4096 pages used for memmap
[ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.1 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS resident on physical CPU 0x0
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
[ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: GIC system register CPU interface
[ 0.000000] CPU features: detected: Cavium erratum 30115
[ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[ 0.000000] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[ 0.000000] Speculative Store Bypass Disable mitigation not required
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 514836
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: cma=64M coherent_pool=16M
net.ifnames=0 debug
[ 0.000000] Dentry cache hash table entries: 262144 (order: 9,
2097152 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 131072 (order: 8,
1048576 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] software IO TLB: mapped [mem 0x3bfff000-0x3ffff000] (64MB)
[ 0.000000] Memory: 1887632K/2092032K available (12732K kernel
code, 1922K rwdata, 6844K rodata, 10496K init, 455K bss, 138864K
reserved, 65536K cma-reserved)
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[ 0.000000] GICv3: 128 SPIs implemented
[ 0.000000] GICv3: 0 Extended SPIs implemented
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: 16 PPIs implemented
[ 0.000000] GICv3: no VLPI support, direct LPI support
[ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000801080000000
[ 0.000000] ITS [mem 0x801000020000-0x80100021ffff]
[ 0.000000] ITS@0x0000801000020000: Devices Table too large, reduce
ids 21->19
[ 0.000000] ITS@0x0000801000020000: allocated 524288 Devices
@78c00000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GICv3: using LPI property table @0x000000007b440000
[ 0.000000] GICv3: CPU0: using allocated LPI pending table
@0x000000007b450000
[ 0.000000] random: get_random_bytes called from
start_kernel+0x2b8/0x448 with crng_init=0
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (phys).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff
max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000003] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps
every 4398046511100ns
[ 0.000397] Console: colour dummy device 80x25
[ 0.000899] printk: console [tty0] enabled
[ 0.000973] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.000992] pid_max: default: 32768 minimum: 301
[ 0.001059] LSM: Security Framework initializing
[ 0.001112] Mount-cache hash table entries: 4096 (order: 3, 32768
bytes, linear)
[ 0.001132] Mountpoint-cache hash table entries: 4096 (order: 3,
32768 bytes, linear)
[ 0.024042] ASID allocator initialised with 32768 entries
[ 0.032035] rcu: Hierarchical SRCU implementation.
[ 0.040106] Platform MSI: gic-its@801000020000 domain created
[ 0.040179] PCI/MSI:
/interrupt-controller@801000000000/gic-its@801000020000 domain created
[ 0.040900] EFI services will not be available.
[ 0.048064] smp: Bringing up secondary CPUs ...
[ 0.080236] Detected VIPT I-cache on CPU1
[ 0.080252] GICv3: CPU1: found redistributor 1 region 0:0x0000801080020000
[ 0.080260] GICv3: CPU1: using allocated LPI pending table
@0x000000007b460000
[ 0.080286] CPU1: Booted secondary processor 0x0000000001 [0x430f0a22]
[ 0.112272] Detected VIPT I-cache on CPU2
[ 0.112284] GICv3: CPU2: found redistributor 2 region 0:0x0000801080040000
[ 0.112290] GICv3: CPU2: using allocated LPI pending table
@0x000000007b470000
[ 0.112308] CPU2: Booted secondary processor 0x0000000002 [0x430f0a22]
[ 0.144330] Detected VIPT I-cache on CPU3
[ 0.144341] GICv3: CPU3: found redistributor 3 region 0:0x0000801080060000
[ 0.144348] GICv3: CPU3: using allocated LPI pending table
@0x000000007b480000
[ 0.144366] CPU3: Booted secondary processor 0x0000000003 [0x430f0a22]
[ 0.144435] smp: Brought up 1 node, 4 CPUs
[ 0.144545] SMP: Total of 4 processors activated.
[ 0.144557] CPU features: detected: Data cache clean to the PoU not
required for I/D coherence
[ 0.144570] CPU features: detected: CRC32 instructions
[ 0.174519] CPU features: emulated: Privileged Access Never (PAN)
using TTBR0_EL1 switching
[ 0.174546] CPU: All CPU(s) started at EL2
[ 0.174584] alternatives: patching kernel code
[ 0.175748] devtmpfs: initialized
[ 0.178091] KASLR disabled due to lack of seed
[ 0.178381] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.178406] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[ 0.179731] pinctrl core: initialized pinctrl subsystem
[ 0.180547] thermal_sys: Registered thermal governor 'step_wise'
[ 0.180549] thermal_sys: Registered thermal governor 'power_allocator'
[ 0.180649] DMI not present or invalid.
[ 0.180927] NET: Registered protocol family 16
[ 0.190295] DMA: preallocated 16384 KiB pool for atomic allocations
[ 0.190324] audit: initializing netlink subsys (disabled)
[ 0.190449] audit: type=2000 audit(0.172:1): state=initialized
audit_enabled=0 res=1
[ 0.191143] cpuidle: using governor menu
[ 0.191351] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[ 0.192244] Serial: AMBA PL011 UART driver
[ 0.193776] 87e028000000.serial: ttyAMA0 at MMIO 0x87e028000000
(irq = 7, base_baud = 0) is a PL011 rev3
[ 0.870244] printk: console [ttyAMA0] enabled
[ 0.875094] 87e029000000.serial: ttyAMA1 at MMIO 0x87e029000000
(irq = 8, base_baud = 0) is a PL011 rev3
[ 0.884998] 87e02a000000.serial: ttyAMA2 at MMIO 0x87e02a000000
(irq = 9, base_baud = 0) is a PL011 rev3
[ 0.894899] 87e02b000000.serial: ttyAMA3 at MMIO 0x87e02b000000
(irq = 10, base_baud = 0) is a PL011 rev3
[ 0.914346] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.921074] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[ 0.927786] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.934490] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[ 0.942887] cryptd: max_cpu_qlen set to 1000
[ 0.950168] ACPI: Interpreter disabled.
[ 0.954674] iommu: Default domain type: Translated
[ 0.959697] vgaarb: loaded
[ 0.962633] SCSI subsystem initialized
[ 0.966531] libata version 3.00 loaded.
[ 0.970558] usbcore: registered new interface driver usbfs
[ 0.976086] usbcore: registered new interface driver hub
[ 0.981451] usbcore: registered new device driver usb
[ 0.986809] pps_core: LinuxPPS API ver. 1 registered
[ 0.991781] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[ 1.000926] PTP clock support registered
[ 1.004903] EDAC MC: Ver: 3.0.0
[ 1.008604] FPGA manager framework
[ 1.012074] Advanced Linux Sound Architecture Driver Initialized.
[ 1.018680] clocksource: Switched to clocksource arch_sys_counter
[ 1.024959] VFS: Disk quotas dquot_6.6.0
[ 1.028928] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.035947] pnp: PnP ACPI: disabled
[ 1.044258] NET: Registered protocol family 2
[ 1.048877] tcp_listen_portaddr_hash hash table entries: 1024
(order: 2, 16384 bytes, linear)
[ 1.057446] TCP established hash table entries: 16384 (order: 5,
131072 bytes, linear)
[ 1.065456] TCP bind hash table entries: 16384 (order: 6, 262144
bytes, linear)
[ 1.072894] TCP: Hash tables configured (established 16384 bind 16384)
[ 1.079491] UDP hash table entries: 1024 (order: 3, 32768 bytes, linear)
[ 1.086236] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes, linear)
[ 1.093503] NET: Registered protocol family 1
[ 1.098213] RPC: Registered named UNIX socket transport module.
[ 1.104150] RPC: Registered udp transport module.
[ 1.108856] RPC: Registered tcp transport module.
[ 1.113560] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 1.120006] PCI: CLS 0 bytes, default 64
[ 2.543115] hw perfevents: enabled with armv8_cavium_thunder PMU
driver, 7 counters available
[ 2.551759] kvm [1]: IPA Size Limit: 48bits
[ 2.556697] kvm [1]: GICv3: no GICV resource entry
[ 2.561499] kvm [1]: disabling GICv2 emulation
[ 2.565945] kvm [1]: GICv3 sysreg trapping enabled ([G0G1], reduced
performance)
[ 2.573361] kvm [1]: GIC system register CPU interface enabled
[ 2.579277] kvm [1]: vgic interrupt IRQ1
[ 2.583329] kvm [1]: Hyp mode initialized successfully
[ 23.590677] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 23.596780] rcu: 3-...0: (72 GPs behind)
idle=8ea/1/0x4000000000000000 softirq=24/24 fqs=2625
[ 23.605480] (detected by 0, t=5252 jiffies, g=-847, q=4)
[ 23.610874] Task dump for CPU 3:
[ 23.614097] modprobe R running task 0 101 7 0x00000002
[ 23.621146] Call trace:
[ 23.623593] ret_from_fork+0x0/0x1c
[ 86.610673] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 86.616769] rcu: 3-...0: (72 GPs behind)
idle=8ea/1/0x4000000000000000 softirq=24/24 fqs=10498
[ 86.625553] (detected by 0, t=21007 jiffies, g=-847, q=4)
[ 86.631034] Task dump for CPU 3:
[ 86.634256] modprobe R running task 0 101 7 0x00000002
[ 86.641304] Call trace:
[ 86.643746] ret_from_fork+0x0/0x1c
...
With the standard arm64 defconfig (with Emulate Privileged Access
Never using TTBR0_EL1 switching disabled) the board will continue on
enumerating PCI etc without this stall.
Any ideas what's going on here?
Best Regards,
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 0:35 ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX Tim Harvey
@ 2020-02-25 0:50 ` Marc Zyngier
2020-02-25 1:16 ` Tim Harvey
0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 0:50 UTC (permalink / raw)
To: Tim Harvey
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
Tim,
On 2020-02-25 00:35, Tim Harvey wrote:
> Greetings,
>
> I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
> OcteonTX (CN80XX) SoC would cause the kernel to hang.
>
> Here's what I'm seeing using arch/arm64/defconfig +
> CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
> CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
> firmware:
>
> Starting kernel ...
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
> [ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
> (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
> #2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
> [ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
> [ 0.000000] efi: Getting EFI parameters from FDT:
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
> [ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
> [ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
> [ 0.000000] Normal empty
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
> [ 0.000000] Initmem setup node 0 [mem
> 0x0000000000500000-0x000000007fffffff]
> [ 0.000000] On node 0 totalpages: 523008
> [ 0.000000] DMA zone: 4076 pages used for memmap
> [ 0.000000] DMA zone: 0 pages reserved
> [ 0.000000] DMA zone: 260864 pages, LIFO batch:63
> [ 0.000000] DMA32 zone: 4096 pages used for memmap
> [ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
> [ 0.000000] psci: probing for conduit method from DT.
> [ 0.000000] psci: PSCIv1.1 detected in firmware.
> [ 0.000000] psci: Using standard PSCI v0.2 function IDs
> [ 0.000000] psci: Trusted OS resident on physical CPU 0x0
> [ 0.000000] psci: SMC Calling Convention v1.1
> [ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
> [ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
> [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
> [ 0.000000] Detected VIPT I-cache on CPU0
> [ 0.000000] CPU features: detected: GIC system register CPU
> interface
> [ 0.000000] CPU features: detected: Cavium erratum 30115
> [ 0.000000] CPU features: detected: Kernel page table isolation
> (KPTI)
If this CPU is just another version of TX1, KPTI shouldn't get enabled
on
this HW, as it definitely breaks (see erratum 27456 and its
consequences).
Can you please enable CONFIG_CAVIUM_ERRATUM_27456 and report back?
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 0:50 ` Marc Zyngier
@ 2020-02-25 1:16 ` Tim Harvey
2020-02-25 1:55 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 1:16 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On Mon, Feb 24, 2020 at 4:50 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Tim,
>
> On 2020-02-25 00:35, Tim Harvey wrote:
> > Greetings,
> >
> > I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
> > OcteonTX (CN80XX) SoC would cause the kernel to hang.
> >
> > Here's what I'm seeing using arch/arm64/defconfig +
> > CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
> > CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
> > firmware:
> >
> > Starting kernel ...
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
> > [ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
> > (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
> > #2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
> > [ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
> > [ 0.000000] efi: Getting EFI parameters from FDT:
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
> > [ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
> > [ 0.000000] Zone ranges:
> > [ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
> > [ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
> > [ 0.000000] Normal empty
> > [ 0.000000] Movable zone start for each node
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
> > [ 0.000000] Initmem setup node 0 [mem
> > 0x0000000000500000-0x000000007fffffff]
> > [ 0.000000] On node 0 totalpages: 523008
> > [ 0.000000] DMA zone: 4076 pages used for memmap
> > [ 0.000000] DMA zone: 0 pages reserved
> > [ 0.000000] DMA zone: 260864 pages, LIFO batch:63
> > [ 0.000000] DMA32 zone: 4096 pages used for memmap
> > [ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
> > [ 0.000000] psci: probing for conduit method from DT.
> > [ 0.000000] psci: PSCIv1.1 detected in firmware.
> > [ 0.000000] psci: Using standard PSCI v0.2 function IDs
> > [ 0.000000] psci: Trusted OS resident on physical CPU 0x0
> > [ 0.000000] psci: SMC Calling Convention v1.1
> > [ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
> > [ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
> > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
> > [ 0.000000] Detected VIPT I-cache on CPU0
> > [ 0.000000] CPU features: detected: GIC system register CPU
> > interface
> > [ 0.000000] CPU features: detected: Cavium erratum 30115
> > [ 0.000000] CPU features: detected: Kernel page table isolation
> > (KPTI)
>
> If this CPU is just another version of TX1, KPTI shouldn't get enabled
> on
> this HW, as it definitely breaks (see erratum 27456 and its
> consequences).
> Can you please enable CONFIG_CAVIUM_ERRATUM_27456 and report back?
>
Marc,
This is a CN8030 Pass 1.2 part so erratum 27456 does appear to be
needed and it is indeed enabled already in the kernel by default.
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 1:16 ` Tim Harvey
@ 2020-02-25 1:55 ` Marc Zyngier
2020-02-25 16:13 ` Tim Harvey
0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 1:55 UTC (permalink / raw)
To: Tim Harvey
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On 2020-02-25 01:16, Tim Harvey wrote:
> On Mon, Feb 24, 2020 at 4:50 PM Marc Zyngier <maz@kernel.org> wrote:
>>
>> Tim,
>>
>> On 2020-02-25 00:35, Tim Harvey wrote:
>> > Greetings,
>> >
>> > I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
>> > OcteonTX (CN80XX) SoC would cause the kernel to hang.
>> >
>> > Here's what I'm seeing using arch/arm64/defconfig +
>> > CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
>> > CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
>> > firmware:
>> >
>> > Starting kernel ...
>> >
>> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
>> > [ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
>> > (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
>> > #2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
>> > [ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
>> > [ 0.000000] efi: Getting EFI parameters from FDT:
>> > [ 0.000000] efi: UEFI not found.
>> > [ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
>> > [ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
>> > [ 0.000000] Zone ranges:
>> > [ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
>> > [ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
>> > [ 0.000000] Normal empty
>> > [ 0.000000] Movable zone start for each node
>> > [ 0.000000] Early memory node ranges
>> > [ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
>> > [ 0.000000] Initmem setup node 0 [mem
>> > 0x0000000000500000-0x000000007fffffff]
>> > [ 0.000000] On node 0 totalpages: 523008
>> > [ 0.000000] DMA zone: 4076 pages used for memmap
>> > [ 0.000000] DMA zone: 0 pages reserved
>> > [ 0.000000] DMA zone: 260864 pages, LIFO batch:63
>> > [ 0.000000] DMA32 zone: 4096 pages used for memmap
>> > [ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
>> > [ 0.000000] psci: probing for conduit method from DT.
>> > [ 0.000000] psci: PSCIv1.1 detected in firmware.
>> > [ 0.000000] psci: Using standard PSCI v0.2 function IDs
>> > [ 0.000000] psci: Trusted OS resident on physical CPU 0x0
>> > [ 0.000000] psci: SMC Calling Convention v1.1
>> > [ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
>> > [ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
>> > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
>> > [ 0.000000] Detected VIPT I-cache on CPU0
>> > [ 0.000000] CPU features: detected: GIC system register CPU
>> > interface
>> > [ 0.000000] CPU features: detected: Cavium erratum 30115
>> > [ 0.000000] CPU features: detected: Kernel page table isolation
>> > (KPTI)
>>
>> If this CPU is just another version of TX1, KPTI shouldn't get enabled
>> on
>> this HW, as it definitely breaks (see erratum 27456 and its
>> consequences).
>> Can you please enable CONFIG_CAVIUM_ERRATUM_27456 and report back?
>>
>
> Marc,
>
> This is a CN8030 Pass 1.2 part so erratum 27456 does appear to be
> needed and it is indeed enabled already in the kernel by default.
And yet the kernel doesn't seem to detect an affected silicon.
Can you please apply the following patch and report what happens
(including the full dmesg):
diff --git a/arch/arm64/kernel/cpu_errata.c
b/arch/arm64/kernel/cpu_errata.c
index 703ad0a84f99..c0890d882e56 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -672,7 +672,7 @@ const struct midr_range cavium_erratum_27456_cpus[]
= {
/* Cavium ThunderX, T88 pass 1.x - 2.1 */
MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
/* Cavium ThunderX, T81 pass 1.0 */
- MIDR_REV(MIDR_THUNDERX_81XX, 0, 0),
+ MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX),
{},
};
#endif
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 1:55 ` Marc Zyngier
@ 2020-02-25 16:13 ` Tim Harvey
2020-02-25 16:27 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 16:13 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On Mon, Feb 24, 2020 at 5:55 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2020-02-25 01:16, Tim Harvey wrote:
> > On Mon, Feb 24, 2020 at 4:50 PM Marc Zyngier <maz@kernel.org> wrote:
> >>
> >> Tim,
> >>
> >> On 2020-02-25 00:35, Tim Harvey wrote:
> >> > Greetings,
> >> >
> >> > I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
> >> > OcteonTX (CN80XX) SoC would cause the kernel to hang.
> >> >
> >> > Here's what I'm seeing using arch/arm64/defconfig +
> >> > CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
> >> > CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
> >> > firmware:
> >> >
> >> > Starting kernel ...
> >> >
> >> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
> >> > [ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
> >> > (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
> >> > #2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
> >> > [ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
> >> > [ 0.000000] efi: Getting EFI parameters from FDT:
> >> > [ 0.000000] efi: UEFI not found.
> >> > [ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
> >> > [ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
> >> > [ 0.000000] Zone ranges:
> >> > [ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
> >> > [ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
> >> > [ 0.000000] Normal empty
> >> > [ 0.000000] Movable zone start for each node
> >> > [ 0.000000] Early memory node ranges
> >> > [ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
> >> > [ 0.000000] Initmem setup node 0 [mem
> >> > 0x0000000000500000-0x000000007fffffff]
> >> > [ 0.000000] On node 0 totalpages: 523008
> >> > [ 0.000000] DMA zone: 4076 pages used for memmap
> >> > [ 0.000000] DMA zone: 0 pages reserved
> >> > [ 0.000000] DMA zone: 260864 pages, LIFO batch:63
> >> > [ 0.000000] DMA32 zone: 4096 pages used for memmap
> >> > [ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
> >> > [ 0.000000] psci: probing for conduit method from DT.
> >> > [ 0.000000] psci: PSCIv1.1 detected in firmware.
> >> > [ 0.000000] psci: Using standard PSCI v0.2 function IDs
> >> > [ 0.000000] psci: Trusted OS resident on physical CPU 0x0
> >> > [ 0.000000] psci: SMC Calling Convention v1.1
> >> > [ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
> >> > [ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
> >> > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
> >> > [ 0.000000] Detected VIPT I-cache on CPU0
> >> > [ 0.000000] CPU features: detected: GIC system register CPU
> >> > interface
> >> > [ 0.000000] CPU features: detected: Cavium erratum 30115
> >> > [ 0.000000] CPU features: detected: Kernel page table isolation
> >> > (KPTI)
> >>
> >> If this CPU is just another version of TX1, KPTI shouldn't get enabled
> >> on
> >> this HW, as it definitely breaks (see erratum 27456 and its
> >> consequences).
> >> Can you please enable CONFIG_CAVIUM_ERRATUM_27456 and report back?
> >>
> >
> > Marc,
> >
> > This is a CN8030 Pass 1.2 part so erratum 27456 does appear to be
> > needed and it is indeed enabled already in the kernel by default.
>
> And yet the kernel doesn't seem to detect an affected silicon.
> Can you please apply the following patch and report what happens
> (including the full dmesg):
>
> diff --git a/arch/arm64/kernel/cpu_errata.c
> b/arch/arm64/kernel/cpu_errata.c
> index 703ad0a84f99..c0890d882e56 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -672,7 +672,7 @@ const struct midr_range cavium_erratum_27456_cpus[]
> = {
> /* Cavium ThunderX, T88 pass 1.x - 2.1 */
> MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
> /* Cavium ThunderX, T81 pass 1.0 */
> - MIDR_REV(MIDR_THUNDERX_81XX, 0, 0),
> + MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX),
> {},
> };
> #endif
>
Marc,
That does enable the erratum, disable KPTI and boot properly but I
misread the erratum and it shouldn't be needed for T81 pass 1.2... the
erratum is documented only needed for pass 1.0.
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 16:13 ` Tim Harvey
@ 2020-02-25 16:27 ` Marc Zyngier
2020-02-25 16:35 ` Robert Richter
0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 16:27 UTC (permalink / raw)
To: Tim Harvey
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On 2020-02-25 16:13, Tim Harvey wrote:
> On Mon, Feb 24, 2020 at 5:55 PM Marc Zyngier <maz@kernel.org> wrote:
>>
>> On 2020-02-25 01:16, Tim Harvey wrote:
>> > On Mon, Feb 24, 2020 at 4:50 PM Marc Zyngier <maz@kernel.org> wrote:
>> >>
>> >> Tim,
>> >>
>> >> On 2020-02-25 00:35, Tim Harvey wrote:
>> >> > Greetings,
>> >> >
>> >> > I'm trying to understand why enabling CONFIG_ARM64_SW_TTBR0_PAN on an
>> >> > OcteonTX (CN80XX) SoC would cause the kernel to hang.
>> >> >
>> >> > Here's what I'm seeing using arch/arm64/defconfig +
>> >> > CONFIG_ARM64_SW_TTBR0_PAN=y on a Gateworks Newport board with a
>> >> > CN8030-1500BG676-SCP-P12-G SoC using the Marvell SDK-10.1.1.0 boot
>> >> > firmware:
>> >> >
>> >> > Starting kernel ...
>> >> >
>> >> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x430f0a22]
>> >> > [ 0.000000] Linux version 5.5.0-00001-g2028a3b (tharvey@tharvey)
>> >> > (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 238.0))
>> >> > #2 SMP PREEMPT Mon Feb 24 16:20:24 PST 2020
>> >> > [ 0.000000] Machine model: Gateworks Newport CN80XX GW6404
>> >> > [ 0.000000] efi: Getting EFI parameters from FDT:
>> >> > [ 0.000000] efi: UEFI not found.
>> >> > [ 0.000000] cma: Reserved 64 MiB at 0x000000007c000000
>> >> > [ 0.000000] NUMA: NODE_DATA [mem 0x7bbe5100-0x7bbe6fff]
>> >> > [ 0.000000] Zone ranges:
>> >> > [ 0.000000] DMA [mem 0x0000000000500000-0x000000003fffffff]
>> >> > [ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
>> >> > [ 0.000000] Normal empty
>> >> > [ 0.000000] Movable zone start for each node
>> >> > [ 0.000000] Early memory node ranges
>> >> > [ 0.000000] node 0: [mem 0x0000000000500000-0x000000007fffffff]
>> >> > [ 0.000000] Initmem setup node 0 [mem
>> >> > 0x0000000000500000-0x000000007fffffff]
>> >> > [ 0.000000] On node 0 totalpages: 523008
>> >> > [ 0.000000] DMA zone: 4076 pages used for memmap
>> >> > [ 0.000000] DMA zone: 0 pages reserved
>> >> > [ 0.000000] DMA zone: 260864 pages, LIFO batch:63
>> >> > [ 0.000000] DMA32 zone: 4096 pages used for memmap
>> >> > [ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
>> >> > [ 0.000000] psci: probing for conduit method from DT.
>> >> > [ 0.000000] psci: PSCIv1.1 detected in firmware.
>> >> > [ 0.000000] psci: Using standard PSCI v0.2 function IDs
>> >> > [ 0.000000] psci: Trusted OS resident on physical CPU 0x0
>> >> > [ 0.000000] psci: SMC Calling Convention v1.1
>> >> > [ 0.000000] percpu: Embedded 22 pages/cpu s53016 r8192 d28904 u90112
>> >> > [ 0.000000] pcpu-alloc: s53016 r8192 d28904 u90112 alloc=22*4096
>> >> > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
>> >> > [ 0.000000] Detected VIPT I-cache on CPU0
>> >> > [ 0.000000] CPU features: detected: GIC system register CPU
>> >> > interface
>> >> > [ 0.000000] CPU features: detected: Cavium erratum 30115
>> >> > [ 0.000000] CPU features: detected: Kernel page table isolation
>> >> > (KPTI)
>> >>
>> >> If this CPU is just another version of TX1, KPTI shouldn't get enabled
>> >> on
>> >> this HW, as it definitely breaks (see erratum 27456 and its
>> >> consequences).
>> >> Can you please enable CONFIG_CAVIUM_ERRATUM_27456 and report back?
>> >>
>> >
>> > Marc,
>> >
>> > This is a CN8030 Pass 1.2 part so erratum 27456 does appear to be
>> > needed and it is indeed enabled already in the kernel by default.
>>
>> And yet the kernel doesn't seem to detect an affected silicon.
>> Can you please apply the following patch and report what happens
>> (including the full dmesg):
>>
>> diff --git a/arch/arm64/kernel/cpu_errata.c
>> b/arch/arm64/kernel/cpu_errata.c
>> index 703ad0a84f99..c0890d882e56 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -672,7 +672,7 @@ const struct midr_range
>> cavium_erratum_27456_cpus[]
>> = {
>> /* Cavium ThunderX, T88 pass 1.x - 2.1 */
>> MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
>> /* Cavium ThunderX, T81 pass 1.0 */
>> - MIDR_REV(MIDR_THUNDERX_81XX, 0, 0),
>> + MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX),
>> {},
>> };
>> #endif
>>
>
> Marc,
>
> That does enable the erratum, disable KPTI and boot properly but I
> misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> erratum is documented only needed for pass 1.0.
Can you then remove the patch *and* disable KPTI?
TX1 is broken beyond recognition and KPTI is known to explode on this HW
(which is why we disable KPTI on it). We always attributed it to this
erratum,
but in the absence of any help from Cavium to identify the problem, we
just
keyed it on that.
*IF* this HW is indeed unaffected by it, then it is probably the mix of
KPTI and SWPAN that triggers the issue. If my suspicion is correct,
you'll
need to have a chat with Cavium/Marvell to understand what is happening
there.
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 16:27 ` Marc Zyngier
@ 2020-02-25 16:35 ` Robert Richter
2020-02-25 17:00 ` Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Robert Richter @ 2020-02-25 16:35 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Tim Harvey, Will Deacon, Sunil Goutham,
linux-arm-kernel
Marc,
On 25.02.20 16:27:41, Marc Zyngier wrote:
> On 2020-02-25 16:13, Tim Harvey wrote:
> > That does enable the erratum, disable KPTI and boot properly but I
> > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> > erratum is documented only needed for pass 1.0.
>
> Can you then remove the patch *and* disable KPTI?
>
> TX1 is broken beyond recognition and KPTI is known to explode on this HW
> (which is why we disable KPTI on it). We always attributed it to this
> erratum,
> but in the absence of any help from Cavium to identify the problem, we just
> keyed it on that.
>
> *IF* this HW is indeed unaffected by it, then it is probably the mix of
> KPTI and SWPAN that triggers the issue. If my suspicion is correct, you'll
> need to have a chat with Cavium/Marvell to understand what is happening
> there.
I checked the docs and Tim is right, this should be only visible on
pass 1.0. Thus, the rev range to enable the workaround as implemented
upstream should be ok. I have asked hw folks regarding this.
-Robert
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 16:35 ` Robert Richter
@ 2020-02-25 17:00 ` Marc Zyngier
2020-02-25 17:04 ` Tim Harvey
2020-02-25 17:11 ` Tim Harvey
2020-03-10 14:51 ` Tim Harvey
2 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 17:00 UTC (permalink / raw)
To: Robert Richter
Cc: Catalin Marinas, Tim Harvey, Will Deacon, Sunil Goutham,
linux-arm-kernel
On 2020-02-25 16:35, Robert Richter wrote:
> Marc,
>
> On 25.02.20 16:27:41, Marc Zyngier wrote:
>> On 2020-02-25 16:13, Tim Harvey wrote:
>
>> > That does enable the erratum, disable KPTI and boot properly but I
>> > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
>> > erratum is documented only needed for pass 1.0.
>>
>> Can you then remove the patch *and* disable KPTI?
>>
>> TX1 is broken beyond recognition and KPTI is known to explode on this
>> HW
>> (which is why we disable KPTI on it). We always attributed it to this
>> erratum,
>> but in the absence of any help from Cavium to identify the problem, we
>> just
>> keyed it on that.
>>
>> *IF* this HW is indeed unaffected by it, then it is probably the mix
>> of
>> KPTI and SWPAN that triggers the issue. If my suspicion is correct,
>> you'll
>> need to have a chat with Cavium/Marvell to understand what is
>> happening
>> there.
>
> I checked the docs and Tim is right, this should be only visible on
> pass 1.0. Thus, the rev range to enable the workaround as implemented
> upstream should be ok. I have asked hw folks regarding this.
Then it could well be that our disabling of KPTI on TX1 is keyed on the
wrong
erratum. In the absence of a clear explanation of what is going on, we
made
an educated guess. If oyu're going to find out about what breaks this
CPU,
it'd be good to understand whether this is the same problem that affects
all
the other revisions.
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 17:00 ` Marc Zyngier
@ 2020-02-25 17:04 ` Tim Harvey
2020-02-25 17:19 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 17:04 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On Tue, Feb 25, 2020 at 9:00 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2020-02-25 16:35, Robert Richter wrote:
> > Marc,
> >
> > On 25.02.20 16:27:41, Marc Zyngier wrote:
> >> On 2020-02-25 16:13, Tim Harvey wrote:
> >
> >> > That does enable the erratum, disable KPTI and boot properly but I
> >> > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> >> > erratum is documented only needed for pass 1.0.
> >>
> >> Can you then remove the patch *and* disable KPTI?
> >>
> >> TX1 is broken beyond recognition and KPTI is known to explode on this
> >> HW
> >> (which is why we disable KPTI on it). We always attributed it to this
> >> erratum,
> >> but in the absence of any help from Cavium to identify the problem, we
> >> just
> >> keyed it on that.
> >>
> >> *IF* this HW is indeed unaffected by it, then it is probably the mix
> >> of
> >> KPTI and SWPAN that triggers the issue. If my suspicion is correct,
> >> you'll
> >> need to have a chat with Cavium/Marvell to understand what is
> >> happening
> >> there.
> >
> > I checked the docs and Tim is right, this should be only visible on
> > pass 1.0. Thus, the rev range to enable the workaround as implemented
> > upstream should be ok. I have asked hw folks regarding this.
>
> Then it could well be that our disabling of KPTI on TX1 is keyed on the
> wrong
> erratum. In the absence of a clear explanation of what is going on, we
> made
> an educated guess. If oyu're going to find out about what breaks this
> CPU,
> it'd be good to understand whether this is the same problem that affects
> all
> the other revisions.
>
Marc,
What's the right way to disable KPTI for ARM64? It seems 'nopti' and
'pti=off' are not honored for arm64?
Thanks,
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 16:35 ` Robert Richter
2020-02-25 17:00 ` Marc Zyngier
@ 2020-02-25 17:11 ` Tim Harvey
2020-03-10 14:51 ` Tim Harvey
2 siblings, 0 replies; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 17:11 UTC (permalink / raw)
To: Robert Richter
Cc: Marc Zyngier, Will Deacon, Sunil Goutham, linux-arm-kernel,
Catalin Marinas
On Tue, Feb 25, 2020 at 8:35 AM Robert Richter <rrichter@marvell.com> wrote:
>
> Marc,
>
> On 25.02.20 16:27:41, Marc Zyngier wrote:
> > On 2020-02-25 16:13, Tim Harvey wrote:
>
> > > That does enable the erratum, disable KPTI and boot properly but I
> > > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> > > erratum is documented only needed for pass 1.0.
> >
> > Can you then remove the patch *and* disable KPTI?
> >
> > TX1 is broken beyond recognition and KPTI is known to explode on this HW
> > (which is why we disable KPTI on it). We always attributed it to this
> > erratum,
> > but in the absence of any help from Cavium to identify the problem, we just
> > keyed it on that.
> >
> > *IF* this HW is indeed unaffected by it, then it is probably the mix of
> > KPTI and SWPAN that triggers the issue. If my suspicion is correct, you'll
> > need to have a chat with Cavium/Marvell to understand what is happening
> > there.
>
> I checked the docs and Tim is right, this should be only visible on
> pass 1.0. Thus, the rev range to enable the workaround as implemented
> upstream should be ok. I have asked hw folks regarding this.
>
Robert,
Thank you - please keep me updated so we can get this resolved
upstream in a proper fashion. The Marvell SDK currently supports Linux
4.14 where you can easily reproduce this. I don't have any CN81XX
boards handy - our boards all have CN80XX (we use
CN8020/CN8021/CN8030/CN8031) which apparently Marvell is all but
dropping support for as the current SDK produces boot firmware (BDK
boot.bin from SDK won't even fit into the smaller CN80XX L2 cache
without hacking the code up).
Best Regards,
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 17:04 ` Tim Harvey
@ 2020-02-25 17:19 ` Marc Zyngier
2020-02-25 17:29 ` Tim Harvey
0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 17:19 UTC (permalink / raw)
To: Tim Harvey
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On 2020-02-25 17:04, Tim Harvey wrote:
> On Tue, Feb 25, 2020 at 9:00 AM Marc Zyngier <maz@kernel.org> wrote:
>>
>> On 2020-02-25 16:35, Robert Richter wrote:
>> > Marc,
>> >
>> > On 25.02.20 16:27:41, Marc Zyngier wrote:
>> >> On 2020-02-25 16:13, Tim Harvey wrote:
>> >
>> >> > That does enable the erratum, disable KPTI and boot properly but I
>> >> > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
>> >> > erratum is documented only needed for pass 1.0.
>> >>
>> >> Can you then remove the patch *and* disable KPTI?
>> >>
>> >> TX1 is broken beyond recognition and KPTI is known to explode on this
>> >> HW
>> >> (which is why we disable KPTI on it). We always attributed it to this
>> >> erratum,
>> >> but in the absence of any help from Cavium to identify the problem, we
>> >> just
>> >> keyed it on that.
>> >>
>> >> *IF* this HW is indeed unaffected by it, then it is probably the mix
>> >> of
>> >> KPTI and SWPAN that triggers the issue. If my suspicion is correct,
>> >> you'll
>> >> need to have a chat with Cavium/Marvell to understand what is
>> >> happening
>> >> there.
>> >
>> > I checked the docs and Tim is right, this should be only visible on
>> > pass 1.0. Thus, the rev range to enable the workaround as implemented
>> > upstream should be ok. I have asked hw folks regarding this.
>>
>> Then it could well be that our disabling of KPTI on TX1 is keyed on
>> the
>> wrong
>> erratum. In the absence of a clear explanation of what is going on, we
>> made
>> an educated guess. If oyu're going to find out about what breaks this
>> CPU,
>> it'd be good to understand whether this is the same problem that
>> affects
>> all
>> the other revisions.
>>
>
> Marc,
>
> What's the right way to disable KPTI for ARM64? It seems 'nopti' and
> 'pti=off' are not honored for arm64?
kpti=0, as documented in
Documentation/admin-guide/kernel-parameters.txt.
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 17:19 ` Marc Zyngier
@ 2020-02-25 17:29 ` Tim Harvey
2020-02-25 17:47 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Tim Harvey @ 2020-02-25 17:29 UTC (permalink / raw)
To: Marc Zyngier
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On Tue, Feb 25, 2020 at 9:19 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2020-02-25 17:04, Tim Harvey wrote:
> > On Tue, Feb 25, 2020 at 9:00 AM Marc Zyngier <maz@kernel.org> wrote:
> >>
> >> On 2020-02-25 16:35, Robert Richter wrote:
> >> > Marc,
> >> >
> >> > On 25.02.20 16:27:41, Marc Zyngier wrote:
> >> >> On 2020-02-25 16:13, Tim Harvey wrote:
> >> >
> >> >> > That does enable the erratum, disable KPTI and boot properly but I
> >> >> > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> >> >> > erratum is documented only needed for pass 1.0.
> >> >>
> >> >> Can you then remove the patch *and* disable KPTI?
> >> >>
> >> >> TX1 is broken beyond recognition and KPTI is known to explode on this
> >> >> HW
> >> >> (which is why we disable KPTI on it). We always attributed it to this
> >> >> erratum,
> >> >> but in the absence of any help from Cavium to identify the problem, we
> >> >> just
> >> >> keyed it on that.
> >> >>
> >> >> *IF* this HW is indeed unaffected by it, then it is probably the mix
> >> >> of
> >> >> KPTI and SWPAN that triggers the issue. If my suspicion is correct,
> >> >> you'll
> >> >> need to have a chat with Cavium/Marvell to understand what is
> >> >> happening
> >> >> there.
> >> >
> >> > I checked the docs and Tim is right, this should be only visible on
> >> > pass 1.0. Thus, the rev range to enable the workaround as implemented
> >> > upstream should be ok. I have asked hw folks regarding this.
> >>
> >> Then it could well be that our disabling of KPTI on TX1 is keyed on
> >> the
> >> wrong
> >> erratum. In the absence of a clear explanation of what is going on, we
> >> made
> >> an educated guess. If oyu're going to find out about what breaks this
> >> CPU,
> >> it'd be good to understand whether this is the same problem that
> >> affects
> >> all
> >> the other revisions.
> >>
> >
> > Marc,
> >
> > What's the right way to disable KPTI for ARM64? It seems 'nopti' and
> > 'pti=off' are not honored for arm64?
>
> kpti=0, as documented in
> Documentation/admin-guide/kernel-parameters.txt.
>
Serves me right for Googling it and finding outdated info instead of
reading the right docs!
Yes, disabling KPTI with 'kpti=0' does work around the issue.
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 17:29 ` Tim Harvey
@ 2020-02-25 17:47 ` Marc Zyngier
0 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2020-02-25 17:47 UTC (permalink / raw)
To: Tim Harvey
Cc: Catalin Marinas, Robert Richter, Will Deacon, Sunil Goutham,
linux-arm-kernel
On 2020-02-25 17:29, Tim Harvey wrote:
> Yes, disabling KPTI with 'kpti=0' does work around the issue.
Then this is definitely Marvell's job to tell us what is going wrong
here.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX
2020-02-25 16:35 ` Robert Richter
2020-02-25 17:00 ` Marc Zyngier
2020-02-25 17:11 ` Tim Harvey
@ 2020-03-10 14:51 ` Tim Harvey
2 siblings, 0 replies; 14+ messages in thread
From: Tim Harvey @ 2020-03-10 14:51 UTC (permalink / raw)
To: Robert Richter
Cc: Marc Zyngier, Will Deacon, Sunil Goutham, linux-arm-kernel,
Catalin Marinas
On Tue, Feb 25, 2020 at 8:35 AM Robert Richter <rrichter@marvell.com> wrote:
>
> Marc,
>
> On 25.02.20 16:27:41, Marc Zyngier wrote:
> > On 2020-02-25 16:13, Tim Harvey wrote:
>
> > > That does enable the erratum, disable KPTI and boot properly but I
> > > misread the erratum and it shouldn't be needed for T81 pass 1.2... the
> > > erratum is documented only needed for pass 1.0.
> >
> > Can you then remove the patch *and* disable KPTI?
> >
> > TX1 is broken beyond recognition and KPTI is known to explode on this HW
> > (which is why we disable KPTI on it). We always attributed it to this
> > erratum,
> > but in the absence of any help from Cavium to identify the problem, we just
> > keyed it on that.
> >
> > *IF* this HW is indeed unaffected by it, then it is probably the mix of
> > KPTI and SWPAN that triggers the issue. If my suspicion is correct, you'll
> > need to have a chat with Cavium/Marvell to understand what is happening
> > there.
>
> I checked the docs and Tim is right, this should be only visible on
> pass 1.0. Thus, the rev range to enable the workaround as implemented
> upstream should be ok. I have asked hw folks regarding this.
>
Robert,
Any feedback on this TX1 issue from Cavium/Marvell?
Tim
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-03-10 14:52 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-25 0:35 ARM64_SW_TTBR0_PAN enabled causing hangs on OcteonTX Tim Harvey
2020-02-25 0:50 ` Marc Zyngier
2020-02-25 1:16 ` Tim Harvey
2020-02-25 1:55 ` Marc Zyngier
2020-02-25 16:13 ` Tim Harvey
2020-02-25 16:27 ` Marc Zyngier
2020-02-25 16:35 ` Robert Richter
2020-02-25 17:00 ` Marc Zyngier
2020-02-25 17:04 ` Tim Harvey
2020-02-25 17:19 ` Marc Zyngier
2020-02-25 17:29 ` Tim Harvey
2020-02-25 17:47 ` Marc Zyngier
2020-02-25 17:11 ` Tim Harvey
2020-03-10 14:51 ` Tim Harvey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).