* aarch64 ACPI boot regressed by commit 7ba5f605f3a0 ("arm64/numa: remove the limitation that cpu0 must bind to node0")
From: Laszlo Ersek @ 2016-10-13 22:50 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
the following regression is experienced in aarch64 qemu/KVM virtual
machines, using the ArmVirtQemu virtual UEFI firmware platform built
from edk2 (EFI Development Kit II).
(1) When booting current master (b67be92feb48) or the bisected first bad
commit (7ba5f605f3a0) with DT enabled, everything works fine.
(2) When booting the above two commits with DT disabled -- meaning:
either the firmware provides ACPI only and no DT, or "acpi=force" is
passed on the kernel command line --, the boot breaks with one of
the following symptoms:
(2a) no messages are printed after
> EFI stub: Booting Linux Kernel...
> ConvertPages: Incompatible memory types
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
and the kernel seems to be spinning in an infinite loop (no
messages despite "earlycon" and friends),
(2b) or the following crash dump is printed:
> EFI stub: Booting Linux Kernel...
> ConvertPages: Incompatible memory types
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> Booting Linux on physical CPU 0x0
> Linux version 4.8.0-rc3+ (root at aarch64-vgpu-1) (gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC) ) #19 SMP Thu Oct 13 22:30:27 CEST 2016
> Boot CPU: AArch64 Processor [500f0000]
> earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> bootconsole [pl11] enabled
> debug: ignoring loglevel setting.
> efi: Getting EFI parameters from FDT:
> efi: EFI v2.60 by EDK II
> efi: SMBIOS 3.0=0x23bdb0000 ACPI 2.0=0x2386d0000 MEMATTR=0x23a665018
> cma: Reserved 512 MiB at 0x00000000c0000000
> ACPI: Early table checksum verification disabled
> ACPI: RSDP 0x00000002386D0000 000024 (v02 BOCHS )
> ACPI: XSDT 0x00000002386C0000 00004C (v01 BOCHS BXPCFACP 00000001 01000013)
> ACPI: FACP 0x0000000238410000 00010C (v05 BOCHS BXPCFACP 00000001 BXPC 00000001)
> ACPI: DSDT 0x0000000238420000 001159 (v02 BOCHS BXPCDSDT 00000001 BXPC 00000001)
> ACPI: APIC 0x0000000238400000 0002BC (v03 BOCHS BXPCAPIC 00000001 BXPC 00000001)
> ACPI: GTDT 0x00000002383F0000 000060 (v02 BOCHS BXPCGTDT 00000001 BXPC 00000001)
> ACPI: MCFG 0x00000002383E0000 00003C (v01 BOCHS BXPCMCFG 00000001 BXPC 00000001)
> ACPI: SPCR 0x00000002383D0000 000050 (v02 BOCHS BXPCSPCR 00000001 BXPC 00000001)
> ACPI: NUMA: Failed to initialise from firmware
> NUMA: Faking a node at [mem 0x0000000000000000-0x000000023fffffff]
> NUMA: Adding memblock [0x40000000 - 0xfffeffff] on node 0
> NUMA: Adding memblock [0xffff0000 - 0xffffffff] on node 0
> NUMA: Adding memblock [0x100000000 - 0x2383cffff] on node 0
> NUMA: Adding memblock [0x2383d0000 - 0x23874ffff] on node 0
> NUMA: Adding memblock [0x238750000 - 0x23bc1ffff] on node 0
> NUMA: Adding memblock [0x23bc20000 - 0x23bffffff] on node 0
> NUMA: Adding memblock [0x23c000000 - 0x23fffffff] on node 0
> NUMA: Initmem setup node 0 [mem 0x40000000-0x23fffffff]
> NUMA: NODE_DATA [mem 0x23fff2580-0x23fffffff]
> Zone ranges:
> DMA [mem 0x0000000040000000-0x00000000ffffffff]
> Normal [mem 0x0000000100000000-0x000000023fffffff]
> Movable zone start for each node
> Early memory node ranges
> node 0: [mem 0x0000000040000000-0x00000000fffeffff]
> node 0: [mem 0x00000000ffff0000-0x00000000ffffffff]
> node 0: [mem 0x0000000100000000-0x00000002383cffff]
> node 0: [mem 0x00000002383d0000-0x000000023874ffff]
> node 0: [mem 0x0000000238750000-0x000000023bc1ffff]
> node 0: [mem 0x000000023bc20000-0x000000023bffffff]
> node 0: [mem 0x000000023c000000-0x000000023fffffff]
> Initmem setup node 0 [mem 0x0000000040000000-0x000000023fffffff]
> On node 0 totalpages: 131072
> DMA zone: 48 pages used for memmap
> DMA zone: 0 pages reserved
> DMA zone: 49152 pages, LIFO batch:1
> Normal zone: 80 pages used for memmap
> Normal zone: 81920 pages, LIFO batch:1
> psci: probing for conduit method from ACPI.
> psci: PSCIv0.2 detected in firmware.
> psci: Using standard PSCI v0.2 function IDs
> psci: Trusted OS migration not required
> percpu: Embedded 3 pages/cpu @fffffe01ffdb0000 s117320 r8192 d71096 u196608
> pcpu-alloc: s117320 r8192 d71096 u196608 alloc=3*65536
> pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7
> Detected PIPT I-cache on CPU0
> Built 1 zonelists in Node order, mobility grouping on. Total pages: 130944
> Policy zone: Normal
> Kernel command line: BOOT_IMAGE=/vmlinuz-4.8.0-rc3+ root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap console=ttyAMA0 earlyprintk=pl011,0x9000000 earlycon ignore_loglevel LANG=en_US.UTF-8 acpi=force
> PID hash table entries: 4096 (order: -1, 32768 bytes)
> software IO TLB [mem 0xfbfe0000-0xfffe0000] (64MB) mapped at [fffffe00bbfe0000-fffffe00bffdffff]
> Memory: 7717184K/8388608K available (8956K kernel code, 1564K rwdata, 3712K rodata, 1536K init, 15875K bss, 147136K reserved, 524288K cma-reserved)
> Virtual kernel memory layout:
> modules : 0xfffffc0000000000 - 0xfffffc0008000000 ( 128 MB)
> vmalloc : 0xfffffc0008000000 - 0xfffffdff5fff0000 ( 2045 GB)
> .text : 0xfffffc0008080000 - 0xfffffc0008940000 ( 8960 KB)
> .rodata : 0xfffffc0008940000 - 0xfffffc0008cf0000 ( 3776 KB)
> .init : 0xfffffc0008cf0000 - 0xfffffc0008e70000 ( 1536 KB)
> .data : 0xfffffc0008e70000 - 0xfffffc0008ff7200 ( 1565 KB)
> .bss : 0xfffffc0008ff7200 - 0xfffffc0009f78120 ( 15876 KB)
> fixed : 0xfffffdff7e7d0000 - 0xfffffdff7ec00000 ( 4288 KB)
> PCI I/O : 0xfffffdff7ee00000 - 0xfffffdff7fe00000 ( 16 MB)
> vmemmap : 0xfffffdff80000000 - 0xfffffe0000000000 ( 2 GB maximum)
> 0xfffffdff80000000 - 0xfffffdff80800000 ( 8 MB actual)
> memory : 0xfffffe0000000000 - 0xfffffe0200000000 ( 8192 MB)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
> Running RCU self tests
> Hierarchical RCU implementation.
> RCU lockdep checking is enabled.
> Build-time adjustment of leaf fanout to 64.
> RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
> RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=8
> kmemleak: Kernel memory leak detector disabled
> NR_IRQS:64 nr_irqs:64 0
> GICv2m: ACPI overriding V2M MSI_TYPER (base:80, num:64)
> GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
> GIC: PPI11 is secure or misconfigured
> arm_arch_timer: WARNING: Invalid trigger for IRQ3, assuming level low
> arm_arch_timer: WARNING: Please fix your firmware
> arm_arch_timer: Architected cp15 timer(s) running at 50.00MHz (virt).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> ... MAX_LOCKDEP_SUBCLASSES: 8
> ... MAX_LOCK_DEPTH: 48
> ... MAX_LOCKDEP_KEYS: 8191
> ... CLASSHASH_SIZE: 4096
> ... MAX_LOCKDEP_ENTRIES: 32768
> ... MAX_LOCKDEP_CHAINS: 65536
> ... CHAINHASH_SIZE: 32768
> memory used by lock dependency info: 8159 kB
> per task-struct memory footprint: 1920 bytes
> Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=50000)
> pid_max: default: 32768 minimum: 301
> ACPI: Core revision 20160422
> ACPI: 1 ACPI AML tables successfully acquired and loaded
>
> Security Framework initialized
> Yama: becoming mindful.
> SELinux: Initializing.
> SELinux: Starting in permissive mode
> Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
> Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
> Mount-cache hash table entries: 16384 (order: 1, 131072 bytes)
> Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes)
> ftrace: allocating 28918 entries in 8 pages
> ASID allocator initialised with 65536 entries
> Unable to handle kernel paging request at virtual address e18000009518
> pgd = fffffc0009fb0000
> [e18000009518] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
> Internal error: Oops: 96000004 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc3+ #19
> Hardware name: linux,dummy-virt (DT)
> task: fffffe01dc07b600 task.stack: fffffe01fa680000
> PC is at __ll_sc_atomic_add+0x20/0x40
> LR is at __lock_acquire+0xe8/0x698
> pc : [<fffffc000847a9a8>] lr : [<fffffc00081357e8>] pstate: 800000c5
> sp : fffffe01fa6838c0
> x29: fffffe01fa6838c0 x28: fffffc0008ea3000
> x27: fffffc0008ea2358 x26: fffffc0009c84000
> x25: 0000000000000001 x24: 0000000000000000
> x23: fffffe01dc07b600 x22: 0000000000000000
> x21: fffffe01ffd80818 x20: 0000000000000000
> x19: fffffe01ffd80818 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000
> x15: ffffffffffffffff x14: 00000000000002b7
> x13: 00000000000002b7 x12: 0000000000000038
> x11: 0000000000000005 x10: 0101010101010101
> x9 : 0000000000000001 x8 : 0000e18000009518
> x7 : fffffc000829124c x6 : 0000000000000000
> x5 : 0000000000000080 x4 : 0000e18000009380
> x3 : 0000000000000000 x2 : 000072000000c380
> x1 : 0000e18000009518 x0 : fffffc00081357e8
>
> Process swapper/0 (pid: 1, stack limit = 0xfffffe01fa680020)
> Stack: (0xfffffe01fa6838c0 to 0xfffffe01fa684000)
> 38c0: fffffe01fa6838e0 fffffc00081357e8 fffffe01fa680000 0000000000000001
> 38e0: fffffe01fa683960 fffffc0008136170 fffffe01ffd80818 0000000000000000
> 3900: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
> 3920: fffffc000829124c 00000000000000c0 fffffc0008ea2358 fffffc0008ea3000
> 3940: fffffc000812dde0 00000000000000c0 fffffc0000000000 fffffc0000000000
> 3960: fffffe01fa6839d0 fffffc000892c9ac fffffe01ffd80800 fffffc000829124c
> 3980: fffffe01ffd80800 fffffc000829200c fffffe01f401fc00 000000000000e8e8
> 39a0: fffffe01f401fc00 fffffe01f401fcf8 fffffe01fff1ea90 0000000000000000
> 39c0: fffffe01dc07b680 fffffc0008ea2000 fffffe01fa6839f0 fffffc000829124c
> 39e0: 00000000ffffffff fffffe01ffd80800 fffffe01fa683b10 fffffc0008291ce8
> 3a00: 00000000ffffffff 0000000000000001 00000000024080c0 fffffc000829200c
> 3a20: 0000000000210d00 000000000000e8e8 fffffe01f401fc00 fffffe01f401fcf8
> 3a40: fffffe01fff1ea90 0000000000000000 fffffe01fa683a80 fffffc0008088954
> 3a60: fffffc0009048eb8 fffffe01dc07b600 fffffe01dc07be60 fffffe01fff1eaa0
> 3a80: fffffe01fa683ad0 fffffc00024080c0 fffffc0009048eb8 fffffc00095a3000
> 3aa0: fffffc0008132e80 fffffc0009048eb8 0000000000000000 0000000000000000
> 3ac0: fffffe01fa683eb0 fffffc0008083330 fffffe01fa683b10 fffffc0008291b18
> 3ae0: 7f7f7f7f7f7f7f7f ff1f2877372f2427 0101010101010101 0000000000000005
> 3b00: 0000000000000038 0000000000000000 fffffe01fa683c30 fffffc000829200c
> 3b20: 0000000000000040 fffffe01f401fc00 00000000024080c0 00000000ffffffff
> 3b40: fffffc00084ac9f0 fffffe01fff1ea90 0000000000000000 0000000000000006
> 3b60: 0000000000000043 fffffc0008b89670 fffffc0008f374b8 0000000000000000
> 3b80: fffffe01fa683bd0 fffffc0008135200 fffffe01fff1eaa0 0000000000000000
> 3ba0: fffffe0100000000 fffffc00084ac9f0 fffffe01fa683bf0 fffffc0008131904
> 3bc0: fffffe01fa680000 0000000102000200 fffffc0008bb4cf0 0000000000000189
> 3be0: 0000000000000028 fffffe01dc07b600 fffffe01fa683c10 fffffc00080fff1c
> 3c00: fffffc0008fbb39e 0000000000000000 fffffe01fa683c40 fffffc0008100034
> 3c20: fffffe01fa683c30 fffffc0008291ff4 fffffe01fa683c70 fffffc00082927dc
> 3c40: fffffe01f401fc00 00000000024080c0 fffffc00084ac9f0 fffffe01f401fc00
> 3c60: 0000000000000028 fffffc0008ea4000 fffffe01fa683cd0 fffffc00084ac9f0
> 3c80: fffffc0008ff6090 fffffc0008b89670 fffffc0008fd5f90 0000000000000002
> 3ca0: fffffc0008fd5f10 0000000000000003 00000000000001bd 0000000000000006
> 3cc0: 0000000000000043 fffffc0008b88a10 fffffe01fa683d10 fffffc0008d25090
> 3ce0: fffffc0008ff6090 fffffc0008fd5f90 fffffc0008fd5f90 fffffc0008fd5000
> 3d00: 0000000000000002 fffffc0008ea2398 fffffe01fa683d90 fffffc0008083594
> 3d20: fffffc0008d24fa0 fffffe01fa680000 0000000000000000 0000000000000000
> 3d40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3d60: 0000000000000000 0000000000000000 fffffe01fa683d90 fffffc0008b89d18
> 3d80: 000000000000000e 0000000000000019 fffffe01fa683e00 fffffc0008cf0d2c
> 3da0: fffffc0008e1c288 fffffc0008e1c2c8 0000000000000040 0000000000000000
> 3dc0: fffffe01fa683e00 fffffc0008cf0d1c fffffc0008e1c200 fffffc0008e1c2c8
> 3de0: 0000000000000040 0000000000000000 0000000000000000 fffffc0008e1c2c8
> 3e00: fffffe01fa683ea0 fffffc0008924978 fffffc0008924960 0000000000000000
> 3e20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3e40: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> 3e60: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
> 3e80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3ea0: 0000000000000000 fffffc0008083330 fffffc0008924960 0000000000000000
> 3ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
> 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call trace:
> Exception stack(0xfffffe01fa6836f0 to 0xfffffe01fa683820)
> 36e0: fffffe01ffd80818 0000040000000000
> 3700: fffffe01fa6838c0 fffffc000847a9a8 fffffe01fff1b580 fffffe01fff1b580
> 3720: fffffc0008927614 fffffc0008ea1000 fffffe01fa683740 00000000000000c0
> 3740: fffffe01fa683780 fffffc000811794c fffffe01fa6837e0 fffffc0008135a64
> 3760: 87cee53ad487914d fffffe01dc07be60 0000000000000001 0000000000000000
> 3780: fffffe01dc07b600 0000000000000000 fffffc00081357e8 0000e18000009518
> 37a0: 000072000000c380 0000000000000000 0000e18000009380 0000000000000080
> 37c0: 0000000000000000 fffffc000829124c 0000e18000009518 0000000000000001
> 37e0: 0101010101010101 0000000000000005 0000000000000038 00000000000002b7
> 3800: 00000000000002b7 ffffffffffffffff 0000000000000000 0000000000000000
> [<fffffc000847a9a8>] __ll_sc_atomic_add+0x20/0x40
> [<fffffc00081357e8>] __lock_acquire+0xe8/0x698
> [<fffffc0008136170>] lock_acquire+0xd8/0x2c0
> [<fffffc000892c9ac>] _raw_spin_lock+0x4c/0x60
> [<fffffc000829124c>] get_partial_node.isra.23+0x4c/0x440
> [<fffffc0008291ce8>] ___slab_alloc+0x438/0x708
> [<fffffc000829200c>] __slab_alloc+0x54/0xa0
> [<fffffc00082927dc>] kmem_cache_alloc_trace+0x35c/0x428
> [<fffffc00084ac9f0>] ddebug_add_module+0x38/0xf0
> [<fffffc0008d25090>] dynamic_debug_init+0xf0/0x2a0
> [<fffffc0008083594>] do_one_initcall+0x44/0x138
> [<fffffc0008cf0d2c>] kernel_init_freeable+0x17c/0x2e0
> [<fffffc0008924978>] kernel_init+0x18/0x110
> [<fffffc0008083330>] ret_from_fork+0x10/0x20
> Code: aa1e03e0 aa0103e8 d503201f f9800111 (885f7d00)
> ---[ end trace 0000000000000000 ]---
> note: swapper/0[1] exited with preempt_count 1
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
(3) Please find the bisection log below:
> git bisect start
> # bad: [b67be92feb486f800d80d72c67fd87b47b79b18e] Merge tag 'pwm/for-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
> git bisect bad b67be92feb486f800d80d72c67fd87b47b79b18e
> # good: [c8d2bc9bc39ebea8437fd974fdbc21847bb897a3] Linux 4.8
> git bisect good c8d2bc9bc39ebea8437fd974fdbc21847bb897a3
> # bad: [41844e36206be90cd4d962ea49b0abc3612a99d0] Merge tag 'staging-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect bad 41844e36206be90cd4d962ea49b0abc3612a99d0
> # bad: [d268dbe76a53d72cc41316eb59e7968db60e77ad] Merge tag 'pinctrl-v4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> git bisect bad d268dbe76a53d72cc41316eb59e7968db60e77ad
> # bad: [02bafd96f3a5d8e610b19033ffec55b92459aaae] Merge tag 'docs-4.9' of git://git.lwn.net/linux
> git bisect bad 02bafd96f3a5d8e610b19033ffec55b92459aaae
> # bad: [9929780e86854833e649b39b290b5fe921eb1701] Merge tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect bad 9929780e86854833e649b39b290b5fe921eb1701
> # bad: [12b7bcb43e6ea834ab2f5dc52d971e379a0ca109] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 12b7bcb43e6ea834ab2f5dc52d971e379a0ca109
> # bad: [72d39926f098b0c4ad95e1461595a8d6d403c14d] Merge tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect bad 72d39926f098b0c4ad95e1461595a8d6d403c14d
> # bad: [72ec94560d7ee1d3a61d5904fd9a5bf68bf3b11a] Merge tag 'pm-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect bad 72ec94560d7ee1d3a61d5904fd9a5bf68bf3b11a
> # bad: [792d47379f4d4c76692f1795f33d38582f8907fa] arm64: alternative: add auto-nop infrastructure
> git bisect bad 792d47379f4d4c76692f1795f33d38582f8907fa
> # good: [dc00247576fdb97211e1959b4dfd2a7893cf9d0b] arm64: kernel: re-export _cpu_resume() from sleep.S
> git bisect good dc00247576fdb97211e1959b4dfd2a7893cf9d0b
> # good: [9787ed6e5cee7a62320f3014eb5e7b373502c292] of/numa: remove a duplicated warning
> git bisect good 9787ed6e5cee7a62320f3014eb5e7b373502c292
> # bad: [c47a1900ad710fd2c97127e2ba19da1df79cf733] arm64: Rearrange CPU errata workaround checks
> git bisect bad c47a1900ad710fd2c97127e2ba19da1df79cf733
> # good: [7af3a0a992524ffddc342cd1481cc4dcb3f1da71] arm64/numa: support HAVE_SETUP_PER_CPU_AREA
> git bisect good 7af3a0a992524ffddc342cd1481cc4dcb3f1da71
> # bad: [7ba5f605f3a0d9495aad539eeb8346d726dfc183] arm64/numa: remove the limitation that cpu0 must bind to node0
> git bisect bad 7ba5f605f3a0d9495aad539eeb8346d726dfc183
> # good: [df7ffa34cc0c06bfa7206732df78725ff34633ee] arm64/numa: remove some useless code
> git bisect good df7ffa34cc0c06bfa7206732df78725ff34633ee
> # first bad commit: [7ba5f605f3a0d9495aad539eeb8346d726dfc183] arm64/numa: remove the limitation that cpu0 must bind to node0
I repeatedly retested the first bad commit:
> commit 7ba5f605f3a0d9495aad539eeb8346d726dfc183
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Thu Sep 1 14:55:04 2016 +0800
>
> arm64/numa: remove the limitation that cpu0 must bind to node0
and its direct ancestor:
> commit df7ffa34cc0c06bfa7206732df78725ff34633ee
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Thu Sep 1 14:55:03 2016 +0800
>
> arm64/numa: remove some useless code
The offending commit consistently fails to boot with the described
symptoms when DT is disabled, and succeeds to boot when DT is enabled.
The predecessor commit consistently succeeds to boot regardless of DT
versus ACPI.
(4) Analysis (well, a lame attempt at that, because I have zero
familiarity with this code). Let me quote the patch:
> commit 7ba5f605f3a0d9495aad539eeb8346d726dfc183
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Thu Sep 1 14:55:04 2016 +0800
>
> arm64/numa: remove the limitation that cpu0 must bind to node0
>
> 1. Remove the old binding code.
> 2. Read the nid of cpu0 from dts.
> 3. Fallback the nid of cpu0 to 0 when numa=off is set in bootargs.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index c3c08368a685..8b048e6ec34a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -624,6 +624,7 @@ static void __init of_parse_and_init_cpus(void)
> }
>
> bootcpu_valid = true;
> + early_map_cpu_to_node(0, of_node_to_nid(dn));
>
> /*
> * cpu_logical_map has already been
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 0a15f010b64a..778a985c8a70 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -116,16 +116,24 @@ static void __init setup_node_to_cpumask_map(void)
> */
> void numa_store_cpu_info(unsigned int cpu)
> {
> - map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> + map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
> }
>
> void __init early_map_cpu_to_node(unsigned int cpu, int nid)
> {
> /* fallback to node 0 */
> - if (nid < 0 || nid >= MAX_NUMNODES)
> + if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
> nid = 0;
>
> cpu_to_node_map[cpu] = nid;
> +
> + /*
> + * We should set the numa node of cpu0 as soon as possible, because it
> + * has already been set up online before. cpu_to_node(0) will soon be
> + * called.
> + */
> + if (!cpu)
> + set_cpu_numa_node(cpu, nid);
> }
>
> #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> @@ -393,10 +401,6 @@ static int __init numa_init(int (*init_func)(void))
>
> setup_node_to_cpumask_map();
>
> - /* init boot processor */
> - cpu_to_node_map[0] = 0;
> - map_cpu_to_node(0, 0);
> -
> return 0;
> }
>
The commit message states that the numa-id (nid) for CPU#0 is read from
the DTS. If there is no DT, I think that won't work so well.
Second, the patch replaces the unconditional, static
CPU#0 <-> numa-node#0
mapping in numa_init(), which is independent of ACPI vs. DT, with a
DT-dependent early_map_cpu_to_node() call, in of_parse_and_init_cpus().
The ACPI branch is regressed by this, because on that branch we now
don't create a
CPU#0 <-> numa-node#whatever
mapping at all.
(5) The entry
ACPI: NUMA: Failed to initialise from firmware
in the dmesg doesn't imply an error in the firmware; it just means that
the firmware does not provide an SRAT table. That is valid if there's
only one NUMA node in the system.
(6) For reference, the MADT is:
> /*
> * Intel ACPI Component Architecture
> * AML/ASL+ Disassembler version 20160831-64
> * Copyright (c) 2000 - 2016 Intel Corporation
> *
> * Disassembly of apic.dat, Fri Oct 14 00:21:16 2016
> *
> * ACPI Data Table [APIC]
> *
> * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue
> */
>
> [000h 0000 4] Signature : "APIC" [Multiple APIC Description Table (MADT)]
> [004h 0004 4] Table Length : 000002BC
> [008h 0008 1] Revision : 03
> [009h 0009 1] Checksum : 18
> [00Ah 0010 6] Oem ID : "BOCHS "
> [010h 0016 8] Oem Table ID : "BXPCAPIC"
> [018h 0024 4] Oem Revision : 00000001
> [01Ch 0028 4] Asl Compiler ID : "BXPC"
> [020h 0032 4] Asl Compiler Revision : 00000001
>
> [024h 0036 4] Local Apic Address : 00000000
> [028h 0040 4] Flags (decoded below) : 00000000
> PC-AT Compatibility : 0
>
> [02Ch 0044 1] Subtable Type : 0C [Generic Interrupt Distributor]
> [02Dh 0045 1] Length : 18
> [02Eh 0046 2] Reserved : 0000
> [030h 0048 4] Local GIC Hardware ID : 00000000
> [034h 0052 8] Base Address : 0000000008000000
> [03Ch 0060 4] Interrupt Base : 00000000
> [040h 0064 1] Version : 02
> [041h 0065 3] Reserved : 000000
>
> [044h 0068 1] Subtable Type : 0B [Generic Interrupt Controller]
> [045h 0069 1] Length : 4C
> [046h 0070 2] Reserved : 0000
> [048h 0072 4] CPU Interface Number : 00000000
> [04Ch 0076 4] Processor UID : 00000000
> [050h 0080 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [054h 0084 4] Parking Protocol Version : 00000000
> [058h 0088 4] Performance Interrupt : 00000017
> [05Ch 0092 8] Parked Address : 0000000000000000
> [064h 0100 8] Base Address : 0000000008010000
> [06Ch 0108 8] Virtual GIC Base Address : 0000000000000000
> [074h 0116 8] Hypervisor GIC Base Address : 0000000000000000
> [07Ch 0124 4] Virtual GIC Interrupt : 00000000
> [080h 0128 8] Redistributor Base Address : 0000000000000000
> [088h 0136 8] ARM MPIDR : 0000000000000000
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [090h 0144 1] Subtable Type : 0B [Generic Interrupt Controller]
> [091h 0145 1] Length : 4C
> [092h 0146 2] Reserved : 0000
> [094h 0148 4] CPU Interface Number : 00000001
> [098h 0152 4] Processor UID : 00000001
> [09Ch 0156 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [0A0h 0160 4] Parking Protocol Version : 00000000
> [0A4h 0164 4] Performance Interrupt : 00000017
> [0A8h 0168 8] Parked Address : 0000000000000000
> [0B0h 0176 8] Base Address : 0000000008010000
> [0B8h 0184 8] Virtual GIC Base Address : 0000000000000000
> [0C0h 0192 8] Hypervisor GIC Base Address : 0000000000000000
> [0C8h 0200 4] Virtual GIC Interrupt : 00000000
> [0CCh 0204 8] Redistributor Base Address : 0000000000000000
> [0D4h 0212 8] ARM MPIDR : 0000000000000001
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [0DCh 0220 1] Subtable Type : 0B [Generic Interrupt Controller]
> [0DDh 0221 1] Length : 4C
> [0DEh 0222 2] Reserved : 0000
> [0E0h 0224 4] CPU Interface Number : 00000002
> [0E4h 0228 4] Processor UID : 00000002
> [0E8h 0232 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [0ECh 0236 4] Parking Protocol Version : 00000000
> [0F0h 0240 4] Performance Interrupt : 00000017
> [0F4h 0244 8] Parked Address : 0000000000000000
> [0FCh 0252 8] Base Address : 0000000008010000
> [104h 0260 8] Virtual GIC Base Address : 0000000000000000
> [10Ch 0268 8] Hypervisor GIC Base Address : 0000000000000000
> [114h 0276 4] Virtual GIC Interrupt : 00000000
> [118h 0280 8] Redistributor Base Address : 0000000000000000
> [120h 0288 8] ARM MPIDR : 0000000000000002
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [128h 0296 1] Subtable Type : 0B [Generic Interrupt Controller]
> [129h 0297 1] Length : 4C
> [12Ah 0298 2] Reserved : 0000
> [12Ch 0300 4] CPU Interface Number : 00000003
> [130h 0304 4] Processor UID : 00000003
> [134h 0308 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [138h 0312 4] Parking Protocol Version : 00000000
> [13Ch 0316 4] Performance Interrupt : 00000017
> [140h 0320 8] Parked Address : 0000000000000000
> [148h 0328 8] Base Address : 0000000008010000
> [150h 0336 8] Virtual GIC Base Address : 0000000000000000
> [158h 0344 8] Hypervisor GIC Base Address : 0000000000000000
> [160h 0352 4] Virtual GIC Interrupt : 00000000
> [164h 0356 8] Redistributor Base Address : 0000000000000000
> [16Ch 0364 8] ARM MPIDR : 0000000000000003
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [174h 0372 1] Subtable Type : 0B [Generic Interrupt Controller]
> [175h 0373 1] Length : 4C
> [176h 0374 2] Reserved : 0000
> [178h 0376 4] CPU Interface Number : 00000004
> [17Ch 0380 4] Processor UID : 00000004
> [180h 0384 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [184h 0388 4] Parking Protocol Version : 00000000
> [188h 0392 4] Performance Interrupt : 00000017
> [18Ch 0396 8] Parked Address : 0000000000000000
> [194h 0404 8] Base Address : 0000000008010000
> [19Ch 0412 8] Virtual GIC Base Address : 0000000000000000
> [1A4h 0420 8] Hypervisor GIC Base Address : 0000000000000000
> [1ACh 0428 4] Virtual GIC Interrupt : 00000000
> [1B0h 0432 8] Redistributor Base Address : 0000000000000000
> [1B8h 0440 8] ARM MPIDR : 0000000000000004
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [1C0h 0448 1] Subtable Type : 0B [Generic Interrupt Controller]
> [1C1h 0449 1] Length : 4C
> [1C2h 0450 2] Reserved : 0000
> [1C4h 0452 4] CPU Interface Number : 00000005
> [1C8h 0456 4] Processor UID : 00000005
> [1CCh 0460 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [1D0h 0464 4] Parking Protocol Version : 00000000
> [1D4h 0468 4] Performance Interrupt : 00000017
> [1D8h 0472 8] Parked Address : 0000000000000000
> [1E0h 0480 8] Base Address : 0000000008010000
> [1E8h 0488 8] Virtual GIC Base Address : 0000000000000000
> [1F0h 0496 8] Hypervisor GIC Base Address : 0000000000000000
> [1F8h 0504 4] Virtual GIC Interrupt : 00000000
> [1FCh 0508 8] Redistributor Base Address : 0000000000000000
> [204h 0516 8] ARM MPIDR : 0000000000000005
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [20Ch 0524 1] Subtable Type : 0B [Generic Interrupt Controller]
> [20Dh 0525 1] Length : 4C
> [20Eh 0526 2] Reserved : 0000
> [210h 0528 4] CPU Interface Number : 00000006
> [214h 0532 4] Processor UID : 00000006
> [218h 0536 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [21Ch 0540 4] Parking Protocol Version : 00000000
> [220h 0544 4] Performance Interrupt : 00000017
> [224h 0548 8] Parked Address : 0000000000000000
> [22Ch 0556 8] Base Address : 0000000008010000
> [234h 0564 8] Virtual GIC Base Address : 0000000000000000
> [23Ch 0572 8] Hypervisor GIC Base Address : 0000000000000000
> [244h 0580 4] Virtual GIC Interrupt : 00000000
> [248h 0584 8] Redistributor Base Address : 0000000000000000
> [250h 0592 8] ARM MPIDR : 0000000000000006
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [258h 0600 1] Subtable Type : 0B [Generic Interrupt Controller]
> [259h 0601 1] Length : 4C
> [25Ah 0602 2] Reserved : 0000
> [25Ch 0604 4] CPU Interface Number : 00000007
> [260h 0608 4] Processor UID : 00000007
> [264h 0612 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
> Performance Interrupt Trigger Mode : 0
> Virtual GIC Interrupt Trigger Mode : 0
> [268h 0616 4] Parking Protocol Version : 00000000
> [26Ch 0620 4] Performance Interrupt : 00000017
> [270h 0624 8] Parked Address : 0000000000000000
> [278h 0632 8] Base Address : 0000000008010000
> [280h 0640 8] Virtual GIC Base Address : 0000000000000000
> [288h 0648 8] Hypervisor GIC Base Address : 0000000000000000
> [290h 0656 4] Virtual GIC Interrupt : 00000000
> [294h 0660 8] Redistributor Base Address : 0000000000000000
> [29Ch 0668 8] ARM MPIDR : 0000000000000007
> /**** ACPI subtable terminates early - may be older version (dump table) */
>
> [2A4h 0676 1] Subtable Type : 0D [Generic MSI Frame]
> [2A5h 0677 1] Length : 18
> [2A6h 0678 2] Reserved : 0000
> [2A8h 0680 4] MSI Frame ID : 00000000
> [2ACh 0684 8] Base Address : 0000000008020000
> [2B4h 0692 4] Flags (decoded below) : 00000001
> Select SPI : 1
> [2B8h 0696 2] SPI Count : 0040
> [2BAh 0698 2] SPI Base : 0050
>
> Raw Table Data: Length 700 (0x2BC)
>
> 0000: 41 50 49 43 BC 02 00 00 03 18 42 4F 43 48 53 20 // APIC......BOCHS
> 0010: 42 58 50 43 41 50 49 43 01 00 00 00 42 58 50 43 // BXPCAPIC....BXPC
> 0020: 01 00 00 00 00 00 00 00 00 00 00 00 0C 18 00 00 // ................
> 0030: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 // ................
> 0040: 02 00 00 00 0B 4C 00 00 00 00 00 00 00 00 00 00 // .....L..........
> 0050: 01 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00 // ................
> 0060: 00 00 00 00 00 00 01 08 00 00 00 00 00 00 00 00 // ................
> 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0090: 0B 4C 00 00 01 00 00 00 01 00 00 00 01 00 00 00 // .L..............
> 00A0: 00 00 00 00 17 00 00 00 00 00 00 00 00 00 00 00 // ................
> 00B0: 00 00 01 08 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 00D0: 00 00 00 00 01 00 00 00 00 00 00 00 0B 4C 00 00 // .............L..
> 00E0: 02 00 00 00 02 00 00 00 01 00 00 00 00 00 00 00 // ................
> 00F0: 17 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 // ................
> 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0120: 02 00 00 00 00 00 00 00 0B 4C 00 00 03 00 00 00 // .........L......
> 0130: 03 00 00 00 01 00 00 00 00 00 00 00 17 00 00 00 // ................
> 0140: 00 00 00 00 00 00 00 00 00 00 01 08 00 00 00 00 // ................
> 0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0160: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 // ................
> 0170: 00 00 00 00 0B 4C 00 00 04 00 00 00 04 00 00 00 // .....L..........
> 0180: 01 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00 // ................
> 0190: 00 00 00 00 00 00 01 08 00 00 00 00 00 00 00 00 // ................
> 01A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 01B0: 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 // ................
> 01C0: 0B 4C 00 00 05 00 00 00 05 00 00 00 01 00 00 00 // .L..............
> 01D0: 00 00 00 00 17 00 00 00 00 00 00 00 00 00 00 00 // ................
> 01E0: 00 00 01 08 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 01F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0200: 00 00 00 00 05 00 00 00 00 00 00 00 0B 4C 00 00 // .............L..
> 0210: 06 00 00 00 06 00 00 00 01 00 00 00 00 00 00 00 // ................
> 0220: 17 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 // ................
> 0230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0250: 06 00 00 00 00 00 00 00 0B 4C 00 00 07 00 00 00 // .........L......
> 0260: 07 00 00 00 01 00 00 00 00 00 00 00 17 00 00 00 // ................
> 0270: 00 00 00 00 00 00 00 00 00 00 01 08 00 00 00 00 // ................
> 0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
> 0290: 00 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 // ................
> 02A0: 00 00 00 00 0D 18 00 00 00 00 00 00 00 00 02 08 // ................
> 02B0: 00 00 00 00 01 00 00 00 40 00 50 00 // ........ at .P.
I'm unsure if this table is supposed to lead to the creation of the
(apparently missing)
CPU#0 <-> numa-node#0
mapping, via
smp_init_cpus() [arch/arm64/kernel/smp.c]
acpi_table_parse_madt() [drivers/acpi/tables.c]
acpi_parse_gic_cpu_interface() [arch/arm64/kernel/smp.c]
acpi_map_gic_cpu_interface() [arch/arm64/kernel/smp.c]
early_map_cpu_to_node() [arch/arm64/mm/numa.c]
(7) I also tried "numa=off" in addition to "acpi=force", just in case;
it didn't make a difference.
Thanks
Laszlo
^ permalink raw reply
* [PATCH 4/4] ARM: socfpga: dts: Add Monitor to A10-SR MFD
From: tthayer at opensource.altera.com @ 2016-10-13 21:32 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476394329-31696-1-git-send-email-tthayer@opensource.altera.com>
From: Thor Thayer <tthayer@opensource.altera.com>
Add the Monitor functionality to the Arria10 DevKit
System Resource chip.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
---
arch/arm/boot/dts/socfpga_arria10_socdk.dtsi | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm/boot/dts/socfpga_arria10_socdk.dtsi b/arch/arm/boot/dts/socfpga_arria10_socdk.dtsi
index eb00ae3..183d88b 100644
--- a/arch/arm/boot/dts/socfpga_arria10_socdk.dtsi
+++ b/arch/arm/boot/dts/socfpga_arria10_socdk.dtsi
@@ -121,6 +121,10 @@
gpio-controller;
#gpio-cells = <2>;
};
+
+ a10sr_monitor {
+ compatible = "altr,a10sr-mon";
+ };
};
};
--
1.7.9.5
^ permalink raw reply related
* [PATCH 3/4] mfd: altr-a10sr: Add Arria10 SR Monitor
From: tthayer at opensource.altera.com @ 2016-10-13 21:32 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476394329-31696-1-git-send-email-tthayer@opensource.altera.com>
From: Thor Thayer <tthayer@opensource.altera.com>
Add the Altera Arria10 DevKit System Resource Monitor functionality
to the MFD device.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
---
drivers/mfd/altera-a10sr.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/mfd/altera-a10sr.c b/drivers/mfd/altera-a10sr.c
index 06e1f7f..0942d7d 100644
--- a/drivers/mfd/altera-a10sr.c
+++ b/drivers/mfd/altera-a10sr.c
@@ -33,6 +33,10 @@
.name = "altr_a10sr_gpio",
.of_compatible = "altr,a10sr-gpio",
},
+ {
+ .name = "altr_a10sr_mon",
+ .of_compatible = "altr,a10sr-mon",
+ },
};
static bool altr_a10sr_reg_readable(struct device *dev, unsigned int reg)
--
1.7.9.5
^ permalink raw reply related
* [PATCH 2/4] misc: Add Altera Arria10 System Resource Control
From: tthayer at opensource.altera.com @ 2016-10-13 21:32 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476394329-31696-1-git-send-email-tthayer@opensource.altera.com>
From: Thor Thayer <tthayer@opensource.altera.com>
This patch adds the Altera Arria10 control & monitoring
functions to the Arria10 System Resource chip.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
---
MAINTAINERS | 1 +
drivers/misc/Kconfig | 7 ++
drivers/misc/Makefile | 1 +
drivers/misc/altera-a10sr-mon.c | 184 +++++++++++++++++++++++++++++++++++++++
4 files changed, 193 insertions(+)
create mode 100644 drivers/misc/altera-a10sr-mon.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 1cd38a7..a0919ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -625,6 +625,7 @@ M: Thor Thayer <tthayer@opensource.altera.com>
S: Maintained
F: drivers/gpio/gpio-altera-a10sr.c
F: drivers/mfd/altera-a10sr.c
+F: drivers/misc/altera-a10sr-mon.c
F: include/linux/mfd/altera-a10sr.h
ALTERA TRIPLE SPEED ETHERNET DRIVER
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 64971ba..9dd33c4 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -766,6 +766,13 @@ config PANEL_BOOT_MESSAGE
An empty message will only clear the display at driver init time. Any other
printf()-formatted message is valid with newline and escape codes.
+config ALTERA_A10SR_MON
+ tristate "Altera Arria10 System Resource Monitor"
+ depends on MFD_ALTERA_A10SR
+ help
+ This enables the System Resource monitor driver for the Altera
+ Arria10 DevKit.
+
source "drivers/misc/c2port/Kconfig"
source "drivers/misc/eeprom/Kconfig"
source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 3198336..fd69f0c 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -43,6 +43,7 @@ obj-y += ti-st/
obj-y += lis3lv02d/
obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
+obj-$(CONFIG_ALTERA_A10SR_MON) += altera-a10sr-mon.o
obj-$(CONFIG_INTEL_MEI) += mei/
obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o
diff --git a/drivers/misc/altera-a10sr-mon.c b/drivers/misc/altera-a10sr-mon.c
new file mode 100644
index 0000000..c07fb97
--- /dev/null
+++ b/drivers/misc/altera-a10sr-mon.c
@@ -0,0 +1,184 @@
+/*
+ * Copyright Altera Corporation (C) 2014-2016. All Rights Reserved
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Monitor driver for the Altera Arria10 MAX5 System Resource Chip
+ * Adapted from ics932s401.c
+ */
+
+#include <linux/err.h>
+#include <linux/mfd/altera-a10sr.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+struct altr_a10sr_regs {
+ struct regmap *regmap;
+ struct attribute_group attr_grp;
+};
+
+static ssize_t a10sr_show(struct device *dev,
+ struct device_attribute *devattr, char *buf);
+static ssize_t a10sr_store(struct device *dev,
+ struct device_attribute *devattr, const char *buf,
+ size_t count);
+
+/* Define FS entries */
+static DEVICE_ATTR(max5_version, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_led, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_button, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_button_irq, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_pg1, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_pg2, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_pg3, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_fmcab, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_hps_resets, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_per_resets, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_sfpa, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_sfpb, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_i2c_master, 0644, a10sr_show, a10sr_store);
+static DEVICE_ATTR(max5_wm_rst, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_wm_rst_key, 0444, a10sr_show, NULL);
+static DEVICE_ATTR(max5_pmbus, 0644, a10sr_show, a10sr_store);
+
+static struct attribute *altr_a10sr_attr[] = {
+ &dev_attr_max5_version.attr,
+ &dev_attr_max5_led.attr,
+ &dev_attr_max5_button.attr,
+ &dev_attr_max5_button_irq.attr,
+ &dev_attr_max5_pg1.attr,
+ &dev_attr_max5_pg2.attr,
+ &dev_attr_max5_pg3.attr,
+ &dev_attr_max5_fmcab.attr,
+ &dev_attr_max5_hps_resets.attr,
+ &dev_attr_max5_per_resets.attr,
+ &dev_attr_max5_sfpa.attr,
+ &dev_attr_max5_sfpb.attr,
+ &dev_attr_max5_i2c_master.attr,
+ &dev_attr_max5_wm_rst.attr,
+ &dev_attr_max5_wm_rst_key.attr,
+ &dev_attr_max5_pmbus.attr,
+ NULL
+};
+
+static const struct attribute_group a10sr_attr_group = {
+ .attrs = altr_a10sr_attr,
+};
+
+static ssize_t a10sr_show(struct device *dev, struct device_attribute *devattr,
+ char *buf)
+{
+ int i, ret;
+ unsigned int val;
+ struct altr_a10sr_regs *a10sr_regs = dev_get_drvdata(dev);
+
+ for (i = 0; i < ARRAY_SIZE(altr_a10sr_attr); i++) {
+ if (devattr == (struct device_attribute *)altr_a10sr_attr[i])
+ break;
+ }
+
+ if (i >= ARRAY_SIZE(altr_a10sr_attr))
+ return -EINVAL;
+
+ /* Shift because LS bit signifies Read/Write */
+ i <<= 1;
+ ret = regmap_read(a10sr_regs->regmap, i, &val);
+ if (ret < 0)
+ return ret;
+
+ return sprintf(buf, "0x%X\n", val);
+}
+
+static ssize_t a10sr_store(struct device *dev,
+ struct device_attribute *devattr, const char *buf,
+ size_t count)
+{
+ struct altr_a10sr_regs *a10sr_regs = dev_get_drvdata(dev);
+ unsigned long val;
+ int i, ret;
+
+ ret = kstrtol(buf, 0, &val);
+ if (ret < 0)
+ return ret;
+
+ for (i = 0; i < ARRAY_SIZE(altr_a10sr_attr); i++) {
+ if (devattr == (struct device_attribute *)altr_a10sr_attr[i])
+ break;
+ }
+ if (i >= ARRAY_SIZE(altr_a10sr_attr))
+ return -EINVAL;
+
+ /* Shift because LS bit signifies Read/Write */
+ i <<= 1;
+ ret = regmap_write(a10sr_regs->regmap, i, val);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static int altr_a10sr_regs_probe(struct platform_device *pdev)
+{
+ int ret;
+ struct altr_a10sr_regs *a10regs;
+ struct altr_a10sr *a10sr = dev_get_drvdata(pdev->dev.parent);
+
+ a10regs = devm_kzalloc(&pdev->dev, sizeof(*a10regs), GFP_KERNEL);
+ if (!a10regs)
+ return -ENOMEM;
+
+ a10regs->regmap = a10sr->regmap;
+ a10regs->attr_grp = a10sr_attr_group;
+
+ platform_set_drvdata(pdev, a10regs);
+
+ ret = sysfs_create_group(&pdev->dev.kobj, &a10sr_attr_group);
+ if (ret)
+ goto err_mem;
+
+ return 0;
+
+err_mem:
+ return ret;
+}
+
+static int altr_a10sr_regs_remove(struct platform_device *pdev)
+{
+ struct altr_a10sr_regs *a10regs = platform_get_drvdata(pdev);
+
+ sysfs_remove_group(&pdev->dev.kobj, &a10regs->attr_grp);
+
+ return 0;
+}
+
+static const struct of_device_id altr_a10sr_regs_of_match[] = {
+ { .compatible = "altr,a10sr-mon" },
+ { },
+};
+MODULE_DEVICE_TABLE(of, altr_a10sr_regs_of_match);
+
+static struct platform_driver altr_a10sr_regs_driver = {
+ .probe = altr_a10sr_regs_probe,
+ .remove = altr_a10sr_regs_remove,
+ .driver = {
+ .name = "altr_a10sr_mon",
+ .of_match_table = altr_a10sr_regs_of_match,
+ },
+};
+
+module_platform_driver(altr_a10sr_regs_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Thor Thayer");
+MODULE_DESCRIPTION("Monitor Driver for Altera Arria10 System Resource Chip");
--
1.7.9.5
^ permalink raw reply related
* [PATCH 1/4] dt-bindings: mfd: Add Altera Arria10 SR Monitor
From: tthayer at opensource.altera.com @ 2016-10-13 21:32 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476394329-31696-1-git-send-email-tthayer@opensource.altera.com>
From: Thor Thayer <tthayer@opensource.altera.com>
Add the Arria10 DevKit System Resource Chip register and state
monitoring module to the MFD.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
---
Note: This needs to be applied to the bindings document that
was Acked & Applied but didn't reach the for-next branch.
See https://patchwork.ozlabs.org/patch/629397/
---
---
.../devicetree/bindings/mfd/altera-a10sr.txt | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/Documentation/devicetree/bindings/mfd/altera-a10sr.txt b/Documentation/devicetree/bindings/mfd/altera-a10sr.txt
index ea151f2..0787ec6 100644
--- a/Documentation/devicetree/bindings/mfd/altera-a10sr.txt
+++ b/Documentation/devicetree/bindings/mfd/altera-a10sr.txt
@@ -18,6 +18,7 @@ The A10SR consists of these sub-devices:
Device Description
------ ----------
a10sr_gpio GPIO Controller
+a10sr_monitor Register and State Monitoring
Arria10 GPIO
Required Properties:
@@ -27,6 +28,10 @@ Required Properties:
the second cell is used to specify flags.
See ../gpio/gpio.txt for more information.
+Arria10 Register and State Monitor
+Required Properties:
+- compatible : Should be "altr,a10sr-mon"
+
Example:
resource-manager at 0 {
@@ -43,4 +48,8 @@ Example:
gpio-controller;
#gpio-cells = <2>;
};
+
+ a10sr_monitor {
+ compatible = "altr,a10sr-mon";
+ };
};
--
1.7.9.5
^ permalink raw reply related
* [PATCH 0/4] Add Altera A10SR Status & Control Monitor
From: tthayer at opensource.altera.com @ 2016-10-13 21:32 UTC (permalink / raw)
To: linux-arm-kernel
From: Thor Thayer <tthayer@opensource.altera.com>
This patch series adds the Altera Arria10 DevKit System Resource
chip's Status and Control Monitor to the A10SR Multi-Function
Device. An earlier patch added this to the hwmon class which
wasn't the proper place so this functionality is added to the
misc directory.
Thor Thayer (4):
dt-bindings: mfd: Add Altera Arria10 SR Monitor
misc: Add Altera Arria10 System Resource Control
mfd: altr-a10sr: Add Arria10 SR Monitor
ARM: socfpga: dts: Add Monitor to A10-SR MFD
.../devicetree/bindings/mfd/altera-a10sr.txt | 9 +
MAINTAINERS | 1 +
arch/arm/boot/dts/socfpga_arria10_socdk.dtsi | 4 +
drivers/mfd/altera-a10sr.c | 4 +
drivers/misc/Kconfig | 7 +
drivers/misc/Makefile | 1 +
drivers/misc/altera-a10sr-mon.c | 184 ++++++++++++++++++++
7 files changed, 210 insertions(+)
create mode 100644 drivers/misc/altera-a10sr-mon.c
--
1.7.9.5
^ permalink raw reply
* [PATCH V2 1/3] Revert "ACPI, PCI, IRQ: reduce static IRQ array size to 16"
From: Rafael J. Wysocki @ 2016-10-13 21:01 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <469134d3-3950-0d89-2599-64f917f09781@codeaurora.org>
On Thu, Oct 13, 2016 at 10:16 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 10/13/2016 4:02 PM, Bjorn Helgaas wrote:
>> On Thu, Oct 13, 2016 at 03:36:11PM -0400, Sinan Kaya wrote:
>>> On 10/13/2016 2:15 PM, Bjorn Helgaas wrote:
>>>> It seems like the problem is that we removed acpi_penalize_sci_irq(),
>>>> which told us the polarity and trigger mode. We tried to get that
>>>> information via irq_get_trigger_type(), but that didn't work in this
>>>> case because we use the acpi_irq_get_penalty() path before the SCI is
>>>> registered.
>>>>
>>>> It makes sense to me to add acpi_penalize_sci_irq() back in, which is
>>>> what patch [3/3] does.
>>>>
>>>> I don't understand how *this* patch, which basically just increases
>>>> the penalty array size from 16 to 256, helps fix the problem. It
>>>> seems like this patch should only matter if the SCI were some IRQ
>>>> between 16 and 255.
>>>
>>>
>>> I see your point. The original code supported 256 interrupts.
>>>
>>> The machine where we had the problem had an SCI interrupt of 11. So,
>>> this patch does not necessarily fix anything for this machine alone.
>>> However, to be safe; I wanted to go back to the old behavior to fix
>>> the SCI issue for all existing platforms.
>>
>> I saw a previous email that said the SCI interrupt could not be
>> greater than 256, but I don't know where that restriction is. I'm
>> pretty sure the FADT field is 2 bytes, which would mean it could be up
>> to 65535.
>>
>> To fix this problem, I think we only need to fix the penalty for the
>> SCI interrupt. It seems better to add a single "sci_penalty"
>> variable, set it to PIRQ_PENALTY_PCI_USING if it's level/low or
>> PIRQ_PENALTY_ISA_ALWAYS otherwise, and add "sci_penalty" in when
>> appropriate. That should fix it for *any* SCI IRQ, not just those
>> less than 256, and we don't have to add these extra penalty table
>> entries that are all unused (except possibly for one entry if we have
>> an SCI in the 16-255 range).
>>
>> Something like this:
>>
>> static int sci_irq, sci_penalty;
>>
>> void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
>> {
>> sci = irq;
>> if (trigger == ACPI_MADT_TRIGGER_LEVEL &&
>> polarity == ACPI_MADT_POLARITY_ACTIVE_LOW)
>> sci_penalty = PIRQ_PENALTY_PCI_USING;
>> else
>> sci_penalty = PIRQ_PENALTY_ISA_ALWAYS;
>> }
>>
>> static int acpi_irq_get_penalty(int irq)
>> {
>> int penalty = 0;
>>
>> if (irq == sci_irq)
>> penalty += sci_penalty;
>> ...
>> }
>>
>> One could argue that ACPI devices can use IRQs above 15, and we should
>> handle penalties for them, too. But the table is the wrong mechanism
>> for that, because it handles penalties for IRQs < 256, but IRQs above
>> that would mysteriously be handled differently.
>>
>> Bjorn
>>
>
> I agree this is a better approach. I think my math was wrong when figuring
> out what a max SCI interrupt could be.
OK
I will be expecting a new patch (or a new series) then.
Thanks,
Rafael
^ permalink raw reply
* [PATCH v5 01/14] drivers: iommu: add FWNODE_IOMMU fwnode type
From: Rafael J. Wysocki @ 2016-10-13 20:53 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161013163229.GA24568@red-moon>
Hi,
On Thu, Oct 13, 2016 at 6:32 PM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> Hi Rafael,
>
> On Fri, Sep 30, 2016 at 05:48:01PM +0200, Rafael J. Wysocki wrote:
>> On Fri, Sep 30, 2016 at 11:07 AM, Lorenzo Pieralisi
>> <lorenzo.pieralisi@arm.com> wrote:
>> > On Thu, Sep 29, 2016 at 10:59:40PM +0200, Rafael J. Wysocki wrote:
>> >> On Thursday, September 29, 2016 03:15:20 PM Lorenzo Pieralisi wrote:
>> >> > Hi Rafael,
>> >> >
>> >> > On Fri, Sep 09, 2016 at 03:23:30PM +0100, Lorenzo Pieralisi wrote:
>> >> > > On systems booting with a device tree, every struct device is
>> >> > > associated with a struct device_node, that represents its DT
>> >> > > representation. The device node can be used in generic kernel
>> >> > > contexts (eg IRQ translation, IOMMU streamid mapping), to
>> >> > > retrieve the properties associated with the device and carry
>> >> > > out kernel operation accordingly. Owing to the 1:1 relationship
>> >> > > between the device and its device_node, the device_node can also
>> >> > > be used as a look-up token for the device (eg looking up a device
>> >> > > through its device_node), to retrieve the device in kernel paths
>> >> > > where the device_node is available.
>> >> > >
>> >> > > On systems booting with ACPI, the same abstraction provided by
>> >> > > the device_node is required to provide look-up functionality.
>> >> > >
>> >> > > Therefore, mirroring the approach implemented in the IRQ domain
>> >> > > kernel layer, this patch adds an additional fwnode type FWNODE_IOMMU.
>> >> > >
>> >> > > This patch also implements a glue kernel layer that allows to
>> >> > > allocate/free FWNODE_IOMMU fwnode_handle structures and associate
>> >> > > them with IOMMU devices.
>> >> > >
>> >> > > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> >> > > Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
>> >> > > Cc: Joerg Roedel <joro@8bytes.org>
>> >> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>> >> > > ---
>> >> > > include/linux/fwnode.h | 1 +
>> >> > > include/linux/iommu.h | 25 +++++++++++++++++++++++++
>> >> > > 2 files changed, 26 insertions(+)
>> >> > >
>> >> > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
>> >> > > index 8516717..6e10050 100644
>> >> > > --- a/include/linux/fwnode.h
>> >> > > +++ b/include/linux/fwnode.h
>> >> > > @@ -19,6 +19,7 @@ enum fwnode_type {
>> >> > > FWNODE_ACPI_DATA,
>> >> > > FWNODE_PDATA,
>> >> > > FWNODE_IRQCHIP,
>> >> > > + FWNODE_IOMMU,
>> >> >
>> >> > This patch provides groundwork for this series and it is key for
>> >> > the rest of it, basically the point here is that we need a fwnode
>> >> > to differentiate platform devices created out of static ACPI tables
>> >> > entries (ie IORT), that represent IOMMU components.
>> >> >
>> >> > The corresponding device is not an ACPI device (I could fabricate one as
>> >> > it is done for other static tables entries eg FADT power button, but I
>> >> > do not necessarily see the reason for doing that given that all we need
>> >> > the fwnode for is a token identifier), so FWNODE_ACPI does not apply
>> >> > here.
>> >> >
>> >> > Please let me know if it is reasonable how I sorted this out (it
>> >> > is basically identical to IRQCHIP, just another enum entry), the
>> >> > remainder of the code depends on this.
>> >>
>> >> I'm not familiar with the use case, so I don't see anything unreasonable
>> >> in it.
>> >
>> > The use case is pretty simple: on ARM SMMU devices are platform devices.
>> > When booting with DT they are identified through an of_node and related
>> > FWNODE_OF type. When booting with ACPI, the ARM SMMU platform devices,
>> > to be equivalent to DT booting path, should be created out of static
>> > IORT table entries (that's how we describe SMMUs); we need to create
>> > a fwnode "token" to associate with those platform devices and that's
>> > not a FWNODE_ACPI (that is for an ACPI device firmware object, here we
>> > really do not need one), so this patch.
>> >
>> >> If you're asking about whether or not I mind adding more fwnode types in
>> >> principle, then no, I don't. :-)
>> >
>> > Yes, that's what I was asking, the only point that bugs me is that for
>> > both FWNODE_IRQCHIP and FWNODE_IOMMU the fwnode is just a "token" (ie a
>> > valid pointer) used for look-up and the type in the fwnode_handle is
>> > mostly there for error checking, I was wondering if we could create a
>> > specific fwnode_type for this specific usage (eg FWNODE_TAG and then add
>> > a type to it as part of its container struct) instead of adding an enum
>> > value per subsystem - it seems there are other fwnode types in the
>> > pipeline :), so I am asking:
>> >
>> > lkml.kernel.org/r/3D1468514043-21081-3-git-send-email-minyard at acm.org
>>
>> OK, I see your concern now, so thanks for presenting it so clearly.
>>
>> While I don't see anything wrong with having per-subsystem fwnode
>> types in principle, I agree that if the only purpose of them is to
>> mean "this comes from ACPI, but from a static table, not from the
>> namespace", it would be better to have a single fwnode type for that,
>> like FWNODE_ACPI_STATIC or similar.
>
> Coming back to this, I updated the series with new fwnode type
> FWNODE_ACPI_STATIC, which IMHO makes more sense (because that
> represents the FW interface it was obtained from rather than
> its content and plays better with upcoming extension above - DMI is a
> different firmware interface so it will be represented with a different
> fwnode type). Thanks.
OK
> However, I still have a question. The approach I took (create platform
> devices out of static IORT table entries for SMMUs) is common in ACPI
> (eg GHES, ACPI watchdog) even though those subsystems ignore the fwnode,
> but I think that's a detail.
>
> Still, fixed HW like power button and sleep button took a different
> approach, which consists in creating struct acpi_device objects out
> of FADT fixed HW features (with a NULL ACPI handle because there is
> no real ACPI object in the namespace for them).
That's how it was done in the past and the code is not functionally
problematic, so I saw no reason to re-implement it (which might
introduce regressions).
> I would like to understand the reasoning behind the difference (I
> think it is related to notification events and the need for an
> ACPI object for them - and sysfs userspace HID IF exposure ?).
I don't think there is a real need for that model and that's why it is
not used any more. It's legacy mostly.
> In theory (but looks crazy to me that's why I did not do it), I could
> fabricate a Device object in the ACPI namespace (?) (with _HID, _CRS and
> related properties - is that doable ?) out of the static table entry in
> the IORT table that provides the ARM SMMU component data (ie its MMIO
> space, IRQs and SMMU properties like cache coherency), this would
> allow the kernel to create a struct acpi_device (and related fwnode)
> + its physical node platform device but that looks insanely complicated
> (if feasible and more importantly if correct from an ACPI standpoint).
>
> This approach would allow me to match the SMMU driver with an _HID,
> the kernel would create a physical_node (ie platform_device) for
> me out of the namespace ACPI device object and I would get the
> FWNODE_ACPI for free (out of the struct acpi_device) instead of having
> to fiddle about with a new fwnode_handle type.
>
> I think this alternative approach (if doable at all) creates more issues
> than it solves but I wanted to make sure what I am doing is kosher
> from an ACPI perspective so I am asking.
That's most likely how I would do it, so fine by me. :-)
> I definitely think the current approach I took is much better, with its
> own downsides (eg matching the ARM SMMU driver by name instead of
> acpi_device_id/_HID), but I wanted to ask.
>
> The point is: ARM SMMU drivers are platform drivers. In DT the SMMUs
> are represented through DT nodes, in ACPI through _static_ IORT table
> entries, somehow a platform device must be created for them, so
> this whole series (and related fwnode issues).
Right. No disagreement.
Thanks,
Rafael
^ permalink raw reply
* Low network throughput on i.MX28
From: Uwe Kleine-König @ 2016-10-13 20:42 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <A57BCA39-0977-43C4-B22D-ED60F5E4B06D@embedded.rocks>
Hello,
On Thu, Oct 13, 2016 at 09:43:00PM +0200, J?rg Krause wrote:
> Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KARO-electronics.de>:
> >This is the iperf output on a TX28 with current mainline kernel
> >(4.8.0-rc5):
> >------------------------------------------------------------
> >Client connecting to 192.168.100.1, TCP port 5001
> >TCP window size: 43.8 KByte (default)
> >------------------------------------------------------------
> >[ 3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port
> >5001
> >[ ID] Interval Transfer Bandwidth
> >[ 3] 0.0-10.0 sec 57.5 MBytes 48.2 Mbits/sec
Just for the record: I have another i.MX28 system here and got 43
Mbits/sec with PREEMPT_NONE and 37 Mbits/sec with PREEMPT_RT both using
a 3.14 kernel.
> >You might check your kernel DEBUG configs (especially
> >CONFIG_DEBUG_PAGEALLOC).
>
> Thanks for sharing the iperf output. What LAN transceiver does the
> TX28 has assembled?
My system has a Marvell Switch (88e6083) as "transceiver".
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply
* [PATCH v3 0/3] usb/phy: Add Amlogic Meson8b and GXBB USB support
From: Martin Blumenstingl @ 2016-10-13 20:27 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161001121900.1168-1-martin.blumenstingl@googlemail.com>
Hi Kishon,
On Sat, Oct 1, 2016 at 2:18 PM, Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
> Martin Blumenstingl (3):
> Documentation: dt-bindings: update the meson-usb2-phy example
> Documentation: dt-bindings: rename meson-usb2-phy to meson8b-usb2-phy
these two already got an ACK by the devicetree maintainers
> phy: meson: add USB2 PHY support for Meson8b and GXBB
did you already have time to review the USB2 PHY driver from patch 3/3?
please let me know if there's something we have to change, then I'll
take care of it
would you like to get all patches through the PHY tree or should Kevin
take some of these?
Regards,
Martin
^ permalink raw reply
* [PATCH V2 1/3] Revert "ACPI, PCI, IRQ: reduce static IRQ array size to 16"
From: Sinan Kaya @ 2016-10-13 20:16 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161013200238.GE21529@localhost>
On 10/13/2016 4:02 PM, Bjorn Helgaas wrote:
> On Thu, Oct 13, 2016 at 03:36:11PM -0400, Sinan Kaya wrote:
>> On 10/13/2016 2:15 PM, Bjorn Helgaas wrote:
>>> It seems like the problem is that we removed acpi_penalize_sci_irq(),
>>> which told us the polarity and trigger mode. We tried to get that
>>> information via irq_get_trigger_type(), but that didn't work in this
>>> case because we use the acpi_irq_get_penalty() path before the SCI is
>>> registered.
>>>
>>> It makes sense to me to add acpi_penalize_sci_irq() back in, which is
>>> what patch [3/3] does.
>>>
>>> I don't understand how *this* patch, which basically just increases
>>> the penalty array size from 16 to 256, helps fix the problem. It
>>> seems like this patch should only matter if the SCI were some IRQ
>>> between 16 and 255.
>>
>>
>> I see your point. The original code supported 256 interrupts.
>>
>> The machine where we had the problem had an SCI interrupt of 11. So,
>> this patch does not necessarily fix anything for this machine alone.
>> However, to be safe; I wanted to go back to the old behavior to fix
>> the SCI issue for all existing platforms.
>
> I saw a previous email that said the SCI interrupt could not be
> greater than 256, but I don't know where that restriction is. I'm
> pretty sure the FADT field is 2 bytes, which would mean it could be up
> to 65535.
>
> To fix this problem, I think we only need to fix the penalty for the
> SCI interrupt. It seems better to add a single "sci_penalty"
> variable, set it to PIRQ_PENALTY_PCI_USING if it's level/low or
> PIRQ_PENALTY_ISA_ALWAYS otherwise, and add "sci_penalty" in when
> appropriate. That should fix it for *any* SCI IRQ, not just those
> less than 256, and we don't have to add these extra penalty table
> entries that are all unused (except possibly for one entry if we have
> an SCI in the 16-255 range).
>
> Something like this:
>
> static int sci_irq, sci_penalty;
>
> void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
> {
> sci = irq;
> if (trigger == ACPI_MADT_TRIGGER_LEVEL &&
> polarity == ACPI_MADT_POLARITY_ACTIVE_LOW)
> sci_penalty = PIRQ_PENALTY_PCI_USING;
> else
> sci_penalty = PIRQ_PENALTY_ISA_ALWAYS;
> }
>
> static int acpi_irq_get_penalty(int irq)
> {
> int penalty = 0;
>
> if (irq == sci_irq)
> penalty += sci_penalty;
> ...
> }
>
> One could argue that ACPI devices can use IRQs above 15, and we should
> handle penalties for them, too. But the table is the wrong mechanism
> for that, because it handles penalties for IRQs < 256, but IRQs above
> that would mysteriously be handled differently.
>
> Bjorn
>
I agree this is a better approach. I think my math was wrong when figuring
out what a max SCI interrupt could be.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH V3 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
From: Baicar, Tyler @ 2016-10-13 20:15 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <87mvi8seng.fsf@e105922-lin.cambridge.arm.com>
Hello Punit,
On 10/13/2016 4:54 AM, Punit Agrawal wrote:
> Hi Tyler,
>
> One last comment...
>
> Tyler Baicar <tbaicar@codeaurora.org> writes:
>
>> UEFI spec allows for non-standard section in Common Platform Error
>> Record. This is defined in section N.2.3 of UEFI version 2.5.
>>
>> Currently if the CPER section's type (UUID) does not match with
>> any section type that the kernel knows how to parse, trace event
>> is not generated for such section. And thus user is not able to know
>> happening of such hardware error, including error record of
>> non-standard section.
>>
>> This commit generates a trace event which contains raw error data
>> for unrecognized CPER section.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>> drivers/acpi/apei/ghes.c | 18 +++++++++++++++++-
>> drivers/ras/ras.c | 1 +
>> include/ras/ras_event.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 36894c8..cb4c7f4 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -49,6 +49,7 @@
>> #include <acpi/ghes.h>
>> #include <acpi/apei.h>
>> #include <asm/tlbflush.h>
>> +#include <ras/ras_event.h>
>>
>> #ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> #include <asm/system_misc.h>
>> @@ -468,12 +469,21 @@ static void ghes_do_proc(struct ghes *ghes,
>> int sev, sec_sev;
>> struct acpi_hest_generic_data *gdata;
>> uuid_le sec_type;
>> + uuid_le *fru_id;
>> + char *fru_text = "";
>>
>> sev = ghes_severity(estatus->error_severity);
>> apei_estatus_for_each_section(estatus, gdata) {
>> sec_sev = ghes_severity(gdata->error_severity);
>> sec_type = *(uuid_le *)gdata->section_type;
>>
>> + if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
>> + fru_id = (uuid_le *)gdata->fru_id;
>> + else
>> + fru_id = &NULL_UUID_LE;
> fru_id can be initialised at declaration and drop the else here. The
> same is already being done for fru_text.
Yes, I will make this change in the next version.
Thanks,
Tyler
> Thanks,
> Punit
>
>> + if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
>> + fru_text = gdata->fru_text;
>> +
>> if (!uuid_le_cmp(sec_type,
>> CPER_SEC_PLATFORM_MEM)) {
>> struct cper_sec_mem_err *mem_err;
> [...]
>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH V3 10/10] arm64: KVM: add guest SEA support
From: Baicar, Tyler @ 2016-10-13 20:14 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <87h98gs853.fsf@e105922-lin.cambridge.arm.com>
Hello Punit,
On 10/13/2016 7:14 AM, Punit Agrawal wrote:
> Hi Tyler,
>
> I know I've had my last comment already ;), but I thought I'd rather
> raise the question than stay confused...
>
> Tyler Baicar <tbaicar@codeaurora.org> writes:
>
>> Currently external aborts are unsupported by the guest abort
>> handling. Add handling for SEAs so that the host kernel reports
>> SEAs which occur in the guest kernel.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>> arch/arm/include/asm/kvm_arm.h | 1 +
>> arch/arm/include/asm/system_misc.h | 5 +++++
>> arch/arm/kvm/mmu.c | 15 +++++++++++++--
>> arch/arm64/include/asm/kvm_arm.h | 1 +
>> arch/arm64/include/asm/system_misc.h | 2 ++
>> arch/arm64/mm/fault.c | 13 +++++++++++++
>> 6 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index e22089f..33a77509 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -187,6 +187,7 @@
>> #define FSC_FAULT (0x04)
>> #define FSC_ACCESS (0x08)
>> #define FSC_PERM (0x0c)
>> +#define FSC_EXTABT (0x10)
>>
>> /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>> #define HPFAR_MASK (~0xf)
>> diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
>> index a3d61ad..86e1faa 100644
>> --- a/arch/arm/include/asm/system_misc.h
>> +++ b/arch/arm/include/asm/system_misc.h
>> @@ -24,4 +24,9 @@ extern unsigned int user_debug;
>>
>> #endif /* !__ASSEMBLY__ */
>>
>> +inline int handle_guest_sea(unsigned long addr, unsigned int esr)
>> +{
>> + return -1;
>> +}
>> +
>> #endif /* __ASM_ARM_SYSTEM_MISC_H */
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index e9a5c0e..24cde84 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -29,6 +29,7 @@
>> #include <asm/kvm_asm.h>
>> #include <asm/kvm_emulate.h>
>> #include <asm/virt.h>
>> +#include <asm/system_misc.h>
>>
>> #include "trace.h"
>>
>> @@ -1441,8 +1442,18 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>
>> /* Check the stage-2 fault is trans. fault or write fault */
>> fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>> - if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>> - fault_status != FSC_ACCESS) {
>> +
>> + if (fault_status == FSC_EXTABT) {
>> + if(handle_guest_sea((unsigned long)fault_ipa,
>> + kvm_vcpu_get_hsr(vcpu))) {
>> + kvm_err("Failed to handle guest SEA, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>> + kvm_vcpu_trap_get_class(vcpu),
>> + (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
>> + (unsigned long)kvm_vcpu_get_hsr(vcpu));
>> + return -EFAULT;
>> + }
>> + } else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>> + fault_status != FSC_ACCESS) {
>> kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>> kvm_vcpu_trap_get_class(vcpu),
>> (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
>> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
>> index 4b5c977..be0efb6 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -201,6 +201,7 @@
>> #define FSC_FAULT ESR_ELx_FSC_FAULT
>> #define FSC_ACCESS ESR_ELx_FSC_ACCESS
>> #define FSC_PERM ESR_ELx_FSC_PERM
>> +#define FSC_EXTABT ESR_ELx_FSC_EXTABT
>>
>> /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>> #define HPFAR_MASK (~UL(0xf))
>> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
>> index 90daf4a..8522fff 100644
>> --- a/arch/arm64/include/asm/system_misc.h
>> +++ b/arch/arm64/include/asm/system_misc.h
>> @@ -77,4 +77,6 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>> int sea_register_handler_chain(struct notifier_block *nb);
>> void sea_unregister_handler_chain(struct notifier_block *nb);
>>
>> +int handle_guest_sea(unsigned long addr, unsigned int esr);
>> +
>> #endif /* __ASM_SYSTEM_MISC_H */
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 81cb7ad..d714432 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -597,6 +597,19 @@ static const char *fault_name(unsigned int esr)
>> }
>>
>> /*
>> + * Handle Synchronous External Aborts that occur in a guest kernel.
>> + */
>> +int handle_guest_sea(unsigned long addr, unsigned int esr)
>> +{
>> + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>> +
>> + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> + fault_name(esr), esr, addr);
>> +
>> + return 0;
>> +}
> Don't we need to pass the abort to the guest?
This requires some infrastructure to implement virtual "ACPI platform
error interface" to expose the details of the abort to the guest. This
patchset does not cover that and focuses on feature parity with other
architectures that support APEI. There are discussions among Linaro
partners about how this can be achieved in the long term, but that work
is outside the scope of this patchset. This patch will ensure that if a
guest encounters one of these errors then it will be reported before
getting killed. Before this patch we would just get an unsupported FSC
print out and then the guest would be killed.
Thanks,
Tyler
>
> Thanks,
> Punit
>
>> +
>> +/*
>> * Dispatch a data abort to the relevant handler.
>> */
>> asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH V2 1/3] Revert "ACPI,PCI,IRQ: reduce static IRQ array size to 16"
From: Bjorn Helgaas @ 2016-10-13 20:02 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <498097a5-f75c-d2dd-f134-148b3fb8e1ed@codeaurora.org>
On Thu, Oct 13, 2016 at 03:36:11PM -0400, Sinan Kaya wrote:
> On 10/13/2016 2:15 PM, Bjorn Helgaas wrote:
> > It seems like the problem is that we removed acpi_penalize_sci_irq(),
> > which told us the polarity and trigger mode. We tried to get that
> > information via irq_get_trigger_type(), but that didn't work in this
> > case because we use the acpi_irq_get_penalty() path before the SCI is
> > registered.
> >
> > It makes sense to me to add acpi_penalize_sci_irq() back in, which is
> > what patch [3/3] does.
> >
> > I don't understand how *this* patch, which basically just increases
> > the penalty array size from 16 to 256, helps fix the problem. It
> > seems like this patch should only matter if the SCI were some IRQ
> > between 16 and 255.
>
>
> I see your point. The original code supported 256 interrupts.
>
> The machine where we had the problem had an SCI interrupt of 11. So,
> this patch does not necessarily fix anything for this machine alone.
> However, to be safe; I wanted to go back to the old behavior to fix
> the SCI issue for all existing platforms.
I saw a previous email that said the SCI interrupt could not be
greater than 256, but I don't know where that restriction is. I'm
pretty sure the FADT field is 2 bytes, which would mean it could be up
to 65535.
To fix this problem, I think we only need to fix the penalty for the
SCI interrupt. It seems better to add a single "sci_penalty"
variable, set it to PIRQ_PENALTY_PCI_USING if it's level/low or
PIRQ_PENALTY_ISA_ALWAYS otherwise, and add "sci_penalty" in when
appropriate. That should fix it for *any* SCI IRQ, not just those
less than 256, and we don't have to add these extra penalty table
entries that are all unused (except possibly for one entry if we have
an SCI in the 16-255 range).
Something like this:
static int sci_irq, sci_penalty;
void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
{
sci = irq;
if (trigger == ACPI_MADT_TRIGGER_LEVEL &&
polarity == ACPI_MADT_POLARITY_ACTIVE_LOW)
sci_penalty = PIRQ_PENALTY_PCI_USING;
else
sci_penalty = PIRQ_PENALTY_ISA_ALWAYS;
}
static int acpi_irq_get_penalty(int irq)
{
int penalty = 0;
if (irq == sci_irq)
penalty += sci_penalty;
...
}
One could argue that ACPI devices can use IRQs above 15, and we should
handle penalties for them, too. But the table is the wrong mechanism
for that, because it handles penalties for IRQs < 256, but IRQs above
that would mysteriously be handled differently.
Bjorn
^ permalink raw reply
* [RFC] arm64: Enforce observed order for spinlock and data
From: bdegraaf at codeaurora.org @ 2016-10-13 20:00 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161013110256.GA436@arm.com>
On 2016-10-13 07:02, Will Deacon wrote:
> Brent,
>
> On Wed, Oct 12, 2016 at 04:01:06PM -0400, bdegraaf at codeaurora.org
> wrote:
>
> Everything from this point down needs clarification.
>
>> All arm64 lockref accesses that occur without taking the spinlock must
>> behave like true atomics, ensuring successive operations are all done
>> sequentially.
>
> What is a "true atomic"? What do you mean by "successive"? What do you
> mean by "done sequentially"?
>
Despite the use case in dentry, lockref itself is a generic locking API,
and
any problems I describe here are with the generic API itself, not
necessarily
the dentry use case. I'm not patching dentry--I'm fixing lockref.
By necessity, the API must do its update atomically, as keeping things
correct
involves potential spinlock access by other agents which may opt to use
the
spinlock API or the lockref API at their discretion. With the current
arm64
spinlock implementation, it is possible for the lockref to observe the
changed
contents of the protected count without observing the spinlock being
locked,
which could lead to missed changes to the lock_count itself, because any
calculations made on it could be overwritten or completed in a different
sequence.
(Spinlock locked access is obtained with a simple store under certain
scenarios. My attempt to fix this in the spinlock code was met with
resistance
saying it should be addressed in lockref, since that is the API that
would
encounter the issue. These changes were initiated in response to that
request.
Additional ordering problems were uncovered when I looked into lockref
itself.)
The example below involves only a single agent exactly as you explain
the
problem in commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67. Even for a
single
execution agent, this means that the code below could access out of
order.
As lockref is a generic API, it doesn't matter whether dentry does this
or not.
By "done sequentially," I mean "accessed in program order."
As far as "true atomic" goes, I am referring to an atomic in the same
sense you
did in commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67.
> The guarantee provided by lockref is that, if you hold the spinlock,
> then
> you don't need to use atomics to inspect the reference count, as it is
> guaranteed to be stable. You can't just go around replacing spin_lock
> calls with lockref_get -- that's not what this is about.
>
I am not sure where you got the idea that I was referring to replacing
spinlocks
with lockref calls. That is not the foundation for this fix.
>> Currently
>> the lockref accesses, when decompiled, look like the following
>> sequence:
>>
>> <Lockref "unlocked" Access [A]>
>>
>> // Lockref "unlocked" (B)
>> 1: ldxr x0, [B] // Exclusive load
>> <change lock_count B>
>> stxr w1, x0, [B]
>> cbnz w1, 1b
>>
>> <Lockref "unlocked" Access [C]>
>>
>> Even though access to the lock_count is protected by exclusives, this
>> is not
>> enough
>> to guarantee order: The lock_count must change atomically, in order,
>> so the
>> only
>> permitted ordering would be:
>> A -> B -> C
>
> Says who? Please point me at a piece of code that relies on this. I'm
> willing to believe that are bugs in this area, but waving your hands
> around
> and saying certain properties "must" hold is not helpful unless you can
> say *why* they must hold and *where* that is required.
>
The lockref code must access in order, because other agents can observe
it via
spinlock OR lockref APIs. Again, this is a generic API, not an explicit
part of
dentry. Other code will use it, and the manner in which it is used in
dentry is not
relevant. What lock_count is changed to is not proscribed by the
lockref
API. There is no guarantee whether it be an add, subtract, multiply,
divide, set
to some explicit value, etc. But the changes must be done in program
order and
observable in that same order by other agents: Therefore, the spinlock
and lock_count
must be accessed atomically, and observed to change atomically at the
system level.
I am not off base saying lockref is an atomic access. Here are some
references:
Under Documentation/filesystems/path-lookup.md, the dentry->d_lockref
mechanism
is described as an atomic access.
At the time lockref was introduced, The Linux Foundation gave a
presentation at
LinuxCon 2014 that can be found at the following link:
https://events.linuxfoundation.org/sites/events/files/slides/linuxcon-2014-locking-final.pdf
On page 46, it outlines the lockref API. The first lines of the slide
give the
relevant details.
Lockref
? *Generic* mechanism to *atomically* update a reference count that is
protected by a spinlock without actually acquiring the spinlock
itself.
While dentry's use is mentioned, this API is not restricted to the use
case of dentry.
>> Unfortunately, this is not the case by the letter of the architecture
>> and,
>> in fact,
>> the accesses to A and C are not protected by any sort of barrier, and
>> hence
>> are
>> permitted to reorder freely, resulting in orderings such as
>>
>> Bl -> A -> C -> Bs
>
> Again, why is this a problem? It's exactly the same as if you did:
>
> spin_lock(lock);
> inc_ref_cnt();
> spin_unlock(lock);
>
> Accesses outside of the critical section can still be reordered. Big
> deal.
>
Since the current code resembles but actually has *fewer* ordering
effects
compared to the example used by your atomic.h commit, even though
A->B->C is in
program order, it could access out of order according to your own commit
text
on commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67.
Taking spin_lock/spin_unlock, however, includes ordering by nature of
the
load-acquire observing the store-release of a prior unlock, so ordering
is
enforced with the spinlock version of accesses. The lockref itself has
*no*
ordering enforced, unless a locked state is encountered and it falls
back
to the spinlock code. So this is a fundamental difference between
lockref and
spinlock. So, no, lockref ordering is currently not exactly the same as
spinlock--but it should be.
>> In this specific scenario, since "change lock_count" could be an
>> increment, a decrement or even a set to a specific value, there could
>> be
>> trouble.
>
> What trouble?
>
Take, for example, a use case where the ref count is either positive or
zero.
If increments and decrements hit out of order, a decrement that was
supposed
to come after an increment would instead do nothing if the value of the
lock
started at zero. Then when the increment hit later, the ref count would
remain
positive with a net effect of +1 to the ref count instead of +1-1=0.
Again,
however, the lockref does not specify how the contents of lock_count are
manipulated, it was only meant to guarantee that they are done
atomically when
the lock is not held.
>> With more agents accessing the lockref without taking the lock, even
>> scenarios where the cmpxchg passes falsely can be encountered, as
>> there is
>> no guarantee that the the "old" value will not match exactly a newer
>> value
>> due to out-of-order access by a combination of agents that increment
>> and
>> decrement the lock_count by the same amount.
>
> This is the A-B-A problem, but I don't see why it affects us here.
> We're
> dealing with a single reference count.
>
If lockref accesses were to occur on many Pe's, there are all sorts of
things that could happen in terms of who wins what, and what they set
the
lock_count to. My point is simply that each access should be atomic
because
lockref is a generic API and was intended to be a lockless atomic
access.
Leaving this problem open until someone else introduces a use that
exposes
it, which could happen in the main kernel code, is probably not a good
idea, as it could prove difficult to track down.
>> Since multiple agents are accessing this without locking the spinlock,
>> this access must have the same protections in place as atomics do in
>> the
>> arch's atomic.h.
>
> Why? I don't think that it does. Have a look at how lockref is used by
> the dcache code: it's really about keeping a reference to a dentry,
> which may be in the process of being unhashed and removed. The
> interaction with concurrent updaters to the dentry itself is handled
> using a seqlock, which does have the necessary barriers. Yes, the code
> is extremely complicated, but given that you're reporting issues based
> on code inspection, then you'll need to understand what you're
> changing.
>
Again, this is a generic API, not an API married to dentry. If it were
for
dentry's sole use, it should not be accessible outside of the dentry
code.
While the cmpxchg64_relaxed case may be OK for dentry, it is not OK for
the
generic case.
>> Fortunately, the fix is not complicated: merely removing the errant
>> _relaxed option on the cmpxchg64 is enough to introduce exactly the
>> same
>> code sequence justified in commit
>> 8e86f0b409a44193f1587e87b69c5dcf8f65be67
>> to fix arm64 atomics.
>
> I introduced cmpxchg64_relaxed precisely for the lockref case. I still
> don't see a compelling reason to strengthen it. If you think there's a
> bug,
> please spend the effort to describe how it manifests and what can
> actually
> go wrong in the existing codebase. Your previous patches fixing
> so-called
> bugs found by inspection have both turned out to be bogus, so I'm
> sorry,
> but I'm not exactly leaping on your contributions to this.
>
> Will
I have detailed the problems here, and they are with the generic case,
no
hand waving required.
On a further note, it is not accurate to say that my prior patches were
bogus: One called to attention a yet-to-be-corrected problem in the
ARMv8
Programmer's Guide, and the other was sidestepped by a refactor that
addressed the problem I set out to fix with a control flow change. Since
that problem was the fundamental reason I had worked on the gettime code
in the first place, I abandoned my effort. The refactor that fixed the
control-flow problem, however, is still missing on v4.7 and earlier
kernels
(sequence lock logic should be verified prior to the isb that demarcates
the virtual counter register read). I have confirmed this is an issue on
various armv8 hardware, sometimes obtaining identical register values
between multiple reads that were delayed such that they should have
shown
changes, evidence that the register read accessed prior to the seqlock
update having finished (the control flow problem).
Brent
^ permalink raw reply
* [PATCH] arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y
From: Timur Tabi @ 2016-10-13 19:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476376929-29688-1-git-send-email-ard.biesheuvel@linaro.org>
Ard Biesheuvel wrote:
> As it turns out, the KASLR code breaks CONFIG_MODVERSIONS, since the
> kcrctab has an absolute address field that is relocated at runtime
> when the kernel offset is randomized.
>
> This has been fixed already for PowerPC in the past, so simply wire up
> the existing code dealing with this issue.
>
> Signed-off-by: Ard Biesheuvel<ard.biesheuvel@linaro.org>
Tested-by: Timur Tabi <timur@codeaurora.org>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Low network throughput on i.MX28
From: Jörg Krause @ 2016-10-13 19:43 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161013084807.6a231fdb@ipc1.ka-ro>
Hi Lothar,
Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KARO-electronics.de>:
>Hi,
>
>On Thu, 13 Oct 2016 01:09:13 +0200 J?rg Krause wrote:
>> Hi,
>>
>> I am using a custom i.MX28 board similar to the i.MX28-EVK. For Wi-Fi
>> the board assembles a BCM43362 from Broadcom and for Ethernet a
>> LAN8720A from Microchip. The board is running mainline Linux 4.7.
>>
>> While both, wireless and wired network interfaces work, the TCP
>> throughput measured with iperf is low.
>>
>> The bandwith for Ethernet is between 20-30 MBits/sec and for WLAN is
>> about 4-5 MBits/sec.
>>
>> There exists an Application Note "i.MX28 Ethernet Performance on
>> Linux" [1] which shows a bandwith of > 60 MBits/sec. A user an the
>NXP
>> forum [2] told he achieved 20 MBits/sec with some Qualcom chip.
>>
>> Note, that these values are most probably measured with the legacy
>> Linux Kernel 2.6.35 from NXP.
>>
>> Does anybody has done throughput tests on i.MX28 with mainline
>Kernel?
>> If so, what are the results? What might be the bottleneck?
>>
>
>This is the iperf output on a TX28 with current mainline kernel
>(4.8.0-rc5):
>------------------------------------------------------------
>Client connecting to 192.168.100.1, TCP port 5001
>TCP window size: 43.8 KByte (default)
>------------------------------------------------------------
>[ 3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port
>5001
>[ ID] Interval Transfer Bandwidth
>[ 3] 0.0-10.0 sec 57.5 MBytes 48.2 Mbits/sec
>
>You might check your kernel DEBUG configs (especially
>CONFIG_DEBUG_PAGEALLOC).
Thanks for sharing the iperf output. What LAN transceiver does the TX28 has assembled?
I checked the config and is has no DEBUG_PAGEALLOC enabled and no DEBUG options related to network.
Best regards
J?rg Krause
^ permalink raw reply
* [PATCH V3 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
From: Baicar, Tyler @ 2016-10-13 19:37 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <064dca7d-0625-c0d0-09ae-72ef7abdc63f@arm.com>
Hello Suzuki,
On 10/13/2016 2:50 AM, Suzuki K Poulose wrote:
> On 12/10/16 23:10, Baicar, Tyler wrote:
>> Hello Suzuki,
>>
>> Thank you for the feedback! Responses below.
>>
>> On 10/11/2016 11:28 AM, Suzuki K Poulose wrote:
>>> On 07/10/16 22:31, Tyler Baicar wrote:
>>>> Currently when a RAS error is reported it is not timestamped.
>>>> The ACPI 6.1 spec adds the timestamp field to the generic error
>>>> data entry v3 structure. The timestamp of when the firmware
>>>> generated the error is now being reported.
>>>>
>>>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>>>> Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
>>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>>>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>>>
>>> Please could you keep the people who reviewed/commented on your
>>> series in the past,
>>> whenever you post a new version ?
>> Do you mean to just send the new version to their e-mail directly in
>> addition to the lists? If so, I will do that next time.
>
> If you send a new version of a series to the list, it is a good idea
> to keep
> the people who commented (significantly) on your previous version in
> Cc, especially
> when you have addressed their feedback. That will help them to keep
> track of the
> series. People can always see the new version in the list, but then it
> is so easy
> to miss something in the 100s of emails you get each day. I am sure
> people have
> special filters for the emails based on if they are in Cc/To etc.
>
Okay, understood. I'll make sure to add those who have commented in the
cc/to list to avoid the e-mail filters.
>>
>> I know you provided good feedback on the previous patchset, but I did
>> not have anyone specifically respond to add "reviewed-by:...". I
>> don't think I should add reviewed-by for someone unless they
>> specifically add it in a response :)
>
> No, I haven't yet "Reviewed-by" your patches. I had some comments on
> it, which means
> I expected it to be addressed as you committed in your response.
>
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 3021f0e..c8488f1 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -80,6 +80,10 @@
>
>> I think that should work to avoid duplication. I will move them to a
>> header file in the next patchset.
>>>> +
>>>> +static void cper_estatus_print_section_v300(const char *pfx,
>>>> + const struct acpi_hest_generic_data_v300 *gdata)
>>>> +{
>>>> + __u8 hour, min, sec, day, mon, year, century, *timestamp;
>>>> +
>>>> + if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
>>>> + timestamp = (__u8 *)&(gdata->time_stamp);
>>>> + memcpy(&sec, timestamp, 1);
>>>> + memcpy(&min, timestamp + 1, 1);
>>>> + memcpy(&hour, timestamp + 2, 1);
>>>> + memcpy(&day, timestamp + 4, 1);
>>>> + memcpy(&mon, timestamp + 5, 1);
>>>> + memcpy(&year, timestamp + 6, 1);
>>>> + memcpy(¢ury, timestamp + 7, 1);
>>>> + printk("%stime: ", pfx);
>>>> + printk("%7s", 0x01 & *(timestamp + 3) ? "precise" : "");
>>>
>>> What format is the (timestamp + 3) stored in ? Does it need
>>> conversion ?
>> The third byte of the timestamp is currently only used to determine
>> if the time is precise or not. Bit 0 is used to specify that and the
>> other bits in this byte are marked as reserved. This is shown in
>> table 247 of the UEFI spec 2.6:
>>
>> Byte 3:
>> Bit 0 ? Timestamp is precise if this bit is set and correlates to
>> the time of the error event.
>> Bit 7:1 ? Reserved
>
> Is it always the same endianness as that of the CPU ?
It is a fair assumption that the firmware populating this record will
use a CPU of the same endianness. There is no mechanism in the spec to
indicate otherwise.
Thanks,
Tyler
>
> Cheers
> Suzuki
>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH V2 1/3] Revert "ACPI, PCI, IRQ: reduce static IRQ array size to 16"
From: Sinan Kaya @ 2016-10-13 19:36 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161013181535.GB21529@localhost>
On 10/13/2016 2:15 PM, Bjorn Helgaas wrote:
> It seems like the problem is that we removed acpi_penalize_sci_irq(),
> which told us the polarity and trigger mode. We tried to get that
> information via irq_get_trigger_type(), but that didn't work in this
> case because we use the acpi_irq_get_penalty() path before the SCI is
> registered.
>
> It makes sense to me to add acpi_penalize_sci_irq() back in, which is
> what patch [3/3] does.
>
> I don't understand how *this* patch, which basically just increases
> the penalty array size from 16 to 256, helps fix the problem. It
> seems like this patch should only matter if the SCI were some IRQ
> between 16 and 255.
I see your point. The original code supported 256 interrupts.
The machine where we had the problem had an SCI interrupt of 11. So,
this patch does not necessarily fix anything for this machine alone.
However, to be safe; I wanted to go back to the old behavior to fix
the SCI issue for all existing platforms.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH v2] drm/bridge: analogix: protect power when get_modes or detect
From: Sean Paul @ 2016-10-13 19:15 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476323438-8435-1-git-send-email-mark.yao@rock-chips.com>
On Wed, Oct 12, 2016 at 9:50 PM, Mark Yao <mark.yao@rock-chips.com> wrote:
> The drm callback ->detect and ->get_modes seems is not power safe,
> they may be called when device is power off, do register access on
> detect or get_modes will cause system die.
>
> Here is the path call ->detect before analogix_dp power on
> [<ffffff800843babc>] analogix_dp_detect+0x44/0xdc
> [<ffffff80083fd840>] drm_helper_probe_single_connector_modes_merge_bits+0xe8/0x41c
> [<ffffff80083fdb84>] drm_helper_probe_single_connector_modes+0x10/0x18
> [<ffffff8008418d24>] drm_mode_getconnector+0xf4/0x304
> [<ffffff800840cff0>] drm_ioctl+0x23c/0x390
> [<ffffff80081a8adc>] do_vfs_ioctl+0x4b8/0x58c
> [<ffffff80081a8c10>] SyS_ioctl+0x60/0x88
>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Sean Paul <seanpaul@chromium.org>
> Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> Cc: "Ville Syrj?l?" <ville.syrjala@linux.intel.com>
>
> Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Thanks for revising, this is much better.
Reviewed-by: Sean Paul <seanpaul@chromium.org>
> ---
> Changes in v2:
> - remove sub device power on/off callback, use pm_runtime_get/put is enough
> to fix my problem, so will can avoid race to dpms.
>
> drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> index efac8ab..ff2d328 100644
> --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> @@ -1062,11 +1062,15 @@ int analogix_dp_get_modes(struct drm_connector *connector)
> return 0;
> }
>
> + pm_runtime_get_sync(dp->dev);
> +
> if (analogix_dp_handle_edid(dp) == 0) {
> drm_mode_connector_update_edid_property(&dp->connector, edid);
> num_modes += drm_add_edid_modes(&dp->connector, edid);
> }
>
> + pm_runtime_put(dp->dev);
> +
> if (dp->plat_data->panel)
> num_modes += drm_panel_get_modes(dp->plat_data->panel);
>
> @@ -1106,9 +1110,13 @@ analogix_dp_detect(struct drm_connector *connector, bool force)
> return connector_status_disconnected;
> }
>
> + pm_runtime_get_sync(dp->dev);
> +
> if (!analogix_dp_detect_hpd(dp))
> status = connector_status_connected;
>
> + pm_runtime_put(dp->dev);
> +
> ret = analogix_dp_prepare_panel(dp, false, false);
> if (ret)
> DRM_ERROR("Failed to unprepare panel (%d)\n", ret);
> --
> 1.9.1
>
>
^ permalink raw reply
* [PATCH V2 3/3] Revert "ACPI,PCI,IRQ: remove SCI penalize function"
From: Bjorn Helgaas @ 2016-10-13 18:25 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1475615720-31047-4-git-send-email-okaya@codeaurora.org>
On Tue, Oct 04, 2016 at 05:15:19PM -0400, Sinan Kaya wrote:
> The SCI function was removed in two steps (first refactor and then remove).
> This patch does the revert in one step.
>
> The commit 103544d86976 ("ACPI,PCI,IRQ: reduce resource requirements")
> refactored the original code so that SCI penalty is calculated dynamically
> by the time get_penalty function is called. This patch does a partial
> revert for the SCI functionality only.
>
> The commit 9e5ed6d1fb87 ("ACPI,PCI,IRQ: remove SCI penalize function") is
> for the removal of the function. SCI penalty API was replaced by the
> runtime penalty calculation based on the value of
> acpi_gbl_FADT.sci_interrupt.
>
> The IRQ type does not get updated at the right time for some platforms and
> results in incorrect penalty assignment for PCI IRQs as
> irq_get_trigger_type returns the wrong type.
>
> The register_gsi function delivers the IRQ found in the ACPI table to
> the interrupt controller driver. Penalties are calculated before a
> link object is enabled to find out which interrupt has the least number
> of users. By the time penalties are calculated, the IRQ is not registered
> yet and the API returns the wrong type.
>
> Tested-by: Jonathan Liu <net147@gmail.com>
> Tested-by: Ondrej Zary <linux@rainbow-software.org>
> Link: https://lkml.org/lkml/2016/10/4/283
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> arch/x86/kernel/acpi/boot.c | 1 +
> drivers/acpi/pci_link.c | 34 ++++++++++++++--------------------
> include/linux/acpi.h | 1 +
> 3 files changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 90d84c3..0ffd26e 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -453,6 +453,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger,
> polarity = acpi_sci_flags & ACPI_MADT_POLARITY_MASK;
>
> mp_override_legacy_irq(bus_irq, polarity, trigger, gsi);
> + acpi_penalize_sci_irq(bus_irq, trigger, polarity);
>
> /*
> * stash over-ride to indicate we've been here
> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> index 06c2a11..6a2af19 100644
> --- a/drivers/acpi/pci_link.c
> +++ b/drivers/acpi/pci_link.c
> @@ -495,27 +495,10 @@ static int acpi_irq_pci_sharing_penalty(int irq)
>
> static int acpi_irq_get_penalty(int irq)
> {
> - int penalty = 0;
> -
> - /*
> - * Penalize IRQ used by ACPI SCI. If ACPI SCI pin attributes conflict
> - * with PCI IRQ attributes, mark ACPI SCI as ISA_ALWAYS so it won't be
> - * use for PCI IRQs.
> - */
> - if (irq == acpi_gbl_FADT.sci_interrupt) {
> - u32 type = irq_get_trigger_type(irq) & IRQ_TYPE_SENSE_MASK;
> -
> - if (type != IRQ_TYPE_LEVEL_LOW)
> - penalty += PIRQ_PENALTY_ISA_ALWAYS;
> - else
> - penalty += PIRQ_PENALTY_PCI_USING;
> - }
> -
> - if (irq < ACPI_MAX_ISA_IRQS)
> - return penalty + acpi_irq_penalty[irq];
> + if (irq < ACPI_MAX_IRQS)
> + return acpi_irq_penalty[irq];
>
> - penalty += acpi_irq_pci_sharing_penalty(irq);
> - return penalty;
> + return acpi_irq_pci_sharing_penalty(irq);
> }
>
> int __init acpi_irq_penalty_init(void)
> @@ -886,6 +869,17 @@ bool acpi_isa_irq_available(int irq)
> acpi_irq_get_penalty(irq) < PIRQ_PENALTY_ISA_ALWAYS);
> }
>
> +void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
> +{
> + if (irq >= 0 && irq < ARRAY_SIZE(acpi_irq_penalty)) {
> + if (trigger != ACPI_MADT_TRIGGER_LEVEL ||
> + polarity != ACPI_MADT_POLARITY_ACTIVE_LOW)
> + acpi_irq_penalty[irq] += PIRQ_PENALTY_ISA_ALWAYS;
> + else
> + acpi_irq_penalty[irq] += PIRQ_PENALTY_PCI_USING;
This would be a lot easier to read if the trigger/polarity tests were
positive, e.g.,
if (trigger == ACPI_MADT_TRIGGER_LEVEL &&
polarity == ACPI_MADT_POLARITY_ACTIVE_LOW)
acpi_irq_penalty[irq] += PIRQ_PENALTY_PCI_USING;
else
acpi_irq_penalty[irq] += PIRQ_PENALTY_ISA_ALWAYS;
> + }
> +}
> +
> /*
> * Over-ride default table to reserve additional IRQs for use by ISA
> * e.g. acpi_irq_isa=5
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 4d8452c..85ac7d5 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -318,6 +318,7 @@ struct pci_dev;
> int acpi_pci_irq_enable (struct pci_dev *dev);
> void acpi_penalize_isa_irq(int irq, int active);
> bool acpi_isa_irq_available(int irq);
> +void acpi_penalize_sci_irq(int irq, int trigger, int polarity);
> void acpi_pci_irq_disable (struct pci_dev *dev);
>
> extern int ec_read(u8 addr, u8 *val);
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH v2] perf: xgene: Remove bogus IS_ERR() check
From: Al Viro @ 2016-10-13 18:18 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1476382156-11641-1-git-send-email-ttnguyen@apm.com>
On Thu, Oct 13, 2016 at 11:09:16AM -0700, Tai Nguyen wrote:
> In acpi_get_pmu_hw_inf we pass the address of a local variable to IS_ERR(),
> which doesn't make sense, as the pointer must be a real, valid pointer.
> This doesn't cause a functional problem, as IS_ERR() will evaluate as
> false, but the check is bogus and causes static checkers to complain.
... unless the test is actually a misspelled IS_ERR(res) and the current
code is broken by effectively skipping it.
^ permalink raw reply
* [PATCH V2 1/3] Revert "ACPI,PCI,IRQ: reduce static IRQ array size to 16"
From: Bjorn Helgaas @ 2016-10-13 18:15 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <09f93442-89c8-7b2c-467d-d29b857739ff@codeaurora.org>
On Wed, Oct 12, 2016 at 06:46:11PM -0400, Sinan Kaya wrote:
> Hi Bjorn,
>
> On 10/12/2016 6:13 PM, Bjorn Helgaas wrote:
> > Hi Sinan,
> >
> > I have to apologize because I haven't followed all the discussion and
> > now I'm trying to figure it out from the patches and changelogs. But
> > I guess that's not all bad, because future interested folks *should*
> > be able to figure things out from that :)
>
> Sure, np. I figured you are busy with the new baseline. Then, I saw a
> series of patches coming from you.
>
> >
> > On Tue, Oct 04, 2016 at 05:15:17PM -0400, Sinan Kaya wrote:
> >> This reverts commit 5c5087a55390 ("ACPI,PCI,IRQ: reduce static IRQ array
> >> size to 16").
> >>
> >> The code maintains a fixed size array for IRQ penalties. The array
> >> gets updated by external calls such as acpi_penalize_sci_irq,
> >> acpi_penalize_isa_irq to reflect the actual interrupt usage of the
> >> system. Since the IRQ distribution is platform specific, this is
> >> not known ahead of time. The IRQs get updated based on the SCI
> >> interrupt number BIOS has chosen or the ISA IRQs that were assigned
> >> to existing peripherals.
> >>
> >> By the time ACPI gets initialized, this code tries to determine an
> >> IRQ number based on penalty values in this array. It will try to locate
> >> the IRQ with the least penalty assignment so that interrupt sharing is
> >> avoided if possible.
> >>
> >> A couple of notes about the external APIs:
> >> 1. These API can be called before the ACPI is started. Therefore, one
> >> cannot assume that the PCI link objects are initialized for calculating
> >> penalties.
> >
> > Which API are you thinking about here? pcibios_penalize_isa_irq() is
> > called by ACPI and PNP, which should both be after ACPI is started.
>
> Correct, I was talking about acpi_penalize_sci_irq function here.
>
> >
> > My guess is you're thinking about acpi_penalize_sci_irq() (added back
> > later in this series), which is called here, which is definitely
> > before ACPI objects are available:
> >
> > setup_arch
> > acpi_boot_init
> > acpi_process_madt
> > acpi_parse_madt_ioapic_entries
> > acpi_table_parse_madt
> > acpi_parse_int_src_ovr
> > acpi_sci_ioapic_setup
> > acpi_penalize_sci_irq # <---
> >
> >> 2. The polarity and trigger information passed via the
> >> acpi_penalize_sci_irq from the BIOS may not match what the IRQ subsystem
> >> is reporting as the call might have been placed before the IRQ is
> >> registered by the interrupt subsystem.
> >>
> >> The previous change was in the direction to remove these external API and
> >> try to calculate the penalties at runtime for the ISA path as well. This
> >> didn't work out well with the existing platforms.
> >>
> >> Restoring the old behavior for IRQ < 256 and the new behavior will remain
> >> effective for IRQ >= 256.
> >
> > IIRC, this all started because we needed more than 256 IRQs, but we
> > didn't know how to size a static table to be large enough without
> > being wasteful.
>
> Correct. We only need 1024 for ARM/ARM64. But, we wanted to remove this
> restriction altogether to be arch proof. One of my earlier proposal was
> to just resize the array to 1024. I was asked if I was wasting resources
> by resizing to 1024.
>
> >
> > Prior to 5c5087a55390, we tracked penalties for IRQs 0-255. After it,
> > we only tracked penalties for IRQs 0-15. I think this patch basically
> > makes it so we track 0-255 again.
>
> Yes, we went back to 256 interrupts after the revert.
>
> >
> > *This* patch only increases the range for pcibios_penalize_isa_irq()
> > (and command-line hints, but hopefully nobody cares about those). A
> > subsequent patch increases it for SCI as well.
> >
> > The name "ACPI_MAX_IRQS" is now slightly misleading (because we do
> > support more than 256 IRQs) and the 256 value is sort of an
> > unjustified magic number. 16 is explainable as the number of ISA
> > IRQs, but I don't know what 256 is based on (other than historical
> > practice, of course). ACPI device IRQs can be much larger, and I
> > think the SCI IRQ can be, too (the FADT SCI_INT field is 16 bits).
> >
> > Can you tie this back to the specific problem on the broken machine
> > somehow? Do we need a penalty for an IRQ in the 16-255 range?
>
> The problem on the broken machine was SCI IRQ and PCI IRQ happened to be
> same. It was IRQ 11. When SCI IRQ heavily penalized IRQ 11 due to
> wrong interrupt type detection, PCI IRQs no longer worked as this line
> prohibits using the IRQ.
>
>
> if (acpi_irq_get_penalty(irq) >= PIRQ_PENALTY_ISA_ALWAYS) {
> printk(KERN_ERR PREFIX "No IRQ available for %s [%s]. "
> "Try pci=noacpi or acpi=off\n",
> acpi_device_name(link->device),
> acpi_device_bid(link->device));
> return -ENODEV;
> }
It seems like the problem is that we removed acpi_penalize_sci_irq(),
which told us the polarity and trigger mode. We tried to get that
information via irq_get_trigger_type(), but that didn't work in this
case because we use the acpi_irq_get_penalty() path before the SCI is
registered.
It makes sense to me to add acpi_penalize_sci_irq() back in, which is
what patch [3/3] does.
I don't understand how *this* patch, which basically just increases
the penalty array size from 16 to 256, helps fix the problem. It
seems like this patch should only matter if the SCI were some IRQ
between 16 and 255.
> > In a subsequent patch, I see something about the IRQ type not being
> > updated at the right time, but I can't quite connect the dots.
>
> The reason why PCI IRQ 11 didn't work is above.
>
> When we detected a problem with the SCI IRQ type, we were penalizing
> the IRQ below.
>
> static int acpi_irq_get_penalty(int irq)
> {
> ...
> if (irq == acpi_gbl_FADT.sci_interrupt) {
> u32 type = irq_get_trigger_type(irq) & IRQ_TYPE_SENSE_MASK;
>
> if (type != IRQ_TYPE_LEVEL_LOW)
> penalty += PIRQ_PENALTY_ISA_ALWAYS; <---- here
> else
> penalty += PIRQ_PENALTY_PCI_USING;
> }
>
>
>
> >
> > To be clear, I'm not asking for any changes in the patch; I'm just
> > trying to understand what's going on.
>
> Sure, I hope this makes it clear now.
>
> >
> >> Tested-by: Jonathan Liu <net147@gmail.com>
> >> Tested-by: Ondrej Zary <linux@rainbow-software.org>
> >> Link: http://www.gossamer-threads.com/lists/linux/kernel/2537016#2537016
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >> ---
> >> drivers/acpi/pci_link.c | 35 ++++++++++++++++++-----------------
> >> 1 file changed, 18 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> >> index c983bf7..f3792f4 100644
> >> --- a/drivers/acpi/pci_link.c
> >> +++ b/drivers/acpi/pci_link.c
> >> @@ -438,6 +438,7 @@ static int acpi_pci_link_set(struct acpi_pci_link *link, int irq)
> >> * enabled system.
> >> */
> >>
> >> +#define ACPI_MAX_IRQS 256
> >> #define ACPI_MAX_ISA_IRQS 16
> >>
> >> #define PIRQ_PENALTY_PCI_POSSIBLE (16*16)
> >> @@ -446,7 +447,7 @@ static int acpi_pci_link_set(struct acpi_pci_link *link, int irq)
> >> #define PIRQ_PENALTY_ISA_USED (16*16*16*16*16)
> >> #define PIRQ_PENALTY_ISA_ALWAYS (16*16*16*16*16*16)
> >>
> >> -static int acpi_isa_irq_penalty[ACPI_MAX_ISA_IRQS] = {
> >> +static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
> >> PIRQ_PENALTY_ISA_ALWAYS, /* IRQ0 timer */
> >> PIRQ_PENALTY_ISA_ALWAYS, /* IRQ1 keyboard */
> >> PIRQ_PENALTY_ISA_ALWAYS, /* IRQ2 cascade */
> >> @@ -511,7 +512,7 @@ static int acpi_irq_get_penalty(int irq)
> >> }
> >>
> >> if (irq < ACPI_MAX_ISA_IRQS)
> >> - return penalty + acpi_isa_irq_penalty[irq];
> >> + return penalty + acpi_irq_penalty[irq];
> >>
> >> penalty += acpi_irq_pci_sharing_penalty(irq);
> >> return penalty;
> >> @@ -538,14 +539,14 @@ int __init acpi_irq_penalty_init(void)
> >>
> >> for (i = 0; i < link->irq.possible_count; i++) {
> >> if (link->irq.possible[i] < ACPI_MAX_ISA_IRQS)
> >> - acpi_isa_irq_penalty[link->irq.
> >> + acpi_irq_penalty[link->irq.
> >> possible[i]] +=
> >> penalty;
> >> }
> >>
> >> } else if (link->irq.active &&
> >> - (link->irq.active < ACPI_MAX_ISA_IRQS)) {
> >> - acpi_isa_irq_penalty[link->irq.active] +=
> >> + (link->irq.active < ACPI_MAX_IRQS)) {
> >> + acpi_irq_penalty[link->irq.active] +=
> >> PIRQ_PENALTY_PCI_POSSIBLE;
> >> }
> >> }
> >> @@ -828,7 +829,7 @@ static void acpi_pci_link_remove(struct acpi_device *device)
> >> }
> >>
> >> /*
> >> - * modify acpi_isa_irq_penalty[] from cmdline
> >> + * modify acpi_irq_penalty[] from cmdline
> >> */
> >> static int __init acpi_irq_penalty_update(char *str, int used)
> >> {
> >> @@ -837,24 +838,24 @@ static int __init acpi_irq_penalty_update(char *str, int used)
> >> for (i = 0; i < 16; i++) {
> >> int retval;
> >> int irq;
> >> - int new_penalty;
> >>
> >> retval = get_option(&str, &irq);
> >>
> >> if (!retval)
> >> break; /* no number found */
> >>
> >> - /* see if this is a ISA IRQ */
> >> - if ((irq < 0) || (irq >= ACPI_MAX_ISA_IRQS))
> >> + if (irq < 0)
> >> + continue;
> >> +
> >> + if (irq >= ARRAY_SIZE(acpi_irq_penalty))
> >> continue;
> >>
> >> if (used)
> >> - new_penalty = acpi_irq_get_penalty(irq) +
> >> - PIRQ_PENALTY_ISA_USED;
> >> + acpi_irq_penalty[irq] = acpi_irq_get_penalty(irq) +
> >> + PIRQ_PENALTY_ISA_USED;
> >> else
> >> - new_penalty = 0;
> >> + acpi_irq_penalty[irq] = 0;
> >>
> >> - acpi_isa_irq_penalty[irq] = new_penalty;
> >> if (retval != 2) /* no next number */
> >> break;
> >> }
> >> @@ -870,14 +871,14 @@ static int __init acpi_irq_penalty_update(char *str, int used)
> >> */
> >> void acpi_penalize_isa_irq(int irq, int active)
> >> {
> >> - if ((irq >= 0) && (irq < ARRAY_SIZE(acpi_isa_irq_penalty)))
> >> - acpi_isa_irq_penalty[irq] = acpi_irq_get_penalty(irq) +
> >> - (active ? PIRQ_PENALTY_ISA_USED : PIRQ_PENALTY_PCI_USING);
> >> + if (irq >= 0 && irq < ARRAY_SIZE(acpi_irq_penalty))
> >> + acpi_irq_penalty[irq] = acpi_irq_get_penalty(irq) +
> >> + (active ? PIRQ_PENALTY_ISA_USED : PIRQ_PENALTY_PCI_USING);
> >> }
> >>
> >> bool acpi_isa_irq_available(int irq)
> >> {
> >> - return irq >= 0 && (irq >= ARRAY_SIZE(acpi_isa_irq_penalty) ||
> >> + return irq >= 0 && (irq >= ARRAY_SIZE(acpi_irq_penalty) ||
> >> acpi_irq_get_penalty(irq) < PIRQ_PENALTY_ISA_ALWAYS);
> >> }
> >>
> >> --
> >> 1.9.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> >> the body of a message to majordomo at vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Sinan Kaya
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH v2] perf: xgene: Remove bogus IS_ERR() check
From: Tai Nguyen @ 2016-10-13 18:09 UTC (permalink / raw)
To: linux-arm-kernel
In acpi_get_pmu_hw_inf we pass the address of a local variable to IS_ERR(),
which doesn't make sense, as the pointer must be a real, valid pointer.
This doesn't cause a functional problem, as IS_ERR() will evaluate as
false, but the check is bogus and causes static checkers to complain.
Remove the bogus check.
The bug is reported by Dan Carpenter <dan.carpenter@oracle.com> in [1]
[1] https://www.spinics.net/lists/arm-kernel/msg535957.html
Signed-off-by: Tai Nguyen <ttnguyen@apm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
v2:
Add more problem description in the commit message
Add Acked-by: Mark Rutland <mark.rutland@arm.com>
drivers/perf/xgene_pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index c2ac764..a8ac4bc 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1011,7 +1011,7 @@ xgene_pmu_dev_ctx *acpi_get_pmu_hw_inf(struct xgene_pmu *xgene_pmu,
rc = acpi_dev_get_resources(adev, &resource_list,
acpi_pmu_dev_add_resource, &res);
acpi_dev_free_resource_list(&resource_list);
- if (rc < 0 || IS_ERR(&res)) {
+ if (rc < 0) {
dev_err(dev, "PMU type %d: No resource address found\n", type);
goto err;
}
--
1.9.1
^ permalink raw reply related
* [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate
From: Catalin Marinas @ 2016-10-13 17:27 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CAKv+Gu9WqUvtHRJvgWhsvYwFOgho97EaLV4qojaa0fjGpJT3WQ@mail.gmail.com>
On Thu, Oct 13, 2016 at 05:57:33PM +0100, Ard Biesheuvel wrote:
> On 13 October 2016 at 17:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Oct 12, 2016 at 12:23:43PM +0100, Ard Biesheuvel wrote:
> >> Now that we no longer allow live kernel PMDs to be split, it is safe to
> >> start using the contiguous bit for kernel mappings. So set the contiguous
> >> bit in the kernel page mappings for regions whose size and alignment are
> >> suitable for this.
> >>
> >> This enables the following contiguous range sizes for the virtual mapping
> >> of the kernel image, and for the linear mapping:
> >>
> >> granule size | cont PTE | cont PMD |
> >> -------------+------------+------------+
> >> 4 KB | 64 KB | 32 MB |
> >> 16 KB | 2 MB | 1 GB* |
> >> 64 KB | 2 MB | 16 GB* |
> >>
> >> * only when built for 3 or more levels of translation
> >
> > I assume the limitation to have contiguous PMD only with 3 or move
> > levels is because of the way p*d folding was implemented in the kernel.
> > With nopmd, looping over pmds is done in __create_pgd_mapping() rather
> > than alloc_init_pmd().
> >
> > A potential solution would be to replicate the contiguous pmd code to
> > the pud and pgd level, though we probably won't benefit from any
> > contiguous entries at higher level (when more than 2 levels).
> > Alternatively, with an #ifdef __PGTABLE_PMD_FOLDED, we could set the
> > PMD_CONT in prot in __create_pgd_mapping() directly (if the right
> > addr/phys alignment).
>
> Indeed. See the next patch :-)
I got there eventually ;).
> > Anyway, it's probably not worth the effort given that 42-bit VA with 64K
> > pages is becoming a less likely configuration (36-bit VA with 16K pages
> > is even less likely, also depending on EXPERT).
>
> This is the reason I put it in a separate patch: this one contains the
> most useful combinations, and the next patch adds the missing ones,
> but clutters up the code significantly. I'm perfectly happy to drop 4
> and 5 if you don't think it is worth the trouble.
I'll have a look at patch 4 first.
Both 64KB contiguous pmd and 4K contiguous pud give us a 16GB range
which (AFAIK) is less likely to be optimised in hardware.
--
Catalin
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox