* Re: [PATCH v9 0/6] add support for relative references in special sections
From: Ard Biesheuvel @ 2018-07-01 17:39 UTC (permalink / raw)
To: Thomas Gleixner, the arch/x86 maintainers, Ingo Molnar
Cc: Linux Kernel Mailing List, Arnd Bergmann, Kees Cook,
Michael Ellerman, Thomas Garnier, Serge E. Hallyn, Bjorn Helgaas,
Benjamin Herrenschmidt, Russell King, Paul Mackerras,
Catalin Marinas, Petr Mladek, James Morris, Andrew Morton,
Nicolas Pitre, Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev,
Will Deacon
In-Reply-To: <20180627151510.GE30631@arm.com>
On 27 June 2018 at 17:15, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> On Tue, Jun 26, 2018 at 08:27:55PM +0200, Ard Biesheuvel wrote:
>> This adds support for emitting special sections such as initcall arrays,
>> PCI fixups and tracepoints as relative references rather than absolute
>> references. This reduces the size by 50% on 64-bit architectures, but
>> more importantly, it removes the need for carrying relocation metadata
>> for these sections in relocatable kernels (e.g., for KASLR) that needs
>> to be fixed up at boot time. On arm64, this reduces the vmlinux footprint
>> of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry
>> vs 4 byte relative reference)
>>
>> Patch #3 was sent out before as a single patch. This series supersedes
>> the previous submission. This version makes relative ksymtab entries
>> dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
>> than trying to infer from kbuild test robot replies for which architectures
>> it should be blacklisted.
>>
>> Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
>> and sets it for the main architectures that are expected to benefit the
>> most from this feature, i.e., 64-bit architectures or ones that use
>> runtime relocations.
>>
>> Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
>> ksymtab/kcrctab sections in decompressor and EFI stub objects when
>> rebuilding existing C files to run in a different context.
>
> I had a small question on patch 3, but it's really for my understanding.
> So, for patches 1-3:
>
> Reviewed-by: Will Deacon <will.deacon@arm.com>
>
Thanks all.
Thomas, Ingo,
Except for the below tweak against patch #3 for powerpc, which may
apparently get confused by an input section called .discard without
any suffixes, this series is good to go, but requires your ack to
proceed, so I would like to ask you to share your comments and/or
objections. Also, any suggestions or recommendations regarding the
route these patches should take are highly appreciated.
Ard.
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 2d9c63f41031..61c844d4ab48 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -287,7 +287,7 @@ unsigned long read_word_at_a_time(const void *addr)
* visible to the compiler.
*/
#define __ADDRESSABLE(sym) \
- static void * __attribute__((section(".discard"), used)) \
+ static void * __attribute__((section(".discard.addressable"), used)) \
__PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
/**
^ permalink raw reply related
* [PATCH 4.17 057/220] powerpc/e500mc: Set assembler machine type to e500mc
From: Greg Kroah-Hartman @ 2018-07-01 16:21 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Michael Jeanson, Mathieu Desnoyers,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Kumar Gala, Vakul Garg, Scott Wood, linuxppc-dev
In-Reply-To: <20180701160908.272447118@linuxfoundation.org>
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Jeanson <mjeanson@efficios.com>
commit 69a8405999aa1c489de4b8d349468f0c2b83f093 upstream.
In binutils 2.26 a new opcode for the "wait" instruction was added for the
POWER9 and has precedence over the one specific to the e500mc. Commit
ebf714ff3756 ("powerpc/e500mc: Add support for the wait instruction in
e500_idle") uses this instruction specifically on the e500mc to work around
an erratum.
This results in an invalid instruction in idle_e500 when we build for the
e500mc on bintutils >= 2.26 with the default assembler machine type.
Since multiplatform between e500 and non-e500 is not supported, set the
assembler machine type globaly when CONFIG_PPC_E500MC=y.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Kumar Gala <galak@kernel.crashing.org>
CC: Vakul Garg <vakul.garg@nxp.com>
CC: Scott Wood <swood@redhat.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/powerpc/Makefile | 1 +
1 file changed, 1 insertion(+)
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -251,6 +251,7 @@ cpu-as-$(CONFIG_4xx) += -Wa,-m405
cpu-as-$(CONFIG_ALTIVEC) += $(call as-option,-Wa$(comma)-maltivec)
cpu-as-$(CONFIG_E200) += -Wa,-me200
cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4
+cpu-as-$(CONFIG_PPC_E500MC) += $(call as-option,-Wa$(comma)-me500mc)
KBUILD_AFLAGS += $(cpu-as-y)
KBUILD_CFLAGS += $(cpu-as-y)
^ permalink raw reply
* Re: [PATCH 1/3] [v2] powerpc: mac: fix rtc read/write functions
From: Meelis Roos @ 2018-07-01 15:47 UTC (permalink / raw)
To: Mathieu Malaterre
Cc: Arnd Bergmann, Paul Mackerras, Michael Ellerman,
Geert Uytterhoeven, funaho, Benjamin Herrenschmidt, gerg,
linux-m68k, linuxppc-dev, LKML, y2038, schwab
In-Reply-To: <CA+7wUsyN_gru4rW0y3ZzXzYwhfaA=uHKaOG=z5bprnYq6oVfMg@mail.gmail.com>
A patch for the subject is now upstream. That made me finally take some
time to test it on my PowerMac G4. Tha date is OK but I get two warnings
with backtrace on bootup. Full dmesg below.
[ 0.000000] Total memory = 1024MB; using 2048kB for hash table (at (ptrval))
[ 0.000000] RAM mapped without BATs
[ 0.000000] Linux version 4.18.0-rc2-00223-g1904148a361a (mroos@pohl) (gcc version 7.3.0 (Debian 7.3.0-24)) #88 Sun Jul 1 01:39:01 EEST 2018
[ 0.000000] Found UniNorth memory controller & host bridge @ 0xf8000000 revision: 0x11
[ 0.000000] Mapped at 0xff7c0000
[ 0.000000] Found a Keylargo mac-io controller, rev: 3, mapped at 0x(ptrval)
[ 0.000000] Processor NAP mode on idle enabled.
[ 0.000000] PowerMac motherboard: PowerMac G4 Silver
[ 0.000000] Using PowerMac machine description
[ 0.000000] bootconsole [udbg0] enabled
[ 0.000000] -----------------------------------------------------
[ 0.000000] Hash_size = 0x200000
[ 0.000000] phys_mem_size = 0x40000000
[ 0.000000] dcache_bsize = 0x20
[ 0.000000] icache_bsize = 0x20
[ 0.000000] cpu_features = 0x000000000401a00a
[ 0.000000] possible = 0x000000002f7ff04b
[ 0.000000] always = 0x0000000000000000
[ 0.000000] cpu_user_features = 0x9c000001 0x00000000
[ 0.000000] mmu_features = 0x00000001
[ 0.000000] Hash = 0x(ptrval)
[ 0.000000] Hash_mask = 0x7fff
[ 0.000000] -----------------------------------------------------
[ 0.000000] Found UniNorth PCI host bridge at 0x00000000f0000000. Firmware bus number: 0->0
[ 0.000000] PCI host bridge /pci@f0000000 ranges:
[ 0.000000] MEM 0x00000000f1000000..0x00000000f1ffffff -> 0x00000000f1000000
[ 0.000000] IO 0x00000000f0000000..0x00000000f07fffff -> 0x0000000000000000
[ 0.000000] MEM 0x0000000090000000..0x000000009fffffff -> 0x0000000090000000
[ 0.000000] Found UniNorth PCI host bridge at 0x00000000f2000000. Firmware bus number: 0->0
[ 0.000000] PCI host bridge /pci@f2000000 (primary) ranges:
[ 0.000000] MEM 0x00000000f3000000..0x00000000f3ffffff -> 0x00000000f3000000
[ 0.000000] IO 0x00000000f2000000..0x00000000f27fffff -> 0x0000000000000000
[ 0.000000] MEM 0x0000000080000000..0x000000008fffffff -> 0x0000000080000000
[ 0.000000] Found UniNorth PCI host bridge at 0x00000000f4000000. Firmware bus number: 0->0
[ 0.000000] PCI host bridge /pci@f4000000 ranges:
[ 0.000000] MEM 0x00000000f5000000..0x00000000f5ffffff -> 0x00000000f5000000
[ 0.000000] IO 0x00000000f4000000..0x00000000f47fffff -> 0x0000000000000000
[ 0.000000] via-pmu: Server Mode is disabled
[ 0.000000] PMU driver v2 initialized for Core99, firmware: 0c
[ 0.000000] nvram: Checking bank 0...
[ 0.000000] nvram: gen0=134, gen1=135
[ 0.000000] nvram: Active bank is: 1
[ 0.000000] nvram: OF partition at 0x210
[ 0.000000] nvram: XP partition at 0x1220
[ 0.000000] nvram: NR partition at 0x1320
[ 0.000000] Top of RAM: 0x40000000, Total RAM: 0x40000000
[ 0.000000] Memory hole size: 0MB
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
[ 0.000000] Normal empty
[ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000003fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
[ 0.000000] On node 0 totalpages: 262144
[ 0.000000] DMA zone: 1536 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 196608 pages, LIFO batch:31
[ 0.000000] HighMem zone: 65536 pages, LIFO batch:15
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 260608
[ 0.000000] Kernel command line: root=/dev/sda3 ro
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Memory: 1029696K/1048576K available (5136K kernel code, 228K rwdata, 996K rodata, 208K init, 255K bss, 18880K reserved, 0K cma-reserved, 262144K highmem)
[ 0.000000] Kernel virtual memory layout:
[ 0.000000] * 0xfffcf000..0xfffff000 : fixmap
[ 0.000000] * 0xff800000..0xffc00000 : highmem PTEs
[ 0.000000] * 0xfded8000..0xff800000 : early ioremap
[ 0.000000] * 0xf1000000..0xfded8000 : vmalloc & ioremap
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[ 0.000000] mpic: Resetting
[ 0.000000] mpic: Setting up MPIC " MPIC 1 " version 1.2 at 80040000, max 1 CPUs
[ 0.000000] mpic: ISU size: 64, shift: 6, mask: 3f
[ 0.000000] mpic: Initializing for 64 sources
[ 0.000000] GMT Delta read from XPRAM: 120 minutes, DST: on
[ 0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/platforms/powermac/time.c:154 pmu_get_time+0x7c/0xc8
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc2-00223-g1904148a361a #88
[ 0.000000] NIP: c0021354 LR: c0021308 CTR: 00000000
[ 0.000000] REGS: c0659df0 TRAP: 0700 Not tainted (4.18.0-rc2-00223-g1904148a361a)
[ 0.000000] MSR: 00021032 <ME,IR,DR,RI> CR: 48000222 XER: 20000000
[ 0.000000]
GPR00: c0021308 c0659ea0 c0635320 00000000 00d70298 00000004 00000001 00000098
GPR08: 00d70000 00000001 00000200 c06a0000 28000228 00000000 006cf290 006cf738
GPR16: ffbc1700 00000000 00000000 0076b855 006d6594 0013da24 009f0028 40140000
GPR24: 00000000 006ac000 efff44c0 c06293ec c065e020 c06293dc c066c4e4 c0659f78
[ 0.000000] NIP [c0021354] pmu_get_time+0x7c/0xc8
[ 0.000000] LR [c0021308] pmu_get_time+0x30/0xc8
[ 0.000000] Call Trace:
[ 0.000000] [c0659ea0] [c0021308] pmu_get_time+0x30/0xc8 (unreliable)
[ 0.000000] [c0659f10] [c000c428] read_persistent_clock64+0xc8/0x13c
[ 0.000000] [c0659f50] [c060e288] timekeeping_init+0x1c/0x24c
[ 0.000000] [c0659fc0] [c06020d8] start_kernel+0x2a0/0x38c
[ 0.000000] [c0659ff0] [00003444] 0x3444
[ 0.000000] Instruction dump:
[ 0.000000] 8941002e 5484c00e 5508801e 88e1002f 7c844214 554a402e 7c845214 7c843a14
[ 0.000000] 7d244810 7d294910 7d2948f8 552907fe <0f090000> 3d2083da 80010074 38210070
[ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x58/0x7c with crng_init=0
[ 0.000000] ---[ end trace 2e01ad9337fe08fa ]---
[ 0.000000] time_init: decrementer frequency = 33.290001 MHz
[ 0.000000] time_init: processor frequency = 533.333332 MHz
[ 0.000021] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x7ad7d595e, max_idle_ns: 440795202265 ns
[ 0.000697] clocksource: timebase mult[1e09ff2c] shift[24] registered
[ 0.001075] clockevent: decrementer mult[885b18a] shift[32] cpu[0]
[ 0.001234] Console: colour dummy device 80x25
[ 0.001558] console [tty0] enabled
[ 0.001855] bootconsole [udbg0] disabled
[ 0.002335] pid_max: default: 32768 minimum: 301
[ 0.002463] Security Framework initialized
[ 0.002586] AppArmor: AppArmor initialized
[ 0.002676] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.002701] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.004649] devtmpfs: initialized
[ 0.005064] Duplicate name in PowerPC,G4@0, renamed to "l2-cache#1"
[ 0.007902] random: get_random_u32 called from bucket_table_alloc+0x90/0x1dc with crng_init=0
[ 0.008118] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.008143] futex hash table entries: 256 (order: -1, 3072 bytes)
[ 0.008462] NET: Registered protocol family 16
[ 0.008890] audit: initializing netlink subsys (disabled)
[ 0.010008] KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000
[ 0.010033] channel 0 bus <multibus>
[ 0.010042] channel 1 bus <multibus>
[ 0.010154] KeyWest i2c @0x80018000 irq 26 /pci@f2000000/mac-io@17/i2c@18000
[ 0.010183] channel 0 bus <multibus>
[ 0.010228] PMU i2c /pci@f2000000/mac-io@17/via-pmu@16000
[ 0.010240] channel 1 bus <multibus>
[ 0.010249] channel 2 bus <multibus>
[ 0.010670] PCI: Probing PCI hardware
[ 0.010826] PCI host bridge to bus 0000:00
[ 0.010860] pci_bus 0000:00: root bus resource [io 0x802000-0x1001fff] (bus address [0x0000-0x7fffff])
[ 0.010886] pci_bus 0000:00: root bus resource [mem 0xf1000000-0xf1ffffff]
[ 0.010906] pci_bus 0000:00: root bus resource [mem 0x90000000-0x9fffffff]
[ 0.010928] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.010950] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff
[ 0.011456] pci 0000:00:0b.0: [106b:002d] type 00 class 0x060000
[ 0.011929] pci 0000:00:10.0: [1002:5046] type 00 class 0x030000
[ 0.011960] pci 0000:00:10.0: reg 0x10: [mem 0x94000000-0x97ffffff pref]
[ 0.011973] pci 0000:00:10.0: reg 0x14: [io 0x802400-0x8024ff]
[ 0.011986] pci 0000:00:10.0: reg 0x18: [mem 0x90000000-0x90003fff]
[ 0.012014] pci 0000:00:10.0: reg 0x30: [mem 0x90020000-0x9003ffff pref]
[ 0.012058] pci 0000:00:10.0: supports D1
[ 0.012948] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
[ 0.013076] PCI host bridge to bus 0001:10
[ 0.013105] pci_bus 0001:10: root bus resource [io 0x0000-0x7fffff]
[ 0.013125] pci_bus 0001:10: root bus resource [mem 0xf3000000-0xf3ffffff]
[ 0.013144] pci_bus 0001:10: root bus resource [mem 0x80000000-0x8fffffff]
[ 0.013193] pci_bus 0001:10: root bus resource [bus 10-ff]
[ 0.013214] pci_bus 0001:10: busn_res: [bus 10-ff] end is updated to ff
[ 0.013706] pci 0001:10:0b.0: [106b:002e] type 00 class 0x060000
[ 0.014424] pci 0001:10:17.0: [106b:0022] type 00 class 0xff0000
[ 0.014449] pci 0001:10:17.0: reg 0x10: [mem 0x80000000-0x8007ffff]
[ 0.014685] pci 0001:10:18.0: [106b:0019] type 00 class 0x0c0310
[ 0.014710] pci 0001:10:18.0: reg 0x10: [mem 0x80081000-0x80081fff]
[ 0.014944] pci 0001:10:19.0: [106b:0019] type 00 class 0x0c0310
[ 0.014970] pci 0001:10:19.0: reg 0x10: [mem 0x80080000-0x80080fff]
[ 0.015553] pci_bus 0001:10: busn_res: [bus 10-ff] end is updated to 10
[ 0.015671] PCI host bridge to bus 0002:20
[ 0.015702] pci_bus 0002:20: root bus resource [io 0xff7fe000-0xffffdfff] (bus address [0x0000-0x7fffff])
[ 0.015728] pci_bus 0002:20: root bus resource [mem 0xf5000000-0xf5ffffff]
[ 0.015749] pci_bus 0002:20: root bus resource [bus 20-ff]
[ 0.015768] pci_bus 0002:20: busn_res: [bus 20-ff] end is updated to ff
[ 0.016256] pci 0002:20:0b.0: [106b:002f] type 00 class 0x060000
[ 0.016581] pci 0002:20:0e.0: [11c1:5811] type 00 class 0x0c0010
[ 0.016606] pci 0002:20:0e.0: reg 0x10: [mem 0xf5000000-0xf5000fff]
[ 0.016658] pci 0002:20:0e.0: supports D1 D2
[ 0.016665] pci 0002:20:0e.0: PME# supported from D0 D1 D2 D3hot
[ 0.016875] pci 0002:20:0f.0: [106b:0021] type 00 class 0x020000
[ 0.016899] pci 0002:20:0f.0: reg 0x10: [mem 0xf5200000-0xf53fffff]
[ 0.016934] pci 0002:20:0f.0: reg 0x30: [mem 0xf5100000-0xf51fffff pref]
[ 0.017919] pci_bus 0002:20: busn_res: [bus 20-ff] end is updated to 20
[ 0.017991] PCI 0000:00 Cannot reserve Legacy IO [io 0x802000-0x802fff]
[ 0.018011] pci_bus 0000:00: resource 4 [io 0x802000-0x1001fff]
[ 0.018018] pci_bus 0000:00: resource 5 [mem 0xf1000000-0xf1ffffff]
[ 0.018025] pci_bus 0000:00: resource 6 [mem 0x90000000-0x9fffffff]
[ 0.018035] pci_bus 0001:10: resource 4 [io 0x0000-0x7fffff]
[ 0.018042] pci_bus 0001:10: resource 5 [mem 0xf3000000-0xf3ffffff]
[ 0.018049] pci_bus 0001:10: resource 6 [mem 0x80000000-0x8fffffff]
[ 0.018058] pci_bus 0002:20: resource 4 [io 0xff7fe000-0xffffdfff]
[ 0.018065] pci_bus 0002:20: resource 5 [mem 0xf5000000-0xf5ffffff]
[ 0.025369] audit: type=2000 audit(0.008:1): state=initialized audit_enabled=0 res=1
[ 0.034974] pci 0000:00:10.0: vgaarb: VGA device added: decodes=io+mem,owns=mem,locks=none
[ 0.035036] pci 0000:00:10.0: vgaarb: bridge control possible
[ 0.035055] pci 0000:00:10.0: vgaarb: setting as boot device (VGA legacy resources not available)
[ 0.035077] vgaarb: loaded
[ 0.035363] SCSI subsystem initialized
[ 0.035944] libata version 3.00 loaded.
[ 0.036069] usbcore: registered new interface driver usbfs
[ 0.036135] usbcore: registered new interface driver hub
[ 0.036205] usbcore: registered new device driver usb
[ 0.038128] clocksource: Switched to clocksource timebase
[ 0.038599] AppArmor: AppArmor Filesystem Enabled
[ 0.054259] NET: Registered protocol family 2
[ 0.054390] random: get_random_u32 called from neigh_hash_alloc+0x84/0xe0 with crng_init=0
[ 0.054948] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes)
[ 0.054995] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.055096] TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.055183] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.055338] UDP hash table entries: 512 (order: 1, 8192 bytes)
[ 0.055387] UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
[ 0.055564] NET: Registered protocol family 1
[ 0.055606] NET: Registered protocol family 44
[ 0.055700] pci 0001:10:18.0: enabling device (0000 -> 0002)
[ 0.114214] pci 0001:10:18.0: quirk_usb_early_handoff+0x0/0x8cc took 57137 usecs
[ 0.114270] pci 0001:10:19.0: enabling device (0000 -> 0002)
[ 0.174156] pci 0001:10:19.0: quirk_usb_early_handoff+0x0/0x8cc took 58481 usecs
[ 0.174197] PCI: CLS mismatch (32 != 1020), using 32 bytes
[ 0.174448] Thermal assist unit
[ 0.174453] using timers,
[ 0.174473] shrink_timer: 500 jiffies
[ 0.175062] Initialise system trusted keyrings
[ 0.175711] workingset: timestamp_bits=30 max_order=18 bucket_order=0
[ 0.194025] Key type asymmetric registered
[ 0.194079] Asymmetric key parser 'x509' registered
[ 0.194248] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 0.194278] io scheduler noop registered
[ 0.194291] io scheduler deadline registered
[ 0.194481] io scheduler cfq registered (default)
[ 0.195115] aty128fb 0000:00:10.0: enabling device (0086 -> 0087)
[ 0.195588] aty128fb 0000:00:10.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x1111
[ 0.195623] aty128fb: ROM failed to map
[ 0.195636] aty128fb: BIOS not located, guessing timings.
[ 0.195659] aty128fb: Rage128 PF PRO AGP [chip rev 0x1]
[ 0.195663] 16M 128-bit SDR SGRAM (1:1)
[ 0.212238] Console: switching to colour frame buffer device 128x48
[ 0.227164] fb0: ATY Rage128 frame buffer device on Rage128 PF PRO AGP
[ 0.227746] pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
[ 0.228033] Generic non-volatile memory driver v1.1
[ 0.233973] loop: module loaded
[ 0.234259] MacIO PCI driver attached to Keylargo chipset
[ 0.236238] 0.00013020:ch-a: ttyS0 at MMIO 0x80013020 (irq = 22, base_baud = 230400) is a Z85c30 ESCC - Serial port
[ 0.238389] 0.00013000:ch-b: ttyS1 at MMIO 0x80013000 (irq = 50, base_baud = 230400) is a Z85c30 ESCC - Serial port
[ 1.282145] pata-macio 0.0001f000:ata-4: Activating pata-macio chipset KeyLargo ATA-4, Apple bus ID 2
[ 1.283355] scsi host0: pata_macio
[ 1.283766] ata1: PATA max UDMA/66 irq 19
[ 1.445026] ata1.00: ATA-6: Maxtor 92049U3, BAC51JJ0, max UDMA/66
[ 1.445230] ata1.00: 40010544 sectors, multi 0: LBA
[ 1.450748] scsi 0:0:0:0: Direct-Access ATA Maxtor 92049U3 1JJ0 PQ: 0 ANSI: 5
[ 1.452076] sd 0:0:0:0: [sda] 40010544 512-byte logical blocks: (20.5 GB/19.1 GiB)
[ 1.452392] sd 0:0:0:0: [sda] Write Protect is off
[ 1.452557] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.452660] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1.472546] sda: [mac] sda1 sda2 sda3 sda4
[ 1.474038] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.338145] pata-macio 0.00020000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 0
[ 2.344877] scsi host1: pata_macio
[ 2.350832] ata2: PATA max MWDMA2 irq 20
[ 2.554418] ata2.00: ATAPI: MATSHITA CD-RW CW-7586, 1A17, max MWDMA2
[ 2.560218] ata2.01: ATAPI: IOMEGA ZIP 250 ATAPI, 41.S, max UDMA/33, CDB intr
[ 2.618851] scsi 1:0:0:0: CD-ROM MATSHITA CD-RW CW-7586 1A17 PQ: 0 ANSI: 5
[ 2.630676] scsi 1:0:1:0: Direct-Access IOMEGA ZIP 250 41.S PQ: 0 ANSI: 5
[ 2.671193] sd 1:0:1:0: [sdb] Attached SCSI removable disk
[ 3.394143] pata-macio 0.00021000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 1
[ 3.401263] scsi host2: pata_macio
[ 3.407591] ata3: PATA max MWDMA2 irq 21
[ 3.413826] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 3.420094] ohci-pci: OHCI PCI platform driver
[ 3.426292] ohci-pci 0001:10:18.0: OHCI PCI host controller
[ 3.432457] ohci-pci 0001:10:18.0: new USB bus registered, assigned bus number 1
[ 3.438790] ohci-pci 0001:10:18.0: irq 27, io mem 0x80081000
[ 3.522135] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 4.18
[ 3.528544] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 3.534935] usb usb1: Product: OHCI PCI host controller
[ 3.541239] usb usb1: Manufacturer: Linux 4.18.0-rc2-00223-g1904148a361a ohci_hcd
[ 3.547669] usb usb1: SerialNumber: 0001:10:18.0
[ 3.554601] hub 1-0:1.0: USB hub found
[ 3.561194] hub 1-0:1.0: 2 ports detected
[ 3.568015] ohci-pci 0001:10:19.0: OHCI PCI host controller
[ 3.574418] ohci-pci 0001:10:19.0: new USB bus registered, assigned bus number 2
[ 3.580866] ohci-pci 0001:10:19.0: irq 28, io mem 0x80080000
[ 3.666142] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 4.18
[ 3.672533] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 3.678911] usb usb2: Product: OHCI PCI host controller
[ 3.685256] usb usb2: Manufacturer: Linux 4.18.0-rc2-00223-g1904148a361a ohci_hcd
[ 3.691597] usb usb2: SerialNumber: 0001:10:19.0
[ 3.698572] hub 2-0:1.0: USB hub found
[ 3.704984] hub 2-0:1.0: 2 ports detected
[ 3.711898] mousedev: PS/2 mouse device common for all mice
[ 3.718339] WARNING: CPU: 0 PID: 1 at arch/powerpc/platforms/powermac/time.c:154 pmu_get_time+0x7c/0xc8
[ 3.724661] Modules linked in:
[ 3.730901] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc2-00223-g1904148a361a #88
[ 3.737332] NIP: c0021354 LR: c0021308 CTR: 00000000
[ 3.743649] REGS: ef047b50 TRAP: 0700 Tainted: G W (4.18.0-rc2-00223-g1904148a361a)
[ 3.750136] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 44000822 XER: 20000000
[ 3.756639]
GPR00: c0021308 ef047c00 ef048000 00000000 00d7029c 00000004 00000001 0000009c
GPR08: 00d70000 00000001 00000200 c06a0000 24000828 00000000 c0004c9c 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0670000 c0601a38
GPR24: 00000007 00000000 ef0ccf40 00000000 ef047cec ef273800 ef047cec ef047cec
[ 3.788482] NIP [c0021354] pmu_get_time+0x7c/0xc8
[ 3.794736] LR [c0021308] pmu_get_time+0x30/0xc8
[ 3.800944] Call Trace:
[ 3.807004] [ef047c00] [c0021308] pmu_get_time+0x30/0xc8 (unreliable)
[ 3.813217] [ef047c70] [c00213e8] pmac_get_rtc_time+0x28/0x40
[ 3.819357] [ef047c80] [c000bc04] rtc_generic_get_time+0x20/0x34
[ 3.825395] [ef047c90] [c03aca34] __rtc_read_time+0x5c/0xe0
[ 3.831290] [ef047ca0] [c03acafc] rtc_read_time+0x44/0x7c
[ 3.837184] [ef047cc0] [c03ad81c] __rtc_read_alarm+0x28/0x440
[ 3.843035] [ef047d30] [c03ac048] rtc_device_register+0x84/0x1b8
[ 3.848869] [ef047d80] [c03ac1e4] devm_rtc_device_register+0x68/0xe8
[ 3.854718] [ef047da0] [c061e1c8] generic_rtc_probe+0x2c/0x54
[ 3.860538] [ef047db0] [c03209b4] platform_drv_probe+0x4c/0xb8
[ 3.866355] [ef047dd0] [c031ed78] driver_probe_device+0x25c/0x35c
[ 3.872071] [ef047e00] [c031ef8c] __driver_attach+0x114/0x118
[ 3.877738] [ef047e20] [c031c960] bus_for_each_dev+0x80/0xc0
[ 3.883334] [ef047e50] [c031df18] bus_add_driver+0x144/0x258
[ 3.888857] [ef047e70] [c031f81c] driver_register+0x88/0x15c
[ 3.894317] [ef047e80] [c03210e0] __platform_driver_probe+0x84/0x13c
[ 3.899820] [ef047ea0] [c0004aa4] do_one_initcall+0x4c/0x1a8
[ 3.905364] [ef047f00] [c06022f0] kernel_init_freeable+0x12c/0x1f4
[ 3.910893] [ef047f30] [c0004cb4] kernel_init+0x18/0x130
[ 3.916367] [ef047f40] [c00121c4] ret_from_kernel_thread+0x14/0x1c
[ 3.921842] Instruction dump:
[ 3.927168] 8941002e 5484c00e 5508801e 88e1002f 7c844214 554a402e 7c845214 7c843a14
[ 3.932672] 7d244810 7d294910 7d2948f8 552907fe <0f090000> 3d2083da 80010074 38210070
[ 3.938214] ---[ end trace 2e01ad9337fe08fb ]---
[ 3.944429] rtc-generic rtc-generic: rtc core: registered rtc-generic as rtc0
[ 3.950813] NET: Registered protocol family 10
[ 3.956508] _warn_unseeded_randomness: 3 callbacks suppressed
[ 3.956530] random: get_random_u32 called from neigh_hash_alloc+0x84/0xe0 with crng_init=0
[ 3.969365] random: get_random_u32 called from bucket_table_alloc+0x90/0x1dc with crng_init=0
[ 3.975248] Segment Routing with IPv6
[ 3.980920] NET: Registered protocol family 17
[ 3.986625] drmem: No dynamic reconfiguration memory found
[ 3.992809] Loading compiled-in X.509 certificates
[ 3.998321] AppArmor: AppArmor sha1 policy hashing enabled
[ 4.003695] random: get_random_bytes called from prandom_seed_full_state+0x20/0x8c with crng_init=0
[ 4.009807] input: PMU as /devices/virtual/input/input0
[ 4.015320] console [netcon0] enabled
[ 4.020771] netconsole: network logging started
[ 4.026490] WARNING: CPU: 0 PID: 1 at arch/powerpc/platforms/powermac/time.c:154 pmu_get_time+0x7c/0xc8
[ 4.032261] Modules linked in:
[ 4.037878] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc2-00223-g1904148a361a #88
[ 4.043750] NIP: c0021354 LR: c0021308 CTR: 00000000
[ 4.049585] REGS: ef047cd0 TRAP: 0700 Tainted: G W (4.18.0-rc2-00223-g1904148a361a)
[ 4.055572] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 44000222 XER: 20000000
[ 4.061620]
GPR00: c0021308 ef047d80 ef048000 00000000 00d7029c 00000004 00000001 0000009c
GPR08: 00d70000 00000001 00000200 c06a0000 24000228 00000000 c0004c9c 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0670000 c0601a38
GPR24: 00000008 c0630f18 c062a40c c05fc10c ef047e50 ef273800 ef047e50 ef047e50
[ 4.092393] NIP [c0021354] pmu_get_time+0x7c/0xc8
[ 4.098596] LR [c0021308] pmu_get_time+0x30/0xc8
[ 4.104779] Call Trace:
[ 4.110909] [ef047d80] [c0021308] pmu_get_time+0x30/0xc8 (unreliable)
[ 4.117209] [ef047df0] [c00213e8] pmac_get_rtc_time+0x28/0x40
[ 4.123470] [ef047e00] [c000bc04] rtc_generic_get_time+0x20/0x34
[ 4.129770] [ef047e10] [c03aca34] __rtc_read_time+0x5c/0xe0
[ 4.136060] [ef047e20] [c03acafc] rtc_read_time+0x44/0x7c
[ 4.142356] [ef047e40] [c061e000] rtc_hctosys+0x64/0x11c
[ 4.148616] [ef047ea0] [c0004aa4] do_one_initcall+0x4c/0x1a8
[ 4.154866] [ef047f00] [c06022f0] kernel_init_freeable+0x12c/0x1f4
[ 4.161123] [ef047f30] [c0004cb4] kernel_init+0x18/0x130
[ 4.167359] [ef047f40] [c00121c4] ret_from_kernel_thread+0x14/0x1c
[ 4.173610] Instruction dump:
[ 4.179766] 8941002e 5484c00e 5508801e 88e1002f 7c844214 554a402e 7c845214 7c843a14
[ 4.186076] 7d244810 7d294910 7d2948f8 552907fe <0f090000> 3d2083da 80010074 38210070
[ 4.192388] ---[ end trace 2e01ad9337fe08fd ]---
[ 4.198643] rtc-generic rtc-generic: hctosys: unable to read the hardware clock
[ 4.209046] EXT4-fs (sda3): mounting ext3 file system using the ext4 subsystem
[ 4.236629] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
[ 4.243073] VFS: Mounted root (ext3 filesystem) readonly on device 8:3.
[ 4.260646] devtmpfs: mounted
[ 4.267211] Freeing unused kernel memory: 208K
[ 4.398178] usb 2-1: new full-speed USB device number 2 using ohci-pci
[ 4.629315] usb 2-1: New USB device found, idVendor=05ac, idProduct=1001, bcdDevice= 2.10
[ 4.635736] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 4.642038] usb 2-1: Product: Hub in Apple USB Keyboard
[ 4.648301] usb 2-1: Manufacturer: Alps Electric
[ 4.656332] hub 2-1:1.0: USB hub found
[ 4.663208] hub 2-1:1.0: 3 ports detected
[ 4.974155] usb 2-1.1: new low-speed USB device number 3 using ohci-pci
[ 5.104225] usb 2-1.1: New USB device found, idVendor=05ac, idProduct=0201, bcdDevice= 1.02
[ 5.110383] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 5.116265] usb 2-1.1: Product: Apple USB Keyboard
[ 5.122014] usb 2-1.1: Manufacturer: Alps Electric
[ 5.210159] usb 2-1.2: new low-speed USB device number 4 using ohci-pci
[ 5.242205] random: fast init done
[ 5.338230] usb 2-1.2: New USB device found, idVendor=05ac, idProduct=0301, bcdDevice= 5.02
[ 5.344013] usb 2-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 5.349638] usb 2-1.2: Product: M4848
[ 5.355197] usb 2-1.2: Manufacturer: Logitech
[ 7.600059] systemd[1]: System time before build time, advancing clock.
[ 7.952523] systemd[1]: Failed to insert module 'autofs4': No such file or directory
[ 8.148253] systemd[1]: systemd 239 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[ 8.178244] systemd[1]: Detected architecture ppc.
[ 8.297863] systemd[1]: Set hostname to <pohl>.
[ 8.319596] _warn_unseeded_randomness: 7 callbacks suppressed
[ 8.319629] random: get_random_u32 called from bucket_table_alloc+0x90/0x1dc with crng_init=1
[ 8.456375] random: get_random_u32 called from arch_pick_mmap_layout+0xe8/0x160 with crng_init=1
[ 8.463131] random: get_random_u32 called from load_elf_binary+0x768/0x1214 with crng_init=1
[ 9.348994] _warn_unseeded_randomness: 67 callbacks suppressed
[ 9.349025] random: get_random_u32 called from arch_align_stack+0x44/0x64 with crng_init=1
[ 9.362283] random: get_random_u32 called from arch_randomize_brk+0x20/0x78 with crng_init=1
[ 9.893854] random: get_random_u32 called from arch_pick_mmap_layout+0xe8/0x160 with crng_init=1
[ 9.962756] random: crng init done
[ 9.969134] random: 5 get_random_xx warning(s) missed due to ratelimiting
[ 13.408566] systemd[1]: Listening on Syslog Socket.
[ 13.422964] systemd[1]: Listening on Journal Socket.
[ 13.436231] systemd[1]: Starting of Arbitrary Executable File Formats File System Automount Point not supported.
[ 13.525715] systemd[1]: Listening on udev Kernel Socket.
[ 13.540122] systemd[1]: Reached target Remote File Systems.
[ 13.554279] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[ 13.569207] systemd[1]: Listening on udev Control Socket.
[ 13.587771] systemd[1]: Starting udev Coldplug all Devices...
[ 13.606824] systemd[1]: Listening on Journal Audit Socket.
[ 13.626991] systemd[1]: Mounting Kernel Debug File System...
[ 15.424946] PowerMac i2c bus pmu 2 registered
[ 15.431134] PowerMac i2c bus pmu 1 registered
[ 15.436983] PowerMac i2c bus mac-io 0 registered
[ 15.442651] PowerMac i2c bus uni-n 1 registered
[ 15.448242] PowerMac i2c bus uni-n 0 registered
[ 15.879672] input: PowerMac Beep as /devices/pci0001:10/0001:10:17.0/input/input1
[ 16.008274] EXT4-fs (sda3): re-mounted. Opts: acl
[ 16.381033] systemd-journald[543]: Received request to flush runtime journal from PID 1
[ 19.281274] genirq: Flags mismatch irq 31. 00000001 (i2sbus: i2s-a (tx)) vs. 00000001 (PMac Output)
[ 19.422231] firewire_ohci 0002:20:0e.0: added OHCI v1.0 device as card 0, 8 IR + 8 IT contexts, quirks 0x0
[ 19.487853] Linux agpgart interface v0.103
[ 19.550217] sungem.c:v1.0 David S. Miller <davem@redhat.com>
[ 19.556595] gem 0002:20:0f.0 eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 00:03:93:48:0e:fe
[ 19.566755] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 19.610255] scsi 1:0:0:0: Attached scsi generic sg1 type 5
[ 19.617423] agpgart-uninorth 0000:00:0b.0: Apple UniNorth 1.5 chipset
[ 19.642340] agpgart-uninorth 0000:00:0b.0: configuring for size idx: 64
[ 19.665663] sd 1:0:1:0: Attached scsi generic sg2 type 0
[ 19.698818] sr 1:0:0:0: [sr0] scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray
[ 19.704575] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 19.717067] agpgart-uninorth 0000:00:0b.0: AGP aperture is 256M @ 0x0
[ 19.762489] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 19.938493] firewire_core 0002:20:0e.0: created device fw0: GUID 000393fffe480efe, S400
[ 20.294101] hidraw: raw HID events driver (C) Jiri Kosina
[ 20.464381] usbcore: registered new interface driver usbhid
[ 20.470456] usbhid: USB HID core driver
[ 21.029892] input: Alps Electric Apple USB Keyboard as /devices/pci0001:10/0001:10:19.0/usb2/2-1/2-1.1/2-1.1:1.0/0003:05AC:0201.0001/input/input2
[ 21.103209] hid-generic 0003:05AC:0201.0001: input,hidraw0: USB HID v1.00 Keyboard [Alps Electric Apple USB Keyboard] on usb-0001:10:19.0-1.1/input0
[ 21.121248] input: Logitech M4848 as /devices/pci0001:10/0001:10:19.0/usb2/2-1/2-1.2/2-1.2:1.0/0003:05AC:0301.0002/input/input3
[ 21.145442] hid-generic 0003:05AC:0301.0002: input,hidraw1: USB HID v1.00 Mouse [Logitech M4848] on usb-0001:10:19.0-1.2/input0
[ 23.026805] Adding 848984k swap on /dev/sda4. Priority:-2 extents:1 across:848984k
[ 33.034566] sungem_phy: PHY ID: 206053, addr: 0
[ 33.555712] gem 0002:20:0f.0 eth0: Found BCM5401 PHY
[ 33.561204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 36.002364] gem 0002:20:0f.0 eth0: Link is up at 100 Mbps, full-duplex
[ 36.007833] gem 0002:20:0f.0 eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
[ 36.013215] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 43.522088] aty128fb 0000:00:10.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x1111
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply
* Patch "powerpc/e500mc: Set assembler machine type to e500mc" has been added to the 4.17-stable tree
From: gregkh @ 2018-07-01 10:09 UTC (permalink / raw)
To: benh, galak, gregkh, linuxppc-dev, mathieu.desnoyers, mjeanson,
mpe, paulus, swood, vakul.garg
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
powerpc/e500mc: Set assembler machine type to e500mc
to the 4.17-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
powerpc-e500mc-set-assembler-machine-type-to-e500mc.patch
and it can be found in the queue-4.17 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
>From 69a8405999aa1c489de4b8d349468f0c2b83f093 Mon Sep 17 00:00:00 2001
From: Michael Jeanson <mjeanson@efficios.com>
Date: Thu, 14 Jun 2018 11:27:42 -0400
Subject: powerpc/e500mc: Set assembler machine type to e500mc
From: Michael Jeanson <mjeanson@efficios.com>
commit 69a8405999aa1c489de4b8d349468f0c2b83f093 upstream.
In binutils 2.26 a new opcode for the "wait" instruction was added for the
POWER9 and has precedence over the one specific to the e500mc. Commit
ebf714ff3756 ("powerpc/e500mc: Add support for the wait instruction in
e500_idle") uses this instruction specifically on the e500mc to work around
an erratum.
This results in an invalid instruction in idle_e500 when we build for the
e500mc on bintutils >= 2.26 with the default assembler machine type.
Since multiplatform between e500 and non-e500 is not supported, set the
assembler machine type globaly when CONFIG_PPC_E500MC=y.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Kumar Gala <galak@kernel.crashing.org>
CC: Vakul Garg <vakul.garg@nxp.com>
CC: Scott Wood <swood@redhat.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/powerpc/Makefile | 1 +
1 file changed, 1 insertion(+)
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -251,6 +251,7 @@ cpu-as-$(CONFIG_4xx) += -Wa,-m405
cpu-as-$(CONFIG_ALTIVEC) += $(call as-option,-Wa$(comma)-maltivec)
cpu-as-$(CONFIG_E200) += -Wa,-me200
cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4
+cpu-as-$(CONFIG_PPC_E500MC) += $(call as-option,-Wa$(comma)-me500mc)
KBUILD_AFLAGS += $(cpu-as-y)
KBUILD_CFLAGS += $(cpu-as-y)
Patches currently in stable-queue which might be from mjeanson@efficios.com are
queue-4.17/powerpc-e500mc-set-assembler-machine-type-to-e500mc.patch
^ permalink raw reply
* Re: [PATCH kernel v2 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize
From: Alex Williamson @ 2018-06-30 19:56 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, Paul Mackerras
In-Reply-To: <20180626055926.27703-2-aik@ozlabs.ru>
On Tue, 26 Jun 2018 15:59:25 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> The size is always equal to 1 page so let's use this. Later on this will
> be used for other checks which use page shifts to check the granularity
> of access.
>
> This should cause no behavioral change.
>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
I assume a v3+ will go in through the ppc tree since the bulk of the
series is there. For this,
Acked-by: Alex Williamson <alex.williamson@redhat.com>
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 759a5bd..2da5f05 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -457,13 +457,13 @@ static void tce_iommu_unuse_page(struct tce_container *container,
> }
>
> static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
> - unsigned long tce, unsigned long size,
> + unsigned long tce, unsigned long shift,
> unsigned long *phpa, struct mm_iommu_table_group_mem_t **pmem)
> {
> long ret = 0;
> struct mm_iommu_table_group_mem_t *mem;
>
> - mem = mm_iommu_lookup(container->mm, tce, size);
> + mem = mm_iommu_lookup(container->mm, tce, 1ULL << shift);
> if (!mem)
> return -EINVAL;
>
> @@ -487,7 +487,7 @@ static void tce_iommu_unuse_page_v2(struct tce_container *container,
> if (!pua)
> return;
>
> - ret = tce_iommu_prereg_ua_to_hpa(container, *pua, IOMMU_PAGE_SIZE(tbl),
> + ret = tce_iommu_prereg_ua_to_hpa(container, *pua, tbl->it_page_shift,
> &hpa, &mem);
> if (ret)
> pr_debug("%s: tce %lx at #%lx was not cached, ret=%d\n",
> @@ -611,7 +611,7 @@ static long tce_iommu_build_v2(struct tce_container *container,
> entry + i);
>
> ret = tce_iommu_prereg_ua_to_hpa(container,
> - tce, IOMMU_PAGE_SIZE(tbl), &hpa, &mem);
> + tce, tbl->it_page_shift, &hpa, &mem);
> if (ret)
> break;
>
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/pkeys: preallocate execute_only key only if the key is available.
From: Gabriel Paubert @ 2018-06-30 16:56 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Ram Pai, fweimer, mhocko, Ulrich.Weigand, bauerman, msuchanek,
linuxppc-dev
In-Reply-To: <8736x5yts2.fsf@morokweng.localdomain>
On Fri, Jun 29, 2018 at 09:58:37PM -0300, Thiago Jung Bauermann wrote:
>
> Gabriel Paubert <paubert@iram.es> writes:
>
> > On Thu, Jun 28, 2018 at 11:56:34PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Hello,
> >>
> >> Ram Pai <linuxram@us.ibm.com> writes:
> >>
> >> > Key 2 is preallocated and reserved for execute-only key. In rare
> >> > cases if key-2 is unavailable, mprotect(PROT_EXEC) will behave
> >> > incorrectly. NOTE: mprotect(PROT_EXEC) uses execute-only key.
> >> >
> >> > Ensure key 2 is available for preallocation before reserving it for
> >> > execute_only purpose. Problem noticed by Michael Ellermen.
> >>
> >> Since "powerpc/pkeys: Preallocate execute-only key" isn't upstream yet,
> >> this patch could be squashed into it.
> >>
> >> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> >> > ---
> >> > arch/powerpc/mm/pkeys.c | 14 +++++++++-----
> >> > 1 files changed, 9 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
> >> > index cec990c..0b03914 100644
> >> > --- a/arch/powerpc/mm/pkeys.c
> >> > +++ b/arch/powerpc/mm/pkeys.c
> >> > @@ -19,6 +19,7 @@
> >> > u64 pkey_amr_mask; /* Bits in AMR not to be touched */
> >> > u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
> >> > u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */
> >> > +int execute_only_key = 2;
> >> >
> >> > #define AMR_BITS_PER_PKEY 2
> >> > #define AMR_RD_BIT 0x1UL
> >> > @@ -26,7 +27,6 @@
> >> > #define IAMR_EX_BIT 0x1UL
> >> > #define PKEY_REG_BITS (sizeof(u64)*8)
> >> > #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
> >> > -#define EXECUTE_ONLY_KEY 2
> >> >
> >> > static void scan_pkey_feature(void)
> >> > {
> >> > @@ -122,8 +122,12 @@ int pkey_initialize(void)
> >> > #else
> >> > os_reserved = 0;
> >> > #endif
> >> > +
> >> > + if ((pkeys_total - os_reserved) <= execute_only_key)
> >> > + execute_only_key = -1;
> >> > +
> >> > /* Bits are in LE format. */
> >> > - reserved_allocation_mask = (0x1 << 1) | (0x1 << EXECUTE_ONLY_KEY);
> >> > + reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
> >>
> >> My understanding is that left-shifting by a negative amount is undefined
> >> behavior in C. A quick test tells me that at least on the couple of
> >> machines I tested, 1 < -1 = 0. Does GCC guarantee that behavior?
> >
> > Not in general. It probably always works on Power because of the definition
> > of the machine instruction for shifts with variable amount (consider the
> > shift amount unsigned and take it modulo twice the width of the operand),
>
> Ok, thanks for confirming.
>
> > but for example it fails on x86 (1<<-1 gives 0x80000000).
>
> Strange, this works on my laptop with an Intel(R) Core(TM) i5-7300U CPU:
>
> $ cat blah.c
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
> printf("1 << -1 = %llx\n", (unsigned long long) 1 << -1);
> return 0;
> }
> $ make blah
> cc blah.c -o blah
> blah.c: In function 'main':
> blah.c:5:52: warning: left shift count is negative [-Wshift-count-negative]
> printf("1 << -1 = %llx\n", (unsigned long long) 1 << -1);
> ^~
> $ ./blah
> 1 << -1 = 0
>
> Even if I change the cast and printf format to int, the result is the
> same. Or am I doing it wrong?
Try something more dynamic, 1 << -1 is evaluated at compile time by gcc,
and when you get a warning, the compile time expression evaluation does
not always give the same result as the run time machine instructions.
To test this I wrote (yes, the error checking is approximate, it would
be better to use strtol):
/***************************************************************************/
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int val, amount, valid;
if (argc != 3) {
printf("Needs 2 arguments!\n");
return EXIT_FAILURE;
}
valid = sscanf(argv[1], "%d", &val);
if (valid == 1) {
valid = sscanf(argv[2], "%d", &amount);
}
if (valid == 1) {
printf("%d shifted by %d is %d\n", val, amount, val<<amount);
return EXIT_SUCCESS;
} else {
printf("Both arguments must be integers!\n");
return EXIT_FAILURE;
}
}
/***************************************************************************/
Compile it, and then run it with 1 and -1 as parameters. The result is
not the same on PPC and x86.
>
> >> If so, a comment pointing this out would make this less confusing.
> >
> > Unless I miss something, this code is run once at boot, so its
> > performance is irrelevant.
> >
> > In this case simply rewrite it as:
> >
> > reserved_allocation_mask = 0x1 << 1;
> > if ( (pkeys_total - os_reserved) <= execute_only_key) {
> > execute_only_key = -1;
> > } else {
> > reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
> > }
>
> I agree it's clearer and more robust code (except that the first
> line should be inside the if block).
Indeed, sorry for this.
Gabriel
>
> >
> > Caveat, I have assumed that the code will either:
> > - only run once,
> > - pkeys_total and os_reserved are int, not unsigned
>
> Both of the above are true.
>
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
>
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: Larry Finger @ 2018-06-30 16:25 UTC (permalink / raw)
To: christophe leroy, Matthew Wilcox, Kirill A. Shutemov,
Vlastimil Babka, Christoph Lameter, Dave Hansen,
Jérôme Glisse, Lai Jiangshan, Martin Schwidefsky,
Pekka Enberg, Randy Dunlap, Andrey Ryabinin, Andrew Morton,
Linus Torvalds, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman
Cc: linuxppc-dev, LKML
In-Reply-To: <1a73e92a-f77d-ba1e-ebf8-b469db3c465e@c-s.fr>
On 06/30/2018 04:31 AM, christophe leroy wrote:
>
>
> Le 29/06/2018 à 22:42, Larry Finger a écrit :
>> My PowerBook G4 Aluminum crashes on boot with 4.18-rcX kernels with a kernel
>> BUG at include/linux/page-flags.h:700! The problem was bisected to commit
>> 1d40a5ea01d5 ("mm: mark pages in use for page tables"). It is not possible to
>> capture the bug with anything other than a camera. The first few lines of the
>> traceback are as follows:
>>
>> free_pgd_range+0x19c/0x30c (unreliable)
>> free_pgtables_0xa0/0xb0
>> exit_pmap+0xf4/0x16c
>> mmput+0x64/0xf0
>> do_exit+0x33c/0x89c
>> oops_end+0x13c/0x144
>> _exception_pkey+0x58/0x128
>> ret_from_except_full+0x0/0x4
>> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
>> LR = free_pgd_range+0x19c/0x30c
>> free_pgtables+0xa/0xb
>> exit_mnap+0xf4/0x16c
>> mmput+0x64/0xf0
>> flush_old_exec+0x490/0x550
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page),
>> page) in routine __ClearPageTable(), which is called from pgtable_page_dtor()
>> in include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfffffbff, which would have been expected for a . The second call had
>> 0xffffffff, which led to the BUG.
>>
>
> Oh, seems to be the one I noticed and told Aneesh about
> (https://patchwork.ozlabs.org/patch/922771/)
>
> Aneesh provided the patch https://patchwork.ozlabs.org/patch/934111/ for it,
> does it help ?
Yes, those changes fix the problem.
Larry
^ permalink raw reply
* Re: [PATCH v9 4/6] init: allow initcall tables to be emitted using relative references
From: kbuild test robot @ 2018-06-30 10:16 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: kbuild-all, linux-kernel, Ard Biesheuvel, Arnd Bergmann,
Kees Cook, Will Deacon, Michael Ellerman, Thomas Garnier,
Thomas Gleixner, Serge E. Hallyn, Bjorn Helgaas,
Benjamin Herrenschmidt, Russell King, Paul Mackerras,
Catalin Marinas, Petr Mladek, Ingo Molnar, James Morris,
Andrew Morton, Nicolas Pitre, Josh Poimboeuf, Steven Rostedt,
Sergey Senozhatsky, Linus Torvalds, Jessica Yu, linux-arm-kernel,
linuxppc-dev, x86
In-Reply-To: <20180626182802.19932-5-ard.biesheuvel@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]
Hi Ard,
I love your patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc2 next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Ard-Biesheuvel/arch-enable-relative-relocations-for-arm64-power-and-x86/20180627-025148
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
{standard input}: Assembler messages:
>> {standard input}: Error: .size expression for .discard does not evaluate to a constant
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 57629 bytes --]
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: christophe leroy @ 2018-06-30 9:31 UTC (permalink / raw)
To: Larry Finger, Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
Christoph Lameter, Dave Hansen, Jérôme Glisse,
Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
Andrey Ryabinin, Andrew Morton, Linus Torvalds,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, LKML
In-Reply-To: <99169786-61dd-b19c-ac81-84bcd0a67de4@lwfinger.net>
Le 29/06/2018 à 22:42, Larry Finger a écrit :
> My PowerBook G4 Aluminum crashes on boot with 4.18-rcX kernels with a
> kernel BUG at include/linux/page-flags.h:700! The problem was bisected
> to commit 1d40a5ea01d5 ("mm: mark pages in use for page tables"). It is
> not possible to capture the bug with anything other than a camera. The
> first few lines of the traceback are as follows:
>
> free_pgd_range+0x19c/0x30c (unreliable)
> free_pgtables_0xa0/0xb0
> exit_pmap+0xf4/0x16c
> mmput+0x64/0xf0
> do_exit+0x33c/0x89c
> oops_end+0x13c/0x144
> _exception_pkey+0x58/0x128
> ret_from_except_full+0x0/0x4
> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
> LR = free_pgd_range+0x19c/0x30c
> free_pgtables+0xa/0xb
> exit_mnap+0xf4/0x16c
> mmput+0x64/0xf0
> flush_old_exec+0x490/0x550
>
> I have more information regarding this BUG. Line 700 of page-flags.h is
> the macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually
> expanded the macro, and found that the bug line is
> VM_BUG_ON_PAGE(!PageTable(page), page) in routine __ClearPageTable(),
> which is called from pgtable_page_dtor() in include/linux/mm.h. I also
> added a printk call to PageTable() that logs page->page_type. The
> routine was called twice. The first had page_type of 0xfffffbff, which
> would have been expected for a . The second call had 0xffffffff, which
> led to the BUG.
>
Oh, seems to be the one I noticed and told Aneesh about
(https://patchwork.ozlabs.org/patch/922771/)
Aneesh provided the patch https://patchwork.ozlabs.org/patch/934111/ for
it, does it help ?
Christophe
> Larry
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: Aneesh Kumar K.V @ 2018-06-30 6:23 UTC (permalink / raw)
To: Kirill A. Shutemov, Linus Torvalds
Cc: Larry Finger, Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, ppc-dev, Linux Kernel Mailing List
In-Reply-To: <20180629214647.mkgpni6hxj7aore4@kshutemo-mobl1>
On 06/30/2018 03:16 AM, Kirill A. Shutemov wrote:
> On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
>> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>
>>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>>> page->page_type. The routine was called twice. The first had page_type of
>>> 0xfffffbff, which would have been expected for a . The second call had
>>> 0xffffffff, which led to the BUG.
>>
>> So it looks to me like the tear-down of the page tables first found a
>> page that is indeed a page table, and cleared the page table bit
>> (well, it set it - the bits are reversed).
>>
>> Then it took an exception (that "interrupt: 700") and that causes
>> do_exit() again, and it tries to free the same page table - and now
>> it's no longer marked as a page table, because it already went through
>> the __ClearPageTable() dance once.
>>
>> So on the second path through, it catches that "the bit already said
>> it wasn't a page table" and does the BUG.
>>
>> But the real question is what the problem was the *first* time around.
>
> +Aneesh.
>
> Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
> Once in __pte_free_tlb() itself and the second time in pgtable_free().
>
> Would this help?
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> index 6a6673907e45..e7a2f0e6b695 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> @@ -137,7 +137,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
> static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
> unsigned long address)
> {
> - pgtable_page_dtor(table);
> pgtable_free_tlb(tlb, page_address(table), 0);
> }
> #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> index 1707781d2f20..30a13b80fd58 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> @@ -139,7 +139,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
> unsigned long address)
> {
> tlb_flush_pgtable(tlb, address);
> - pgtable_page_dtor(table);
> pgtable_free_tlb(tlb, page_address(table), 0);
> }
> #endif /* _ASM_POWERPC_PGALLOC_32_H */
>
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/175015.html
Also part of pull request from Michael Ellerman
-aneesh
^ permalink raw reply
* Re: [PATCH 2/3] powerpc/powernv: DMA operations for discontiguous allocation
From: Benjamin Herrenschmidt @ 2018-06-30 2:52 UTC (permalink / raw)
To: Russell Currey, linuxppc-dev; +Cc: alistair, aik, tpearson
In-Reply-To: <20180629073437.4060-3-ruscur@russell.cc>
On Fri, 2018-06-29 at 17:34 +1000, Russell Currey wrote:
> DMA pseudo-bypass is a new set of DMA operations that solve some issues for
> devices that want to address more than 32 bits but can't address the 59
> bits required to enable direct DMA.
One thing you may need to add (I didn't see it with a cursory glance
but maybe it's there) is some form of handling of allocations or
mapping requests that span a TCE boundary.
For allocations, since they are page orders, that means you only have
to check if they are bigger than a TCE page.
For mappings, you need to check individual sglist entries.
At this stage, if you hit that all you can do is fail, with maybe a
rate limited printk. But it's better than whatever corruption or
misbehaviour will happen if you don't catch them. I don't expect this
to happen much if at all with 1G pages, as most "sg" mappings are
probably be in unit of pages, but I still want to catch if it does
happen.
> The previous implementation for POWER8/PHB3 worked around this by
> configuring a bypass from the default 32-bit address space into 64-bit
> address space. This approach does not work for POWER9/PHB4 because
> regions of memory are discontiguous and many devices will be unable to
> address memory beyond the first node.
>
> Instead, implement a new set of DMA operations that allocate TCEs as DMA
> mappings are requested so that all memory is addressable even when a
> one-to-one mapping between real addresses and DMA addresses isn't
> possible. These TCEs are the maximum size available on the platform,
> which is 256M on PHB3 and 1G on PHB4.
>
> Devices can now map any region of memory up to the maximum amount they can
> address according to the DMA mask set, in chunks of the largest available
> TCE size.
>
> This implementation replaces the need for the existing PHB3 solution and
> should be compatible with future PHB versions.
>
> It is, however, rather naive. There is no unmapping, and as a result
> devices can eventually run out of space if they address their entire
> DMA mask worth of TCEs. An implementation with unmap() will come in
> future (and requires a much more complex implementation), but this is a
> good start due to the drastic performance improvement.
>
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> arch/powerpc/include/asm/dma-mapping.h | 1 +
> arch/powerpc/platforms/powernv/Makefile | 2 +-
> arch/powerpc/platforms/powernv/pci-dma.c | 243 ++++++++++++++++++++++
> arch/powerpc/platforms/powernv/pci-ioda.c | 82 +++-----
> arch/powerpc/platforms/powernv/pci.h | 7 +
> 5 files changed, 281 insertions(+), 54 deletions(-)
> create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
>
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa394520af6..354f435160f3 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -74,6 +74,7 @@ static inline unsigned long device_to_mask(struct device *dev)
> extern struct dma_map_ops dma_iommu_ops;
> #endif
> extern const struct dma_map_ops dma_nommu_ops;
> +extern const struct dma_map_ops dma_pseudo_bypass_ops;
>
> static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
> {
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index 703a350a7f4e..2467bdab3c13 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -6,7 +6,7 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
> obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
>
> obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
> -obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o
> +obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-dma.o
> obj-$(CONFIG_CXL_BASE) += pci-cxl.o
> obj-$(CONFIG_EEH) += eeh-powernv.o
> obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
> diff --git a/arch/powerpc/platforms/powernv/pci-dma.c b/arch/powerpc/platforms/powernv/pci-dma.c
> new file mode 100644
> index 000000000000..79382627c7be
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/pci-dma.c
> @@ -0,0 +1,243 @@
> +/*
> + * DMA operations supporting pseudo-bypass for PHB3+
> + *
> + * Author: Russell Currey <ruscur@russell.cc>
> + *
> + * Copyright 2018 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2 of the License, or (at your
> + * option) any later version.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/memblock.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/hash.h>
> +
> +#include <asm/pci-bridge.h>
> +#include <asm/ppc-pci.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/tce.h>
> +
> +#include "pci.h"
> +
> +/*
> + * This is a naive implementation that directly operates on TCEs, allocating
> + * on demand. There is no locking or refcounts since no TCEs are ever removed
> + * and unmap does nothing.
> + */
> +static dma_addr_t dma_pseudo_bypass_get_address(struct device *dev,
> + phys_addr_t addr)
> +{
> + struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
> + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> + struct pnv_phb *phb = hose->private_data;
> + struct pnv_ioda_pe *pe;
> + u64 i, tce, ret, offset;
> + __be64 entry;
> +
> + offset = addr & ((1 << phb->ioda.max_tce_order) - 1);
> +
> + pe = &phb->ioda.pe_array[pci_get_pdn(pdev)->pe_number];
> +
> + /* look through the tracking table for a free entry */
> + for (i = 0; i < pe->tce_count; i++) {
> + /* skip between 2GB and 4GB */
> + if ((i << phb->ioda.max_tce_order) >= 0x80000000 &&
> + (i << phb->ioda.max_tce_order) < 0x100000000)
> + continue;
> +
> + tce = be64_to_cpu(pe->tces[i]);
> +
> + /* if the TCE is already valid (read + write) */
> + if ((tce & 3) == 3) {
> + /* check if we're already allocated, if not move on */
> + if (tce >> phb->ioda.max_tce_order ==
> + addr >> phb->ioda.max_tce_order) {
> + /* wait for the lock bit to clear */
> + while (be64_to_cpu(pe->tces[i]) & 4)
> + cpu_relax();
> +
> + return (i << phb->ioda.max_tce_order) | offset;
> + }
> +
> + continue;
> + }
> +
> + /*
> + * The TCE isn't being used, so let's try and allocate it.
> + * Bits 0 and 1 are read/write, and we use bit 2 as a "lock"
> + * bit. This is to prevent any race where the value is set in
> + * the TCE table but the invalidate/mb() hasn't finished yet.
> + */
> + entry = cpu_to_be64((addr - offset) | 7);
> + ret = cmpxchg(&pe->tces[i], tce, entry);
> + if (ret != tce) {
> + /* conflict, start looking again just in case */
> + i--;
> + continue;
> + }
> + pnv_pci_phb3_tce_invalidate(pe, 0, 0, addr - offset, 1);
> + mb();
> + /* clear the lock bit now that we know it's active */
> + ret = cmpxchg(&pe->tces[i], entry, cpu_to_be64((addr - offset) | 3));
> + if (ret != entry) {
> + /* conflict, start looking again just in case */
> + i--;
> + continue;
> + }
> +
> + return (i << phb->ioda.max_tce_order) | offset;
> + }
> + /* If we get here, the table must be full, so error out. */
> + return -1ULL;
> +}
> +
> +/*
> + * For now, don't actually do anything on unmap.
> + */
> +static void dma_pseudo_bypass_unmap_address(struct device *dev, dma_addr_t dma_addr)
> +{
> +}
> +
> +static int dma_pseudo_bypass_dma_supported(struct device *dev, u64 mask)
> +{
> + /*
> + * Normally dma_supported() checks if the mask is capable of addressing
> + * all of memory. Since we map physical memory in chunks that the
> + * device can address, the device will be able to address whatever it
> + * wants - just not all at once.
> + */
> + return 1;
> +}
> +
> +static void *dma_pseudo_bypass_alloc_coherent(struct device *dev,
> + size_t size,
> + dma_addr_t *dma_handle,
> + gfp_t flag,
> + unsigned long attrs)
> +{
> + void *ret;
> + struct page *page;
> + int node = dev_to_node(dev);
> +
> + /* ignore region specifiers */
> + flag &= ~(__GFP_HIGHMEM);
> +
> + page = alloc_pages_node(node, flag, get_order(size));
> + if (page == NULL)
> + return NULL;
> + ret = page_address(page);
> + memset(ret, 0, size);
> + *dma_handle = dma_pseudo_bypass_get_address(dev, __pa(ret));
> +
> + return ret;
> +}
> +
> +static void dma_pseudo_bypass_free_coherent(struct device *dev,
> + size_t size,
> + void *vaddr,
> + dma_addr_t dma_handle,
> + unsigned long attrs)
> +{
> + free_pages((unsigned long)vaddr, get_order(size));
> +}
> +
> +static int dma_pseudo_bypass_mmap_coherent(struct device *dev,
> + struct vm_area_struct *vma,
> + void *cpu_addr,
> + dma_addr_t handle,
> + size_t size,
> + unsigned long attrs)
> +{
> + unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
> +
> + return remap_pfn_range(vma, vma->vm_start,
> + pfn + vma->vm_pgoff,
> + vma->vm_end - vma->vm_start,
> + vma->vm_page_prot);
> +}
> +
> +static inline dma_addr_t dma_pseudo_bypass_map_page(struct device *dev,
> + struct page *page,
> + unsigned long offset,
> + size_t size,
> + enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + BUG_ON(dir == DMA_NONE);
> +
> + return dma_pseudo_bypass_get_address(dev, page_to_phys(page) + offset);
> +}
> +
> +static inline void dma_pseudo_bypass_unmap_page(struct device *dev,
> + dma_addr_t dma_address,
> + size_t size,
> + enum dma_data_direction direction,
> + unsigned long attrs)
> +{
> + dma_pseudo_bypass_unmap_address(dev, dma_address);
> +}
> +
> +
> +static int dma_pseudo_bypass_map_sg(struct device *dev, struct scatterlist *sgl,
> + int nents, enum dma_data_direction direction,
> + unsigned long attrs)
> +{
> + struct scatterlist *sg;
> + int i;
> +
> +
> + for_each_sg(sgl, sg, nents, i) {
> + sg->dma_address = dma_pseudo_bypass_get_address(dev, sg_phys(sg));
> + sg->dma_length = sg->length;
> +
> + __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
> + }
> +
> + return nents;
> +}
> +
> +static void dma_pseudo_bypass_unmap_sg(struct device *dev, struct scatterlist *sgl,
> + int nents, enum dma_data_direction direction,
> + unsigned long attrs)
> +{
> + struct scatterlist *sg;
> + int i;
> +
> + for_each_sg(sgl, sg, nents, i) {
> + dma_pseudo_bypass_unmap_address(dev, sg->dma_address);
> + }
> +}
> +
> +static u64 dma_pseudo_bypass_get_required_mask(struct device *dev)
> +{
> + /*
> + * there's no limitation on our end, the driver should just call
> + * set_mask() with as many bits as the device can address.
> + */
> + return -1ULL;
> +}
> +
> +static int dma_pseudo_bypass_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + return dma_addr == -1ULL;
> +}
> +
> +
> +const struct dma_map_ops dma_pseudo_bypass_ops = {
> + .alloc = dma_pseudo_bypass_alloc_coherent,
> + .free = dma_pseudo_bypass_free_coherent,
> + .mmap = dma_pseudo_bypass_mmap_coherent,
> + .map_sg = dma_pseudo_bypass_map_sg,
> + .unmap_sg = dma_pseudo_bypass_unmap_sg,
> + .dma_supported = dma_pseudo_bypass_dma_supported,
> + .map_page = dma_pseudo_bypass_map_page,
> + .unmap_page = dma_pseudo_bypass_unmap_page,
> + .get_required_mask = dma_pseudo_bypass_get_required_mask,
> + .mapping_error = dma_pseudo_bypass_mapping_error,
> +};
> +EXPORT_SYMBOL(dma_pseudo_bypass_ops);
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 17c590087279..d2ca214610fd 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1088,6 +1088,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> pe->pbus = NULL;
> pe->mve_number = -1;
> pe->rid = dev->bus->number << 8 | pdn->devfn;
> + pe->tces = NULL;
>
> pe_info(pe, "Associated device to PE\n");
>
> @@ -1569,6 +1570,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> pe->mve_number = -1;
> pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
> pci_iov_virtfn_devfn(pdev, vf_index);
> + pe->tces = NULL;
>
> pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n",
> hose->global_number, pdev->bus->number,
> @@ -1774,43 +1776,22 @@ static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
> return true;
> }
>
> -/*
> - * Reconfigure TVE#0 to be usable as 64-bit DMA space.
> - *
> - * The first 4GB of virtual memory for a PE is reserved for 32-bit accesses.
> - * Devices can only access more than that if bit 59 of the PCI address is set
> - * by hardware, which indicates TVE#1 should be used instead of TVE#0.
> - * Many PCI devices are not capable of addressing that many bits, and as a
> - * result are limited to the 4GB of virtual memory made available to 32-bit
> - * devices in TVE#0.
> - *
> - * In order to work around this, reconfigure TVE#0 to be suitable for 64-bit
> - * devices by configuring the virtual memory past the first 4GB inaccessible
> - * by 64-bit DMAs. This should only be used by devices that want more than
> - * 4GB, and only on PEs that have no 32-bit devices.
> - *
> - * Currently this will only work on PHB3 (POWER8).
> - */
> -static int pnv_pci_ioda_dma_64bit_bypass(struct pnv_ioda_pe *pe)
> +static int pnv_pci_pseudo_bypass_setup(struct pnv_ioda_pe *pe)
> {
> - u64 window_size, table_size, tce_count, addr;
> + u64 tce_count, table_size, window_size;
> + struct pnv_phb *p = pe->phb;
> struct page *table_pages;
> - u64 tce_order = 28; /* 256MB TCEs */
> __be64 *tces;
> - s64 rc;
> + int rc = -ENOMEM;
>
> - /*
> - * Window size needs to be a power of two, but needs to account for
> - * shifting memory by the 4GB offset required to skip 32bit space.
> - */
> - window_size = roundup_pow_of_two(memory_hotplug_max() + (1ULL << 32));
> - tce_count = window_size >> tce_order;
> + window_size = roundup_pow_of_two(memory_hotplug_max());
> + tce_count = window_size >> p->ioda.max_tce_order;
> table_size = tce_count << 3;
>
> if (table_size < PAGE_SIZE)
> table_size = PAGE_SIZE;
>
> - table_pages = alloc_pages_node(pe->phb->hose->node, GFP_KERNEL,
> + table_pages = alloc_pages_node(p->hose->node, GFP_KERNEL,
> get_order(table_size));
> if (!table_pages)
> goto err;
> @@ -1821,26 +1802,23 @@ static int pnv_pci_ioda_dma_64bit_bypass(struct pnv_ioda_pe *pe)
>
> memset(tces, 0, table_size);
>
> - for (addr = 0; addr < memory_hotplug_max(); addr += (1 << tce_order)) {
> - tces[(addr + (1ULL << 32)) >> tce_order] =
> - cpu_to_be64(addr | TCE_PCI_READ | TCE_PCI_WRITE);
> - }
> + pe->tces = tces;
> + pe->tce_count = tce_count;
>
> rc = opal_pci_map_pe_dma_window(pe->phb->opal_id,
> pe->pe_number,
> - /* reconfigure window 0 */
> (pe->pe_number << 1) + 0,
> 1,
> __pa(tces),
> table_size,
> - 1 << tce_order);
> + 1 << p->ioda.max_tce_order);
> if (rc == OPAL_SUCCESS) {
> - pe_info(pe, "Using 64-bit DMA iommu bypass (through TVE#0)\n");
> + pe_info(pe, "TCE tables configured for pseudo-bypass\n");
> return 0;
> }
> err:
> - pe_err(pe, "Error configuring 64-bit DMA bypass\n");
> - return -EIO;
> + pe_err(pe, "error configuring pseudo-bypass\n");
> + return rc;
> }
>
> static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> @@ -1851,7 +1829,6 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> struct pnv_ioda_pe *pe;
> uint64_t top;
> bool bypass = false;
> - s64 rc;
>
> if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> return -ENODEV;
> @@ -1868,21 +1845,15 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> } else {
> /*
> * If the device can't set the TCE bypass bit but still wants
> - * to access 4GB or more, on PHB3 we can reconfigure TVE#0 to
> - * bypass the 32-bit region and be usable for 64-bit DMAs.
> - * The device needs to be able to address all of this space.
> + * to access 4GB or more, we need to use a different set of DMA
> + * operations with an indirect mapping.
> */
> if (dma_mask >> 32 &&
> - dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
> - pnv_pci_ioda_pe_single_vendor(pe) &&
> - phb->model == PNV_PHB_MODEL_PHB3) {
> - /* Configure the bypass mode */
> - rc = pnv_pci_ioda_dma_64bit_bypass(pe);
> - if (rc)
> - return rc;
> - /* 4GB offset bypasses 32-bit space */
> - set_dma_offset(&pdev->dev, (1ULL << 32));
> - set_dma_ops(&pdev->dev, &dma_nommu_ops);
> + phb->model != PNV_PHB_MODEL_P7IOC &&
> + pnv_pci_ioda_pe_single_vendor(pe)) {
> + if (!pe->tces)
> + pnv_pci_pseudo_bypass_setup(pe);
> + set_dma_ops(&pdev->dev, &dma_pseudo_bypass_ops);
> } else if (dma_mask >> 32 && dma_mask != DMA_BIT_MASK(64)) {
> /*
> * Fail the request if a DMA mask between 32 and 64 bits
> @@ -2071,7 +2042,7 @@ static inline void pnv_pci_phb3_tce_invalidate_pe(struct pnv_ioda_pe *pe)
> __raw_writeq_be(val, invalidate);
> }
>
> -static void pnv_pci_phb3_tce_invalidate(struct pnv_ioda_pe *pe, bool rm,
> +void pnv_pci_phb3_tce_invalidate(struct pnv_ioda_pe *pe, bool rm,
> unsigned shift, unsigned long index,
> unsigned long npages)
> {
> @@ -2611,10 +2582,15 @@ static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift,
> static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
> {
> struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> - table_group);
> + table_group);
> +
> /* Store @tbl as pnv_pci_ioda2_unset_window() resets it */
> struct iommu_table *tbl = pe->table_group.tables[0];
>
> + if (pe->tces)
> + free_pages((unsigned long)pe->tces,
> + get_order(pe->tce_count << 3));
> +
> pnv_pci_ioda2_set_bypass(pe, false);
> pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> if (pe->pbus)
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index c9952def5e93..56846ddc76a2 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -70,6 +70,10 @@ struct pnv_ioda_pe {
> bool tce_bypass_enabled;
> uint64_t tce_bypass_base;
>
> + /* TCE tables for DMA pseudo-bypass */
> + __be64 *tces;
> + u64 tce_count;
> +
> /* MSIs. MVE index is identical for for 32 and 64 bit MSI
> * and -1 if not supported. (It's actually identical to the
> * PE number)
> @@ -211,6 +215,9 @@ extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
> extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> unsigned long *hpa, enum dma_data_direction *direction);
> extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
> +extern void pnv_pci_phb3_tce_invalidate(struct pnv_ioda_pe *pe, bool rm,
> + unsigned shift, unsigned long index,
> + unsigned long npages);
>
> void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
> unsigned char *log_buff);
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: Denise Finger @ 2018-06-30 2:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, ppc-dev, Linux Kernel Mailing List
In-Reply-To: <CA+55aFzZ7PND2Xvz9wB1jaCmp0rBMTSmJtKiFwSeOWy9iLSd8Q@mail.gmail.com>
On 06/29/2018 04:01 PM, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfffffbff, which would have been expected for a . The second call had
>> 0xffffffff, which led to the BUG.
>
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
>
> Then it took an exception (that "interrupt: 700") and that causes
> do_exit() again, and it tries to free the same page table - and now
> it's no longer marked as a page table, because it already went through
> the __ClearPageTable() dance once.
>
> So on the second path through, it catches that "the bit already said
> it wasn't a page table" and does the BUG.
>
> But the real question is what the problem was the *first* time around.
> I assume that has scrolled off the screen? This part:
>
> _exception_pkey+0x58/0x128
> ret_from_except_full+0x0/0x4
> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
> LR = free_pgd_range+0x19c/0x30c
> free_pgtables+0xa/0xb
> exit_mnap+0xf4/0x16c
> mmput+0x64/0xf0
>
> Does reverting that commit 1d40a5ea01d5 make everything work for you?
> Because if so, judging by the deafening silence on this so far, I
> think that's what we should do.
>
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?
The deafening silence may be due to my having an old Microsoft address for
Matthew Wilcox in my first posting. He should now have received the BUG report,
and he may have some suggestions. Yes, reverting commit 1d40a5ea01d5 does permit
the box to boot.
Kirill's patch also works, which seems like a better solution. If any other
architecture bugs on boot, at least we will know where to look. :)
@Kirill: You may add a Reported-by: and Tested-by: Larry Finger
<Larry.Finger@lwfinger.net> to the patch.
Thanks for the help,
Larry
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: Linus Torvalds @ 2018-06-30 2:22 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: aneesh.kumar, Larry Finger, Matthew Wilcox, Kirill A. Shutemov,
Vlastimil Babka, Christoph Lameter, Dave Hansen, Jerome Glisse,
Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
Andrey Ryabinin, Andrew Morton, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, ppc-dev,
Linux Kernel Mailing List
In-Reply-To: <20180629214647.mkgpni6hxj7aore4@kshutemo-mobl1>
On Fri, Jun 29, 2018 at 2:46 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
> Once in __pte_free_tlb() itself and the second time in pgtable_free().
Ahh, that would certainly do it,. and explains why this hits ppc32 but
not x86, for example.
Linus
^ permalink raw reply
* Re: [PATCH v2 1/1] powerpc/pseries: fix EEH recovery of some IOV devices
From: Bjorn Helgaas @ 2018-06-30 1:53 UTC (permalink / raw)
To: Sam Bobroff; +Cc: linuxppc-dev, linux-pci, mpe, bhelgaas, bryantly
In-Reply-To: <7598ffeb48c16c88a34937ad93b18f806222b8df.1527208281.git.sbobroff@linux.ibm.com>
On Fri, May 25, 2018 at 10:31:36AM +1000, Sam Bobroff wrote:
> EEH recovery currently fails on pSeries for some IOV capable PCI
> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
> certain device tree properties for the device. (Found on an IOV
> capable device using the ipr driver.)
>
> Recovery fails in pci_enable_resources() at the check on r->parent,
> because r->flags is set and r->parent is not. This state is due to
> sriov_init() setting the start, end and flags members of the IOV BARs
> but the parent not being set later in
> pseries_pci_fixup_iov_resources(), because the
> "ibm,open-sriov-vf-bar-info" property is missing.
>
> Correct this by zeroing the resource flags for IOV BARs when they
> can't be configured.
>
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
I assume this will be handled by the powerpc folks, since it doesn't touch
drivers/pci. Let me know if you need something from me.
> ---
> Hi,
>
> This is a fix to allow EEH recovery to succeed in a specific situation,
> which I've tried to explain in the commit message.
>
> As with the RFC version, the IOV BARs are disabled by setting the resource
> flags to 0 but the other fields are now left as-is because that is what is done
> elsewhere (see sriov_init() and __pci_read_base()).
>
> I've also examined the concern raised by Bjorn Helgaas, that VFs could be
> enabled later after the BARs are disabled, and it already seems safe: enabling
> VFs (on pseries) depends on another device tree property,
> "ibm,number-of-configurable-vfs" as well as support for the RTAS function
> "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it
> seems unlikely that we would ever see some of them but not all. (None are
> currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the EEH
> recovery failure was discovered doesn't even seem to have SR-IOV support so it
> certainly can't enable VFs.)
>
> Cheers,
> Sam.
> ====== v1 -> v2: ======
>
> Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
> * Moved the BAR disabling code to a function.
> * Also check in pseries_pci_fixup_resources().
>
> ====== v1: ======
>
> Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices
>
> arch/powerpc/platforms/pseries/setup.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index b55ad4286dc7..0a9e4243ae1d 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const int *indexes)
> }
> }
>
> +static void pseries_disable_sriov_resources(struct pci_dev *pdev)
> +{
> + int i;
> +
> + pci_warn(pdev, "No hypervisor support for SR-IOV on this device, IOV BARs disabled.\n");
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
> + pdev->resource[i + PCI_IOV_RESOURCES].flags = 0;
> +}
> +
> static void pseries_pci_fixup_resources(struct pci_dev *pdev)
> {
> const int *indexes;
> @@ -652,10 +661,10 @@ static void pseries_pci_fixup_resources(struct pci_dev *pdev)
>
> /*Firmware must support open sriov otherwise dont configure*/
> indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
> - if (!indexes)
> - return;
> - /* Assign the addresses from device tree*/
> - of_pci_set_vf_bar_size(pdev, indexes);
> + if (indexes)
> + of_pci_set_vf_bar_size(pdev, indexes);
> + else
> + pseries_disable_sriov_resources(pdev);
> }
>
> static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev)
> @@ -667,10 +676,10 @@ static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev)
> return;
> /*Firmware must support open sriov otherwise dont configure*/
> indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
> - if (!indexes)
> - return;
> - /* Assign the addresses from device tree*/
> - of_pci_parse_iov_addrs(pdev, indexes);
> + if (indexes)
> + of_pci_parse_iov_addrs(pdev, indexes);
> + else
> + pseries_disable_sriov_resources(pdev);
> }
>
> static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
> --
> 2.16.1.74.g9b0b1f47b
>
^ permalink raw reply
* [GIT PULL] Please pull powerpc/linux.git powerpc-4.18-3 tag
From: Michael Ellerman @ 2018-06-30 1:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: aneesh.kumar, arnd, hch, leitao, linux-kernel, linuxppc-dev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Linus,
Please pull some more powerpc fixes for 4.18:
The following changes since commit fadd03c615922d8521a2e76d4ba2335891cb2790:
powerpc/mm/hash/4k: Free hugetlb page table caches correctly. (2018-06-20 09:13:25 +1000)
are available in the git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.18-3
for you to fetch changes up to 22db552b50fa11d8c1d171de908a1f9ef62172b7:
powerpc/powermac: Fix rtc read/write functions (2018-06-27 13:48:49 +1000)
- ------------------------------------------------------------------
powerpc fixes for 4.18 #3
Two regression fixes, and a new syscall wire-up.
A fix for the recent conversion to time64_t in the powermac RTC routines, which
caused time to go backward.
Another fix for fallout from the split PMD PTL conversion.
Wire up the new io_pgetevents() syscall.
Thanks to:
Aneesh Kumar K.V, Arnd Bergmann, Breno Leitao, Mathieu Malaterre.
- ------------------------------------------------------------------
Aneesh Kumar K.V (1):
powerpc/mm/32: Fix pgtable_page_dtor call
Arnd Bergmann (1):
powerpc/powermac: Fix rtc read/write functions
Breno Leitao (1):
powerpc: Wire up io_pgetevents
arch/powerpc/include/asm/book3s/32/pgalloc.h | 1 -
arch/powerpc/include/asm/nohash/32/pgalloc.h | 1 -
arch/powerpc/include/asm/systbl.h | 1 +
arch/powerpc/include/asm/unistd.h | 2 +-
arch/powerpc/include/uapi/asm/unistd.h | 1 +
arch/powerpc/platforms/powermac/time.c | 29 +++++++++++++++++++---------
6 files changed, 23 insertions(+), 12 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCAAGBQJbNuG+AAoJEFHr6jzI4aWA1EAP/RdhBfhErOfaR6CN9hB7JT0f
4ibdfYQzfGLhml40w70BnSR1rJZtQLk3TZUeT4bhjWb9GRLmakXf1iwp0yGle6gX
Y+r54Jy3lqD8dHEn1cONQYAOdADBAv1OFZ54cCnch1yvjOxnFTDn4jR0jjLnM8Jk
fD1VBRgCTJT5lkNpfIP2UtbEcy2Y9E4QCCUDCRaAYTPwJQqjIBcuNmxahObvawhe
U1k9FhPcqwUPD0jjEbEBvwnaPEZzvHOnqU4G2elBmHjBSPCFXVWQ+4AdDlAHK67J
qEx3q6Xycw8ATmonwPxxv8eu7Lq0XgSYXB/zUmjp1pvcOCMdYZ0Lei0yyPJfcFWA
PX9RtlENuMbhz6Pm1INhtcm+yxV9v+nzHAI40kVHh+yZLQTK2zt8jZiqFs3ZHOm5
NY55XIWW1ATnKB9T0Teu/efEDTBHtDtTQ5hmJW88UB3Fd9OVjTspz6aWVlTB4VRr
jxT0VedPt3B/zTwEpIeA6W3/WgveIhK8VZ49ntMqAINw4tsc4gc7a6A98riZ9Oxw
YQ9pPqejx9Jw/2AeYhZ1RA/hV9REICIicX6fxR98fkx2ViqNF8OC3YzvQXNJWe3L
EyRx1TLj/8lb0o5A92rOeeL/5QQK+H5PGoIY9gY6NhsDDSAadtHu1oAhftscVopZ
k94DXR9yFZg+aoU2lWDp
=NS+i
-----END PGP SIGNATURE-----
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/pkeys: preallocate execute_only key only if the key is available.
From: Ram Pai @ 2018-06-30 1:40 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Gabriel Paubert, fweimer, mhocko, Ulrich.Weigand, bauerman,
msuchanek, linuxppc-dev
In-Reply-To: <8736x5yts2.fsf@morokweng.localdomain>
On Fri, Jun 29, 2018 at 09:58:37PM -0300, Thiago Jung Bauermann wrote:
>
> Gabriel Paubert <paubert@iram.es> writes:
>
> > On Thu, Jun 28, 2018 at 11:56:34PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Hello,
> >>
> >> Ram Pai <linuxram@us.ibm.com> writes:
> >>
> >> > Key 2 is preallocated and reserved for execute-only key. In rare
> >> > cases if key-2 is unavailable, mprotect(PROT_EXEC) will behave
> >> > incorrectly. NOTE: mprotect(PROT_EXEC) uses execute-only key.
> >> >
> >> > Ensure key 2 is available for preallocation before reserving it for
> >> > execute_only purpose. Problem noticed by Michael Ellermen.
> >>
> >> Since "powerpc/pkeys: Preallocate execute-only key" isn't upstream yet,
> >> this patch could be squashed into it.
> >>
> >> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> >> > ---
> >> > arch/powerpc/mm/pkeys.c | 14 +++++++++-----
> >> > 1 files changed, 9 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
> >> > index cec990c..0b03914 100644
> >> > --- a/arch/powerpc/mm/pkeys.c
> >> > +++ b/arch/powerpc/mm/pkeys.c
> >> > @@ -19,6 +19,7 @@
> >> > u64 pkey_amr_mask; /* Bits in AMR not to be touched */
> >> > u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
> >> > u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */
> >> > +int execute_only_key = 2;
> >> >
> >> > #define AMR_BITS_PER_PKEY 2
> >> > #define AMR_RD_BIT 0x1UL
> >> > @@ -26,7 +27,6 @@
> >> > #define IAMR_EX_BIT 0x1UL
> >> > #define PKEY_REG_BITS (sizeof(u64)*8)
> >> > #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
> >> > -#define EXECUTE_ONLY_KEY 2
> >> >
> >> > static void scan_pkey_feature(void)
> >> > {
> >> > @@ -122,8 +122,12 @@ int pkey_initialize(void)
> >> > #else
> >> > os_reserved = 0;
> >> > #endif
> >> > +
> >> > + if ((pkeys_total - os_reserved) <= execute_only_key)
> >> > + execute_only_key = -1;
> >> > +
> >> > /* Bits are in LE format. */
> >> > - reserved_allocation_mask = (0x1 << 1) | (0x1 << EXECUTE_ONLY_KEY);
> >> > + reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
> >>
> >> My understanding is that left-shifting by a negative amount is undefined
> >> behavior in C. A quick test tells me that at least on the couple of
> >> machines I tested, 1 < -1 = 0. Does GCC guarantee that behavior?
> >
> > Not in general. It probably always works on Power because of the definition
> > of the machine instruction for shifts with variable amount (consider the
> > shift amount unsigned and take it modulo twice the width of the operand),
>
> Ok, thanks for confirming.
>
> > but for example it fails on x86 (1<<-1 gives 0x80000000).
>
> Strange, this works on my laptop with an Intel(R) Core(TM) i5-7300U CPU:
>
> $ cat blah.c
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
> return 0;
> }
> $ make blah
> cc blah.c -o blah
> blah.c: In function 'main':
> blah.c:5:52: warning: left shift count is negative [-Wshift-count-negative]
> printf("1 << -1 = %llx\n", (unsigned long long) 1 << -1);
> ^~
> $ ./blah
> 1 << -1 = 0
My intel box does the same. It makes it zero. So does my
powerpc box. Mathematically, (1 << -1) is nothing but 2^-1,
which is 1/2, which is 0.5, and when rounded it has to be 0.
However, yes, GCC defines it to be 'undefined'. gcc compiler does
warn 'left shift count is negative'. Will have to fix it.
Thanks for catching this.
>
> Even if I change the cast and printf format to int, the result is the
> same. Or am I doing it wrong?
>
> >> If so, a comment pointing this out would make this less confusing.
> >
> > Unless I miss something, this code is run once at boot, so its
> > performance is irrelevant.
> >
> > In this case simply rewrite it as:
> >
> > reserved_allocation_mask = 0x1 << 1;
> > if ( (pkeys_total - os_reserved) <= execute_only_key) {
> > execute_only_key = -1;
> > } else {
> > reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
> > }
I tried hard not to sprikle if-then-else in the code. Makes it less
elegant. But then, correctness is more important than
elegance.
>
> I agree it's clearer and more robust code (except that the first
> line should be inside the if block).
>
> >
> > Caveat, I have assumed that the code will either:
> > - only run once,
> > - pkeys_total and os_reserved are int, not unsigned
>
> Both of the above are true.
yes.
Thanks,
RP
^ permalink raw reply
* Re: [PATCH v05 2/9] hotplug/cpu: Add operation queuing function
From: kbuild test robot @ 2018-06-30 1:11 UTC (permalink / raw)
To: Michael Bringmann
Cc: kbuild-all, linuxppc-dev, Nathan Fontenot, Michael Bringmann,
Thomas Falcon, Tyrel Datwyler, John Allen
In-Reply-To: <c5dd8c32-ffef-126a-442a-24b0190641b0@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]
Hi Michael,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc2 next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-hotplug-Update-affinity-for-migrated-CPUs/20180630-062238
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
arch/powerpc/platforms/pseries/mobility.c: In function 'migration_store':
>> arch/powerpc/platforms/pseries/mobility.c:380:2: error: implicit declaration of function 'dlpar_schedule_delayed_queue'; did you mean 'schedule_delayed_work'? [-Werror=implicit-function-declaration]
dlpar_schedule_delayed_queue();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
schedule_delayed_work
cc1: all warnings being treated as errors
vim +380 arch/powerpc/platforms/pseries/mobility.c
356
357 static ssize_t migration_store(struct class *class,
358 struct class_attribute *attr, const char *buf,
359 size_t count)
360 {
361 u64 streamid;
362 int rc;
363
364 rc = kstrtou64(buf, 0, &streamid);
365 if (rc)
366 return rc;
367
368 do {
369 rc = rtas_ibm_suspend_me(streamid);
370 if (rc == -EAGAIN)
371 ssleep(1);
372 } while (rc == -EAGAIN);
373
374 if (rc)
375 return rc;
376
377 post_mobility_fixup();
378
379 /* Apply any necessary changes identified during fixup */
> 380 dlpar_schedule_delayed_queue();
381
382 return count;
383 }
384
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23378 bytes --]
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/pkeys: preallocate execute_only key only if the key is available.
From: Thiago Jung Bauermann @ 2018-06-30 0:58 UTC (permalink / raw)
To: Gabriel Paubert
Cc: Ram Pai, fweimer, mhocko, Ulrich.Weigand, bauerman, msuchanek,
linuxppc-dev
In-Reply-To: <20180629060715.36xlai5mayyx6j34@lt-gp.iram.es>
Gabriel Paubert <paubert@iram.es> writes:
> On Thu, Jun 28, 2018 at 11:56:34PM -0300, Thiago Jung Bauermann wrote:
>>
>> Hello,
>>
>> Ram Pai <linuxram@us.ibm.com> writes:
>>
>> > Key 2 is preallocated and reserved for execute-only key. In rare
>> > cases if key-2 is unavailable, mprotect(PROT_EXEC) will behave
>> > incorrectly. NOTE: mprotect(PROT_EXEC) uses execute-only key.
>> >
>> > Ensure key 2 is available for preallocation before reserving it for
>> > execute_only purpose. Problem noticed by Michael Ellermen.
>>
>> Since "powerpc/pkeys: Preallocate execute-only key" isn't upstream yet,
>> this patch could be squashed into it.
>>
>> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
>> > ---
>> > arch/powerpc/mm/pkeys.c | 14 +++++++++-----
>> > 1 files changed, 9 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
>> > index cec990c..0b03914 100644
>> > --- a/arch/powerpc/mm/pkeys.c
>> > +++ b/arch/powerpc/mm/pkeys.c
>> > @@ -19,6 +19,7 @@
>> > u64 pkey_amr_mask; /* Bits in AMR not to be touched */
>> > u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
>> > u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */
>> > +int execute_only_key = 2;
>> >
>> > #define AMR_BITS_PER_PKEY 2
>> > #define AMR_RD_BIT 0x1UL
>> > @@ -26,7 +27,6 @@
>> > #define IAMR_EX_BIT 0x1UL
>> > #define PKEY_REG_BITS (sizeof(u64)*8)
>> > #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
>> > -#define EXECUTE_ONLY_KEY 2
>> >
>> > static void scan_pkey_feature(void)
>> > {
>> > @@ -122,8 +122,12 @@ int pkey_initialize(void)
>> > #else
>> > os_reserved = 0;
>> > #endif
>> > +
>> > + if ((pkeys_total - os_reserved) <= execute_only_key)
>> > + execute_only_key = -1;
>> > +
>> > /* Bits are in LE format. */
>> > - reserved_allocation_mask = (0x1 << 1) | (0x1 << EXECUTE_ONLY_KEY);
>> > + reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
>>
>> My understanding is that left-shifting by a negative amount is undefined
>> behavior in C. A quick test tells me that at least on the couple of
>> machines I tested, 1 < -1 = 0. Does GCC guarantee that behavior?
>
> Not in general. It probably always works on Power because of the definition
> of the machine instruction for shifts with variable amount (consider the
> shift amount unsigned and take it modulo twice the width of the operand),
Ok, thanks for confirming.
> but for example it fails on x86 (1<<-1 gives 0x80000000).
Strange, this works on my laptop with an Intel(R) Core(TM) i5-7300U CPU:
$ cat blah.c
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("1 << -1 = %llx\n", (unsigned long long) 1 << -1);
return 0;
}
$ make blah
cc blah.c -o blah
blah.c: In function 'main':
blah.c:5:52: warning: left shift count is negative [-Wshift-count-negative]
printf("1 << -1 = %llx\n", (unsigned long long) 1 << -1);
^~
$ ./blah
1 << -1 = 0
Even if I change the cast and printf format to int, the result is the
same. Or am I doing it wrong?
>> If so, a comment pointing this out would make this less confusing.
>
> Unless I miss something, this code is run once at boot, so its
> performance is irrelevant.
>
> In this case simply rewrite it as:
>
> reserved_allocation_mask = 0x1 << 1;
> if ( (pkeys_total - os_reserved) <= execute_only_key) {
> execute_only_key = -1;
> } else {
> reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
> }
I agree it's clearer and more robust code (except that the first
line should be inside the if block).
>
> Caveat, I have assumed that the code will either:
> - only run once,
> - pkeys_total and os_reserved are int, not unsigned
Both of the above are true.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
From: Segher Boessenkool @ 2018-06-30 0:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Larry Finger, Randy Dunlap, Dave Hansen, Lai Jiangshan,
Linux Kernel Mailing List, Matthew Wilcox, Pekka Enberg,
Jerome Glisse, Paul Mackerras, Kirill A. Shutemov,
Martin Schwidefsky, Andrey Ryabinin, Christoph Lameter, ppc-dev,
Andrew Morton, Vlastimil Babka
In-Reply-To: <CA+55aFzZ7PND2Xvz9wB1jaCmp0rBMTSmJtKiFwSeOWy9iLSd8Q@mail.gmail.com>
On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
> But the real question is what the problem was the *first* time around.
> I assume that has scrolled off the screen? This part:
>
> _exception_pkey+0x58/0x128
> ret_from_except_full+0x0/0x4
> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
> LR = free_pgd_range+0x19c/0x30c
> free_pgtables+0xa/0xb
> exit_mnap+0xf4/0x16c
> mmput+0x64/0xf0
>
> Does reverting that commit 1d40a5ea01d5 make everything work for you?
> Because if so, judging by the deafening silence on this so far, I
> think that's what we should do.
>
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?
700 is "program interrupt"; here it probably means a BUG() happened (which
does a trap instruction, which causes a 700). The stuff that scrolled away
should tell more.
Segher
^ permalink raw reply
* [PATCH v02 5/5] migration/memory: Support 'ibm,dynamic-memory-v2'
From: Michael Bringmann @ 2018-06-29 22:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
In-Reply-To: <adc67a97-b800-b533-7993-516fc254b6a2@linux.vnet.ibm.com>
migration/memory: This patch adds recognition for changes to the
associativity of memory blocks described by 'ibm,dynamic-memory-v2'.
If the associativity of an LMB has changed, it should be readded to
the system in order to update local and general kernel data structures.
This patch builds upon previous enhancements that scan the device-tree
"ibm,dynamic-memory" properties using the base LMB array, and a copy
derived from the updated properties.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 7dcc3e9..5c7d9c0 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1199,7 +1199,8 @@ static int pseries_memory_notifier(struct notifier_block *nb,
err = pseries_remove_mem_node(rd->dn);
break;
case OF_RECONFIG_UPDATE_PROPERTY:
- if (!strcmp(rd->prop->name, "ibm,dynamic-memory")) {
+ if (!strcmp(rd->prop->name, "ibm,dynamic-memory") ||
+ !strcmp(rd->prop->name, "ibm,dynamic-memory-v2")) {
struct drmem_lmb_info *dinfo =
drmem_lmbs_init(rd->prop);
if (!dinfo)
^ permalink raw reply related
* [PATCH v02 4/5] migration/memory: Evaluate LMB assoc changes
From: Michael Bringmann @ 2018-06-29 22:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
In-Reply-To: <adc67a97-b800-b533-7993-516fc254b6a2@linux.vnet.ibm.com>
migration/memory: This patch adds code that recognizes changes to
the associativity of memory blocks described by the device-tree
properties in order to drive equivalent 'hotplug' operations to
update local and general kernel data structures to reflect those
changes. These differences may include:
* Evaluate 'ibm,dynamic-memory' properties when processing the
updated device-tree properties of the system during Post Migration
events (migration_store). The new functionality looks for changes
to the aa_index values for each drc_index/LMB to identify any memory
blocks that should be readded.
* In an LPAR migration scenario, the "ibm,associativity-lookup-arrays"
property may change. In the event that a row of the array differs,
locate all assigned memory blocks with that 'aa_index' and 're-add'
them to the system memory block data structures. In the process of
the 're-add', the system routines will update the corresponding entry
for the memory in the LMB structures and any other relevant kernel
data structures.
A number of previous extensions made to the DRMEM code for scanning
device-tree properties and creating LMB arrays are used here to
ensure that the resulting code is simpler and more usable:
* Use new paired list iterator for the DRMEM LMB info arrays to find
differences in old and new versions of properties.
* Use new iterator for copies of the DRMEM info arrays to evaluate
completely new structures.
* Combine common code for parsing and evaluating memory description
properties based on the DRMEM LMB array model to greatly simplify
extension from the older property 'ibm,dynamic-memory' to the new
property model of 'ibm,dynamic-memory-v2'.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in v02:
-- Modify the code that parses the memory affinity attributes to
mark relevant DRMEM LMB array entries using the internal_flags
mechanism instead of generate unique hotplug actions for each
memory block to be readded. The change is intended to both
simplify the code, and to require fewer resources on systems
with huge amounts of memory.
-- Save up notice about any all LMB entries until the end of the
'migration_store' operation at which point a single action is
queued to scan the entire DRMEM array.
-- Add READD_MULTIPLE function for memory that scans the DRMEM
array to identify multiple entries that were marked previously.
The corresponding memory blocks are to be readded to the system
to update relevant data structures outside of the powerpc-
specific code.
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 216 +++++++++++++++++++----
arch/powerpc/platforms/pseries/mobility.c | 1
arch/powerpc/platforms/pseries/pseries.h | 4
3 files changed, 187 insertions(+), 34 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f5..7dcc3e9 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -561,8 +561,11 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
}
}
- if (!lmb_found)
- rc = -EINVAL;
+ if (!lmb_found) {
+ pr_info("Failed to update memory for drc index %lx\n",
+ (unsigned long) drc_index);
+ return -EINVAL;
+ }
if (rc)
pr_info("Failed to update memory at %llx\n",
@@ -573,6 +576,30 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
return rc;
}
+static int dlpar_memory_readd_multiple(void)
+{
+ struct drmem_lmb *lmb;
+ int rc;
+
+ pr_info("Attempting to update multiple LMBs\n");
+
+ for_each_drmem_lmb(lmb) {
+ if (drmem_lmb_update(lmb)) {
+ rc = dlpar_remove_lmb(lmb);
+
+ if (!rc) {
+ rc = dlpar_add_lmb(lmb);
+ if (rc)
+ dlpar_release_drc(lmb->drc_index);
+ }
+
+ drmem_remove_lmb_update(lmb);
+ }
+ }
+
+ return rc;
+}
+
static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
{
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
@@ -673,6 +700,10 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
{
return -EOPNOTSUPP;
}
+static int dlpar_memory_readd_multiple(void)
+{
+ return -EOPNOTSUPP;
+}
static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
{
@@ -952,6 +983,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
drc_index = hp_elog->_drc_u.drc_index;
rc = dlpar_memory_readd_by_index(drc_index);
break;
+ case PSERIES_HP_ELOG_ACTION_READD_MULTIPLE:
+ rc = dlpar_memory_readd_multiple();
+ break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;
@@ -994,13 +1028,39 @@ static int pseries_add_mem_node(struct device_node *np)
return (ret < 0) ? -EINVAL : 0;
}
-static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
+static int pmt_changes = 0;
+
+void dlpar_memory_pmt_changes_set(void)
+{
+ pmt_changes = 1;
+}
+
+void dlpar_memory_pmt_changes_clear(void)
+{
+ pmt_changes = 0;
+}
+
+int dlpar_memory_pmt_changes(void)
{
- struct of_drconf_cell_v1 *new_drmem, *old_drmem;
+ return pmt_changes;
+}
+
+void dlpar_memory_pmt_changes_action(void)
+{
+ if (dlpar_memory_pmt_changes()) {
+ dlpar_queue_action(
+ PSERIES_HP_ELOG_RESOURCE_MEM,
+ PSERIES_HP_ELOG_ACTION_READD_MULTIPLE,
+ 0);
+ dlpar_memory_pmt_changes_clear();
+ }
+}
+
+static int pseries_update_drconf_memory(struct drmem_lmb_info *new_dinfo)
+{
+ struct drmem_lmb *old_lmb, *new_lmb;
unsigned long memblock_size;
- u32 entries;
- __be32 *p;
- int i, rc = -EINVAL;
+ int rc = 0;
if (rtas_hp_event)
return 0;
@@ -1009,42 +1069,122 @@ static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
if (!memblock_size)
return -EINVAL;
- p = (__be32 *) pr->old_prop->value;
- if (!p)
- return -EINVAL;
+ /* Arrays should have the same size and DRC indexes */
+ for_each_pair_dinfo_lmb(drmem_info, old_lmb, new_dinfo, new_lmb) {
- /* The first int of the property is the number of lmb's described
- * by the property. This is followed by an array of of_drconf_cell
- * entries. Get the number of entries and skip to the array of
- * of_drconf_cell's.
- */
- entries = be32_to_cpu(*p++);
- old_drmem = (struct of_drconf_cell_v1 *)p;
-
- p = (__be32 *)pr->prop->value;
- p++;
- new_drmem = (struct of_drconf_cell_v1 *)p;
+ if (new_lmb->drc_index != old_lmb->drc_index)
+ continue;
- for (i = 0; i < entries; i++) {
- if ((be32_to_cpu(old_drmem[i].flags) & DRCONF_MEM_ASSIGNED) &&
- (!(be32_to_cpu(new_drmem[i].flags) & DRCONF_MEM_ASSIGNED))) {
+ if ((old_lmb->flags & DRCONF_MEM_ASSIGNED) &&
+ (!(new_lmb->flags & DRCONF_MEM_ASSIGNED))) {
rc = pseries_remove_memblock(
- be64_to_cpu(old_drmem[i].base_addr),
- memblock_size);
+ old_lmb->base_addr, memblock_size);
break;
- } else if ((!(be32_to_cpu(old_drmem[i].flags) &
- DRCONF_MEM_ASSIGNED)) &&
- (be32_to_cpu(new_drmem[i].flags) &
- DRCONF_MEM_ASSIGNED)) {
- rc = memblock_add(be64_to_cpu(old_drmem[i].base_addr),
- memblock_size);
+ } else if ((!(old_lmb->flags & DRCONF_MEM_ASSIGNED)) &&
+ (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
+ rc = memblock_add(old_lmb->base_addr,
+ memblock_size);
rc = (rc < 0) ? -EINVAL : 0;
break;
+ } else if ((old_lmb->aa_index != new_lmb->aa_index) &&
+ (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
+ drmem_mark_lmb_update(old_lmb);
+ dlpar_memory_pmt_changes_set();
}
}
return rc;
}
+static void pseries_update_ala_memory_aai(int aa_index)
+{
+ struct drmem_lmb *lmb;
+
+ /* Readd all LMBs which were previously using the
+ * specified aa_index value.
+ */
+ for_each_drmem_lmb(lmb) {
+ if ((lmb->aa_index == aa_index) &&
+ (lmb->flags & DRCONF_MEM_ASSIGNED)) {
+ drmem_mark_lmb_update(lmb);
+ dlpar_memory_pmt_changes_set();
+ }
+ }
+}
+
+struct assoc_arrays {
+ u32 n_arrays;
+ u32 array_sz;
+ const __be32 *arrays;
+};
+
+static int pseries_update_ala_memory(struct of_reconfig_data *pr)
+{
+ struct assoc_arrays new_ala, old_ala;
+ __be32 *p;
+ int i, lim;
+
+ if (rtas_hp_event)
+ return 0;
+
+ /*
+ * The layout of the ibm,associativity-lookup-arrays
+ * property is a number N indicating the number of
+ * associativity arrays, followed by a number M
+ * indicating the size of each associativity array,
+ * followed by a list of N associativity arrays.
+ */
+
+ p = (__be32 *) pr->old_prop->value;
+ if (!p)
+ return -EINVAL;
+ old_ala.n_arrays = of_read_number(p++, 1);
+ old_ala.array_sz = of_read_number(p++, 1);
+ old_ala.arrays = p;
+
+ p = (__be32 *) pr->prop->value;
+ if (!p)
+ return -EINVAL;
+ new_ala.n_arrays = of_read_number(p++, 1);
+ new_ala.array_sz = of_read_number(p++, 1);
+ new_ala.arrays = p;
+
+ lim = (new_ala.n_arrays > old_ala.n_arrays) ? old_ala.n_arrays :
+ new_ala.n_arrays;
+
+ if (old_ala.array_sz == new_ala.array_sz) {
+
+ /* Reset any entries where the old and new rows
+ * the array have changed.
+ */
+ for (i = 0; i < lim; i++) {
+ int index = (i * new_ala.array_sz);
+
+ if (!memcmp(&old_ala.arrays[index],
+ &new_ala.arrays[index],
+ new_ala.array_sz))
+ continue;
+
+ pseries_update_ala_memory_aai(i);
+ }
+
+ /* Reset any entries representing the extra rows.
+ * There shouldn't be any, but just in case ...
+ */
+ for (i = lim; i < new_ala.n_arrays; i++)
+ pseries_update_ala_memory_aai(i);
+
+ } else {
+ /* Update all entries representing these rows;
+ * as all rows have different sizes, none can
+ * have equivalent values.
+ */
+ for (i = 0; i < lim; i++)
+ pseries_update_ala_memory_aai(i);
+ }
+
+ return 0;
+}
+
static int pseries_memory_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -1059,8 +1199,16 @@ static int pseries_memory_notifier(struct notifier_block *nb,
err = pseries_remove_mem_node(rd->dn);
break;
case OF_RECONFIG_UPDATE_PROPERTY:
- if (!strcmp(rd->prop->name, "ibm,dynamic-memory"))
- err = pseries_update_drconf_memory(rd);
+ if (!strcmp(rd->prop->name, "ibm,dynamic-memory")) {
+ struct drmem_lmb_info *dinfo =
+ drmem_lmbs_init(rd->prop);
+ if (!dinfo)
+ return -EINVAL;
+ err = pseries_update_drconf_memory(dinfo);
+ drmem_lmbs_free(dinfo);
+ } else if (!strcmp(rd->prop->name,
+ "ibm,associativity-lookup-arrays"))
+ err = pseries_update_ala_memory(rd);
break;
}
return notifier_from_errno(err);
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index d0d1cae..e245a88 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -380,6 +380,7 @@ static ssize_t migration_store(struct class *class,
post_mobility_fixup();
/* Apply any necessary changes identified during fixup */
+ dlpar_memory_pmt_changes_action();
dlpar_queued_actions_run();
return count;
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 72ca996..a4e3e08 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -71,6 +71,10 @@ static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
return -EOPNOTSUPP;
}
#endif
+void dlpar_memory_pmt_changes_set(void);
+void dlpar_memory_pmt_changes_clear(void);
+int dlpar_memory_pmt_changes(void);
+void dlpar_memory_pmt_changes_action(void);
#ifdef CONFIG_HOTPLUG_CPU
int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
^ permalink raw reply related
* [PATCH v02 3/5] migration/memory: Add hotplug READD_MULTIPLE
From: Michael Bringmann @ 2018-06-29 22:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
In-Reply-To: <adc67a97-b800-b533-7993-516fc254b6a2@linux.vnet.ibm.com>
migration/memory: This patch adds a new pseries hotplug action
for CPU and memory operations, PSERIES_HP_ELOG_ACTION_READD_MULTIPLE.
This is a variant of the READD operation which performs the action
upon multiple instances of the resource at one time. The operation
is to be triggered by device-tree analysis of updates by RTAS events
analyzed by 'migation_store' during post-migration processing. It
will be used for memory updates, initially.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 1 +
arch/powerpc/mm/drmem.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 4ab605a..bb4294b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -322,6 +322,7 @@ struct pseries_hp_errorlog {
#define PSERIES_HP_ELOG_ACTION_ADD 1
#define PSERIES_HP_ELOG_ACTION_REMOVE 2
#define PSERIES_HP_ELOG_ACTION_READD 3
+#define PSERIES_HP_ELOG_ACTION_READD_MULTIPLE 4
#define PSERIES_HP_ELOG_ID_DRC_NAME 1
#define PSERIES_HP_ELOG_ID_DRC_INDEX 2
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index fd2cae92..2228586 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -422,6 +422,7 @@ static void init_drmem_v2_lmbs(const __be32 *prop,
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
+ lmb->internal_flags = 0;
}
}
}
^ permalink raw reply related
* [PATCH v02 2/5] powerpc/drmem: Add internal_flags feature
From: Michael Bringmann @ 2018-06-29 22:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
In-Reply-To: <adc67a97-b800-b533-7993-516fc254b6a2@linux.vnet.ibm.com>
powerpc/drmem: Add internal_flags field to each LMB to allow
marking of kernel software-specific operations that need not
be exported to other users. For instance, if information about
selected LMBs needs to be maintained for subsequent passes
through the system, it can be encoded into the LMB array itself
without requiring the allocation and maintainance of additional
data structures.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/drmem.h | 18 ++++++++++++++++++
arch/powerpc/mm/drmem.c | 2 ++
2 files changed, 20 insertions(+)
diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index b0e70fd..acb6539 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -17,6 +17,7 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
+ u32 internal_flags;
};
struct drmem_lmb_info {
@@ -101,6 +102,23 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
return lmb->flags & DRMEM_LMB_RESERVED;
}
+#define DRMEM_LMBINT_UPDATE 0x00000001
+
+static inline void drmem_mark_lmb_update(struct drmem_lmb *lmb)
+{
+ lmb->internal_flags |= DRMEM_LMBINT_UPDATE;
+}
+
+static inline void drmem_remove_lmb_update(struct drmem_lmb *lmb)
+{
+ lmb->internal_flags &= ~DRMEM_LMBINT_UPDATE;
+}
+
+static inline bool drmem_lmb_update(struct drmem_lmb *lmb)
+{
+ return lmb->internal_flags & DRMEM_LMBINT_UPDATE;
+}
+
u64 drmem_lmb_memory_max(void);
void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 13d2abb..fd2cae92 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -207,6 +207,7 @@ static void read_drconf_v1_cell(struct drmem_lmb *lmb,
lmb->aa_index = of_read_number(p++, 1);
lmb->flags = of_read_number(p++, 1);
+ lmb->internal_flags = 0;
*prop = p;
}
@@ -265,6 +266,7 @@ static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
lmb.aa_index = dr_cell.aa_index;
lmb.flags = dr_cell.flags;
+ lmb.internal_flags = 0;
func(&lmb, &usm);
}
^ permalink raw reply related
* [PATCH v02 1/5] powerpc/drmem: Export 'dynamic-memory' loader
From: Michael Bringmann @ 2018-06-29 22:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
In-Reply-To: <adc67a97-b800-b533-7993-516fc254b6a2@linux.vnet.ibm.com>
powerpc/drmem: Export many of the functions of DRMEM to parse
"ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during hotplug
operations and for Post Migration events.
Also modify the DRMEM initialization code to allow it to,
* Be called after system initialization
* Provide a separate user copy of the LMB array that is produces
* Free the user copy upon request
In addition, a couple of changes were made to make the creation
of additional copies of the LMB array more useful including,
* Add new iterator to work through a pair of drmem_info arrays.
* Modify DRMEM code to replace usages of dt_root_addr_cells, and
dt_mem_next_cell, as these are only available at first boot.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/drmem.h | 15 ++++++++
arch/powerpc/mm/drmem.c | 75 ++++++++++++++++++++++++++++----------
2 files changed, 70 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index ce242b9..b0e70fd 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -35,6 +35,18 @@ struct drmem_lmb_info {
&drmem_info->lmbs[0], \
&drmem_info->lmbs[drmem_info->n_lmbs - 1])
+#define for_each_dinfo_lmb(dinfo, lmb) \
+ for_each_drmem_lmb_in_range((lmb), \
+ &dinfo->lmbs[0], \
+ &dinfo->lmbs[dinfo->n_lmbs - 1])
+
+#define for_each_pair_dinfo_lmb(dinfo1, lmb1, dinfo2, lmb2) \
+ for ((lmb1) = (&dinfo1->lmbs[0]), \
+ (lmb2) = (&dinfo2->lmbs[0]); \
+ ((lmb1) <= (&dinfo1->lmbs[dinfo1->n_lmbs - 1])) && \
+ ((lmb2) <= (&dinfo2->lmbs[dinfo2->n_lmbs - 1])); \
+ (lmb1)++, (lmb2)++)
+
/*
* The of_drconf_cell_v1 struct defines the layout of the LMB data
* specified in the ibm,dynamic-memory device tree property.
@@ -94,6 +106,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
int drmem_update_dt(void);
+struct drmem_lmb_info *drmem_lmbs_init(struct property *prop);
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
+
#ifdef CONFIG_PPC_PSERIES
void __init walk_drmem_lmbs_early(unsigned long node,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 3f18036..13d2abb 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -20,6 +20,7 @@
static struct drmem_lmb_info __drmem_info;
struct drmem_lmb_info *drmem_info = &__drmem_info;
+static int n_root_addr_cells;
u64 drmem_lmb_memory_max(void)
{
@@ -193,12 +194,13 @@ int drmem_update_dt(void)
return rc;
}
-static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
+static void read_drconf_v1_cell(struct drmem_lmb *lmb,
const __be32 **prop)
{
const __be32 *p = *prop;
- lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
+ lmb->base_addr = of_read_number(p, n_root_addr_cells);
+ p += n_root_addr_cells;
lmb->drc_index = of_read_number(p++, 1);
p++; /* skip reserved field */
@@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
*prop = p;
}
-static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
{
struct drmem_lmb lmb;
@@ -225,13 +227,14 @@ static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
}
}
-static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
+static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
const __be32 **prop)
{
const __be32 *p = *prop;
dr_cell->seq_lmbs = of_read_number(p++, 1);
- dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
+ dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
+ p += n_root_addr_cells;
dr_cell->drc_index = of_read_number(p++, 1);
dr_cell->aa_index = of_read_number(p++, 1);
dr_cell->flags = of_read_number(p++, 1);
@@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
*prop = p;
}
-static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
{
struct of_drconf_cell_v2 dr_cell;
@@ -275,6 +278,9 @@ void __init walk_drmem_lmbs_early(unsigned long node,
const __be32 *prop, *usm;
int len;
+ if (n_root_addr_cells == 0)
+ n_root_addr_cells = dt_root_addr_cells;
+
prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
if (!prop || len < dt_root_size_cells * sizeof(__be32))
return;
@@ -353,24 +359,26 @@ void __init walk_drmem_lmbs(struct device_node *dn,
}
}
-static void __init init_drmem_v1_lmbs(const __be32 *prop)
+static void init_drmem_v1_lmbs(const __be32 *prop,
+ struct drmem_lmb_info *dinfo)
{
struct drmem_lmb *lmb;
- drmem_info->n_lmbs = of_read_number(prop++, 1);
- if (drmem_info->n_lmbs == 0)
+ dinfo->n_lmbs = of_read_number(prop++, 1);
+ if (dinfo->n_lmbs == 0)
return;
- drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
+ dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
GFP_KERNEL);
- if (!drmem_info->lmbs)
+ if (!dinfo->lmbs)
return;
- for_each_drmem_lmb(lmb)
+ for_each_dinfo_lmb(dinfo, lmb)
read_drconf_v1_cell(lmb, &prop);
}
-static void __init init_drmem_v2_lmbs(const __be32 *prop)
+static void init_drmem_v2_lmbs(const __be32 *prop,
+ struct drmem_lmb_info *dinfo)
{
struct drmem_lmb *lmb;
struct of_drconf_cell_v2 dr_cell;
@@ -386,12 +394,12 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
p = prop;
for (i = 0; i < lmb_sets; i++) {
read_drconf_v2_cell(&dr_cell, &p);
- drmem_info->n_lmbs += dr_cell.seq_lmbs;
+ dinfo->n_lmbs += dr_cell.seq_lmbs;
}
- drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
+ dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
GFP_KERNEL);
- if (!drmem_info->lmbs)
+ if (!dinfo->lmbs)
return;
/* second pass, read in the LMB information */
@@ -402,10 +410,10 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
read_drconf_v2_cell(&dr_cell, &p);
for (j = 0; j < dr_cell.seq_lmbs; j++) {
- lmb = &drmem_info->lmbs[lmb_index++];
+ lmb = &dinfo->lmbs[lmb_index++];
lmb->base_addr = dr_cell.base_addr;
- dr_cell.base_addr += drmem_info->lmb_size;
+ dr_cell.base_addr += dinfo->lmb_size;
lmb->drc_index = dr_cell.drc_index;
dr_cell.drc_index++;
@@ -416,11 +424,38 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
}
}
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo)
+{
+ if (dinfo) {
+ kfree(dinfo->lmbs);
+ kfree(dinfo);
+ }
+}
+
+struct drmem_lmb_info *drmem_lmbs_init(struct property *prop)
+{
+ struct drmem_lmb_info *dinfo;
+
+ dinfo = kzalloc(sizeof(*dinfo), GFP_KERNEL);
+ if (!dinfo)
+ return NULL;
+
+ if (!strcmp("ibm,dynamic-memory", prop->name))
+ init_drmem_v1_lmbs(prop->value, dinfo);
+ else if (!strcmp("ibm,dynamic-memory-v2", prop->name))
+ init_drmem_v2_lmbs(prop->value, dinfo);
+
+ return dinfo;
+}
+
static int __init drmem_init(void)
{
struct device_node *dn;
const __be32 *prop;
+ if (n_root_addr_cells == 0)
+ n_root_addr_cells = dt_root_addr_cells;
+
dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
if (!dn) {
pr_info("No dynamic reconfiguration memory found\n");
@@ -434,11 +469,11 @@ static int __init drmem_init(void)
prop = of_get_property(dn, "ibm,dynamic-memory", NULL);
if (prop) {
- init_drmem_v1_lmbs(prop);
+ init_drmem_v1_lmbs(prop, drmem_info);
} else {
prop = of_get_property(dn, "ibm,dynamic-memory-v2", NULL);
if (prop)
- init_drmem_v2_lmbs(prop);
+ init_drmem_v2_lmbs(prop, drmem_info);
}
of_node_put(dn);
^ permalink raw reply related
* [PATCH v02 0/5] powerpc/migration: Affinity fix for memory
From: Michael Bringmann @ 2018-06-29 22:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
Thomas Falcon
The migration of LPARs across Power systems affects many attributes
including that of the associativity of memory blocks. The patches
in this set execute when a system is coming up fresh upon a migration
target. They are intended to,
* Recognize changes to the associativity of memory recorded in
internal data structures when compared to the latest copies in
the device tree (e.g. ibm,dynamic-memory, ibm,dynamic-memory-v2).
* Recognize changes to the associativity mapping (e.g. ibm,
associativity-lookup-arrays), locate all assigned memory blocks
corresponding to each changed row, and readd all such blocks.
* Generate calls to other code layers to reset the data structures
related to associativity of memory.
* Re-register the 'changed' entities into the target system.
Re-registration of memory blocks mostly entails acting as if they
have been newly hot-added into the target system.
This code builds upon features introduced in a previous patch set
that updates CPUs for affinity changes that may occur during LPM.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Michael Bringmann (5):
powerpc/drmem: Export 'dynamic-memory' loader
powerpc/drmem: Add internal_flags feature
migration/memory: Add hotplug flags READD_MULTIPLE
migration/memory: Evaluate LMB assoc changes
migration/memory: Support 'ibm,dynamic-memory-v2'
---
Changes in v02:
-- Change operation to tag changed LMBs in DRMEM array instead
of queuing a potentially huge number of structures.
-- Added another hotplug queue event for CPU/memory operations
-- Added internal_flags feature to DRMEM
-- Improve the patch description language for the patch set.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox