* Re: [PATCH v4 0/3] powernv/idle: Power9 idle cleanup
From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw)
To: npiggin, ego, pratik.r.sampat, Pratik Rajesh Sampat, linuxppc-dev,
mikey, paulus, benh, svaidy, linux-kernel, mpe
In-Reply-To: <20200721153708.89057-1-psampat@linux.ibm.com>
On Tue, 21 Jul 2020 21:07:05 +0530, Pratik Rajesh Sampat wrote:
> v3: https://lkml.org/lkml/2020/7/17/1093
> Changelog v3-->v4:
> Based on comments from Nicholas Piggin and Gautham Shenoy,
> 1. Changed the naming of pnv_first_spr_loss_level from
> pnv_first_fullstate_loss_level to deep_spr_loss_state
> 2. Make the P9 PVR check only on the top level function
> pnv_probe_idle_states and let the rest of the checks be DT based because
> it is faster to do so
>
> [...]
Applied to powerpc/next.
[1/3] powerpc/powernv/idle: Replace CPU feature check with PVR check
https://git.kernel.org/powerpc/c/8747bf36f312356f8a295a0c39ff092d65ce75ae
[2/3] powerpc/powernv/idle: Rename pnv_first_spr_loss_level variable
https://git.kernel.org/powerpc/c/dcbbfa6b05daca94ebcdbce80a7cf05c717d2942
[3/3] powerpc/powernv/idle: Exclude mfspr on HID1, 4, 5 on P9 and above
https://git.kernel.org/powerpc/c/5c92fb1b46102e1efe0eed69e743f711bc1c7d2e
cheers
^ permalink raw reply
* Re: [PATCH v2 1/2] powerpc/mce: Add MCE notification chain
From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw)
To: Santosh Sivaraj, linuxppc-dev
Cc: Aneesh Kumar K.V, Ganesh Goudar, Oliver, Mahesh Salgaonkar,
Vaibhav Jain
In-Reply-To: <20200709135142.721504-1-santosh@fossix.org>
On Thu, 9 Jul 2020 19:21:41 +0530, Santosh Sivaraj wrote:
> Introduce notification chain which lets us know about uncorrected memory
> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> to track bad blocks for better handling of persistent memory allocations.
Applied to powerpc/next.
[1/2] powerpc/mce: Add MCE notification chain
https://git.kernel.org/powerpc/c/c37a63afc429ce959402168f67e4f094ab639ace
[2/2] powerpc/papr/scm: Add bad memory ranges to nvdimm bad ranges
https://git.kernel.org/powerpc/c/85343a8da2d969df1a10ada8f7cb857d52ea70a6
cheers
^ permalink raw reply
* Re: [PATCH trivial] ppc64/mm: remove comment that is no longer valid
From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw)
To: Santosh Sivaraj, linuxppc-dev
In-Reply-To: <20200721091915.205006-1-santosh@fossix.org>
On Tue, 21 Jul 2020 14:49:15 +0530, Santosh Sivaraj wrote:
> hash_low_64.S was removed in [1] and since flush_hash_page is not called
> from any assembly routine.
>
> [1]: commit a43c0eb8364c0 ("powerpc/mm: Convert 4k insert from asm to C")
Applied to powerpc/next.
[1/1] powerpc/mm/hash64: Remove comment that is no longer valid
https://git.kernel.org/powerpc/c/69507b984ddce803df81215cc7813825189adafa
cheers
^ permalink raw reply
* Re: [PATCH -next] powerpc: Remove unneeded inline functions
From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw)
To: npiggin, haren, paulus, dave.hansen, benh, YueHaibing, mpe
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200717112714.19304-1-yuehaibing@huawei.com>
On Fri, 17 Jul 2020 19:27:14 +0800, YueHaibing wrote:
> Both of those functions are only called from 64-bit only code, so the
> stubs should not be needed at all.
Applied to powerpc/next.
[1/1] powerpc: Remove unneeded inline functions
https://git.kernel.org/powerpc/c/a3f3f8aa1f72dafe1450ccf8cbdfb1d12d42853a
cheers
^ permalink raw reply
* Re: [PATCH 1/1 V4] : PCIE PHB reset
From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw)
To: linuxppc-dev, wenxiong@linux.vnet.ibm.com
Cc: brking, oohall, bobroff, wenxiong
In-Reply-To: <1594651173-32166-1-git-send-email-wenxiong@linux.vnet.ibm.com>
On Mon, 13 Jul 2020 09:39:33 -0500, wenxiong@linux.vnet.ibm.com wrote:
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code when triggering kdump. PHB reset stop all PCI
> transactions from normal kernel. We have tested the patch in several
> enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/pseries: PCIE PHB reset
https://git.kernel.org/powerpc/c/5a090f7c363fdc09b99222eae679506a58e7cc68
cheers
^ permalink raw reply
* Re: [PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
From: Hari Bathini @ 2020-07-24 14:08 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <875zad6ajx.fsf@morokweng.localdomain>
On 24/07/20 5:36 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
>> Kdump kernel, used for capturing the kernel core image, is supposed
>> to use only specific memory regions to avoid corrupting the image to
>> be captured. The regions are crashkernel range - the memory reserved
>> explicitly for kdump kernel, memory used for the tce-table, the OPAL
>> region and RTAS region as applicable. Restrict kdump kernel memory
>> to use only these regions by setting up usable-memory DT property.
>> Also, tell the kdump kernel to run at the loaded address by setting
>> the magic word at 0x5c.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> Tested-by: Pingfan Liu <piliu@redhat.com>
>> ---
>>
>> v3 -> v4:
>> * Updated get_node_path() to be an iterative function instead of a
>> recursive one.
>> * Added comment explaining why low memory is added to kdump kernel's
>> usable memory ranges though it doesn't fall in crashkernel region.
>> * For correctness, added fdt_add_mem_rsv() for the low memory being
>> added to kdump kernel's usable memory ranges.
>
> Good idea.
>
>> * Fixed prop pointer update in add_usable_mem_property() and changed
>> duple to tuple as suggested by Thiago.
>
> <snip>
>
>> +/**
>> + * get_node_pathlen - Get the full path length of the given node.
>> + * @dn: Node.
>> + *
>> + * Also, counts '/' at the end of the path.
>> + * For example, /memory@0 will be "/memory@0/\0" => 11 bytes.
>
> Wouldn't this function return 10 in the case of /memory@0?
Actually, it does return 11. +1 while returning is for counting %NUL.
On top of that we count an extra '/' for root node.. so, it ends up as 11.
('/'memory@0'/''\0'). Note the extra '/' before '\0'. Let me handle root node
separately. That should avoid the confusion.
>> + *
>> + * Returns the string length of the node's full path.
>> + */
>
> Maybe it's me (by analogy with strlen()), but I would expect "string
> length" to not include the terminating \0. I suggest renaming the
> function to something like get_node_path_size() and do s/length/size/ in
> the comment above if it's supposed to count the terminating \0.
Sure, will update the function name.
Thanks
Hari
^ permalink raw reply
* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Nathan Lynch @ 2020-07-24 16:50 UTC (permalink / raw)
To: Pingfan Liu
Cc: cheloha, Kexec Mailing List, ldufour, linuxppc-dev, Hari Bathini
In-Reply-To: <CAFgQCTu_QO=50v2J0=aY2iV8P-oM82_Kfw9My600ZARUt01grw@mail.gmail.com>
Pingfan Liu <kernelfans@gmail.com> writes:
> On Thu, Jul 23, 2020 at 9:27 PM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>> Pingfan Liu <kernelfans@gmail.com> writes:
>> > This will introduce extra dt updating payload for each involved lmb when hotplug.
>> > But it should be fine since drmem_update_dt() is memory based operation and
>> > hotplug is not a hot path.
>>
>> This is great analysis but the performance implications of the change
>> are grave. The add/remove paths here are already O(n) where n is the
>> quantity of memory assigned to the LP, this change would make it O(n^2):
>>
>> dlpar_memory_add_by_count
>> for_each_drmem_lmb <--
>> dlpar_add_lmb
>> drmem_update_dt(_v1|_v2)
>> for_each_drmem_lmb <--
>>
>> Memory add/remove isn't a hot path but quadratic runtime complexity
>> isn't acceptable. Its current performance is bad enough that I have
> Yes, the quadratic runtime complexity sounds terrible.
> And I am curious about the bug. Does the system have thousands of lmb?
Yes.
>> Not to mention we leak memory every time drmem_update_dt is called
>> because we can't safely free device tree properties :-(
> Do you know what block us to free it?
It's a longstanding problem. References to device tree properties aren't
counted or tracked so there's no way to safely free them unless the node
itself is released. But the ibm,dynamic-reconfiguration-memory node does
not ever go away and its properties are only subject to updates.
Maybe there's a way to address the specific case of
ibm,dynamic-reconfiguration-memory and the ibm,dynamic-memory(-v2)
properties, instead of tackling the general problem.
Regardless of all that, the drmem code needs better data structures and
lookup functions.
^ permalink raw reply
* Re: [v3 13/15] tools/perf: Add perf tools support for extended register capability in powerpc
From: Athira Rajeev @ 2020-07-24 18:02 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <7fcf405f-440a-19dc-7c3a-33fc52c9d1ef@linux.ibm.com>
> On 24-Jul-2020, at 4:32 PM, Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
>
> Hi Athira,
>
> On 7/17/20 8:08 PM, Athira Rajeev wrote:
>> From: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> Add extended regs to sample_reg_mask in the tool side to use
>> with `-I?` option. Perf tools side uses extended mask to display
>> the platform supported register names (with -I? option) to the user
>> and also send this mask to the kernel to capture the extended registers
>> in each sample. Hence decide the mask value based on the processor
>> version.
>> Currently definitions for `mfspr`, `SPRN_PVR` are part of
>> `arch/powerpc/util/header.c`. Move this to a header file so that
>> these definitions can be re-used in other source files as well.
>
> It seems this patch has a regression.
>
> Without this patch:
>
> $ sudo ./perf record -I
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.458 MB perf.data (318 samples) ]
>
> With this patch:
>
> $ sudo ./perf record -I
> Error:
> dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Hi Ravi,
Thanks for reviewing this patch and also testing. The above issue happens since
commit 0a892c1c9472 ("perf record: Add dummy event during system wide synthesis”) which adds a dummy event.
The fix for this issue is currently discussed here: https://lkml.org/lkml/2020/7/19/413
So once this fix is in, the issue will be resolved.
Thanks
Athira
>
> Ravi
^ permalink raw reply
* Re: [v3 12/15] powerpc/perf: Add support for outputting extended regs in perf intr_regs
From: Athira Rajeev @ 2020-07-24 18:13 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <b7aa5dfb-273b-e0f7-6337-c71094c666cd@linux.ibm.com>
> On 24-Jul-2020, at 5:56 PM, Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
>
> Hi Athira,
>
>> +/* Function to return the extended register values */
>> +static u64 get_ext_regs_value(int idx)
>> +{
>> + switch (idx) {
>> + case PERF_REG_POWERPC_MMCR0:
>> + return mfspr(SPRN_MMCR0);
>> + case PERF_REG_POWERPC_MMCR1:
>> + return mfspr(SPRN_MMCR1);
>> + case PERF_REG_POWERPC_MMCR2:
>> + return mfspr(SPRN_MMCR2);
>> + default: return 0;
>> + }
>> +}
>> +
>> u64 perf_reg_value(struct pt_regs *regs, int idx)
>> {
>> - if (WARN_ON_ONCE(idx >= PERF_REG_POWERPC_MAX))
>> - return 0;
>> + u64 PERF_REG_EXTENDED_MAX;
>
> PERF_REG_EXTENDED_MAX should be initialized. otherwise ...
>
>> +
>> + if (cpu_has_feature(CPU_FTR_ARCH_300))
>> + PERF_REG_EXTENDED_MAX = PERF_REG_MAX_ISA_300;
>> if (idx == PERF_REG_POWERPC_SIER &&
>> (IS_ENABLED(CONFIG_FSL_EMB_PERF_EVENT) ||
>> @@ -85,6 +103,16 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>> IS_ENABLED(CONFIG_PPC32)))
>> return 0;
>> + if (idx >= PERF_REG_POWERPC_MAX && idx < PERF_REG_EXTENDED_MAX)
>> + return get_ext_regs_value(idx);
>
> On non p9/p10 machine, PERF_REG_EXTENDED_MAX may contain random value which will
> allow user to pass this if condition unintentionally.
>
> Neat: PERF_REG_EXTENDED_MAX is a local variable so it should be in lowercase.
> Any specific reason to define it in capital?
Hi Ravi
There is no specific reason. I will include both these changes in next version
Thanks
Athira Rajeev
>
> Ravi
^ permalink raw reply
* Re: [PATCH 5/9] powerpc/32s: Fix CONFIG_BOOK3S_601 uses
From: Christophe Leroy @ 2020-07-24 18:42 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <20200724131728.1643966-5-mpe@ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> a écrit :
> We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them
> to use CONFIG_PPC_BOOK3S_601 which is the correct symbol.
>
> Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>
> I think the bug in get_cycles() at least demonstrates that no one has
> booted a 601 since v5.4. Time to drop 601?
Would be great.
I can submit a patch for that in August.
Christophe
> ---
> arch/powerpc/include/asm/ptrace.h | 2 +-
> arch/powerpc/include/asm/timex.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ptrace.h
> b/arch/powerpc/include/asm/ptrace.h
> index f194339cef3b..155a197c0aa1 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -243,7 +243,7 @@ static inline void set_trap_norestart(struct
> pt_regs *regs)
> }
>
> #define arch_has_single_step() (1)
> -#ifndef CONFIG_BOOK3S_601
> +#ifndef CONFIG_PPC_BOOK3S_601
> #define arch_has_block_step() (true)
> #else
> #define arch_has_block_step() (false)
> diff --git a/arch/powerpc/include/asm/timex.h
> b/arch/powerpc/include/asm/timex.h
> index d2d2c4bd8435..6047402b0a4d 100644
> --- a/arch/powerpc/include/asm/timex.h
> +++ b/arch/powerpc/include/asm/timex.h
> @@ -17,7 +17,7 @@ typedef unsigned long cycles_t;
>
> static inline cycles_t get_cycles(void)
> {
> - if (IS_ENABLED(CONFIG_BOOK3S_601))
> + if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
> return 0;
>
> return mftb();
> --
> 2.25.1
^ permalink raw reply
* Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR
From: Waiman Long @ 2020-07-24 19:10 UTC (permalink / raw)
To: Will Deacon, peterz
Cc: linux-arch, Boqun Feng, linux-kernel, Nicholas Piggin,
virtualization, Ingo Molnar, kvm-ppc, linuxppc-dev
In-Reply-To: <20200724081647.GA16642@willie-the-truck>
On 7/24/20 4:16 AM, Will Deacon wrote:
> On Thu, Jul 23, 2020 at 08:47:59PM +0200, peterz@infradead.org wrote:
>> On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote:
>>> BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch?
>>> I will have to update the patch to fix the reported 0-day test problem, but
>>> I want to collect other feedback before sending out v3.
>> I want to say I hate it all, it adds instructions to a path we spend an
>> aweful lot of time optimizing without really getting anything back for
>> it.
>>
>> Will, how do you feel about it?
> I can see it potentially being useful for debugging, but I hate the
> limitation to 256 CPUs. Even arm64 is hitting that now.
After thinking more about that, I think we can use all the remaining
bits in the 16-bit locked_pending. Reserving 1 bit for locked and 1 bit
for pending, there are 14 bits left. So as long as NR_CPUS < 16k
(requirement for 16-bit locked_pending), we can put all possible cpu
numbers into the lock. We can also just use smp_processor_id() without
additional percpu data.
>
> Also, you're talking ~1% gains here. I think our collective time would
> be better spent off reviewing the CNA series and trying to make it more
> deterministic.
I thought you guys are not interested in CNA. I do want to get CNA
merged, if possible. Let review the current version again and see if
there are ways we can further improve it.
Cheers,
Longman
^ permalink raw reply
* [PATCH v5 00/11] ppc64: enable kdump support for kexec_file_load syscall
From: Hari Bathini @ 2020-07-24 21:00 UTC (permalink / raw)
To: Michael Ellerman, Andrew Morton
Cc: Pingfan Liu, Kexec-ml, Nayna Jain, Petr Tesarik,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Vivek Goyal, Dave Young, Thiago Jung Bauermann, Eric Biederman
This patch series enables kdump support for kexec_file_load system
call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
code but heavily modified for kernel consumption.
The first patch adds a weak arch_kexec_locate_mem_hole() function to
override locate memory hole logic suiting arch needs. There are some
special regions in ppc64 which should be avoided while loading buffer
& there are multiple callers to kexec_add_buffer making it complicated
to maintain range sanity and using generic lookup at the same time.
The second patch marks ppc64 specific code within arch/powerpc/kexec
and arch/powerpc/purgatory to make the subsequent code changes easy
to understand.
The next patch adds helper function to setup different memory ranges
needed for loading kdump kernel, booting into it and exporting the
crashing kernel's elfcore.
The fourth patch overrides arch_kexec_locate_mem_hole() function to
locate memory hole for kdump segments by accounting for the special
memory regions, referred to as excluded memory ranges, and sets
kbuf->mem when a suitable memory region is found.
The fifth patch moves walk_drmem_lmbs() out of .init section with
a few changes to reuse it for setting up kdump kernel's usable memory
ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
and set linux,drconf-usable-memory & linux,usable-memory properties
in order to restrict kdump kernel's memory usage.
The seventh patch updates purgatory to setup r8 & r9 with opal base
and opal entry addresses respectively to aid kernels built with
CONFIG_PPC_EARLY_DEBUG_OPAL enabled. The next patch setups up backup
region as a kexec segment while loading kdump kernel and teaches
purgatory to copy data from source to destination.
Patch 09 builds the elfcore header for the running kernel & passes
the info to kdump kernel via "elfcorehdr=" parameter to export as
/proc/vmcore file. The next patch sets up the memory reserve map
for the kexec kernel and also claims kdump support for kdump as
all the necessary changes are added.
The last patch fixes a lookup issue for `kexec -l -s` case when
memory is reserved for crashkernel.
Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
boxes, one with secureboot enabled, KVM guest and a simulator.
v4 -> v5:
* Dropped patches 07/12 & 08/12 and updated purgatory to do everything
in assembly.
* Added a new patch (which was part of patch 08/12 in v4) to update
r8 & r9 registers with opal base & opal entry addresses as it is
expected on kernels built with CONFIG_PPC_EARLY_DEBUG_OPAL enabled.
* Fixed kexec load issue on KVM guest.
v3 -> v4:
* Updated get_node_path() function to be iterative instead of a recursive one.
* Added comment explaining why low memory is added to kdump kernel's usable
memory ranges though it doesn't fall in crashkernel region.
* Fixed stack_buf to be quadword aligned in accordance with ABI.
* Added missing of_node_put() in setup_purgatory_ppc64().
* Added a FIXME tag to indicate issue in adding opal/rtas regions to
core image.
v2 -> v3:
* Fixed TOC pointer calculation for purgatory by using section info
that has relocations applied.
* Fixed arch_kexec_locate_mem_hole() function to fallback to generic
kexec_locate_mem_hole() lookup if exclude ranges list is empty.
* Dropped check for backup_start in trampoline_64.S as purgatory()
function takes care of it anyway.
v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
weak arch_kexec_add_buffer().
* Addressed warnings reported by lkp.
* Added patch to address kexec load issue when memory is reserved
for crashkernel.
* Used the appropriate license header for the new files added.
* Added an option to merge ranges to minimize reallocations while
adding memory ranges.
* Dropped within_crashkernel parameter for add_opal_mem_range() &
add_rtas_mem_range() functions as it is not really needed.
---
Hari Bathini (11):
kexec_file: allow archs to handle special regions while locating memory hole
powerpc/kexec_file: mark PPC64 specific code
powerpc/kexec_file: add helper functions for getting memory ranges
ppc64/kexec_file: avoid stomping memory used by special regions
powerpc/drmem: make lmb walk a bit more flexible
ppc64/kexec_file: restrict memory usage of kdump kernel
ppc64/kexec_file: enable early kernel's OPAL calls
ppc64/kexec_file: setup backup region for kdump kernel
ppc64/kexec_file: prepare elfcore header for crashing kernel
ppc64/kexec_file: add appropriate regions for memory reserve map
ppc64/kexec_file: fix kexec load failure with lack of memory hole
arch/powerpc/include/asm/crashdump-ppc64.h | 19
arch/powerpc/include/asm/drmem.h | 9
arch/powerpc/include/asm/kexec.h | 29 +
arch/powerpc/include/asm/kexec_ranges.h | 25 +
arch/powerpc/kernel/prom.c | 13
arch/powerpc/kexec/Makefile | 2
arch/powerpc/kexec/elf_64.c | 36 +
arch/powerpc/kexec/file_load.c | 60 +
arch/powerpc/kexec/file_load_64.c | 1209 ++++++++++++++++++++++++++++
arch/powerpc/kexec/ranges.c | 417 ++++++++++
arch/powerpc/mm/drmem.c | 87 +-
arch/powerpc/mm/numa.c | 13
arch/powerpc/purgatory/Makefile | 4
arch/powerpc/purgatory/trampoline.S | 117 ---
arch/powerpc/purgatory/trampoline_64.S | 162 ++++
include/linux/kexec.h | 29 -
kernel/kexec_file.c | 16
17 files changed, 2052 insertions(+), 195 deletions(-)
create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
create mode 100644 arch/powerpc/include/asm/kexec_ranges.h
create mode 100644 arch/powerpc/kexec/file_load_64.c
create mode 100644 arch/powerpc/kexec/ranges.c
delete mode 100644 arch/powerpc/purgatory/trampoline.S
create mode 100644 arch/powerpc/purgatory/trampoline_64.S
^ permalink raw reply
* [PATCH v5 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: Hari Bathini @ 2020-07-24 21:00 UTC (permalink / raw)
To: Michael Ellerman, Andrew Morton
Cc: kernel test robot, Pingfan Liu, Kexec-ml, Nayna Jain,
Petr Tesarik, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Thiago Jung Bauermann, Dave Young, Vivek Goyal,
Eric Biederman
In-Reply-To: <159562433305.7836.11742488792509689660.stgit@hbathini.in.ibm.com>
Some architectures may have special memory regions, within the given
memory range, which can't be used for the buffer in a kexec segment.
Implement weak arch_kexec_locate_mem_hole() definition which arch code
may override, to take care of special regions, while trying to locate
a memory hole.
Also, add the missing declarations for arch overridable functions and
and drop the __weak descriptors in the declarations to avoid non-weak
definitions from becoming weak.
Reported-by: kernel test robot <lkp@intel.com>
[lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing]
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
v4 -> v5:
* Unchanged.
v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.
v2 -> v3:
* Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.
v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
weak arch_kexec_add_buffer().
* Dropped __weak identifier for arch overridable functions.
* Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
reported by lkp. lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264418/
include/linux/kexec.h | 29 ++++++++++++++++++-----------
kernel/kexec_file.c | 16 ++++++++++++++--
2 files changed, 32 insertions(+), 13 deletions(-)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ea67910..9e93bef 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name,
bool get_value);
void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name);
-int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len);
-void * __weak arch_kexec_kernel_image_load(struct kimage *image);
-int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi,
- Elf_Shdr *section,
- const Elf_Shdr *relsec,
- const Elf_Shdr *symtab);
-int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
- Elf_Shdr *section,
- const Elf_Shdr *relsec,
- const Elf_Shdr *symtab);
+/* Architectures may override the below functions */
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len);
+void *arch_kexec_kernel_image_load(struct kimage *image);
+int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
+ Elf_Shdr *section,
+ const Elf_Shdr *relsec,
+ const Elf_Shdr *symtab);
+int arch_kexec_apply_relocations(struct purgatory_info *pi,
+ Elf_Shdr *section,
+ const Elf_Shdr *relsec,
+ const Elf_Shdr *symtab);
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#ifdef CONFIG_KEXEC_SIG
+int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
+ unsigned long buf_len);
+#endif
+int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 09cc78d..e89912d 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
}
/**
+ * arch_kexec_locate_mem_hole - Find free memory to place the segments.
+ * @kbuf: Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+ return kexec_locate_mem_hole(kbuf);
+}
+
+/**
* kexec_add_buffer - place a buffer in a kexec segment
* @kbuf: Buffer contents and memory parameters.
*
@@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
*/
int kexec_add_buffer(struct kexec_buf *kbuf)
{
-
struct kexec_segment *ksegment;
int ret;
@@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
/* Walk the RAM ranges and allocate a suitable range for the buffer */
- ret = kexec_locate_mem_hole(kbuf);
+ ret = arch_kexec_locate_mem_hole(kbuf);
if (ret)
return ret;
^ permalink raw reply related
* [PATCH v5 02/11] powerpc/kexec_file: mark PPC64 specific code
From: Hari Bathini @ 2020-07-24 21:00 UTC (permalink / raw)
To: Michael Ellerman, Andrew Morton
Cc: Pingfan Liu, Kexec-ml, Nayna Jain, Petr Tesarik,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Vivek Goyal, Laurent Dufour, Dave Young, Thiago Jung Bauermann,
Eric Biederman
In-Reply-To: <159562433305.7836.11742488792509689660.stgit@hbathini.in.ibm.com>
Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
same spirit. No functional changes.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
v4 -> v5:
* Unchanged.
v3 -> v4:
* Moved common code back to set_new_fdt() from setup_new_fdt_ppc64()
function. Added Reviewed-by tags from Laurent & Thiago.
v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.
v1 -> v2:
* No changes.
arch/powerpc/include/asm/kexec.h | 9 ++
arch/powerpc/kexec/Makefile | 2 -
arch/powerpc/kexec/elf_64.c | 7 +-
arch/powerpc/kexec/file_load.c | 19 +----
arch/powerpc/kexec/file_load_64.c | 87 ++++++++++++++++++++++++
arch/powerpc/purgatory/Makefile | 4 +
arch/powerpc/purgatory/trampoline.S | 117 --------------------------------
arch/powerpc/purgatory/trampoline_64.S | 117 ++++++++++++++++++++++++++++++++
8 files changed, 222 insertions(+), 140 deletions(-)
create mode 100644 arch/powerpc/kexec/file_load_64.c
delete mode 100644 arch/powerpc/purgatory/trampoline.S
create mode 100644 arch/powerpc/purgatory/trampoline_64.S
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c684768..ac8fd48 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -116,6 +116,15 @@ int setup_new_fdt(const struct kimage *image, void *fdt,
unsigned long initrd_load_addr, unsigned long initrd_len,
const char *cmdline);
int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
+
+#ifdef CONFIG_PPC64
+int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
+ const void *fdt, unsigned long kernel_load_addr,
+ unsigned long fdt_load_addr);
+int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
+ unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline);
+#endif /* CONFIG_PPC64 */
#endif /* CONFIG_KEXEC_FILE */
#else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 86380c6..67c3553 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -7,7 +7,7 @@ obj-y += core.o crash.o core_$(BITS).o
obj-$(CONFIG_PPC32) += relocate_32.o
-obj-$(CONFIG_KEXEC_FILE) += file_load.o elf_$(BITS).o
+obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o
ifdef CONFIG_HAVE_IMA_KEXEC
ifdef CONFIG_IMA
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 3072fd6..23ad04c 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -88,7 +88,8 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
goto out;
}
- ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
+ ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
+ initrd_len, cmdline);
if (ret)
goto out;
@@ -107,8 +108,8 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr);
slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset;
- ret = setup_purgatory(image, slave_code, fdt, kernel_load_addr,
- fdt_load_addr);
+ ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr,
+ fdt_load_addr);
if (ret)
pr_err("Error setting up the purgatory.\n");
diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 143c917..38439ab 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * ppc64 code to implement the kexec_file_load syscall
+ * powerpc code to implement the kexec_file_load syscall
*
* Copyright (C) 2004 Adam Litke (agl@us.ibm.com)
* Copyright (C) 2004 IBM Corp.
@@ -20,22 +20,7 @@
#include <linux/libfdt.h>
#include <asm/ima.h>
-#define SLAVE_CODE_SIZE 256
-
-const struct kexec_file_ops * const kexec_file_loaders[] = {
- &kexec_elf64_ops,
- NULL
-};
-
-int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
- /* We don't support crash kernels yet. */
- if (image->type == KEXEC_TYPE_CRASH)
- return -EOPNOTSUPP;
-
- return kexec_image_probe_default(image, buf, buf_len);
-}
+#define SLAVE_CODE_SIZE 256 /* First 0x100 bytes */
/**
* setup_purgatory - initialize the purgatory's global variables
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
new file mode 100644
index 0000000..41fe8b6
--- /dev/null
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ppc64 code to implement the kexec_file_load syscall
+ *
+ * Copyright (C) 2004 Adam Litke (agl@us.ibm.com)
+ * Copyright (C) 2004 IBM Corp.
+ * Copyright (C) 2004,2005 Milton D Miller II, IBM Corporation
+ * Copyright (C) 2005 R Sharada (sharada@in.ibm.com)
+ * Copyright (C) 2006 Mohan Kumar M (mohan@in.ibm.com)
+ * Copyright (C) 2020 IBM Corporation
+ *
+ * Based on kexec-tools' kexec-ppc64.c, kexec-elf-rel-ppc64.c, fs2dt.c.
+ * Heavily modified for the kernel by
+ * Hari Bathini <hbathini@linux.ibm.com>.
+ */
+
+#include <linux/kexec.h>
+#include <linux/of_fdt.h>
+#include <linux/libfdt.h>
+
+const struct kexec_file_ops * const kexec_file_loaders[] = {
+ &kexec_elf64_ops,
+ NULL
+};
+
+/**
+ * setup_purgatory_ppc64 - initialize PPC64 specific purgatory's global
+ * variables and call setup_purgatory() to initialize
+ * common global variable.
+ * @image: kexec image.
+ * @slave_code: Slave code for the purgatory.
+ * @fdt: Flattened device tree for the next kernel.
+ * @kernel_load_addr: Address where the kernel is loaded.
+ * @fdt_load_addr: Address where the flattened device tree is loaded.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
+ const void *fdt, unsigned long kernel_load_addr,
+ unsigned long fdt_load_addr)
+{
+ int ret;
+
+ ret = setup_purgatory(image, slave_code, fdt, kernel_load_addr,
+ fdt_load_addr);
+ if (ret)
+ pr_err("Failed to setup purgatory symbols");
+ return ret;
+}
+
+/**
+ * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
+ * being loaded.
+ * @image: kexec image being loaded.
+ * @fdt: Flattened device tree for the next kernel.
+ * @initrd_load_addr: Address where the next initrd will be loaded.
+ * @initrd_len: Size of the next initrd, or 0 if there will be none.
+ * @cmdline: Command line for the next kernel, or NULL if there will
+ * be none.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
+ unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline)
+{
+ return setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
+}
+
+/**
+ * arch_kexec_kernel_image_probe - Does additional handling needed to setup
+ * kexec segments.
+ * @image: kexec image being loaded.
+ * @buf: Buffer pointing to elf data.
+ * @buf_len: Length of the buffer.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+ /* We don't support crash kernels yet. */
+ if (image->type == KEXEC_TYPE_CRASH)
+ return -EOPNOTSUPP;
+
+ return kexec_image_probe_default(image, buf, buf_len);
+}
diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile
index 7c6d8b1..348f5958 100644
--- a/arch/powerpc/purgatory/Makefile
+++ b/arch/powerpc/purgatory/Makefile
@@ -2,11 +2,11 @@
KASAN_SANITIZE := n
-targets += trampoline.o purgatory.ro kexec-purgatory.c
+targets += trampoline_$(BITS).o purgatory.ro kexec-purgatory.c
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined
-$(obj)/purgatory.ro: $(obj)/trampoline.o FORCE
+$(obj)/purgatory.ro: $(obj)/trampoline_$(BITS).o FORCE
$(call if_changed,ld)
quiet_cmd_bin2c = BIN2C $@
diff --git a/arch/powerpc/purgatory/trampoline.S b/arch/powerpc/purgatory/trampoline.S
deleted file mode 100644
index a5a83c3..0000000
--- a/arch/powerpc/purgatory/trampoline.S
+++ /dev/null
@@ -1,117 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * kexec trampoline
- *
- * Based on code taken from kexec-tools and kexec-lite.
- *
- * Copyright (C) 2004 - 2005, Milton D Miller II, IBM Corporation
- * Copyright (C) 2006, Mohan Kumar M, IBM Corporation
- * Copyright (C) 2013, Anton Blanchard, IBM Corporation
- */
-
-#include <asm/asm-compat.h>
-
- .machine ppc64
- .balign 256
- .globl purgatory_start
-purgatory_start:
- b master
-
- /* ABI: possible run_at_load flag at 0x5c */
- .org purgatory_start + 0x5c
- .globl run_at_load
-run_at_load:
- .long 0
- .size run_at_load, . - run_at_load
-
- /* ABI: slaves start at 60 with r3=phys */
- .org purgatory_start + 0x60
-slave:
- b .
- /* ABI: end of copied region */
- .org purgatory_start + 0x100
- .size purgatory_start, . - purgatory_start
-
-/*
- * The above 0x100 bytes at purgatory_start are replaced with the
- * code from the kernel (or next stage) by setup_purgatory().
- */
-
-master:
- or %r1,%r1,%r1 /* low priority to let other threads catchup */
- isync
- mr %r17,%r3 /* save cpu id to r17 */
- mr %r15,%r4 /* save physical address in reg15 */
-
- or %r3,%r3,%r3 /* ok now to high priority, lets boot */
- lis %r6,0x1
- mtctr %r6 /* delay a bit for slaves to catch up */
- bdnz . /* before we overwrite 0-100 again */
-
- bl 0f /* Work out where we're running */
-0: mflr %r18
-
- /* load device-tree address */
- ld %r3, (dt_offset - 0b)(%r18)
- mr %r16,%r3 /* save dt address in reg16 */
- li %r4,20
- LWZX_BE %r6,%r3,%r4 /* fetch __be32 version number at byte 20 */
- cmpwi %cr0,%r6,2 /* v2 or later? */
- blt 1f
- li %r4,28
- STWX_BE %r17,%r3,%r4 /* Store my cpu as __be32 at byte 28 */
-1:
- /* load the kernel address */
- ld %r4,(kernel - 0b)(%r18)
-
- /* load the run_at_load flag */
- /* possibly patched by kexec */
- ld %r6,(run_at_load - 0b)(%r18)
- /* and patch it into the kernel */
- stw %r6,(0x5c)(%r4)
-
- mr %r3,%r16 /* restore dt address */
-
- li %r5,0 /* r5 will be 0 for kernel */
-
- mfmsr %r11
- andi. %r10,%r11,1 /* test MSR_LE */
- bne .Little_endian
-
- mtctr %r4 /* prepare branch to */
- bctr /* start kernel */
-
-.Little_endian:
- mtsrr0 %r4 /* prepare branch to */
-
- clrrdi %r11,%r11,1 /* clear MSR_LE */
- mtsrr1 %r11
-
- rfid /* update MSR and start kernel */
-
-
- .balign 8
- .globl kernel
-kernel:
- .8byte 0x0
- .size kernel, . - kernel
-
- .balign 8
- .globl dt_offset
-dt_offset:
- .8byte 0x0
- .size dt_offset, . - dt_offset
-
-
- .data
- .balign 8
-.globl purgatory_sha256_digest
-purgatory_sha256_digest:
- .skip 32
- .size purgatory_sha256_digest, . - purgatory_sha256_digest
-
- .balign 8
-.globl purgatory_sha_regions
-purgatory_sha_regions:
- .skip 8 * 2 * 16
- .size purgatory_sha_regions, . - purgatory_sha_regions
diff --git a/arch/powerpc/purgatory/trampoline_64.S b/arch/powerpc/purgatory/trampoline_64.S
new file mode 100644
index 0000000..a5a83c3
--- /dev/null
+++ b/arch/powerpc/purgatory/trampoline_64.S
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * kexec trampoline
+ *
+ * Based on code taken from kexec-tools and kexec-lite.
+ *
+ * Copyright (C) 2004 - 2005, Milton D Miller II, IBM Corporation
+ * Copyright (C) 2006, Mohan Kumar M, IBM Corporation
+ * Copyright (C) 2013, Anton Blanchard, IBM Corporation
+ */
+
+#include <asm/asm-compat.h>
+
+ .machine ppc64
+ .balign 256
+ .globl purgatory_start
+purgatory_start:
+ b master
+
+ /* ABI: possible run_at_load flag at 0x5c */
+ .org purgatory_start + 0x5c
+ .globl run_at_load
+run_at_load:
+ .long 0
+ .size run_at_load, . - run_at_load
+
+ /* ABI: slaves start at 60 with r3=phys */
+ .org purgatory_start + 0x60
+slave:
+ b .
+ /* ABI: end of copied region */
+ .org purgatory_start + 0x100
+ .size purgatory_start, . - purgatory_start
+
+/*
+ * The above 0x100 bytes at purgatory_start are replaced with the
+ * code from the kernel (or next stage) by setup_purgatory().
+ */
+
+master:
+ or %r1,%r1,%r1 /* low priority to let other threads catchup */
+ isync
+ mr %r17,%r3 /* save cpu id to r17 */
+ mr %r15,%r4 /* save physical address in reg15 */
+
+ or %r3,%r3,%r3 /* ok now to high priority, lets boot */
+ lis %r6,0x1
+ mtctr %r6 /* delay a bit for slaves to catch up */
+ bdnz . /* before we overwrite 0-100 again */
+
+ bl 0f /* Work out where we're running */
+0: mflr %r18
+
+ /* load device-tree address */
+ ld %r3, (dt_offset - 0b)(%r18)
+ mr %r16,%r3 /* save dt address in reg16 */
+ li %r4,20
+ LWZX_BE %r6,%r3,%r4 /* fetch __be32 version number at byte 20 */
+ cmpwi %cr0,%r6,2 /* v2 or later? */
+ blt 1f
+ li %r4,28
+ STWX_BE %r17,%r3,%r4 /* Store my cpu as __be32 at byte 28 */
+1:
+ /* load the kernel address */
+ ld %r4,(kernel - 0b)(%r18)
+
+ /* load the run_at_load flag */
+ /* possibly patched by kexec */
+ ld %r6,(run_at_load - 0b)(%r18)
+ /* and patch it into the kernel */
+ stw %r6,(0x5c)(%r4)
+
+ mr %r3,%r16 /* restore dt address */
+
+ li %r5,0 /* r5 will be 0 for kernel */
+
+ mfmsr %r11
+ andi. %r10,%r11,1 /* test MSR_LE */
+ bne .Little_endian
+
+ mtctr %r4 /* prepare branch to */
+ bctr /* start kernel */
+
+.Little_endian:
+ mtsrr0 %r4 /* prepare branch to */
+
+ clrrdi %r11,%r11,1 /* clear MSR_LE */
+ mtsrr1 %r11
+
+ rfid /* update MSR and start kernel */
+
+
+ .balign 8
+ .globl kernel
+kernel:
+ .8byte 0x0
+ .size kernel, . - kernel
+
+ .balign 8
+ .globl dt_offset
+dt_offset:
+ .8byte 0x0
+ .size dt_offset, . - dt_offset
+
+
+ .data
+ .balign 8
+.globl purgatory_sha256_digest
+purgatory_sha256_digest:
+ .skip 32
+ .size purgatory_sha256_digest, . - purgatory_sha256_digest
+
+ .balign 8
+.globl purgatory_sha_regions
+purgatory_sha_regions:
+ .skip 8 * 2 * 16
+ .size purgatory_sha_regions, . - purgatory_sha_regions
^ permalink raw reply related
* Re: [PATCH v4 6/6] powerpc: implement smp_cond_load_relaxed
From: Waiman Long @ 2020-07-24 21:10 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
Cc: linux-arch, Peter Zijlstra, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, Michal Suchánek, Will Deacon
In-Reply-To: <20200724131423.1362108-7-npiggin@gmail.com>
On 7/24/20 9:14 AM, Nicholas Piggin wrote:
> This implements smp_cond_load_relaed with the slowpath busy loop using the
Nit: "smp_cond_load_relaxed"
Cheers,
Longman
^ permalink raw reply
* Re: [PATCH v4 0/6] powerpc: queued spinlocks and rwlocks
From: Waiman Long @ 2020-07-24 21:11 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
Cc: linux-arch, Peter Zijlstra, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, Michal Suchánek, Will Deacon
In-Reply-To: <20200724131423.1362108-1-npiggin@gmail.com>
On 7/24/20 9:14 AM, Nicholas Piggin wrote:
> Updated with everybody's feedback (thanks all), and more performance
> results.
>
> What I've found is I might have been measuring the worst load point for
> the paravirt case, and by looking at a range of loads it's clear that
> queued spinlocks are overall better even on PV, doubly so when you look
> at the generally much improved worst case latencies.
>
> I have defaulted it to N even though I'm less concerned about the PV
> numbers now, just because I think it needs more stress testing. But
> it's very nicely selectable so should be low risk to include.
>
> All in all this is a very cool technology and great results especially
> on the big systems but even on smaller ones there are nice gains. Thanks
> Waiman and everyone who developed it.
>
> Thanks,
> Nick
>
> Nicholas Piggin (6):
> powerpc/pseries: move some PAPR paravirt functions to their own file
> powerpc: move spinlock implementation to simple_spinlock
> powerpc/64s: implement queued spinlocks and rwlocks
> powerpc/pseries: implement paravirt qspinlocks for SPLPAR
> powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
> lock hint
> powerpc: implement smp_cond_load_relaxed
>
> arch/powerpc/Kconfig | 15 +
> arch/powerpc/include/asm/Kbuild | 1 +
> arch/powerpc/include/asm/atomic.h | 28 ++
> arch/powerpc/include/asm/barrier.h | 14 +
> arch/powerpc/include/asm/paravirt.h | 87 +++++
> arch/powerpc/include/asm/qspinlock.h | 91 ++++++
> arch/powerpc/include/asm/qspinlock_paravirt.h | 7 +
> arch/powerpc/include/asm/simple_spinlock.h | 288 ++++++++++++++++
> .../include/asm/simple_spinlock_types.h | 21 ++
> arch/powerpc/include/asm/spinlock.h | 308 +-----------------
> arch/powerpc/include/asm/spinlock_types.h | 17 +-
> arch/powerpc/lib/Makefile | 3 +
> arch/powerpc/lib/locks.c | 12 +-
> arch/powerpc/platforms/pseries/Kconfig | 9 +-
> arch/powerpc/platforms/pseries/setup.c | 4 +-
> include/asm-generic/qspinlock.h | 4 +
> 16 files changed, 588 insertions(+), 321 deletions(-)
> create mode 100644 arch/powerpc/include/asm/paravirt.h
> create mode 100644 arch/powerpc/include/asm/qspinlock.h
> create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
> create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
> create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h
>
That patch series looks good to me. Thanks for working on this.
For the series,
Acked-by: Waiman Long <longman@redhat.com>
^ permalink raw reply
* Re: [PATCH v 1/1] powerpc/64s: allow for clang's objdump differences
From: Bill Wendling @ 2020-07-24 22:40 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <87sgdhp4t7.fsf@mpe.ellerman.id.au>
On Fri, Jul 24, 2020 at 3:48 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Hi Bill,
>
> Bill Wendling <morbo@google.com> writes:
> > Clang's objdump emits slightly different output from GNU's objdump,
> > causing a list of warnings to be emitted during relocatable builds.
> > E.g., clang's objdump emits this:
> >
> > c000000000000004: 2c 00 00 48 b 0xc000000000000030
> > ...
> > c000000000005c6c: 10 00 82 40 bf 2, 0xc000000000005c7c
> >
> > while GNU objdump emits:
> >
> > c000000000000004: 2c 00 00 48 b c000000000000030 <__start+0x30>
> > ...
> > c000000000005c6c: 10 00 82 40 bne c000000000005c7c <masked_interrupt+0x3c>
> >
> > Adjust llvm-objdump's output to remove the extraneous '0x' and convert
> > 'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
> > objdump's output.
> >
> > Note that clang's objdump doesn't yet output the relocation symbols on
> > PPC.
> >
> > Signed-off-by: Bill Wendling <morbo@google.com>
> > ---
> > arch/powerpc/tools/unrel_branch_check.sh | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/powerpc/tools/unrel_branch_check.sh b/arch/powerpc/tools/unrel_branch_check.sh
> > index 77114755dc6f..71ce86b68d18 100755
> > --- a/arch/powerpc/tools/unrel_branch_check.sh
> > +++ b/arch/powerpc/tools/unrel_branch_check.sh
> > @@ -31,6 +31,9 @@ grep -e "^c[0-9a-f]*:[[:space:]]*\([0-9a-f][0-9a-f][[:space:]]\)\{4\}[[:space:]]
> > grep -v '\<__start_initialization_multiplatform>' |
> > grep -v -e 'b.\?.\?ctr' |
> > grep -v -e 'b.\?.\?lr' |
> > +sed 's/\bbt.\?[[:space:]]*[[:digit:]][[:digit:]]*,/beq/' |
> > +sed 's/\bbf.\?[[:space:]]*[[:digit:]][[:digit:]]*,/bne/' |
> > +sed 's/[[:space:]]0x/ /' |
> > sed 's/://' |
>
> I know you followed the example in the script of just doing everything
> as a separate entry in the pipeline, but I think we could consolidate
> all the seds into one?
>
> eg:
>
> sed -e 's/\bbt.\?[[:space:]]*[[:digit:]][[:digit:]]*,/beq/' \
> -e 's/\bbf.\?[[:space:]]*[[:digit:]][[:digit:]]*,/bne/' \
> -e 's/[[:space:]]0x/ /' \
> -e 's/://' |
>
> Does that work?
>
I'm fine with that. I separated them mostly for my benefit while
creating the patch to keep things simple. :-) I'll send out an update.
-bw
^ permalink raw reply
* [PATCH v2] powerpc/64s: allow for clang's objdump differences
From: Bill Wendling @ 2020-07-24 22:49 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
Cc: linuxppc-dev, Bill Wendling
In-Reply-To: <CAGG=3QW4=SmOEY=9mdtZUPBBvHHzVD4UN7hAz9wC83ctr8XsXQ@mail.gmail.com>
Clang's objdump emits slightly different output from GNU's objdump,
causing a list of warnings to be emitted during relocatable builds.
E.g., clang's objdump emits this:
c000000000000004: 2c 00 00 48 b 0xc000000000000030
...
c000000000005c6c: 10 00 82 40 bf 2, 0xc000000000005c7c
while GNU objdump emits:
c000000000000004: 2c 00 00 48 b c000000000000030 <__start+0x30>
...
c000000000005c6c: 10 00 82 40 bne c000000000005c7c <masked_interrupt+0x3c>
Adjust llvm-objdump's output to remove the extraneous '0x' and convert
'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
objdump's output.
Note that clang's objdump doesn't yet output the relocation symbols on
PPC.
Signed-off-by: Bill Wendling <morbo@google.com>
---
arch/powerpc/tools/unrel_branch_check.sh | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/tools/unrel_branch_check.sh b/arch/powerpc/tools/unrel_branch_check.sh
index 77114755dc6f..6e6a30aea3ed 100755
--- a/arch/powerpc/tools/unrel_branch_check.sh
+++ b/arch/powerpc/tools/unrel_branch_check.sh
@@ -31,7 +31,10 @@ grep -e "^c[0-9a-f]*:[[:space:]]*\([0-9a-f][0-9a-f][[:space:]]\)\{4\}[[:space:]]
grep -v '\<__start_initialization_multiplatform>' |
grep -v -e 'b.\?.\?ctr' |
grep -v -e 'b.\?.\?lr' |
-sed 's/://' |
+sed -e 's/\bbt.\?[[:space:]]*[[:digit:]][[:digit:]]*,/beq/' \
+ -e 's/\bbf.\?[[:space:]]*[[:digit:]][[:digit:]]*,/bne/' \
+ -e 's/[[:space:]]0x/ /' \
+ -e 's/://' |
awk '{ print $1 ":" $6 ":0x" $7 ":" $8 " "}'
)
^ permalink raw reply related
* Re: [PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
From: Thiago Jung Bauermann @ 2020-07-25 0:32 UTC (permalink / raw)
To: Hari Bathini
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <77c606da-8eb2-d831-147b-a204b498c7d7@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 24/07/20 5:36 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>>> Kdump kernel, used for capturing the kernel core image, is supposed
>>> to use only specific memory regions to avoid corrupting the image to
>>> be captured. The regions are crashkernel range - the memory reserved
>>> explicitly for kdump kernel, memory used for the tce-table, the OPAL
>>> region and RTAS region as applicable. Restrict kdump kernel memory
>>> to use only these regions by setting up usable-memory DT property.
>>> Also, tell the kdump kernel to run at the loaded address by setting
>>> the magic word at 0x5c.
>>>
>>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>>> Tested-by: Pingfan Liu <piliu@redhat.com>
>>> ---
>>>
>>> v3 -> v4:
>>> * Updated get_node_path() to be an iterative function instead of a
>>> recursive one.
>>> * Added comment explaining why low memory is added to kdump kernel's
>>> usable memory ranges though it doesn't fall in crashkernel region.
>>> * For correctness, added fdt_add_mem_rsv() for the low memory being
>>> added to kdump kernel's usable memory ranges.
>>
>> Good idea.
>>
>>> * Fixed prop pointer update in add_usable_mem_property() and changed
>>> duple to tuple as suggested by Thiago.
>>
>> <snip>
>>
>>> +/**
>>> + * get_node_pathlen - Get the full path length of the given node.
>>> + * @dn: Node.
>>> + *
>>> + * Also, counts '/' at the end of the path.
>>> + * For example, /memory@0 will be "/memory@0/\0" => 11 bytes.
>>
>> Wouldn't this function return 10 in the case of /memory@0?
>
> Actually, it does return 11. +1 while returning is for counting %NUL.
> On top of that we count an extra '/' for root node.. so, it ends up as 11.
> ('/'memory@0'/''\0'). Note the extra '/' before '\0'. Let me handle root node
> separately. That should avoid the confusion.
Ah, that is true. I forgot to count the iteration for the root node.
Sorry about that.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR
From: Waiman Long @ 2020-07-25 3:02 UTC (permalink / raw)
To: Will Deacon, peterz
Cc: linux-arch, Boqun Feng, linux-kernel, Nicholas Piggin,
virtualization, Ingo Molnar, kvm-ppc, linuxppc-dev
In-Reply-To: <8532332b-85dd-661b-cf72-81a8ceb70747@redhat.com>
On 7/24/20 3:10 PM, Waiman Long wrote:
> On 7/24/20 4:16 AM, Will Deacon wrote:
>> On Thu, Jul 23, 2020 at 08:47:59PM +0200, peterz@infradead.org wrote:
>>> On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote:
>>>> BTW, do you have any comment on my v2 lock holder cpu info
>>>> qspinlock patch?
>>>> I will have to update the patch to fix the reported 0-day test
>>>> problem, but
>>>> I want to collect other feedback before sending out v3.
>>> I want to say I hate it all, it adds instructions to a path we spend an
>>> aweful lot of time optimizing without really getting anything back for
>>> it.
>>>
>>> Will, how do you feel about it?
>> I can see it potentially being useful for debugging, but I hate the
>> limitation to 256 CPUs. Even arm64 is hitting that now.
>
> After thinking more about that, I think we can use all the remaining
> bits in the 16-bit locked_pending. Reserving 1 bit for locked and 1
> bit for pending, there are 14 bits left. So as long as NR_CPUS < 16k
> (requirement for 16-bit locked_pending), we can put all possible cpu
> numbers into the lock. We can also just use smp_processor_id() without
> additional percpu data.
Sorry, that doesn't work. The extra bits in the pending byte won't get
cleared on unlock. That will have noticeable performance impact.
Clearing the pending byte on unlock will cause other performance
problem. So I guess we will have to limit the cpu number in the locked byte.
Regards,
Longman
^ permalink raw reply
* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
From: David Gibson @ 2020-07-25 7:37 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev, bharata
In-Reply-To: <87mu3pp1u9.fsf@mpe.ellerman.id.au>
[-- Attachment #1: Type: text/plain, Size: 3847 bytes --]
On Fri, Jul 24, 2020 at 09:52:14PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> >> Bharata B Rao <bharata@linux.ibm.com> writes:
> >> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> >> Nathan Lynch <nathanl@linux.ibm.com> writes:
> >> >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> >> The issues and the fix are described in the actual patches.
> >> >> >
> >> >> > I guess this isn't actually causing problems at runtime right now, but I
> >> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >> >
> >> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >> > struct mhp_params *params)
> >> >> > {
> >> >> > unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >> > unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >> > int rc;
> >> >> >
> >> >> > resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >> >
> >> >> > start = (unsigned long)__va(start);
> >> >> > rc = create_section_mapping(start, start + size, nid,
> >> >> > params->pgprot);
> >> >> > ...
> >> >>
> >> >> Hmm well spotted.
> >> >>
> >> >> That does return early if the ops are not setup:
> >> >>
> >> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> >> {
> >> >> unsigned target_hpt_shift;
> >> >>
> >> >> if (!mmu_hash_ops.resize_hpt)
> >> >> return 0;
> >> >>
> >> >>
> >> >> And:
> >> >>
> >> >> void __init hpte_init_pseries(void)
> >> >> {
> >> >> ...
> >> >> if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> >> mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> >>
> >> >> And that comes in via ibm,hypertas-functions:
> >> >>
> >> >> {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> >> >>
> >> >>
> >> >> But firmware is not necessarily going to add/remove that call based on
> >> >> whether we're using hash/radix.
> >> >
> >> > Correct but hpte_init_pseries() will not be called for radix guests.
> >>
> >> Yeah, duh. You'd think the function name would have been a sufficient
> >> clue for me :)
> >>
> >> >> So I think a follow-up patch is needed to make this more robust.
> >> >>
> >> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> >> how this didn't break.
> >> >
> >> > I have tested memory hotplug/unplug for radix guest on zz platform and
> >> > sanity-tested this for hash guest on P8.
> >> >
> >> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> >> > guest and hence we won't see any breakage.
> >>
> >> OK.
> >>
> >> That's probably fine as it is then. Or maybe just a comment in
> >> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> >> we're using radix.
> >
> > Or we could move these calls to hpt-only routines like below?
>
> That looks like it would be equivalent, and would nicely isolate those
> calls in hash specific code. So yeah I think that's worth sending as a
> proper patch, even better if you can test it.
>
> > David - Do you remember if there was any particular reason to have
> > these two hpt-resize calls within powerpc-generic memory hotplug code?
>
> I think the HPT resizing was developed before or concurrently with the
> radix support, so I would guess it was just not something we thought
> about at the time.
Sounds about right; I don't remember for certain.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* [PATCH v3 01/14] powerpc/eeh: Remove eeh_dev_phb_init_dynamic()
From: Oliver O'Halloran @ 2020-07-25 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Oliver O'Halloran
This function is a one line wrapper around eeh_phb_pe_create() and despite
the name it doesn't create any eeh_dev structures. Replace it with direct
calls to eeh_phb_pe_create() since that does what it says on the tin
and removes a layer of indirection.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: Added sub prototype of eeh_phb_pe_create() for the !CONFIG_EEH case.
v3: Same as above, but done properly.
---
arch/powerpc/include/asm/eeh.h | 2 +-
arch/powerpc/kernel/eeh.c | 2 +-
arch/powerpc/kernel/eeh_dev.c | 13 -------------
arch/powerpc/kernel/of_platform.c | 4 ++--
arch/powerpc/platforms/pseries/pci_dlpar.c | 2 +-
5 files changed, 5 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 964a54292b36..1a19b1bb74c0 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -294,7 +294,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe);
struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
-void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
void eeh_show_enabled(void);
int __init eeh_ops_register(struct eeh_ops *ops);
int __exit eeh_ops_unregister(const char *name);
@@ -362,6 +361,7 @@ static inline void eeh_remove_device(struct pci_dev *dev) { }
#define EEH_POSSIBLE_ERROR(val, type) (0)
#define EEH_IO_ERROR_VALUE(size) (-1UL)
+static inline int eeh_phb_pe_create(struct pci_controller *phb) { return 0; }
#endif /* CONFIG_EEH */
#if defined(CONFIG_PPC_PSERIES) && defined(CONFIG_EEH)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d407981dec76..859f76020256 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1096,7 +1096,7 @@ static int eeh_init(void)
/* Initialize PHB PEs */
list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
- eeh_dev_phb_init_dynamic(hose);
+ eeh_phb_pe_create(hose);
eeh_addr_cache_init();
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index 7370185c7a05..8e159a12f10c 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -52,16 +52,3 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
return edev;
}
-
-/**
- * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
- * @phb: PHB
- *
- * Scan the PHB OF node and its child association, then create the
- * EEH devices accordingly
- */
-void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
-{
- /* EEH PE for PHB */
- eeh_phb_pe_create(phb);
-}
diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c
index 71a3f97dc988..f89376ff633e 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -62,8 +62,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
/* Init pci_dn data structures */
pci_devs_phb_init_dynamic(phb);
- /* Create EEH PEs for the PHB */
- eeh_dev_phb_init_dynamic(phb);
+ /* Create EEH PE for the PHB */
+ eeh_phb_pe_create(phb);
/* Scan the bus */
pcibios_scan_phb(phb);
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index b3a38f5a6b68..f9ae17e8a0f4 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,7 +34,7 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
pci_devs_phb_init_dynamic(phb);
/* Create EEH devices for the PHB */
- eeh_dev_phb_init_dynamic(phb);
+ eeh_phb_pe_create(phb);
if (dn->child)
pseries_eeh_init_edev_recursive(PCI_DN(dn));
--
2.26.2
^ permalink raw reply related
* [PATCH v3 02/14] powerpc/eeh: Remove eeh_dev.c
From: Oliver O'Halloran @ 2020-07-25 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Oliver O'Halloran
In-Reply-To: <20200725081231.39076-1-oohall@gmail.com>
The only thing in this file is eeh_dev_init() which is allocates and
initialises an eeh_dev based on a pci_dn. This is only ever called from
pci_dn.c so move it into there and remove the file.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: no change
v3: no change
---
arch/powerpc/include/asm/eeh.h | 6 ----
arch/powerpc/kernel/Makefile | 2 +-
arch/powerpc/kernel/eeh_dev.c | 54 ----------------------------------
arch/powerpc/kernel/pci_dn.c | 20 +++++++++++++
4 files changed, 21 insertions(+), 61 deletions(-)
delete mode 100644 arch/powerpc/kernel/eeh_dev.c
diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1a19b1bb74c0..dd7dd55db7dc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -293,7 +293,6 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
const char *eeh_pe_loc_get(struct eeh_pe *pe);
struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
-struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
void eeh_show_enabled(void);
int __init eeh_ops_register(struct eeh_ops *ops);
int __exit eeh_ops_unregister(const char *name);
@@ -339,11 +338,6 @@ static inline bool eeh_enabled(void)
static inline void eeh_show_enabled(void) { }
-static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
-{
- return NULL;
-}
-
static inline void eeh_dev_phb_init_dynamic(struct pci_controller *phb) { }
static inline int eeh_check_failure(const volatile void __iomem *token)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 244542ae2a91..c5211bdcf1b6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -71,7 +71,7 @@ obj-$(CONFIG_PPC_RTAS_DAEMON) += rtasd.o
obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o
obj-$(CONFIG_RTAS_PROC) += rtas-proc.o
obj-$(CONFIG_PPC_DT_CPU_FTRS) += dt_cpu_ftrs.o
-obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
+obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_cache.o \
eeh_driver.o eeh_event.o eeh_sysfs.o
obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
deleted file mode 100644
index 8e159a12f10c..000000000000
--- a/arch/powerpc/kernel/eeh_dev.c
+++ /dev/null
@@ -1,54 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * The file intends to implement dynamic creation of EEH device, which will
- * be bound with OF node and PCI device simutaneously. The EEH devices would
- * be foundamental information for EEH core components to work proerly. Besides,
- * We have to support multiple situations where dynamic creation of EEH device
- * is required:
- *
- * 1) Before PCI emunation starts, we need create EEH devices according to the
- * PCI sensitive OF nodes.
- * 2) When PCI emunation is done, we need do the binding between PCI device and
- * the associated EEH device.
- * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH device
- * will be created while PCI sensitive OF node is detected from DR.
- * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. If
- * PHB is newly inserted, we also need create EEH devices accordingly.
- *
- * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
- */
-
-#include <linux/export.h>
-#include <linux/gfp.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/pci.h>
-#include <linux/string.h>
-
-#include <asm/pci-bridge.h>
-#include <asm/ppc-pci.h>
-
-/**
- * eeh_dev_init - Create EEH device according to OF node
- * @pdn: PCI device node
- *
- * It will create EEH device according to the given OF node. The function
- * might be called by PCI emunation, DR, PHB hotplug.
- */
-struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
-{
- struct eeh_dev *edev;
-
- /* Allocate EEH device */
- edev = kzalloc(sizeof(*edev), GFP_KERNEL);
- if (!edev)
- return NULL;
-
- /* Associate EEH device with OF node */
- pdn->edev = edev;
- edev->pdn = pdn;
- edev->bdfn = (pdn->busno << 8) | pdn->devfn;
- edev->controller = pdn->phb;
-
- return edev;
-}
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 4e654df55969..f790a8d06f50 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -124,6 +124,26 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
return NULL;
}
+#ifdef CONFIG_EEH
+static struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
+{
+ struct eeh_dev *edev;
+
+ /* Allocate EEH device */
+ edev = kzalloc(sizeof(*edev), GFP_KERNEL);
+ if (!edev)
+ return NULL;
+
+ /* Associate EEH device with OF node */
+ pdn->edev = edev;
+ edev->pdn = pdn;
+ edev->bdfn = (pdn->busno << 8) | pdn->devfn;
+ edev->controller = pdn->phb;
+
+ return edev;
+}
+#endif /* CONFIG_EEH */
+
#ifdef CONFIG_PCI_IOV
static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
int vf_index,
--
2.26.2
^ permalink raw reply related
* [PATCH v3 03/14] powerpc/eeh: Move vf_index out of pci_dn and into eeh_dev
From: Oliver O'Halloran @ 2020-07-25 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Oliver O'Halloran
In-Reply-To: <20200725081231.39076-1-oohall@gmail.com>
Drivers that do not support the PCI error handling callbacks are handled by
tearing down the device and re-probing them. If the device being removed is
a virtual function then we need to know the VF index so it can be removed
using the pci_iov_{add|remove}_virtfn() API.
Currently this is handled by looking up the pci_dn, and using the vf_index
that was stashed there when the pci_dn for the VF was created in
pcibios_sriov_enable(). We would like to eliminate the use of pci_dn
outside of pseries though so we need to provide the generic EEH code with
some other way to find the vf_index.
The easiest thing to do here is move the vf_index field out of pci_dn and
into eeh_dev. Currently pci_dn and eeh_dev are allocated and initialized
together so this is a fairly minimal change in preparation for splitting
pci_dn and eeh_dev in the future.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
v2: Commit message fixup
v3: no change
---
arch/powerpc/include/asm/eeh.h | 3 +++
arch/powerpc/include/asm/pci-bridge.h | 1 -
arch/powerpc/kernel/eeh_driver.c | 6 ++----
arch/powerpc/kernel/pci_dn.c | 7 ++++---
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index dd7dd55db7dc..2a935db72198 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -148,7 +148,10 @@ struct eeh_dev {
struct pci_dn *pdn; /* Associated PCI device node */
struct pci_dev *pdev; /* Associated PCI device */
bool in_error; /* Error flag for edev */
+
+ /* VF specific properties */
struct pci_dev *physfn; /* Associated SRIOV PF */
+ int vf_index; /* Index of this VF */
};
/* "fmt" must be a simple literal string */
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index b92e81b256e5..d2a2a14e56f9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -202,7 +202,6 @@ struct pci_dn {
#define IODA_INVALID_PE 0xFFFFFFFF
unsigned int pe_number;
#ifdef CONFIG_PCI_IOV
- int vf_index; /* VF index in the PF */
u16 vfs_expanded; /* number of VFs IOV BAR expanded */
u16 num_vfs; /* number of VFs enabled*/
unsigned int *pe_num_map; /* PE# for the first VF PE or array */
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7b048cee767c..b70b9273f45a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -477,7 +477,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
}
#ifdef CONFIG_PCI_IOV
- pci_iov_add_virtfn(edev->physfn, eeh_dev_to_pdn(edev)->vf_index);
+ pci_iov_add_virtfn(edev->physfn, edev->vf_index);
#endif
return NULL;
}
@@ -521,9 +521,7 @@ static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
if (edev->physfn) {
#ifdef CONFIG_PCI_IOV
- struct pci_dn *pdn = eeh_dev_to_pdn(edev);
-
- pci_iov_remove_virtfn(edev->physfn, pdn->vf_index);
+ pci_iov_remove_virtfn(edev->physfn, edev->vf_index);
edev->pdev = NULL;
#endif
if (rmv_data)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f790a8d06f50..bf11ac8427ac 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -146,7 +146,6 @@ static struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
#ifdef CONFIG_PCI_IOV
static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
- int vf_index,
int busno, int devfn)
{
struct pci_dn *pdn;
@@ -163,7 +162,6 @@ static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
pdn->parent = parent;
pdn->busno = busno;
pdn->devfn = devfn;
- pdn->vf_index = vf_index;
pdn->pe_number = IODA_INVALID_PE;
INIT_LIST_HEAD(&pdn->child_list);
INIT_LIST_HEAD(&pdn->list);
@@ -194,7 +192,7 @@ struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
struct eeh_dev *edev __maybe_unused;
- pdn = add_one_sriov_vf_pdn(parent, i,
+ pdn = add_one_sriov_vf_pdn(parent,
pci_iov_virtfn_bus(pdev, i),
pci_iov_virtfn_devfn(pdev, i));
if (!pdn) {
@@ -207,7 +205,10 @@ struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
/* Create the EEH device for the VF */
edev = eeh_dev_init(pdn);
BUG_ON(!edev);
+
+ /* FIXME: these should probably be populated by the EEH probe */
edev->physfn = pdev;
+ edev->vf_index = i;
#endif /* CONFIG_EEH */
}
return pci_get_pdn(pdev);
--
2.26.2
^ permalink raw reply related
* [PATCH v3 04/14] powerpc/pseries: Stop using pdn->pe_number
From: Oliver O'Halloran @ 2020-07-25 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Oliver O'Halloran
In-Reply-To: <20200725081231.39076-1-oohall@gmail.com>
The pci_dn->pe_number field is mainly used to track the IODA PE number of a
device on PowerNV. At some point it grew a user in the pseries SR-IOV
support which muddies the waters a bit, so remove it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
v2: no change
v3: no change
---
arch/powerpc/platforms/pseries/eeh_pseries.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index ace117f99d94..18a2522b9b5e 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -52,8 +52,6 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
dev_dbg(&pdev->dev, "EEH: Setting up device\n");
#ifdef CONFIG_PCI_IOV
if (pdev->is_virtfn) {
- struct pci_dn *physfn_pdn;
-
pdn->device_id = pdev->device;
pdn->vendor_id = pdev->vendor;
pdn->class_code = pdev->class;
@@ -63,8 +61,6 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
* completion from platform.
*/
pdn->last_allow_rc = 0;
- physfn_pdn = pci_get_pdn(pdev->physfn);
- pdn->pe_number = physfn_pdn->pe_num_map[pdn->vf_index];
}
#endif
pseries_eeh_init_edev(pdn);
@@ -772,8 +768,8 @@ int pseries_send_allow_unfreeze(struct pci_dn *pdn,
static int pseries_call_allow_unfreeze(struct eeh_dev *edev)
{
+ int cur_vfs = 0, rc = 0, vf_index, bus, devfn, vf_pe_num;
struct pci_dn *pdn, *tmp, *parent, *physfn_pdn;
- int cur_vfs = 0, rc = 0, vf_index, bus, devfn;
u16 *vf_pe_array;
vf_pe_array = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
@@ -806,8 +802,10 @@ static int pseries_call_allow_unfreeze(struct eeh_dev *edev)
}
} else {
pdn = pci_get_pdn(edev->pdev);
- vf_pe_array[0] = cpu_to_be16(pdn->pe_number);
physfn_pdn = pci_get_pdn(edev->physfn);
+
+ vf_pe_num = physfn_pdn->pe_num_map[edev->vf_index];
+ vf_pe_array[0] = cpu_to_be16(vf_pe_num);
rc = pseries_send_allow_unfreeze(physfn_pdn,
vf_pe_array, 1);
pdn->last_allow_rc = rc;
--
2.26.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox