LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
From: Nicholas Piggin @ 2021-02-12  0:36 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-5-mpe@ellerman.id.au>

Excerpts from Michael Ellerman's message of February 11, 2021 11:51 pm:
> When we enabled STRICT_KERNEL_RWX we received some reports of boot
> failures when using the Hash MMU and running under phyp. The crashes
> are intermittent, and often exhibit as a completely unresponsive
> system, or possibly an oops.
> 
> One example, which was caught in xmon:
> 
>   [   14.068327][    T1] devtmpfs: mounted
>   [   14.069302][    T1] Freeing unused kernel memory: 5568K
>   [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
>   [   14.142063][    T1] Run /sbin/init as init process
>   [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
>   cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
>       pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
>       lr: c0000000001862d4: update_rq_clock+0x44/0x110
>       sp: c00000000c747880
>      msr: 8000000040001031
>     current = 0xc00000000c60d380
>     paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
>       pid   = 347, comm = kworker/2:1
>   ...
>   enter ? for help
>   [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
>   [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
>   [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
>   [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
>   [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
>   [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
>   [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
>   [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
>   [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
>   [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
>   [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c
> 
> This shows that CPU 2, which was idle, woke up and then appears to
> randomly take an instruction fault on a completely valid area of
> kernel text.
> 
> The cause turns out to be the call to hash__mark_rodata_ro(), late in
> boot. Due to the way we layout text and rodata, that function actually
> changes the permissions for all of text and rodata to read-only plus
> execute.
> 
> To do the permission change we use a hypervisor call, H_PROTECT. On
> phyp that appears to be implemented by briefly removing the mapping of
> the kernel text, before putting it back with the updated permissions.
> If any other CPU is executing during that window, it will see spurious
> faults on the kernel text and/or data, leading to crashes.
> 
> To fix it we use stop machine to collect all other CPUs, and then have
> them drop into real mode (MMU off), while we change the mapping. That
> way they are unaffected by the mapping temporarily disappearing.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/mm/book3s64/hash_pgtable.c | 105 +++++++++++++++++++++++-
>  1 file changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> index 3663d3cdffac..01de985df2c4 100644
> --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> @@ -8,6 +8,7 @@
>  #include <linux/sched.h>
>  #include <linux/mm_types.h>
>  #include <linux/mm.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/sections.h>
>  #include <asm/mmu.h>
> @@ -400,6 +401,19 @@ EXPORT_SYMBOL_GPL(hash__has_transparent_hugepage);
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +struct change_memory_parms {
> +	unsigned long start, end, newpp;
> +	unsigned int step, nr_cpus, master_cpu;
> +	atomic_t cpu_counter;
> +};
> +
> +// We'd rather this was on the stack but it has to be in the RMO
> +static struct change_memory_parms chmem_parms;
> +
> +// And therefore we need a lock to protect it from concurrent use
> +static DEFINE_MUTEX(chmem_lock);
> +
>  static void change_memory_range(unsigned long start, unsigned long end,
>  				unsigned int step, unsigned long newpp)
>  {
> @@ -414,6 +428,73 @@ static void change_memory_range(unsigned long start, unsigned long end,
>  							mmu_kernel_ssize);
>  }
>  
> +static int notrace chmem_secondary_loop(struct change_memory_parms *parms)
> +{
> +	unsigned long msr, tmp, flags;
> +	int *p;
> +
> +	p = &parms->cpu_counter.counter;
> +
> +	local_irq_save(flags);
> +	__hard_EE_RI_disable();
> +
> +	asm volatile (
> +	// Switch to real mode and leave interrupts off
> +	"mfmsr	%[msr]			;"
> +	"li	%[tmp], %[MSR_IR_DR]	;"
> +	"andc	%[tmp], %[msr], %[tmp]	;"
> +	"mtmsrd %[tmp]			;"
> +
> +	// Tell the master we are in real mode
> +	"1:				"
> +	"lwarx	%[tmp], 0, %[p]		;"
> +	"addic	%[tmp], %[tmp], -1	;"
> +	"stwcx.	%[tmp], 0, %[p]		;"
> +	"bne-	1b			;"
> +
> +	// Spin until the counter goes to zero
> +	"2:				;"
> +	"lwz	%[tmp], 0(%[p])		;"
> +	"cmpwi	%[tmp], 0		;"
> +	"bne-	2b			;"
> +
> +	// Switch back to virtual mode
> +	"mtmsrd %[msr]			;"

Pity we don't have something that can switch to emergency stack and
so we can write this stuff in C.

How's something like this suit you?

---
 arch/powerpc/kernel/misc_64.S | 22 +++++++++++++++++++++
 arch/powerpc/kernel/process.c | 37 +++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 070465825c21..5e911d0b0b16 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,6 +27,28 @@
 
 	.text
 
+#ifdef CONFIG_PPC_BOOK3S_64
+_GLOBAL(__call_realmode)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r5)
+	mr	r1,r5
+	mtctr	r3
+	mr	r3,r4
+	mfmsr	r4
+	xori	r4,r4,(MSR_IR|MSR_DR)
+	mtmsrd	r4
+	bctrl
+	mfmsr	r4
+	xori	r4,r4,(MSR_IR|MSR_DR)
+	mtmsrd	r4
+	ld	r1,0(r1)
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+
+#endif
+
 _GLOBAL(call_do_softirq)
 	mflr	r0
 	std	r0,16(r1)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a66f435dabbf..260d60f665a3 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2197,6 +2197,43 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
 	put_task_stack(tsk);
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+int __call_realmode(int (*fn)(void *arg), void *arg, void *sp);
+
+/* XXX: find a better place for this
+ * Executing C code in real-mode in general Book3S-64 code can only be done
+ * via this function that switches the stack to one inside the real-mode-area,
+ * which may cover only a small first part of real memory on hash guest LPARs.
+ * fn must be NOKPROBES, must not access vmalloc or anything outside the RMA,
+ * probably shouldn't enable the MMU or interrupts, etc, and be very careful
+ * about calling other generic kernel or powerpc functions.
+ */
+int call_realmode(int (*fn)(void *arg), void *arg)
+{
+	unsigned long flags;
+	void *cursp, *emsp;
+	int ret;
+
+	/* Stack switch is only really required for HPT LPAR, but do it for all to help test coverage of tricky code */
+	cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
+	emsp = (void *)(local_paca->emergency_sp - THREAD_SIZE);
+
+	/* XXX check_stack_overflow(); */
+
+	if (WARN_ON_ONCE(cursp == emsp))
+		return -EBUSY;
+
+	local_irq_save(flags);
+	hard_irq_disable();
+
+	ret = __call_realmode(fn, arg, emsp);
+
+	local_irq_restore(flags);
+
+	return ret;
+}
+#endif
+
 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
 void notrace __ppc64_runlatch_on(void)
-- 
2.23.0


^ permalink raw reply related

* Re: Fwd: Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-12  1:09 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Rob Herring, linux-integrity, linuxppc-dev, linux-arm-kernel,
	Mimi Zohar
In-Reply-To: <87mtwap35f.fsf@manicouagan.localdomain>

On 2/11/21 3:59 PM, Thiago Jung Bauermann wrote:
> 
> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> 
>> On 2/11/21 9:42 AM, Lakshmi Ramasubramanian wrote:
>>> Hi Rob,
>>> [PATCH] powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
>>> This change causes build problem for x86_64 architecture (please see the
>>> mail from kernel test bot below) since arch/x86/include/asm/kexec.h uses
>>> "elf_load_addr" for the ELF header buffer address and not
>>> "elf_headers_mem".
>>> struct kimage_arch {
>>>       ...
>>>       /* Core ELF header buffer */
>>>       void *elf_headers;
>>>       unsigned long elf_headers_sz;
>>>       unsigned long elf_load_addr;
>>> };
>>> I am thinking of limiting of_kexec_alloc_and_setup_fdt() to ARM64 and
>>> PPC64 since they are the only ones using this function now.
>>> #if defined(CONFIG_ARM64) && defined(CONFIG_PPC64)
>> Sorry - I meant to say
>> #if defined(CONFIG_ARM64) || defined(CONFIG_PPC64)
>>
> 
> Does it build correctly if you rename elf_headers_mem to elf_load_addr?
> Or the other way around, renaming x86's elf_load_addr to
> elf_headers_mem. I don't really have a preference.

Yes - changing arm64 and ppc from "elf_headers_mem" to "elf_load_addr" 
builds fine.

But I am concerned about a few other architectures that also define 
"struct kimage_arch" such as "parisc", "arm" which do not have any ELF 
related fields. They would not build if the config defines 
CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE.

Do you think that could be an issue?

thanks,
  -lakshmi

> 
> That would be better than adding an #if condition.
> 
>>> void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>>                      unsigned long initrd_load_addr,
>>>                      unsigned long initrd_len,
>>>                      const char *cmdline)
>>> {
>>>       ...
>>> }
>>> #endif /* defined(CONFIG_ARM64) && defined(CONFIG_PPC64) */
>>> Please let me know if you have any concerns.
>>> thanks,
>>>    -lakshmi
> 


^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Thiago Jung Bauermann @ 2021-02-12  1:09 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: mark.rutland, tao.li, zohar, paulus, vincenzo.frascino,
	frowand.list, sashal, robh, masahiroy, jmorris, takahiro.akashi,
	linux-arm-kernel, catalin.marinas, serge, devicetree,
	pasha.tatashin, will, prsriva, hsinyi, allison, christophe.leroy,
	mbrugger, balajib, dmitry.kasatkin, linux-kernel, james.morse,
	gregkh, joe, linux-integrity, linuxppc-dev
In-Reply-To: <20210209182200.30606-3-nramas@linux.microsoft.com>


There's actually a complication that I just noticed and needs to be
addressed. More below.

Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> From: Rob Herring <robh@kernel.org>
>
> Both arm64 and powerpc do essentially the same FDT /chosen setup for
> kexec.  The differences are either omissions that arm64 should have
> or additional properties that will be ignored.  The setup code can be
> combined and shared by both powerpc and arm64.
>
> The differences relative to the arm64 version:
>  - If /chosen doesn't exist, it will be created (should never happen).
>  - Any old dtb and initrd reserved memory will be released.
>  - The new initrd and elfcorehdr are marked reserved.
>  - "linux,booted-from-kexec" is set.
>
> The differences relative to the powerpc version:
>  - "kaslr-seed" and "rng-seed" may be set.
>  - "linux,elfcorehdr" is set.
>  - Any existing "linux,usable-memory-range" is removed.
>
> Combine the code for setting up the /chosen node in the FDT and updating
> the memory reservation for kexec, for powerpc and arm64, in
> of_kexec_alloc_and_setup_fdt() and move it to "drivers/of/kexec.c".
>
> Signed-off-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> ---
>  drivers/of/Makefile |   6 ++
>  drivers/of/kexec.c  | 258 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/of.h  |  13 +++
>  3 files changed, 277 insertions(+)
>  create mode 100644 drivers/of/kexec.c
>
> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> index 6e1e5212f058..c13b982084a3 100644
> --- a/drivers/of/Makefile
> +++ b/drivers/of/Makefile
> @@ -14,4 +14,10 @@ obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>  obj-$(CONFIG_OF_OVERLAY) += overlay.o
>  obj-$(CONFIG_OF_NUMA) += of_numa.o
>  
> +ifdef CONFIG_KEXEC_FILE
> +ifdef CONFIG_OF_FLATTREE
> +obj-y	+= kexec.o
> +endif
> +endif
> +
>  obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
> new file mode 100644
> index 000000000000..469e09613cdd
> --- /dev/null
> +++ b/drivers/of/kexec.c
> @@ -0,0 +1,258 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Arm Limited
> + *
> + * Based on arch/arm64/kernel/machine_kexec_file.c:
> + *  Copyright (C) 2018 Linaro Limited
> + *
> + * And arch/powerpc/kexec/file_load.c:
> + *  Copyright (C) 2016  IBM Corporation
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/libfdt.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +/* relevant device tree properties */
> +#define FDT_PROP_KEXEC_ELFHDR	"linux,elfcorehdr"
> +#define FDT_PROP_MEM_RANGE	"linux,usable-memory-range"
> +#define FDT_PROP_INITRD_START	"linux,initrd-start"
> +#define FDT_PROP_INITRD_END	"linux,initrd-end"
> +#define FDT_PROP_BOOTARGS	"bootargs"
> +#define FDT_PROP_KASLR_SEED	"kaslr-seed"
> +#define FDT_PROP_RNG_SEED	"rng-seed"
> +#define RNG_SEED_SIZE		128
> +
> +/**
> + * fdt_find_and_del_mem_rsv - delete memory reservation with given address and size
> + *
> + * @fdt:	Flattened device tree for the current kernel.
> + * @start:	Starting address of the reserved memory.
> + * @size:	Size of the reserved memory.
> + *
> + * Return: 0 on success, or negative errno on error.
> + */
> +static int fdt_find_and_del_mem_rsv(void *fdt, unsigned long start, unsigned long size)
> +{
> +	int i, ret, num_rsvs = fdt_num_mem_rsv(fdt);
> +
> +	for (i = 0; i < num_rsvs; i++) {
> +		u64 rsv_start, rsv_size;
> +
> +		ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
> +		if (ret) {
> +			pr_err("Malformed device tree.\n");
> +			return -EINVAL;
> +		}
> +
> +		if (rsv_start == start && rsv_size == size) {
> +			ret = fdt_del_mem_rsv(fdt, i);
> +			if (ret) {
> +				pr_err("Error deleting device tree reservation.\n");
> +				return -EINVAL;
> +			}
> +
> +			return 0;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> +
> +/*
> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
> + *
> + * @image:		kexec image being loaded.
> + * @initrd_load_addr:	Address where the next initrd will be loaded.
> + * @initrd_len:		Size of the next initrd, or 0 if there will be none.
> + * @cmdline:		Command line for the next kernel, or NULL if there will
> + *			be none.
> + *
> + * Return: fdt on success, or NULL errno on error.
> + */
> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
> +				   unsigned long initrd_load_addr,
> +				   unsigned long initrd_len,
> +				   const char *cmdline)
> +{
> +	void *fdt;
> +	int ret, chosen_node;
> +	const void *prop;
> +	unsigned long fdt_size;
> +
> +	fdt_size = fdt_totalsize(initial_boot_params) +
> +		   (cmdline ? strlen(cmdline) : 0) +
> +		   FDT_EXTRA_SPACE;

Just adding 4 KB to initial_boot_params won't be enough for crash
kernels on ppc64. The current powerpc code doubles the size of
initial_boot_params (which is normally larger than 4 KB) and even that
isn't enough. A patch was added to powerpc/next today which uses a more
precise (but arch-specific) formula:

https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/

So I believe we need a hook here where architectures can provide their
own specific calculation for the size of the fdt. Perhaps a weakly
defined function providing a default implementation which an
arch-specific file can override (a la arch_kexec_kernel_image_load())?

Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
function from the patch I linked above.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-12  1:17 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: mark.rutland, tao.li, zohar, paulus, vincenzo.frascino,
	frowand.list, sashal, robh, masahiroy, jmorris, takahiro.akashi,
	linux-arm-kernel, catalin.marinas, serge, devicetree,
	pasha.tatashin, will, prsriva, hsinyi, allison, christophe.leroy,
	mbrugger, balajib, dmitry.kasatkin, linux-kernel, james.morse,
	gregkh, joe, linux-integrity, linuxppc-dev
In-Reply-To: <87k0reozwh.fsf@manicouagan.localdomain>

On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
> 
> There's actually a complication that I just noticed and needs to be
> addressed. More below.
> 

<...>

>> +
>> +/*
>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
>> + *
>> + * @image:		kexec image being loaded.
>> + * @initrd_load_addr:	Address where the next initrd will be loaded.
>> + * @initrd_len:		Size of the next initrd, or 0 if there will be none.
>> + * @cmdline:		Command line for the next kernel, or NULL if there will
>> + *			be none.
>> + *
>> + * Return: fdt on success, or NULL errno on error.
>> + */
>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>> +				   unsigned long initrd_load_addr,
>> +				   unsigned long initrd_len,
>> +				   const char *cmdline)
>> +{
>> +	void *fdt;
>> +	int ret, chosen_node;
>> +	const void *prop;
>> +	unsigned long fdt_size;
>> +
>> +	fdt_size = fdt_totalsize(initial_boot_params) +
>> +		   (cmdline ? strlen(cmdline) : 0) +
>> +		   FDT_EXTRA_SPACE;
> 
> Just adding 4 KB to initial_boot_params won't be enough for crash
> kernels on ppc64. The current powerpc code doubles the size of
> initial_boot_params (which is normally larger than 4 KB) and even that
> isn't enough. A patch was added to powerpc/next today which uses a more
> precise (but arch-specific) formula:
> 
> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
> 
> So I believe we need a hook here where architectures can provide their
> own specific calculation for the size of the fdt. Perhaps a weakly
> defined function providing a default implementation which an
> arch-specific file can override (a la arch_kexec_kernel_image_load())?
> 
> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
> function from the patch I linked above.
> 

Do you think it'd better to add "fdt_size" parameter to 
of_kexec_alloc_and_setup_fdt() so that the caller can provide the 
desired FDT buffer size?

thanks,
  -lakshmi

^ permalink raw reply

* [RFC PATCH] powerpc/64s: introduce call_realmode to run C code in real-mode
From: Nicholas Piggin @ 2021-02-12  1:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The regular kernel stack can not be accessed in real mode in hash
guest kernels, which prevents the MMU from being disabled in general
C code. Provide a helper that can call a function pointer in real
mode using the emergency stack (accessable in real mode).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |  1 +
 arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +
 arch/powerpc/include/asm/thread_info.h    | 16 ++++++++
 arch/powerpc/kernel/irq.c                 | 16 --------
 arch/powerpc/kernel/misc_64.S             | 22 +++++++++++
 arch/powerpc/mm/book3s64/pgtable.c        | 48 +++++++++++++++++++++++
 6 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index d0b832cbbec8..a973023c390a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -126,6 +126,7 @@ extern s64 __ashldi3(s64, int);
 extern s64 __ashrdi3(s64, int);
 extern int __cmpdi2(s64, s64);
 extern int __ucmpdi2(u64, u64);
+int __call_realmode(int (*fn)(void *arg), void *arg, void *sp);
 
 /* tracing */
 void _mcount(void);
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 995bbcdd0ef8..80b0d24415ac 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -274,5 +274,7 @@ static inline unsigned long get_user_vsid(mm_context_t *ctx,
 	return get_vsid(context, ea, ssize);
 }
 
+int call_realmode(int (*fn)(void *arg), void *arg);
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 3d8a47af7a25..9279e472d51e 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -172,6 +172,22 @@ static inline bool test_thread_local_flags(unsigned int flags)
 #define is_elf2_task() (0)
 #endif
 
+static inline void check_stack_overflow(void)
+{
+	long sp;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW))
+		return;
+
+	sp = current_stack_pointer & (THREAD_SIZE - 1);
+
+	/* check for stack overflow: is there less than 2KB free? */
+	if (unlikely(sp < 2048)) {
+		pr_err("do_IRQ: stack overflow: %ld\n", sp);
+		dump_stack();
+	}
+}
+
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6b1eca53e36c..193b47b5b6a5 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -620,22 +620,6 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 	return sum;
 }
 
-static inline void check_stack_overflow(void)
-{
-	long sp;
-
-	if (!IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW))
-		return;
-
-	sp = current_stack_pointer & (THREAD_SIZE - 1);
-
-	/* check for stack overflow: is there less than 2KB free? */
-	if (unlikely(sp < 2048)) {
-		pr_err("do_IRQ: stack overflow: %ld\n", sp);
-		dump_stack();
-	}
-}
-
 void __do_irq(struct pt_regs *regs)
 {
 	unsigned int irq;
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 070465825c21..5e911d0b0b16 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,6 +27,28 @@
 
 	.text
 
+#ifdef CONFIG_PPC_BOOK3S_64
+_GLOBAL(__call_realmode)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r5)
+	mr	r1,r5
+	mtctr	r3
+	mr	r3,r4
+	mfmsr	r4
+	xori	r4,r4,(MSR_IR|MSR_DR)
+	mtmsrd	r4
+	bctrl
+	mfmsr	r4
+	xori	r4,r4,(MSR_IR|MSR_DR)
+	mtmsrd	r4
+	ld	r1,0(r1)
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+
+#endif
+
 _GLOBAL(call_do_softirq)
 	mflr	r0
 	std	r0,16(r1)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 5b3a3bae21aa..aad0e2059305 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -474,6 +474,54 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 	return true;
 }
 
+/*
+ * Executing C code in real-mode in general Book3S-64 code can only be done
+ * via this function that switches the stack to one inside the real-mode-area,
+ * which may cover only a small first part of real memory on hash guest LPARs.
+ * fn must be NOKPROBES, must not access vmalloc or anything outside the RMA,
+ * probably shouldn't enable the MMU or interrupts, etc, and be very careful
+ * about calling other generic kernel or powerpc functions.
+ */
+int call_realmode(int (*fn)(void *arg), void *arg)
+{
+	unsigned long flags;
+	void *cursp, *emsp;
+	int ret;
+
+	if (WARN_ON_ONCE(!(mfmsr() & MSR_DR)))
+		return -EINVAL;
+	if (WARN_ON_ONCE(!(mfmsr() & MSR_IR)))
+		return -EINVAL;
+
+	/*
+	 * The switch to emergency stack is only really required for HPT LPAR,
+	 * but do it for all to help test coverage of tricky code.
+	 */
+	cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
+	emsp = (void *)(local_paca->emergency_sp - THREAD_SIZE);
+
+	/*
+	 * It's probably okay to go to real-mode and call directly in case we
+	 * are already on the emergency stack, so allow it. But we may want to
+	 * prevent callers from doing this in future though, so warn.
+	 */
+	WARN_ON_ONCE(cursp == emsp);
+
+	check_stack_overflow();
+
+	local_irq_save(flags);
+	hard_irq_disable();
+
+	if (cursp == emsp)
+		ret = fn(arg);
+	else
+		ret = __call_realmode(fn, arg, emsp);
+
+	local_irq_restore(flags);
+
+	return ret;
+}
+
 /*
  * Does the CPU support tlbie?
  */
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Thiago Jung Bauermann @ 2021-02-12  1:39 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: mark.rutland, tao.li, zohar, paulus, vincenzo.frascino,
	frowand.list, sashal, robh, masahiroy, jmorris, takahiro.akashi,
	linux-arm-kernel, catalin.marinas, serge, devicetree,
	pasha.tatashin, will, prsriva, hsinyi, allison, christophe.leroy,
	mbrugger, balajib, dmitry.kasatkin, linux-kernel, james.morse,
	gregkh, joe, linux-integrity, linuxppc-dev
In-Reply-To: <8a3aa3d2-2eba-549a-9970-a2b0fe3586c9@linux.microsoft.com>


Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
>> There's actually a complication that I just noticed and needs to be
>> addressed. More below.
>> 
>
> <...>
>
>>> +
>>> +/*
>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
>>> + *
>>> + * @image:		kexec image being loaded.
>>> + * @initrd_load_addr:	Address where the next initrd will be loaded.
>>> + * @initrd_len:		Size of the next initrd, or 0 if there will be none.
>>> + * @cmdline:		Command line for the next kernel, or NULL if there will
>>> + *			be none.
>>> + *
>>> + * Return: fdt on success, or NULL errno on error.
>>> + */
>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>> +				   unsigned long initrd_load_addr,
>>> +				   unsigned long initrd_len,
>>> +				   const char *cmdline)
>>> +{
>>> +	void *fdt;
>>> +	int ret, chosen_node;
>>> +	const void *prop;
>>> +	unsigned long fdt_size;
>>> +
>>> +	fdt_size = fdt_totalsize(initial_boot_params) +
>>> +		   (cmdline ? strlen(cmdline) : 0) +
>>> +		   FDT_EXTRA_SPACE;
>> Just adding 4 KB to initial_boot_params won't be enough for crash
>> kernels on ppc64. The current powerpc code doubles the size of
>> initial_boot_params (which is normally larger than 4 KB) and even that
>> isn't enough. A patch was added to powerpc/next today which uses a more
>> precise (but arch-specific) formula:
>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
>> So I believe we need a hook here where architectures can provide their
>> own specific calculation for the size of the fdt. Perhaps a weakly
>> defined function providing a default implementation which an
>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
>> function from the patch I linked above.
>> 
>
> Do you think it'd better to add "fdt_size" parameter to
> of_kexec_alloc_and_setup_fdt() so that the caller can provide the 
> desired FDT buffer size?

Yes, that is actually simpler and better than my idea. :-)

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: Fwd: Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Thiago Jung Bauermann @ 2021-02-12  2:11 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Rob Herring, linux-integrity, linuxppc-dev, linux-arm-kernel,
	Mimi Zohar
In-Reply-To: <b4ebf962-4210-5d17-2149-6b152d587f95@linux.microsoft.com>


Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> On 2/11/21 3:59 PM, Thiago Jung Bauermann wrote:
>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>> 
>>> On 2/11/21 9:42 AM, Lakshmi Ramasubramanian wrote:
>>>> Hi Rob,
>>>> [PATCH] powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
>>>> This change causes build problem for x86_64 architecture (please see the
>>>> mail from kernel test bot below) since arch/x86/include/asm/kexec.h uses
>>>> "elf_load_addr" for the ELF header buffer address and not
>>>> "elf_headers_mem".
>>>> struct kimage_arch {
>>>>       ...
>>>>       /* Core ELF header buffer */
>>>>       void *elf_headers;
>>>>       unsigned long elf_headers_sz;
>>>>       unsigned long elf_load_addr;
>>>> };
>>>> I am thinking of limiting of_kexec_alloc_and_setup_fdt() to ARM64 and
>>>> PPC64 since they are the only ones using this function now.
>>>> #if defined(CONFIG_ARM64) && defined(CONFIG_PPC64)
>>> Sorry - I meant to say
>>> #if defined(CONFIG_ARM64) || defined(CONFIG_PPC64)
>>>
>> Does it build correctly if you rename elf_headers_mem to elf_load_addr?
>> Or the other way around, renaming x86's elf_load_addr to
>> elf_headers_mem. I don't really have a preference.
>
> Yes - changing arm64 and ppc from "elf_headers_mem" to "elf_load_addr" builds
> fine.
>
> But I am concerned about a few other architectures that also define "struct
> kimage_arch" such as "parisc", "arm" which do not have any ELF related fields.
> They would not build if the config defines CONFIG_KEXEC_FILE and
> CONFIG_OF_FLATTREE.
>
> Do you think that could be an issue?

That's a good point. But in practice, arm doesn't support
CONFIG_KEXEC_FILE. And while parisc does support CONFIG_KEXEC_FILE, as
far as I could determine it doesn't support CONFIG_OF.

So IMHO we don't need to worry about them. We'll cross that bridge if we
get there. If they ever implement KEXEC_FILE or OF_FLATTREE support,
then (again, IMHO) the natural solution would be for them to name the
ELF header member the same way the other arches do.

And since no other architecture defines struct kimage_arch, those are
the only ones we need to consider.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: Fwd: Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-12  2:28 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Rob Herring, linux-integrity, linuxppc-dev, linux-arm-kernel,
	Mimi Zohar
In-Reply-To: <87eehmox08.fsf@manicouagan.localdomain>

On 2/11/21 6:11 PM, Thiago Jung Bauermann wrote:
> 
> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> 
>> On 2/11/21 3:59 PM, Thiago Jung Bauermann wrote:
>>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>>>
>>>> On 2/11/21 9:42 AM, Lakshmi Ramasubramanian wrote:
>>>>> Hi Rob,
>>>>> [PATCH] powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
>>>>> This change causes build problem for x86_64 architecture (please see the
>>>>> mail from kernel test bot below) since arch/x86/include/asm/kexec.h uses
>>>>> "elf_load_addr" for the ELF header buffer address and not
>>>>> "elf_headers_mem".
>>>>> struct kimage_arch {
>>>>>        ...
>>>>>        /* Core ELF header buffer */
>>>>>        void *elf_headers;
>>>>>        unsigned long elf_headers_sz;
>>>>>        unsigned long elf_load_addr;
>>>>> };
>>>>> I am thinking of limiting of_kexec_alloc_and_setup_fdt() to ARM64 and
>>>>> PPC64 since they are the only ones using this function now.
>>>>> #if defined(CONFIG_ARM64) && defined(CONFIG_PPC64)
>>>> Sorry - I meant to say
>>>> #if defined(CONFIG_ARM64) || defined(CONFIG_PPC64)
>>>>
>>> Does it build correctly if you rename elf_headers_mem to elf_load_addr?
>>> Or the other way around, renaming x86's elf_load_addr to
>>> elf_headers_mem. I don't really have a preference.
>>
>> Yes - changing arm64 and ppc from "elf_headers_mem" to "elf_load_addr" builds
>> fine.
>>
>> But I am concerned about a few other architectures that also define "struct
>> kimage_arch" such as "parisc", "arm" which do not have any ELF related fields.
>> They would not build if the config defines CONFIG_KEXEC_FILE and
>> CONFIG_OF_FLATTREE.
>>
>> Do you think that could be an issue?
> 
> That's a good point. But in practice, arm doesn't support
> CONFIG_KEXEC_FILE. And while parisc does support CONFIG_KEXEC_FILE, as
> far as I could determine it doesn't support CONFIG_OF.
> 
> So IMHO we don't need to worry about them. We'll cross that bridge if we
> get there. If they ever implement KEXEC_FILE or OF_FLATTREE support,
> then (again, IMHO) the natural solution would be for them to name the
> ELF header member the same way the other arches do.
> 
> And since no other architecture defines struct kimage_arch, those are
> the only ones we need to consider.
> 

Sounds good Thiago.

I'll rename arm64 and ppc kimage_arch ELF address field to match that 
defined for x86/x64.

Also, will add "fdt_size" param to of_kexec_alloc_and_setup_fdt(). For 
now, I'll use 2*fdt_totalsize(initial_boot_params) for ppc.

Will send the updated patches shortly.

  -lakshmi


^ permalink raw reply

* Re: Fwd: Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Thiago Jung Bauermann @ 2021-02-12  3:21 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Rob Herring, linux-integrity, linuxppc-dev, linux-arm-kernel,
	Mimi Zohar
In-Reply-To: <e7f3ae2e-20bc-9901-fb8d-80a3163e7d5e@linux.microsoft.com>


Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> On 2/11/21 6:11 PM, Thiago Jung Bauermann wrote:
>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>> 
>>> On 2/11/21 3:59 PM, Thiago Jung Bauermann wrote:
>>>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>>>>
>>>>> On 2/11/21 9:42 AM, Lakshmi Ramasubramanian wrote:
>>>>>> Hi Rob,
>>>>>> [PATCH] powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
>>>>>> This change causes build problem for x86_64 architecture (please see the
>>>>>> mail from kernel test bot below) since arch/x86/include/asm/kexec.h uses
>>>>>> "elf_load_addr" for the ELF header buffer address and not
>>>>>> "elf_headers_mem".
>>>>>> struct kimage_arch {
>>>>>>        ...
>>>>>>        /* Core ELF header buffer */
>>>>>>        void *elf_headers;
>>>>>>        unsigned long elf_headers_sz;
>>>>>>        unsigned long elf_load_addr;
>>>>>> };
>>>>>> I am thinking of limiting of_kexec_alloc_and_setup_fdt() to ARM64 and
>>>>>> PPC64 since they are the only ones using this function now.
>>>>>> #if defined(CONFIG_ARM64) && defined(CONFIG_PPC64)
>>>>> Sorry - I meant to say
>>>>> #if defined(CONFIG_ARM64) || defined(CONFIG_PPC64)
>>>>>
>>>> Does it build correctly if you rename elf_headers_mem to elf_load_addr?
>>>> Or the other way around, renaming x86's elf_load_addr to
>>>> elf_headers_mem. I don't really have a preference.
>>>
>>> Yes - changing arm64 and ppc from "elf_headers_mem" to "elf_load_addr" builds
>>> fine.
>>>
>>> But I am concerned about a few other architectures that also define "struct
>>> kimage_arch" such as "parisc", "arm" which do not have any ELF related fields.
>>> They would not build if the config defines CONFIG_KEXEC_FILE and
>>> CONFIG_OF_FLATTREE.
>>>
>>> Do you think that could be an issue?
>> That's a good point. But in practice, arm doesn't support
>> CONFIG_KEXEC_FILE. And while parisc does support CONFIG_KEXEC_FILE, as
>> far as I could determine it doesn't support CONFIG_OF.
>> So IMHO we don't need to worry about them. We'll cross that bridge if we
>> get there. If they ever implement KEXEC_FILE or OF_FLATTREE support,
>> then (again, IMHO) the natural solution would be for them to name the
>> ELF header member the same way the other arches do.
>> And since no other architecture defines struct kimage_arch, those are
>> the only ones we need to consider.
>> 
>
> Sounds good Thiago.
>
> I'll rename arm64 and ppc kimage_arch ELF address field to match that defined
> for x86/x64.
>
> Also, will add "fdt_size" param to of_kexec_alloc_and_setup_fdt(). For now, I'll
> use 2*fdt_totalsize(initial_boot_params) for ppc.
>
> Will send the updated patches shortly.

Sounds good. There will be a small conflict with powerpc/next because of
the patch I mentioned, but it's simple to fix by whoever merges the
series.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* linux-next: manual merge of the spi tree with the powerpc tree
From: Stephen Rothwell @ 2021-02-12  4:31 UTC (permalink / raw)
  To: Mark Brown, Michael Ellerman, PowerPC
  Cc: Linux Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

Hi all,

Today's linux-next merge of the spi tree got a conflict in:

  drivers/spi/spi-mpc52xx.c

between commit:

  e10656114d32 ("spi: mpc52xx: Avoid using get_tbl()")

from the powerpc tree and commit:

  258ea99fe25a ("spi: spi-mpc52xx: Use new structure for SPI transfer delays")

from the spi tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

BTW Mark: the author's address in 258ea99fe25a uses a non existent domain :-(
-- 
Cheers,
Stephen Rothwell

diff --cc drivers/spi/spi-mpc52xx.c
index e6a30f232370,36f941500676..000000000000
--- a/drivers/spi/spi-mpc52xx.c
+++ b/drivers/spi/spi-mpc52xx.c
@@@ -247,8 -247,10 +247,10 @@@ static int mpc52xx_spi_fsmstate_transfe
  	/* Is the transfer complete? */
  	ms->len--;
  	if (ms->len == 0) {
 -		ms->timestamp = get_tbl();
 +		ms->timestamp = mftb();
- 		ms->timestamp += ms->transfer->delay_usecs * tb_ticks_per_usec;
+ 		if (ms->transfer->delay.unit == SPI_DELAY_UNIT_USECS)
+ 			ms->timestamp += ms->transfer->delay.value *
+ 					 tb_ticks_per_usec;
  		ms->state = mpc52xx_spi_fsmstate_wait;
  		return FSM_CONTINUE;
  	}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v5 04/10] powerpc: Reference param in MSR_TM_ACTIVE() macro
From: Daniel Axtens @ 2021-02-12  4:52 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev
In-Reply-To: <20210203184323.20792-5-cmr@codefail.de>

Hi Chris,

(totally bikeshedding, but I'd use the full word parameter in the
subject if you have enough spare characters.)

> Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
> use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
> causes an 'unused variable' compile warning unless the variable is also
> guarded with CONFIG_PPC_TRANSACTIONAL_MEM.
>
> Reference but do nothing with the argument in the macro to avoid a
> potential compile warning.

Andrew asked why we weren't seeing these warnings already.

I was able to reproduce them with CONFIG_PPC_TRANSACTIONAL_MEM off, but
only if I compiled with W=1:

arch/powerpc/kernel/process.c: In function ‘enable_kernel_fp’:
arch/powerpc/kernel/process.c:210:16: error: variable ‘cpumsr’ set but not used [-Werror=unused-but-set-variable]
  210 |  unsigned long cpumsr;
      |                ^~~~~~

Having said that, this change seems like a good idea, squashing warnings
at W=1 is still valuable.

Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel

> Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> ---
>  arch/powerpc/include/asm/reg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index e40a921d78f9..c5a3e856191c 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -124,7 +124,7 @@
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
>  #else
> -#define MSR_TM_ACTIVE(x) 0
> +#define MSR_TM_ACTIVE(x) ((void)(x), 0)
>  #endif
>  
>  #if defined(CONFIG_PPC_BOOK3S_64)
> -- 
> 2.26.1

^ permalink raw reply

* Re: [PATCH v5 05/10] powerpc/signal64: Remove TM ifdefery in middle of if/else block
From: Daniel Axtens @ 2021-02-12  5:21 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev
In-Reply-To: <20210203184323.20792-6-cmr@codefail.de>

Hi Chris,

> Rework the messy ifdef breaking up the if-else for TM similar to
> commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of if/else").

I'm not sure what 'the messy ifdef' and 'the if-else for TM' is (yet):
perhaps you could start the commit message with a tiny bit of
background.

> Unlike that commit for ppc32, the ifdef can't be removed entirely since
> uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.
>
> Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> ---
>  arch/powerpc/kernel/signal_64.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index b211a8ea4f6e..8e1d804ce552 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -710,9 +710,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
>  	struct pt_regs *regs = current_pt_regs();
>  	struct ucontext __user *uc = (struct ucontext __user *)regs->gpr[1];
>  	sigset_t set;
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  	unsigned long msr;
> -#endif
>  
>  	/* Always make any pending restarted system calls return -EINTR */
>  	current->restart_block.fn = do_no_restart_syscall;
> @@ -765,7 +763,10 @@ SYSCALL_DEFINE0(rt_sigreturn)
>  
>  	if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
>  		goto badframe;
> +#endif
> +
>  	if (MSR_TM_ACTIVE(msr)) {
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  		/* We recheckpoint on return. */
>  		struct ucontext __user *uc_transact;
>  
> @@ -778,9 +779,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
>  		if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
>  					   &uc_transact->uc_mcontext))
>  			goto badframe;
> -	} else
>  #endif
> -	{
> +	} else {
>  		/*
>  		 * Fall through, for non-TM restore
>  		 *

I think you can get rid of all the ifdefs in _this function_ by
providing a couple of stubs:

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a66f435dabbf..19059a4b798f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1120,6 +1120,7 @@ void restore_tm_state(struct pt_regs *regs)
 #else
 #define tm_recheckpoint_new_task(new)
 #define __switch_to_tm(prev, new)
+void tm_reclaim_current(uint8_t cause) {}
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 static inline void save_sprs(struct thread_struct *t)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 8e1d804ce552..ed58ca019ad9 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -594,6 +594,13 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 
 	return err;
 }
+#else
+static long restore_tm_sigcontexts(struct task_struct *tsk,
+				   struct sigcontext __user *sc,
+				   struct sigcontext __user *tm_sc)
+{
+	return -EINVAL;
+}
 #endif
 
 /*
@@ -722,7 +729,6 @@ SYSCALL_DEFINE0(rt_sigreturn)
 		goto badframe;
 	set_current_blocked(&set);
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	/*
 	 * If there is a transactional state then throw it away.
 	 * The purpose of a sigreturn is to destroy all traces of the
@@ -763,10 +769,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
 
 	if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
 		goto badframe;
-#endif
 
 	if (MSR_TM_ACTIVE(msr)) {
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 		/* We recheckpoint on return. */
 		struct ucontext __user *uc_transact;
 
@@ -779,7 +783,6 @@ SYSCALL_DEFINE0(rt_sigreturn)
 		if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
 					   &uc_transact->uc_mcontext))
 			goto badframe;
-#endif
 	} else {
 		/*
 		 * Fall through, for non-TM restore

My only concern here was whether it was valid to access
 	if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
if CONFIG_PPC_TRANSACTIONAL_MEM was not defined, but I didn't think of
any obvious reason why it wouldn't be...

> @@ -818,10 +818,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
>  	unsigned long newsp = 0;
>  	long err = 0;
>  	struct pt_regs *regs = tsk->thread.regs;
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  	/* Save the thread's msr before get_tm_stackpointer() changes it */
>  	unsigned long msr = regs->msr;
> -#endif

I wondered if that comment still makes sense in the absence of the
ifdef. It is preserved in commit f1cf4f93de2f ("powerpc/signal32: Remove
ifdefery in middle of if/else"), and I can't figure out how you would
improve it, so I guess it's probably good as it is.

>  	frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
>  	if (!access_ok(frame, sizeof(*frame)))
> @@ -836,8 +834,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
>  	/* Create the ucontext.  */
>  	err |= __put_user(0, &frame->uc.uc_flags);
>  	err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]);
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +
>  	if (MSR_TM_ACTIVE(msr)) {
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  		/* The ucontext_t passed to userland points to the second
>  		 * ucontext_t (for transactional state) with its uc_link ptr.
>  		 */
> @@ -847,9 +846,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
>  					    tsk, ksig->sig, NULL,
>  					    (unsigned long)ksig->ka.sa.sa_handler,
>  					    msr);
> -	} else
>  #endif
> -	{
> +	} else {
>  		err |= __put_user(0, &frame->uc.uc_link);
>  		prepare_setup_sigcontext(tsk, 1);
>  		err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,

That seems reasonable to me.

Kind regards,
Daniel


^ permalink raw reply related

* Re: [PATCH v5 06/10] powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
From: Daniel Axtens @ 2021-02-12  5:41 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev
In-Reply-To: <20210203184323.20792-7-cmr@codefail.de>

Hi Chris,

> Previously setup_sigcontext() performed a costly KUAP switch on every
> uaccess operation. These repeated uaccess switches cause a significant
> drop in signal handling performance.
>
> Rewrite setup_sigcontext() to assume that a userspace write access window
> is open. Replace all uaccess functions with their 'unsafe' versions
> which avoid the repeated uaccess switches.

Just to clarify the commit message a bit: you're also changing the
callers of the old safe versions to first open the window, then call the
unsafe variants, then close the window again.

> Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> ---
>  arch/powerpc/kernel/signal_64.c | 70 ++++++++++++++++++++-------------
>  1 file changed, 43 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 8e1d804ce552..4248e4489ff1 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -101,9 +101,13 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re
>   * Set up the sigcontext for the signal frame.
>   */
>  
> -static long setup_sigcontext(struct sigcontext __user *sc,
> -		struct task_struct *tsk, int signr, sigset_t *set,
> -		unsigned long handler, int ctx_has_vsx_region)
> +#define unsafe_setup_sigcontext(sc, tsk, signr, set, handler,		\
> +				ctx_has_vsx_region, e)			\
> +	unsafe_op_wrap(__unsafe_setup_sigcontext(sc, tsk, signr, set,	\
> +			handler, ctx_has_vsx_region), e)
> +static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc,
> +					struct task_struct *tsk, int signr, sigset_t *set,
> +					unsigned long handler, int ctx_has_vsx_region)
>  {
>  	/* When CONFIG_ALTIVEC is set, we _always_ setup v_regs even if the
>  	 * process never used altivec yet (MSR_VEC is zero in pt_regs of
> @@ -118,20 +122,19 @@ static long setup_sigcontext(struct sigcontext __user *sc,
>  #endif
>  	struct pt_regs *regs = tsk->thread.regs;
>  	unsigned long msr = regs->msr;
> -	long err = 0;
>  	/* Force usr to alway see softe as 1 (interrupts enabled) */
>  	unsigned long softe = 0x1;
>  
>  	BUG_ON(tsk != current);
>  
>  #ifdef CONFIG_ALTIVEC
> -	err |= __put_user(v_regs, &sc->v_regs);
> +	unsafe_put_user(v_regs, &sc->v_regs, efault_out);
>  
>  	/* save altivec registers */
>  	if (tsk->thread.used_vr) {
>  		/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
> -		err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
> -				      33 * sizeof(vector128));
> +		unsafe_copy_to_user(v_regs, &tsk->thread.vr_state,
> +				    33 * sizeof(vector128), efault_out);
>  		/* set MSR_VEC in the MSR value in the frame to indicate that sc->v_reg)
>  		 * contains valid data.
>  		 */
> @@ -140,12 +143,12 @@ static long setup_sigcontext(struct sigcontext __user *sc,
>  	/* We always copy to/from vrsave, it's 0 if we don't have or don't
>  	 * use altivec.
>  	 */
> -	err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
> +	unsafe_put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33], efault_out);
>  #else /* CONFIG_ALTIVEC */
> -	err |= __put_user(0, &sc->v_regs);
> +	unsafe_put_user(0, &sc->v_regs, efault_out);
>  #endif /* CONFIG_ALTIVEC */
>  	/* copy fpr regs and fpscr */
> -	err |= copy_fpr_to_user(&sc->fp_regs, tsk);
> +	unsafe_copy_fpr_to_user(&sc->fp_regs, tsk, efault_out);
>  
>  	/*
>  	 * Clear the MSR VSX bit to indicate there is no valid state attached
> @@ -160,24 +163,27 @@ static long setup_sigcontext(struct sigcontext __user *sc,
>  	 */
>  	if (tsk->thread.used_vsr && ctx_has_vsx_region) {
>  		v_regs += ELF_NVRREG;
> -		err |= copy_vsx_to_user(v_regs, tsk);
> +		unsafe_copy_vsx_to_user(v_regs, tsk, efault_out);
>  		/* set MSR_VSX in the MSR value in the frame to
>  		 * indicate that sc->vs_reg) contains valid data.
>  		 */
>  		msr |= MSR_VSX;
>  	}
>  #endif /* CONFIG_VSX */
> -	err |= __put_user(&sc->gp_regs, &sc->regs);
> +	unsafe_put_user(&sc->gp_regs, &sc->regs, efault_out);
>  	WARN_ON(!FULL_REGS(regs));
> -	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
> -	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
> -	err |= __put_user(softe, &sc->gp_regs[PT_SOFTE]);
> -	err |= __put_user(signr, &sc->signal);
> -	err |= __put_user(handler, &sc->handler);
> +	unsafe_copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE, efault_out);
> +	unsafe_put_user(msr, &sc->gp_regs[PT_MSR], efault_out);
> +	unsafe_put_user(softe, &sc->gp_regs[PT_SOFTE], efault_out);
> +	unsafe_put_user(signr, &sc->signal, efault_out);
> +	unsafe_put_user(handler, &sc->handler, efault_out);
>  	if (set != NULL)
> -		err |=  __put_user(set->sig[0], &sc->oldmask);
> +		unsafe_put_user(set->sig[0], &sc->oldmask, efault_out);
>  
> -	return err;
> +	return 0;
> +
> +efault_out:
> +	return -EFAULT;
>  }
>  
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> @@ -664,12 +670,15 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>  
>  	if (old_ctx != NULL) {
>  		prepare_setup_sigcontext(current, ctx_has_vsx_region);
> -		if (!access_ok(old_ctx, ctx_size)
> -		    || setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
> -					ctx_has_vsx_region)
> -		    || __copy_to_user(&old_ctx->uc_sigmask,
> -				      &current->blocked, sizeof(sigset_t)))
> +		if (!user_write_access_begin(old_ctx, ctx_size))
>  			return -EFAULT;

I notice we get rid of access_ok, but that's fine because
user_write_access_begin includes access_ok since Linus decided that was
a good idea.

> +
> +		unsafe_setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL,
> +					0, ctx_has_vsx_region, efault_out);
> +		unsafe_copy_to_user(&old_ctx->uc_sigmask, &current->blocked,
> +				    sizeof(sigset_t), efault_out);
> +
> +		user_write_access_end();
>  	}
>  	if (new_ctx == NULL)
>  		return 0;
> @@ -698,6 +707,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>  	/* This returns like rt_sigreturn */
>  	set_thread_flag(TIF_RESTOREALL);
>  	return 0;
> +
> +efault_out:
> +	user_write_access_end();
> +	return -EFAULT;
>  }
>  
>  
> @@ -850,9 +863,12 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
>  	} else {
>  		err |= __put_user(0, &frame->uc.uc_link);
>  		prepare_setup_sigcontext(tsk, 1);
> -		err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
> -					NULL, (unsigned long)ksig->ka.sa.sa_handler,
> -					1);
> +		if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
> +			return -EFAULT;

Here you're opening a window for all of `frame`...

> +		err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk,
... but here you're only passing in frame->uc.uc_mcontext for writing.

ISTR that the underlying AMR switch is fully on / fully off so I don't
think it really matters, but in this case should you be calling
user_write_access_begin() with &frame->uc.uc_mcontext and the size of
that?

> +						ksig->sig, NULL,
> +						(unsigned long)ksig->ka.sa.sa_handler, 1);
> +		user_write_access_end();
>  	}
>  	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
>  	if (err)


Apart from the size thing, everything looks good to me. I checked that
all the things you've changed from safe to unsafe pass the same
parameters, and they all looked good to me.

With those caveats,
 Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel

^ permalink raw reply

* Re: [PATCH v2 0/2] of: of_device.h cleanups
From: Greg Kroah-Hartman @ 2021-02-12  7:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: cocci, devicetree, Michal Marek, Rafael J. Wysocki, linuxppc-dev,
	Nicolas Palix, Patrice Chotard, linux-kernel, Julia Lawall,
	Gilles Muller, linux-usb, Paul Mackerras, netdev, Jakub Kicinski,
	Frank Rowand, David S. Miller, linux-arm-kernel, Felipe Balbi
In-Reply-To: <20210211232745.1498137-1-robh@kernel.org>

On Thu, Feb 11, 2021 at 05:27:43PM -0600, Rob Herring wrote:
> This is a couple of cleanups for of_device.h. They fell out from my
> attempt at decoupling of_device.h and of_platform.h which is a mess
> and I haven't finished, but there's no reason to wait on these.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* [PATCH for 5.10] powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
From: Christophe Leroy @ 2021-02-12  8:57 UTC (permalink / raw)
  To: fedora.dm0, stable; +Cc: linuxppc-dev, linux-kernel

This is backport of 3642eb21256a ("powerpc/32: Preserve cr1 in
exception prolog stack check to fix build error") for kernel 5.10

It fixes the build failure on v5.10 reported by kernel test robot
and by David Michael.

This fix is not in Linux tree yet, it is in next branch in powerpc tree.

(cherry picked from commit 3642eb21256a317ac14e9ed560242c6d20cf06d9)

THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1
Maximum PAGE_SHIFT is 18 for 256k pages so
THREAD_ALIGN_SHIFT is 19 at the maximum.

No need to clobber cr1, it can be preserved when moving r1
into CR when we check stack overflow.

This reduces the number of instructions in Machine Check Exception
prolog and fixes a build failure reported by the kernel test robot
on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That
build failure is due to too many instructions in the prolog hence
not fitting between 0x200 and 0x300. Allthough the problem doesn't
show up in mainline, it is still worth the change.

Fixes: 98bf2d3f4970 ("powerpc/32s: Fix RTAS machine check with VMAP stack")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.leroy@csgroup.eu
---
 arch/powerpc/kernel/head_32.h        | 2 +-
 arch/powerpc/kernel/head_book3s_32.S | 6 ------
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index c88e66adecb5..fef0b34a77c9 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -56,7 +56,7 @@
 1:
 	tophys_novmstack r11, r11
 #ifdef CONFIG_VMAP_STACK
-	mtcrf	0x7f, r1
+	mtcrf	0x3f, r1
 	bt	32 - THREAD_ALIGN_SHIFT, stack_overflow
 #endif
 .endm
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index d66da35f2e8d..2729d8fa6e77 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -280,12 +280,6 @@ MachineCheck:
 7:	EXCEPTION_PROLOG_2
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_CHRP
-#ifdef CONFIG_VMAP_STACK
-	mfspr	r4, SPRN_SPRG_THREAD
-	tovirt(r4, r4)
-	lwz	r4, RTAS_SP(r4)
-	cmpwi	cr1, r4, 0
-#endif
 	beq	cr1, machine_check_tramp
 	twi	31, 0, 0
 #else
-- 
2.25.0


^ permalink raw reply related

* Re: linux-next: manual merge of the spi tree with the powerpc tree
From: Mark Brown @ 2021-02-12 12:27 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20210212152755.5c82563a@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 237 bytes --]

On Fri, Feb 12, 2021 at 03:31:42PM +1100, Stephen Rothwell wrote:

> BTW Mark: the author's address in 258ea99fe25a uses a non existent domain :-(

Ugh, I think that's something gone wrong with b4 :(  A bit late now to
try to fix it up.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Rob Herring @ 2021-02-12 14:38 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Masahiro Yamada,
	James Morris, AKASHI, Takahiro, linux-arm-kernel, Catalin Marinas,
	Serge E. Hallyn, devicetree, Pavel Tatashin, Will Deacon,
	Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <8a3aa3d2-2eba-549a-9970-a2b0fe3586c9@linux.microsoft.com>

On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
> >
> > There's actually a complication that I just noticed and needs to be
> > addressed. More below.
> >
>
> <...>
>
> >> +
> >> +/*
> >> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
> >> + *
> >> + * @image:          kexec image being loaded.
> >> + * @initrd_load_addr:       Address where the next initrd will be loaded.
> >> + * @initrd_len:             Size of the next initrd, or 0 if there will be none.
> >> + * @cmdline:                Command line for the next kernel, or NULL if there will
> >> + *                  be none.
> >> + *
> >> + * Return: fdt on success, or NULL errno on error.
> >> + */
> >> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
> >> +                               unsigned long initrd_load_addr,
> >> +                               unsigned long initrd_len,
> >> +                               const char *cmdline)
> >> +{
> >> +    void *fdt;
> >> +    int ret, chosen_node;
> >> +    const void *prop;
> >> +    unsigned long fdt_size;
> >> +
> >> +    fdt_size = fdt_totalsize(initial_boot_params) +
> >> +               (cmdline ? strlen(cmdline) : 0) +
> >> +               FDT_EXTRA_SPACE;
> >
> > Just adding 4 KB to initial_boot_params won't be enough for crash
> > kernels on ppc64. The current powerpc code doubles the size of
> > initial_boot_params (which is normally larger than 4 KB) and even that
> > isn't enough. A patch was added to powerpc/next today which uses a more
> > precise (but arch-specific) formula:
> >
> > https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
> >
> > So I believe we need a hook here where architectures can provide their
> > own specific calculation for the size of the fdt. Perhaps a weakly
> > defined function providing a default implementation which an
> > arch-specific file can override (a la arch_kexec_kernel_image_load())?
> >
> > Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
> > function from the patch I linked above.
> >
>
> Do you think it'd better to add "fdt_size" parameter to
> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
> desired FDT buffer size?

Yes, I guess so. But please define the param as extra size, not total
size. The kernel command line size addition can be in the common code.

The above change is also going to conflict, so I think this may have
to wait. Or I'll take the common and arm bits and powerpc can be
converted next cycle (or after the merge window).

Rob

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/uaccess: Move might_fault() into user_access_begin()
From: Christophe Leroy @ 2021-02-12 16:10 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: aik
In-Reply-To: <20210208135717.2618798-2-mpe@ellerman.id.au>



Le 08/02/2021 à 14:57, Michael Ellerman a écrit :
> We have a might_fault() check in __unsafe_put_user_goto(), but that is
> dangerous as it potentially calls lots of code while user access is
> enabled.
> 
> It also triggers the check Alexey added to the irq restore path to
> catch cases like that:
> 
>    WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 arch_local_irq_restore+0x160/0x190
>    NIP arch_local_irq_restore+0x160/0x190
>    LR  lock_is_held_type+0x140/0x200
>    Call Trace:
>      0xc00000007f392ff8 (unreliable)
>      ___might_sleep+0x180/0x320
>      __might_fault+0x50/0xe0
>      filldir64+0x2d0/0x5d0
>      call_filldir+0xc8/0x180
>      ext4_readdir+0x948/0xb40
>      iterate_dir+0x1ec/0x240
>      sys_getdents64+0x80/0x290
>      system_call_exception+0x160/0x280
>      system_call_common+0xf0/0x27c
> 
> So remove the might fault check from unsafe_put_user().
> 
> Any call to unsafe_put_user() must be inside a region that's had user
> access enabled with user_access_begin(), so move the might_fault() in
> there. That also allows us to drop the is_kernel_addr() test, because
> there should be no code using user_access_begin() in order to access a
> kernel address.

x86 and mips only have might_fault() on get_user() and put_user(), neither on __get_user() nor on 
__put_user() nor on the unsafe alternative.

When have might_fault() in __get_user_nocheck() that is used by __get_user() and 
__get_user_allowed() ie by unsafe_get_user().

Shoudln't those be dropped as well ?

Christophe

> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>   arch/powerpc/include/asm/uaccess.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 70347ee34c94..71640eca7341 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -214,8 +214,6 @@ do {								\
>   #define __unsafe_put_user_goto(x, ptr, size, label)		\
>   do {								\
>   	__typeof__(*(ptr)) __user *__pu_addr = (ptr);		\
> -	if (!is_kernel_addr((unsigned long)__pu_addr))		\
> -		might_fault();					\
>   	__chk_user_ptr(ptr);					\
>   	__put_user_size_goto((x), __pu_addr, (size), label);	\
>   } while (0)
> @@ -494,6 +492,8 @@ extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
>   
>   static __must_check inline bool user_access_begin(const void __user *ptr, size_t len)
>   {
> +	might_fault();
> +
>   	if (unlikely(!access_ok(ptr, len)))
>   		return false;
>   	allow_read_write_user((void __user *)ptr, ptr, len);
> 

^ permalink raw reply

* [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
From: Greg Kurz @ 2021-02-12 16:41 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: lvivier, Greg Kurz, stable, linux-kernel, Cédric Le Goater,
	linuxppc-dev

Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: lvivier@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index b3ac2455faad..29d04b83288d 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 			return hwirq;
 		}

-		virq = irq_create_mapping_affinity(NULL, hwirq,
-						   entry->affinity);
+		/*
+		 * Depending on the number of online CPUs in the original
+		 * kernel, it is likely for CPU #0 to be offline in a kdump
+		 * kernel. The associated IRQs in the affinity mappings
+		 * provided by irq_create_affinity_masks() are thus not
+		 * started by irq_startup(), as per-design for managed IRQs.
+		 * This can be a problem with multi-queue block devices driven
+		 * by blk-mq : such a non-started IRQ is very likely paired
+		 * with the single queue enforced by blk-mq during kdump (see
+		 * blk_mq_alloc_tag_set()). This causes the device to remain
+		 * silent and likely hangs the guest at some point.
+		 *
+		 * We don't really care for fine-grained affinity when doing
+		 * kdump actually : simply ignore the pre-computed affinity
+		 * masks in this case and let the default mask with all CPUs
+		 * be used when creating the IRQ mappings.
+		 */
+		if (is_kdump_kernel())
+			virq = irq_create_mapping(NULL, hwirq);
+		else
+			virq = irq_create_mapping_affinity(NULL, hwirq,
+							   entry->affinity);

 		if (!virq) {
 			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.26.2

^ permalink raw reply related

* Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
From: Laurent Vivier @ 2021-02-12 16:51 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, stable, Cédric Le Goater
In-Reply-To: <20210212164132.821332-1-groug@kaod.org>

On 12/02/2021 17:41, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
> Cc: lvivier@redhat.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
>  			return hwirq;
>  		}
>  
> -		virq = irq_create_mapping_affinity(NULL, hwirq,
> -						   entry->affinity);
> +		/*
> +		 * Depending on the number of online CPUs in the original
> +		 * kernel, it is likely for CPU #0 to be offline in a kdump
> +		 * kernel. The associated IRQs in the affinity mappings
> +		 * provided by irq_create_affinity_masks() are thus not
> +		 * started by irq_startup(), as per-design for managed IRQs.
> +		 * This can be a problem with multi-queue block devices driven
> +		 * by blk-mq : such a non-started IRQ is very likely paired
> +		 * with the single queue enforced by blk-mq during kdump (see
> +		 * blk_mq_alloc_tag_set()). This causes the device to remain
> +		 * silent and likely hangs the guest at some point.
> +		 *
> +		 * We don't really care for fine-grained affinity when doing
> +		 * kdump actually : simply ignore the pre-computed affinity
> +		 * masks in this case and let the default mask with all CPUs
> +		 * be used when creating the IRQ mappings.
> +		 */
> +		if (is_kdump_kernel())
> +			virq = irq_create_mapping(NULL, hwirq);
> +		else
> +			virq = irq_create_mapping_affinity(NULL, hwirq,
> +							   entry->affinity);
>  
>  		if (!virq) {
>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 

Reviewed-by: Laurent Vivier <lvivier@redhat.com>


^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-12 17:19 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Masahiro Yamada,
	James Morris, AKASHI, Takahiro, linux-arm-kernel, Catalin Marinas,
	Serge E. Hallyn, devicetree, Pavel Tatashin, Will Deacon,
	Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <CAL_JsqJ3sDzjsJXtb6EzE77BL+PhUxDJYUngLTqcm0popd7Ajw@mail.gmail.com>

On 2/12/21 6:38 AM, Rob Herring wrote:
> On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>>
>> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
>>>
>>> There's actually a complication that I just noticed and needs to be
>>> addressed. More below.
>>>
>>
>> <...>
>>
>>>> +
>>>> +/*
>>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
>>>> + *
>>>> + * @image:          kexec image being loaded.
>>>> + * @initrd_load_addr:       Address where the next initrd will be loaded.
>>>> + * @initrd_len:             Size of the next initrd, or 0 if there will be none.
>>>> + * @cmdline:                Command line for the next kernel, or NULL if there will
>>>> + *                  be none.
>>>> + *
>>>> + * Return: fdt on success, or NULL errno on error.
>>>> + */
>>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>>> +                               unsigned long initrd_load_addr,
>>>> +                               unsigned long initrd_len,
>>>> +                               const char *cmdline)
>>>> +{
>>>> +    void *fdt;
>>>> +    int ret, chosen_node;
>>>> +    const void *prop;
>>>> +    unsigned long fdt_size;
>>>> +
>>>> +    fdt_size = fdt_totalsize(initial_boot_params) +
>>>> +               (cmdline ? strlen(cmdline) : 0) +
>>>> +               FDT_EXTRA_SPACE;
>>>
>>> Just adding 4 KB to initial_boot_params won't be enough for crash
>>> kernels on ppc64. The current powerpc code doubles the size of
>>> initial_boot_params (which is normally larger than 4 KB) and even that
>>> isn't enough. A patch was added to powerpc/next today which uses a more
>>> precise (but arch-specific) formula:
>>>
>>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
>>>
>>> So I believe we need a hook here where architectures can provide their
>>> own specific calculation for the size of the fdt. Perhaps a weakly
>>> defined function providing a default implementation which an
>>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
>>>
>>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
>>> function from the patch I linked above.
>>>
>>
>> Do you think it'd better to add "fdt_size" parameter to
>> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
>> desired FDT buffer size?
> 
> Yes, I guess so. But please define the param as extra size, not total
> size. The kernel command line size addition can be in the common code.

Will do. Just to clarify -

The common code will do:

fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size

The caller will pass "extra_fdt_size"
ARM64 => 4KB
PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when 
the patch Thiago had referred to is merged.

> 
> The above change is also going to conflict, so I think this may have
> to wait. Or I'll take the common and arm bits and powerpc can be
> converted next cycle (or after the merge window).
> 

thanks.

  -lakshmi



^ permalink raw reply

* Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
From: Cédric Le Goater @ 2021-02-12 17:27 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman; +Cc: lvivier, linuxppc-dev, linux-kernel, stable
In-Reply-To: <20210212164132.821332-1-groug@kaod.org>

On 2/12/21 5:41 PM, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
> Cc: lvivier@redhat.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Greg Kurz <groug@kaod.org>


Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks for tracking this issue. 

This layer needs a rework. Patches adding a MSI domain should be ready 
in a couple of releases. Hopefully. 

C. 

> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
>  			return hwirq;
>  		}
>  
> -		virq = irq_create_mapping_affinity(NULL, hwirq,
> -						   entry->affinity);
> +		/*
> +		 * Depending on the number of online CPUs in the original
> +		 * kernel, it is likely for CPU #0 to be offline in a kdump
> +		 * kernel. The associated IRQs in the affinity mappings
> +		 * provided by irq_create_affinity_masks() are thus not
> +		 * started by irq_startup(), as per-design for managed IRQs.
> +		 * This can be a problem with multi-queue block devices driven
> +		 * by blk-mq : such a non-started IRQ is very likely paired
> +		 * with the single queue enforced by blk-mq during kdump (see
> +		 * blk_mq_alloc_tag_set()). This causes the device to remain
> +		 * silent and likely hangs the guest at some point.
> +		 *
> +		 * We don't really care for fine-grained affinity when doing
> +		 * kdump actually : simply ignore the pre-computed affinity
> +		 * masks in this case and let the default mask with all CPUs
> +		 * be used when creating the IRQ mappings.
> +		 */
> +		if (is_kdump_kernel())
> +			virq = irq_create_mapping(NULL, hwirq);
> +		else
> +			virq = irq_create_mapping_affinity(NULL, hwirq,
> +							   entry->affinity);
>  
>  		if (!virq) {
>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 


^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Rob Herring @ 2021-02-12 18:24 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Masahiro Yamada,
	James Morris, AKASHI, Takahiro, linux-arm-kernel, Catalin Marinas,
	Serge E. Hallyn, devicetree, Pavel Tatashin, Will Deacon,
	Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <55685b61-dac0-2f24-f74a-939acf74a4f2@linux.microsoft.com>

On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 2/12/21 6:38 AM, Rob Herring wrote:
> > On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
> > <nramas@linux.microsoft.com> wrote:
> >>
> >> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
> >>>
> >>> There's actually a complication that I just noticed and needs to be
> >>> addressed. More below.
> >>>
> >>
> >> <...>
> >>
> >>>> +
> >>>> +/*
> >>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
> >>>> + *
> >>>> + * @image:          kexec image being loaded.
> >>>> + * @initrd_load_addr:       Address where the next initrd will be loaded.
> >>>> + * @initrd_len:             Size of the next initrd, or 0 if there will be none.
> >>>> + * @cmdline:                Command line for the next kernel, or NULL if there will
> >>>> + *                  be none.
> >>>> + *
> >>>> + * Return: fdt on success, or NULL errno on error.
> >>>> + */
> >>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
> >>>> +                               unsigned long initrd_load_addr,
> >>>> +                               unsigned long initrd_len,
> >>>> +                               const char *cmdline)
> >>>> +{
> >>>> +    void *fdt;
> >>>> +    int ret, chosen_node;
> >>>> +    const void *prop;
> >>>> +    unsigned long fdt_size;
> >>>> +
> >>>> +    fdt_size = fdt_totalsize(initial_boot_params) +
> >>>> +               (cmdline ? strlen(cmdline) : 0) +
> >>>> +               FDT_EXTRA_SPACE;
> >>>
> >>> Just adding 4 KB to initial_boot_params won't be enough for crash
> >>> kernels on ppc64. The current powerpc code doubles the size of
> >>> initial_boot_params (which is normally larger than 4 KB) and even that
> >>> isn't enough. A patch was added to powerpc/next today which uses a more
> >>> precise (but arch-specific) formula:
> >>>
> >>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
> >>>
> >>> So I believe we need a hook here where architectures can provide their
> >>> own specific calculation for the size of the fdt. Perhaps a weakly
> >>> defined function providing a default implementation which an
> >>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
> >>>
> >>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
> >>> function from the patch I linked above.
> >>>
> >>
> >> Do you think it'd better to add "fdt_size" parameter to
> >> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
> >> desired FDT buffer size?
> >
> > Yes, I guess so. But please define the param as extra size, not total
> > size. The kernel command line size addition can be in the common code.
>
> Will do. Just to clarify -
>
> The common code will do:
>
> fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size
>
> The caller will pass "extra_fdt_size"
> ARM64 => 4KB
> PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
> the patch Thiago had referred to is merged.

Yes, I'd leave the 4KB in there by default and arm64 use 0.

Rob

^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-12 18:27 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Masahiro Yamada,
	James Morris, AKASHI, Takahiro, linux-arm-kernel, Catalin Marinas,
	Serge E. Hallyn, devicetree, Pavel Tatashin, Will Deacon,
	Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <CAL_JsqKDCgtJngxqMCRdC9evEQpHnryEaMvfgYEh0Mcto6dLHA@mail.gmail.com>

On 2/12/21 10:24 AM, Rob Herring wrote:
> On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>>
>> On 2/12/21 6:38 AM, Rob Herring wrote:
>>> On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
>>> <nramas@linux.microsoft.com> wrote:
>>>>
>>>> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
>>>>>
>>>>> There's actually a complication that I just noticed and needs to be
>>>>> addressed. More below.
>>>>>
>>>>
>>>> <...>
>>>>
>>>>>> +
>>>>>> +/*
>>>>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
>>>>>> + *
>>>>>> + * @image:          kexec image being loaded.
>>>>>> + * @initrd_load_addr:       Address where the next initrd will be loaded.
>>>>>> + * @initrd_len:             Size of the next initrd, or 0 if there will be none.
>>>>>> + * @cmdline:                Command line for the next kernel, or NULL if there will
>>>>>> + *                  be none.
>>>>>> + *
>>>>>> + * Return: fdt on success, or NULL errno on error.
>>>>>> + */
>>>>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>>>>> +                               unsigned long initrd_load_addr,
>>>>>> +                               unsigned long initrd_len,
>>>>>> +                               const char *cmdline)
>>>>>> +{
>>>>>> +    void *fdt;
>>>>>> +    int ret, chosen_node;
>>>>>> +    const void *prop;
>>>>>> +    unsigned long fdt_size;
>>>>>> +
>>>>>> +    fdt_size = fdt_totalsize(initial_boot_params) +
>>>>>> +               (cmdline ? strlen(cmdline) : 0) +
>>>>>> +               FDT_EXTRA_SPACE;
>>>>>
>>>>> Just adding 4 KB to initial_boot_params won't be enough for crash
>>>>> kernels on ppc64. The current powerpc code doubles the size of
>>>>> initial_boot_params (which is normally larger than 4 KB) and even that
>>>>> isn't enough. A patch was added to powerpc/next today which uses a more
>>>>> precise (but arch-specific) formula:
>>>>>
>>>>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
>>>>>
>>>>> So I believe we need a hook here where architectures can provide their
>>>>> own specific calculation for the size of the fdt. Perhaps a weakly
>>>>> defined function providing a default implementation which an
>>>>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
>>>>>
>>>>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
>>>>> function from the patch I linked above.
>>>>>
>>>>
>>>> Do you think it'd better to add "fdt_size" parameter to
>>>> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
>>>> desired FDT buffer size?
>>>
>>> Yes, I guess so. But please define the param as extra size, not total
>>> size. The kernel command line size addition can be in the common code.
>>
>> Will do. Just to clarify -
>>
>> The common code will do:
>>
>> fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size
>>
>> The caller will pass "extra_fdt_size"
>> ARM64 => 4KB
>> PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
>> the patch Thiago had referred to is merged.
> 
> Yes, I'd leave the 4KB in there by default and arm64 use 0.
> 

Sounds good.

common:
fdt_totalsize(initial_boot_params) + strlen(cmdline) + 0x1000 + extra

arm64 => 0 for extra
ppc => fdt_totalsize(initial_boot_params) for extra.

  -lakshmi


^ permalink raw reply

* Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Thiago Jung Bauermann @ 2021-02-12 19:39 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Rob Herring,
	Masahiro Yamada, James Morris, AKASHI, Takahiro, linux-arm-kernel,
	Catalin Marinas, Serge E. Hallyn, devicetree, Pavel Tatashin,
	Will Deacon, Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev
In-Reply-To: <11aff288-feaa-8e43-fcda-12bc12fbf4cf@linux.microsoft.com>


Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> On 2/12/21 10:24 AM, Rob Herring wrote:
>> On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
>> <nramas@linux.microsoft.com> wrote:
>>>
>>> On 2/12/21 6:38 AM, Rob Herring wrote:
>>>> On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
>>>> <nramas@linux.microsoft.com> wrote:
>>>>>
>>>>> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
>>>>>>
>>>>>> There's actually a complication that I just noticed and needs to be
>>>>>> addressed. More below.
>>>>>>
>>>>>
>>>>> <...>
>>>>>
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
>>>>>>> + *
>>>>>>> + * @image:          kexec image being loaded.
>>>>>>> + * @initrd_load_addr:       Address where the next initrd will be loaded.
>>>>>>> + * @initrd_len:             Size of the next initrd, or 0 if there will be none.
>>>>>>> + * @cmdline:                Command line for the next kernel, or NULL if there will
>>>>>>> + *                  be none.
>>>>>>> + *
>>>>>>> + * Return: fdt on success, or NULL errno on error.
>>>>>>> + */
>>>>>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>>>>>> +                               unsigned long initrd_load_addr,
>>>>>>> +                               unsigned long initrd_len,
>>>>>>> +                               const char *cmdline)
>>>>>>> +{
>>>>>>> +    void *fdt;
>>>>>>> +    int ret, chosen_node;
>>>>>>> +    const void *prop;
>>>>>>> +    unsigned long fdt_size;
>>>>>>> +
>>>>>>> +    fdt_size = fdt_totalsize(initial_boot_params) +
>>>>>>> +               (cmdline ? strlen(cmdline) : 0) +
>>>>>>> +               FDT_EXTRA_SPACE;
>>>>>>
>>>>>> Just adding 4 KB to initial_boot_params won't be enough for crash
>>>>>> kernels on ppc64. The current powerpc code doubles the size of
>>>>>> initial_boot_params (which is normally larger than 4 KB) and even that
>>>>>> isn't enough. A patch was added to powerpc/next today which uses a more
>>>>>> precise (but arch-specific) formula:
>>>>>>
>>>>>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
>>>>>>
>>>>>> So I believe we need a hook here where architectures can provide their
>>>>>> own specific calculation for the size of the fdt. Perhaps a weakly
>>>>>> defined function providing a default implementation which an
>>>>>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
>>>>>>
>>>>>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
>>>>>> function from the patch I linked above.
>>>>>>
>>>>>
>>>>> Do you think it'd better to add "fdt_size" parameter to
>>>>> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
>>>>> desired FDT buffer size?
>>>>
>>>> Yes, I guess so. But please define the param as extra size, not total
>>>> size. The kernel command line size addition can be in the common code.
>>>
>>> Will do. Just to clarify -
>>>
>>> The common code will do:
>>>
>>> fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size
>>>
>>> The caller will pass "extra_fdt_size"
>>> ARM64 => 4KB
>>> PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
>>> the patch Thiago had referred to is merged.
>> Yes, I'd leave the 4KB in there by default and arm64 use 0.
>> 
>
> Sounds good.
>
> common:
> fdt_totalsize(initial_boot_params) + strlen(cmdline) + 0x1000 + extra
>
> arm64 => 0 for extra
> ppc => fdt_totalsize(initial_boot_params) for extra.

Looks good to me.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox