* [PATCH v17 01/10] powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
From: Lakshmi Ramasubramanian @ 2021-02-09 18:21 UTC (permalink / raw)
To: zohar, bauerman, robh, takahiro.akashi, gregkh, will, joe,
catalin.marinas, mpe
Cc: mark.rutland, tao.li, paulus, vincenzo.frascino, frowand.list,
sashal, masahiroy, jmorris, allison, serge, devicetree,
pasha.tatashin, prsriva, hsinyi, linux-arm-kernel,
christophe.leroy, mbrugger, balajib, dmitry.kasatkin,
linux-kernel, james.morse, linux-integrity, linuxppc-dev
In-Reply-To: <20210209182200.30606-1-nramas@linux.microsoft.com>
From: Rob Herring <robh@kernel.org>
The architecture specific field, elfcorehdr_addr in struct kimage_arch,
that holds the address of the buffer in memory for ELF core header for
powerpc has a different name than the one used for arm64. This makes
it hard to have a common code for setting up the device tree for
kexec system call.
Rename elfcorehdr_addr to elf_headers_mem to align with arm64 name so
common code can use it.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
---
arch/powerpc/include/asm/kexec.h | 2 +-
arch/powerpc/kexec/file_load.c | 4 ++--
arch/powerpc/kexec/file_load_64.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 55d6ede30c19..dbf09d2f36d0 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -108,7 +108,7 @@ struct kimage_arch {
unsigned long backup_start;
void *backup_buf;
- unsigned long elfcorehdr_addr;
+ unsigned long elf_headers_mem;
unsigned long elf_headers_sz;
void *elf_headers;
diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 9a232bc36c8f..e452b11df631 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -45,7 +45,7 @@ char *setup_kdump_cmdline(struct kimage *image, char *cmdline,
return NULL;
elfcorehdr_strlen = sprintf(cmdline_ptr, "elfcorehdr=0x%lx ",
- image->arch.elfcorehdr_addr);
+ image->arch.elf_headers_mem);
if (elfcorehdr_strlen + cmdline_len > COMMAND_LINE_SIZE) {
pr_err("Appending elfcorehdr=<addr> exceeds cmdline size\n");
@@ -263,7 +263,7 @@ int setup_new_fdt(const struct kimage *image, void *fdt,
* Avoid elfcorehdr from being stomped on in kdump kernel by
* setting up memory reserve map.
*/
- ret = fdt_add_mem_rsv(fdt, image->arch.elfcorehdr_addr,
+ ret = fdt_add_mem_rsv(fdt, image->arch.elf_headers_mem,
image->arch.elf_headers_sz);
if (ret) {
pr_err("Error reserving elfcorehdr memory: %s\n",
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index c69bcf9b547a..a05c19b3cc60 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -815,7 +815,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
goto out;
}
- image->arch.elfcorehdr_addr = kbuf->mem;
+ image->arch.elf_headers_mem = kbuf->mem;
image->arch.elf_headers_sz = headers_sz;
image->arch.elf_headers = headers;
out:
@@ -851,7 +851,7 @@ int load_crashdump_segments_ppc64(struct kimage *image,
return ret;
}
pr_debug("Loaded elf core header at 0x%lx, bufsz=0x%lx memsz=0x%lx\n",
- image->arch.elfcorehdr_addr, kbuf->bufsz, kbuf->memsz);
+ image->arch.elf_headers_mem, kbuf->bufsz, kbuf->memsz);
return 0;
}
--
2.30.0
^ permalink raw reply related
* [PATCH v17 02/10] of: Add a common kexec FDT setup function
From: Lakshmi Ramasubramanian @ 2021-02-09 18:21 UTC (permalink / raw)
To: zohar, bauerman, robh, takahiro.akashi, gregkh, will, joe,
catalin.marinas, mpe
Cc: mark.rutland, tao.li, paulus, vincenzo.frascino, frowand.list,
sashal, masahiroy, jmorris, allison, serge, devicetree,
pasha.tatashin, prsriva, hsinyi, linux-arm-kernel,
christophe.leroy, mbrugger, balajib, dmitry.kasatkin,
linux-kernel, james.morse, linux-integrity, linuxppc-dev
In-Reply-To: <20210209182200.30606-1-nramas@linux.microsoft.com>
From: Rob Herring <robh@kernel.org>
Both arm64 and powerpc do essentially the same FDT /chosen setup for
kexec. The differences are either omissions that arm64 should have
or additional properties that will be ignored. The setup code can be
combined and shared by both powerpc and arm64.
The differences relative to the arm64 version:
- If /chosen doesn't exist, it will be created (should never happen).
- Any old dtb and initrd reserved memory will be released.
- The new initrd and elfcorehdr are marked reserved.
- "linux,booted-from-kexec" is set.
The differences relative to the powerpc version:
- "kaslr-seed" and "rng-seed" may be set.
- "linux,elfcorehdr" is set.
- Any existing "linux,usable-memory-range" is removed.
Combine the code for setting up the /chosen node in the FDT and updating
the memory reservation for kexec, for powerpc and arm64, in
of_kexec_alloc_and_setup_fdt() and move it to "drivers/of/kexec.c".
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
---
drivers/of/Makefile | 6 ++
drivers/of/kexec.c | 258 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/of.h | 13 +++
3 files changed, 277 insertions(+)
create mode 100644 drivers/of/kexec.c
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index 6e1e5212f058..c13b982084a3 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -14,4 +14,10 @@ obj-$(CONFIG_OF_RESOLVE) += resolver.o
obj-$(CONFIG_OF_OVERLAY) += overlay.o
obj-$(CONFIG_OF_NUMA) += of_numa.o
+ifdef CONFIG_KEXEC_FILE
+ifdef CONFIG_OF_FLATTREE
+obj-y += kexec.o
+endif
+endif
+
obj-$(CONFIG_OF_UNITTEST) += unittest-data/
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
new file mode 100644
index 000000000000..469e09613cdd
--- /dev/null
+++ b/drivers/of/kexec.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Arm Limited
+ *
+ * Based on arch/arm64/kernel/machine_kexec_file.c:
+ * Copyright (C) 2018 Linaro Limited
+ *
+ * And arch/powerpc/kexec/file_load.c:
+ * Copyright (C) 2016 IBM Corporation
+ */
+
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/libfdt.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/random.h>
+#include <linux/types.h>
+
+/* relevant device tree properties */
+#define FDT_PROP_KEXEC_ELFHDR "linux,elfcorehdr"
+#define FDT_PROP_MEM_RANGE "linux,usable-memory-range"
+#define FDT_PROP_INITRD_START "linux,initrd-start"
+#define FDT_PROP_INITRD_END "linux,initrd-end"
+#define FDT_PROP_BOOTARGS "bootargs"
+#define FDT_PROP_KASLR_SEED "kaslr-seed"
+#define FDT_PROP_RNG_SEED "rng-seed"
+#define RNG_SEED_SIZE 128
+
+/**
+ * fdt_find_and_del_mem_rsv - delete memory reservation with given address and size
+ *
+ * @fdt: Flattened device tree for the current kernel.
+ * @start: Starting address of the reserved memory.
+ * @size: Size of the reserved memory.
+ *
+ * Return: 0 on success, or negative errno on error.
+ */
+static int fdt_find_and_del_mem_rsv(void *fdt, unsigned long start, unsigned long size)
+{
+ int i, ret, num_rsvs = fdt_num_mem_rsv(fdt);
+
+ for (i = 0; i < num_rsvs; i++) {
+ u64 rsv_start, rsv_size;
+
+ ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
+ if (ret) {
+ pr_err("Malformed device tree.\n");
+ return -EINVAL;
+ }
+
+ if (rsv_start == start && rsv_size == size) {
+ ret = fdt_del_mem_rsv(fdt, i);
+ if (ret) {
+ pr_err("Error deleting device tree reservation.\n");
+ return -EINVAL;
+ }
+
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+/*
+ * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
+ *
+ * @image: kexec image being loaded.
+ * @initrd_load_addr: Address where the next initrd will be loaded.
+ * @initrd_len: Size of the next initrd, or 0 if there will be none.
+ * @cmdline: Command line for the next kernel, or NULL if there will
+ * be none.
+ *
+ * Return: fdt on success, or NULL errno on error.
+ */
+void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
+ unsigned long initrd_load_addr,
+ unsigned long initrd_len,
+ const char *cmdline)
+{
+ void *fdt;
+ int ret, chosen_node;
+ const void *prop;
+ unsigned long fdt_size;
+
+ fdt_size = fdt_totalsize(initial_boot_params) +
+ (cmdline ? strlen(cmdline) : 0) +
+ FDT_EXTRA_SPACE;
+
+ fdt = kvmalloc(fdt_size, GFP_KERNEL);
+ if (!fdt)
+ return NULL;
+
+ ret = fdt_open_into(initial_boot_params, fdt, fdt_size);
+ if (ret < 0) {
+ pr_err("Error %d setting up the new device tree.\n", ret);
+ goto out;
+ }
+
+ /* Remove memory reservation for the current device tree. */
+ ret = fdt_find_and_del_mem_rsv(fdt, __pa(initial_boot_params),
+ fdt_totalsize(initial_boot_params));
+ if (ret == -EINVAL) {
+ pr_err("Error removing memory reservation.\n");
+ goto out;
+ }
+
+ chosen_node = fdt_path_offset(fdt, "/chosen");
+ if (chosen_node == -FDT_ERR_NOTFOUND)
+ chosen_node = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"),
+ "chosen");
+ if (chosen_node < 0) {
+ ret = chosen_node;
+ goto out;
+ }
+
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_KEXEC_ELFHDR);
+ if (ret && ret != -FDT_ERR_NOTFOUND)
+ goto out;
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_MEM_RANGE);
+ if (ret && ret != -FDT_ERR_NOTFOUND)
+ goto out;
+
+ /* Did we boot using an initrd? */
+ prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", NULL);
+ if (prop) {
+ u64 tmp_start, tmp_end, tmp_size;
+
+ tmp_start = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+ prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", NULL);
+ if (!prop) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ tmp_end = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+ /*
+ * kexec reserves exact initrd size, while firmware may
+ * reserve a multiple of PAGE_SIZE, so check for both.
+ */
+ tmp_size = tmp_end - tmp_start;
+ ret = fdt_find_and_del_mem_rsv(fdt, tmp_start, tmp_size);
+ if (ret == -ENOENT)
+ ret = fdt_find_and_del_mem_rsv(fdt, tmp_start,
+ round_up(tmp_size, PAGE_SIZE));
+ if (ret == -EINVAL)
+ goto out;
+ }
+
+ /* add initrd-* */
+ if (initrd_load_addr) {
+ ret = fdt_setprop_u64(fdt, chosen_node, FDT_PROP_INITRD_START,
+ initrd_load_addr);
+ if (ret)
+ goto out;
+
+ ret = fdt_setprop_u64(fdt, chosen_node, FDT_PROP_INITRD_END,
+ initrd_load_addr + initrd_len);
+ if (ret)
+ goto out;
+
+ ret = fdt_add_mem_rsv(fdt, initrd_load_addr, initrd_len);
+ if (ret)
+ goto out;
+
+ } else {
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_INITRD_START);
+ if (ret && (ret != -FDT_ERR_NOTFOUND))
+ goto out;
+
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_INITRD_END);
+ if (ret && (ret != -FDT_ERR_NOTFOUND))
+ goto out;
+ }
+
+ if (image->type == KEXEC_TYPE_CRASH) {
+ /* add linux,elfcorehdr */
+ ret = fdt_appendprop_addrrange(fdt, 0, chosen_node,
+ FDT_PROP_KEXEC_ELFHDR,
+ image->arch.elf_headers_mem,
+ image->arch.elf_headers_sz);
+ if (ret)
+ goto out;
+
+ /*
+ * Avoid elfcorehdr from being stomped on in kdump kernel by
+ * setting up memory reserve map.
+ */
+ ret = fdt_add_mem_rsv(fdt, image->arch.elf_headers_mem,
+ image->arch.elf_headers_sz);
+ if (ret)
+ goto out;
+
+ /* add linux,usable-memory-range */
+ ret = fdt_appendprop_addrrange(fdt, 0, chosen_node,
+ FDT_PROP_MEM_RANGE,
+ crashk_res.start,
+ crashk_res.end - crashk_res.start + 1);
+ if (ret)
+ goto out;
+ }
+
+ /* add bootargs */
+ if (cmdline) {
+ ret = fdt_setprop_string(fdt, chosen_node, FDT_PROP_BOOTARGS, cmdline);
+ if (ret)
+ goto out;
+ } else {
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_BOOTARGS);
+ if (ret && (ret != -FDT_ERR_NOTFOUND))
+ goto out;
+ }
+
+ /* add kaslr-seed */
+ ret = fdt_delprop(fdt, chosen_node, FDT_PROP_KASLR_SEED);
+ if (ret == -FDT_ERR_NOTFOUND)
+ ret = 0;
+ else if (ret)
+ goto out;
+
+ if (rng_is_initialized()) {
+ u64 seed = get_random_u64();
+
+ ret = fdt_setprop_u64(fdt, chosen_node, FDT_PROP_KASLR_SEED, seed);
+ if (ret)
+ goto out;
+ } else {
+ pr_notice("RNG is not initialised: omitting \"%s\" property\n",
+ FDT_PROP_KASLR_SEED);
+ }
+
+ /* add rng-seed */
+ if (rng_is_initialized()) {
+ void *rng_seed;
+
+ ret = fdt_setprop_placeholder(fdt, chosen_node, FDT_PROP_RNG_SEED,
+ RNG_SEED_SIZE, &rng_seed);
+ if (ret)
+ goto out;
+ get_random_bytes(rng_seed, RNG_SEED_SIZE);
+ } else {
+ pr_notice("RNG is not initialised: omitting \"%s\" property\n",
+ FDT_PROP_RNG_SEED);
+ }
+
+ ret = fdt_setprop(fdt, chosen_node, "linux,booted-from-kexec", NULL, 0);
+
+out:
+ if (ret) {
+ kvfree(fdt);
+ fdt = NULL;
+ }
+
+ return fdt;
+}
diff --git a/include/linux/of.h b/include/linux/of.h
index 4b27c9a27df3..f0eff5e84353 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -560,6 +560,19 @@ int of_map_id(struct device_node *np, u32 id,
phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+/*
+ * Additional space needed for the buffer to build the new FDT
+ * so that we can add initrd, bootargs, kaslr-seed, rng-seed,
+ * userable-memory-range and elfcorehdr.
+ */
+#define FDT_EXTRA_SPACE 0x1000
+
+struct kimage;
+void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
+ unsigned long initrd_load_addr,
+ unsigned long initrd_len,
+ const char *cmdline);
+
#else /* CONFIG_OF */
static inline void of_core_init(void)
--
2.30.0
^ permalink raw reply related
* [PATCH v17 00/10] Carry forward IMA measurement log on kexec on ARM64
From: Lakshmi Ramasubramanian @ 2021-02-09 18:21 UTC (permalink / raw)
To: zohar, bauerman, robh, takahiro.akashi, gregkh, will, joe,
catalin.marinas, mpe
Cc: mark.rutland, tao.li, paulus, vincenzo.frascino, frowand.list,
sashal, masahiroy, jmorris, allison, serge, devicetree,
pasha.tatashin, prsriva, hsinyi, linux-arm-kernel,
christophe.leroy, mbrugger, balajib, dmitry.kasatkin,
linux-kernel, james.morse, linux-integrity, linuxppc-dev
On kexec file load Integrity Measurement Architecture (IMA) subsystem
may verify the IMA signature of the kernel and initramfs, and measure
it. The command line parameters passed to the kernel in the kexec call
may also be measured by IMA. A remote attestation service can verify
a TPM quote based on the TPM event log, the IMA measurement list, and
the TPM PCR data. This can be achieved only if the IMA measurement log
is carried over from the current kernel to the next kernel across
the kexec call.
powerpc already supports carrying forward the IMA measurement log on
kexec. This patch set adds support for carrying forward the IMA
measurement log on kexec on ARM64.
This patch set moves the platform independent code defined for powerpc
such that it can be reused for other platforms as well. A chosen node
"linux,ima-kexec-buffer" is added to the DTB for ARM64 to hold
the address and the size of the memory reserved to carry
the IMA measurement log.
This patch set has been tested for ARM64 platform using QEMU.
I would like help from the community for testing this change on powerpc.
Thanks.
This patch set is based on
commit 96acc833dec8 ("ima: Free IMA measurement buffer after kexec syscall")
in https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
"next-integrity" branch.
Changelog:
v17
- Renamed of_kexec_setup_new_fdt() to of_kexec_alloc_and_setup_fdt(),
and moved memory allocation for the new FDT to this function.
v16
- Defined functions to allocate and free buffer for FDT for powerpc
and arm64.
- Moved ima_buffer_addr and ima_buffer_size fields from
"struct kimage_arch" in powerpc to "struct kimage"
v15
- Included Rob's patches in the patch set, and rebased
the changes to "next-integrity" branch.
- Allocate memory for DTB, on arm64, using kmalloc() instead of
vmalloc() to keep it consistent with powerpc implementation.
- Call of_kexec_setup_new_fdt() from setup_new_fdt_ppc64() and
remove setup_new_fdt() in the same patch to keep it bisect safe.
v14
- Select CONFIG_HAVE_IMA_KEXEC for CONFIG_KEXEC_FILE, for powerpc
and arm64, if CONFIG_IMA is enabled.
- Use IS_ENABLED() macro instead of "#ifdef" in remove_ima_buffer(),
ima_get_kexec_buffer(), and ima_free_kexec_buffer().
- Call of_kexec_setup_new_fdt() from setup_new_fdt_ppc64() and
remove setup_new_fdt() in "arch/powerpc/kexec/file_load.c".
v13
- Moved the arch independent functions to drivers/of/kexec.c
and then refactored the code.
- Moved arch_ima_add_kexec_buffer() to
security/integrity/ima/ima_kexec.c
v12
- Use fdt_appendprop_addrrange() in setup_ima_buffer()
to setup the IMA measurement list property in
the device tree.
- Moved architecture independent functions from
"arch/powerpc/kexec/ima.c" to "drivers/of/kexec."
- Deleted "arch/powerpc/kexec/ima.c" and
"arch/powerpc/include/asm/ima.h".
v11
- Rebased the changes on the kexec code refactoring done by
Rob Herring in his "dt/kexec" branch
- Removed "extern" keyword in function declarations
- Removed unnecessary header files included in C files
- Updated patch descriptions per Thiago's comments
v10
- Moved delete_fdt_mem_rsv(), remove_ima_buffer(),
get_ima_kexec_buffer, and get_root_addr_size_cells()
to drivers/of/kexec.c
- Moved arch_ima_add_kexec_buffer() to
security/integrity/ima/ima_kexec.c
- Conditionally define IMA buffer fields in struct kimage_arch
v9
- Moved delete_fdt_mem_rsv() to drivers/of/kexec_fdt.c
- Defined a new function get_ima_kexec_buffer() in
drivers/of/ima_kexec.c to replace do_get_kexec_buffer()
- Changed remove_ima_kexec_buffer() to the original function name
remove_ima_buffer()
- Moved remove_ima_buffer() to drivers/of/ima_kexec.c
- Moved ima_get_kexec_buffer() and ima_free_kexec_buffer()
to security/integrity/ima/ima_kexec.c
v8:
- Moved remove_ima_kexec_buffer(), do_get_kexec_buffer(), and
delete_fdt_mem_rsv() to drivers/of/fdt.c
- Moved ima_dump_measurement_list() and ima_add_kexec_buffer()
back to security/integrity/ima/ima_kexec.c
v7:
- Renamed remove_ima_buffer() to remove_ima_kexec_buffer() and moved
this function definition to kernel.
- Moved delete_fdt_mem_rsv() definition to kernel
- Moved ima_dump_measurement_list() and ima_add_kexec_buffer() to
a new file namely ima_kexec_fdt.c in IMA
v6:
- Remove any existing FDT_PROP_IMA_KEXEC_BUFFER property in the device
tree and also its corresponding memory reservation in the currently
running kernel.
- Moved the function remove_ima_buffer() defined for powerpc to IMA
and renamed the function to ima_remove_kexec_buffer(). Also, moved
delete_fdt_mem_rsv() from powerpc to IMA.
v5:
- Merged get_addr_size_cells() and do_get_kexec_buffer() into a single
function when moving the arch independent code from powerpc to IMA
- Reverted the change to use FDT functions in powerpc code and added
back the original code in get_addr_size_cells() and
do_get_kexec_buffer() for powerpc.
- Added fdt_add_mem_rsv() for ARM64 to reserve the memory for
the IMA log buffer during kexec.
- Fixed the warning reported by kernel test bot for ARM64
arch_ima_add_kexec_buffer() - moved this function to a new file
namely arch/arm64/kernel/ima_kexec.c
v4:
- Submitting the patch series on behalf of the original author
Prakhar Srivastava <prsriva@linux.microsoft.com>
- Moved FDT_PROP_IMA_KEXEC_BUFFER ("linux,ima-kexec-buffer") to
libfdt.h so that it can be shared by multiple platforms.
v3:
Breakup patches further into separate patches.
- Refactoring non architecture specific code out of powerpc
- Update powerpc related code to use fdt functions
- Update IMA buffer read related code to use of functions
- Add support to store the memory information of the IMA
measurement logs to be carried forward.
- Update the property strings to align with documented nodes
https://github.com/devicetree-org/dt-schema/pull/46
v2:
Break patches into separate patches.
- Powerpc related Refactoring
- Updating the docuemntation for chosen node
- Updating arm64 to support IMA buffer pass
v1:
Refactoring carrying over IMA measuremnet logs over Kexec. This patch
moves the non-architecture specific code out of powerpc and adds to
security/ima.(Suggested by Thiago)
Add Documentation regarding the ima-kexec-buffer node in the chosen
node documentation
v0:
Add a layer of abstraction to use the memory reserved by device tree
for ima buffer pass.
Add support for ima buffer pass using reserved memory for arm64 kexec.
Update the arch sepcific code path in kexec file load to store the
ima buffer in the reserved memory. The same reserved memory is read
on kexec or cold boot.
Lakshmi Ramasubramanian (6):
powerpc: Move ima buffer fields to struct kimage
powerpc: Enable passing IMA log to next kernel on kexec
powerpc: Move arch independent ima kexec functions to
drivers/of/kexec.c
kexec: Use fdt_appendprop_addrrange() to add ima buffer to FDT
powerpc: Delete unused function delete_fdt_mem_rsv()
arm64: Enable passing IMA log to next kernel on kexec
Rob Herring (4):
powerpc: Rename kexec elfcorehdr_addr to elf_headers_mem
of: Add a common kexec FDT setup function
arm64: Use common of_kexec_alloc_and_setup_fdt()
powerpc: Use common of_kexec_alloc_and_setup_fdt()
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/machine_kexec_file.c | 180 +---------
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/ima.h | 30 --
arch/powerpc/include/asm/kexec.h | 12 +-
arch/powerpc/kexec/Makefile | 7 -
arch/powerpc/kexec/elf_64.c | 29 +-
arch/powerpc/kexec/file_load.c | 183 +---------
arch/powerpc/kexec/file_load_64.c | 11 +-
arch/powerpc/kexec/ima.c | 219 ------------
drivers/of/Makefile | 6 +
drivers/of/kexec.c | 473 +++++++++++++++++++++++++
include/linux/kexec.h | 3 +
include/linux/of.h | 20 ++
security/integrity/ima/ima.h | 4 -
security/integrity/ima/ima_kexec.c | 3 +-
16 files changed, 539 insertions(+), 644 deletions(-)
delete mode 100644 arch/powerpc/include/asm/ima.h
delete mode 100644 arch/powerpc/kexec/ima.c
create mode 100644 drivers/of/kexec.c
--
2.30.0
^ permalink raw reply
* RE: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer
From: David Laight @ 2021-02-09 17:16 UTC (permalink / raw)
To: 'Christophe Leroy', 'Segher Boessenkool',
Nicholas Piggin
Cc: Paul Mackerras, msuchanek@suse.de, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org
In-Reply-To: <f6ae2e77-3a89-c294-9a6e-58d84fbb46b1@csgroup.eu>
From: Christophe Leroy <christophe.leroy@csgroup.eu>
> Sent: 09 February 2021 17:04
>
> Le 09/02/2021 à 15:31, David Laight a écrit :
> > From: Segher Boessenkool
> >> Sent: 09 February 2021 13:51
> >>
> >> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> >>> What if you did this?
> >>
> >>> +static inline struct task_struct *get_current(void)
> >>> +{
> >>> + register struct task_struct *task asm ("r2");
> >>> +
> >>> + return task;
> >>> +}
> >>
> >> Local register asm variables are *only* guaranteed to live in that
> >> register as operands to an asm. See
> >> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> >> ("The only supported use" etc.)
> >>
> >> You can do something like
> >>
> >> static inline struct task_struct *get_current(void)
> >> {
> >> register struct task_struct *task asm ("r2");
> >>
> >> asm("" : "+r"(task));
> >>
> >> return task;
> >> }
> >>
> >> which makes sure that "task" actually is in r2 at the point of that asm.
> >
> > If "r2" always contains current (and is never assigned by the compiler)
> > why not use a global register variable for it?
> >
>
>
> The change proposed by Nick doesn't solve the issue.
>
> The problem is that at the begining of the function we have:
>
> unsigned long *ti_flagsp = ¤t_thread_info()->flags;
>
> When the function uses ti_flagsp for the first time, it does use 112(r2)
>
> Then the function calls some other functions.
>
> Most likely because the function could update 'current', GCC copies r2 into r30, so that if r2 get
> changed by the called function, ti_flagsp is still based on the previous value of current.
>
> Allthough we know r2 wont change, GCC doesn't know it. And in order to save r2 into r30, it needs to
> save r30 in the stack.
>
>
> By using ¤t_thread_info()->flags directly instead of this intermediaite ti_flagsp pointer, GCC
> uses r2 instead instead of doing a copy.
Does marking current_thread_info() 'pure' (I think that the right one)
work - so that gcc knows its result doesn't depend on external data
and that it doesn't change external data.
Although I'm not 100% how well those attributes actually work.
> Nick, I don't understand the reason why you need that 'ti_flagsp' local var.
Probably to save typing.
I sometimes reload locals after function calls.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer
From: Christophe Leroy @ 2021-02-09 17:03 UTC (permalink / raw)
To: David Laight, 'Segher Boessenkool', Nicholas Piggin
Cc: Paul Mackerras, msuchanek@suse.de, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org
In-Reply-To: <d35cc46eca474b2c9f94a4de269321e7@AcuMS.aculab.com>
Le 09/02/2021 à 15:31, David Laight a écrit :
> From: Segher Boessenkool
>> Sent: 09 February 2021 13:51
>>
>> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
>>> What if you did this?
>>
>>> +static inline struct task_struct *get_current(void)
>>> +{
>>> + register struct task_struct *task asm ("r2");
>>> +
>>> + return task;
>>> +}
>>
>> Local register asm variables are *only* guaranteed to live in that
>> register as operands to an asm. See
>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
>> ("The only supported use" etc.)
>>
>> You can do something like
>>
>> static inline struct task_struct *get_current(void)
>> {
>> register struct task_struct *task asm ("r2");
>>
>> asm("" : "+r"(task));
>>
>> return task;
>> }
>>
>> which makes sure that "task" actually is in r2 at the point of that asm.
>
> If "r2" always contains current (and is never assigned by the compiler)
> why not use a global register variable for it?
>
The change proposed by Nick doesn't solve the issue.
The problem is that at the begining of the function we have:
unsigned long *ti_flagsp = ¤t_thread_info()->flags;
When the function uses ti_flagsp for the first time, it does use 112(r2)
Then the function calls some other functions.
Most likely because the function could update 'current', GCC copies r2 into r30, so that if r2 get
changed by the called function, ti_flagsp is still based on the previous value of current.
Allthough we know r2 wont change, GCC doesn't know it. And in order to save r2 into r30, it needs to
save r30 in the stack.
By using ¤t_thread_info()->flags directly instead of this intermediaite ti_flagsp pointer, GCC
uses r2 instead instead of doing a copy.
Nick, I don't understand the reason why you need that 'ti_flagsp' local var.
Christophe
^ permalink raw reply
* [PATCH 3/5] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
In-Reply-To: <20210209161936.377760-1-clg@kaod.org>
The IPI interrupt has its own domain now. Testing the HW interrupt
number is not needed anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 4aceac0f3046..21e8f7ef5cc4 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1397,13 +1397,12 @@ static void xive_flush_cpu_queue(unsigned int cpu, struct xive_cpu *xc)
struct irq_desc *desc = irq_to_desc(irq);
struct irq_data *d = irq_desc_get_irq_data(desc);
struct xive_irq_data *xd;
- unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
/*
* Ignore anything that isn't a XIVE irq and ignore
* IPIs, so can just be dropped.
*/
- if (d->domain != xive_irq_domain || hw_irq == XIVE_IPI_HW_IRQ)
+ if (d->domain != xive_irq_domain)
continue;
/*
--
2.26.2
^ permalink raw reply related
* [PATCH 4/5] powerpc/xive: Simplify xive_core_debug_show()
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
In-Reply-To: <20210209161936.377760-1-clg@kaod.org>
Now that the IPI interrupt has its own domain, the checks on the HW
interrupt number XIVE_IPI_HW_IRQ and on the chip can be replaced by a
check on the domain.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 21e8f7ef5cc4..30034af4462b 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1580,17 +1580,14 @@ static void xive_debug_show_cpu(struct seq_file *m, int cpu)
seq_puts(m, "\n");
}
-static void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
+static void xive_debug_show_irq(struct seq_file *m, struct irq_data *d)
{
- struct irq_chip *chip = irq_data_get_irq_chip(d);
+ unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
int rc;
u32 target;
u8 prio;
u32 lirq;
- if (!is_xive_irq(chip))
- return;
-
rc = xive_ops->get_irq_config(hw_irq, &target, &prio, &lirq);
if (rc) {
seq_printf(m, "IRQ 0x%08x : no config rc=%d\n", hw_irq, rc);
@@ -1628,16 +1625,9 @@ static int xive_core_debug_show(struct seq_file *m, void *private)
for_each_irq_desc(i, desc) {
struct irq_data *d = irq_desc_get_irq_data(desc);
- unsigned int hw_irq;
-
- if (!d)
- continue;
-
- hw_irq = (unsigned int)irqd_to_hwirq(d);
- /* IPIs are special (HW number 0) */
- if (hw_irq != XIVE_IPI_HW_IRQ)
- xive_debug_show_irq(m, hw_irq, d);
+ if (d->domain == xive_irq_domain)
+ xive_debug_show_irq(m, d);
}
return 0;
}
--
2.26.2
^ permalink raw reply related
* [PATCH 1/5] powerpc/xive: Use cpu_to_node() instead of ibm, chip-id property
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
In-Reply-To: <20210209161936.377760-1-clg@kaod.org>
The 'chip_id' field of the XIVE CPU structure is used to choose a
target for a source located on the same chip when possible. This field
is assigned on the PowerNV platform using the "ibm,chip-id" property
on pSeries under KVM when NUMA nodes are defined but it is undefined
under PowerVM. The XIVE source structure has a similar field
'src_chip' which is only assigned on the PowerNV platform.
cpu_to_node() returns a compatible value on all platforms, 0 being the
default node. It will also give us the opportunity to set the affinity
of a source on pSeries when we can localize them.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 7e08be5e5e4a..776871274b69 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1336,16 +1336,11 @@ static int xive_prepare_cpu(unsigned int cpu)
xc = per_cpu(xive_cpu, cpu);
if (!xc) {
- struct device_node *np;
-
xc = kzalloc_node(sizeof(struct xive_cpu),
GFP_KERNEL, cpu_to_node(cpu));
if (!xc)
return -ENOMEM;
- np = of_get_cpu_node(cpu, NULL);
- if (np)
- xc->chip_id = of_get_ibm_chip_id(np);
- of_node_put(np);
+ xc->chip_id = cpu_to_node(cpu);
xc->hw_ipi = XIVE_BAD_IRQ;
per_cpu(xive_cpu, cpu) = xc;
--
2.26.2
^ permalink raw reply related
* [PATCH 5/5] powerpc/xive: Map one IPI interrupt per node
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
In-Reply-To: <20210209161936.377760-1-clg@kaod.org>
ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.
This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.
As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.
As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.
* P9 DD2.2 - 2s * 64 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 4.984023 4.875405 4.996536 5.048892
0-31 10.879164 10.544040 10.757632 11.037859
0-47 15.345301 14.688764 14.926520 15.310053
0-63 17.064907 17.066812 17.613416 17.874511
2 0-79 11.768764 21.650749 22.689120 22.566508
0-95 10.616812 26.878789 28.434703 28.320324
0-111 10.151693 31.397803 31.771773 32.388122
0-127 9.948502 33.139336 34.875716 35.224548
* P10 DD1 - 4s (not homogeneous) 352 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 2.409402 2.364108 2.383303 2.395091
0-31 6.028325 6.046075 6.089999 6.073750
0-47 8.655178 8.644531 8.712830 8.724702
0-63 11.629652 11.735953 12.088203 12.055979
0-79 14.392321 14.729959 14.986701 14.973073
0-95 12.604158 13.004034 17.528748 17.568095
2 0-111 9.767753 13.719831 19.968606 20.024218
0-127 6.744566 16.418854 22.898066 22.995110
0-143 6.005699 19.174421 25.425622 25.417541
0-159 5.649719 21.938836 27.952662 28.059603
0-175 5.441410 24.109484 31.133915 31.127996
3 0-191 5.318341 24.405322 33.999221 33.775354
0-207 5.191382 26.449769 36.050161 35.867307
0-223 5.102790 29.356943 39.544135 39.508169
0-239 5.035295 31.933051 42.135075 42.071975
0-255 4.969209 34.477367 44.655395 44.757074
4 0-271 4.907652 35.887016 47.080545 47.318537
0-287 4.839581 38.076137 50.464307 50.636219
0-303 4.786031 40.881319 53.478684 53.310759
0-319 4.743750 43.448424 56.388102 55.973969
0-335 4.709936 45.623532 59.400930 58.926857
0-351 4.681413 45.646151 62.035804 61.830057
[*] https://github.com/antonblanchard/ipistorm
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/xive-internal.h | 2 --
arch/powerpc/sysdev/xive/common.c | 39 ++++++++++++++++++------
2 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/xive-internal.h b/arch/powerpc/sysdev/xive/xive-internal.h
index 9cf57c722faa..b3a456fdd3a5 100644
--- a/arch/powerpc/sysdev/xive/xive-internal.h
+++ b/arch/powerpc/sysdev/xive/xive-internal.h
@@ -5,8 +5,6 @@
#ifndef __XIVE_INTERNAL_H
#define __XIVE_INTERNAL_H
-#define XIVE_IPI_HW_IRQ 0 /* interrupt source # for IPIs */
-
/*
* A "disabled" interrupt should never fire, to catch problems
* we set its logical number to this
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 30034af4462b..a1e61a5cf927 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -65,8 +65,16 @@ static struct irq_domain *xive_irq_domain;
#ifdef CONFIG_SMP
static struct irq_domain *xive_ipi_irq_domain;
-/* The IPIs all use the same logical irq number */
-static u32 xive_ipi_irq;
+/* The IPIs use the same logical irq number when on the same chip */
+static struct xive_ipi_desc {
+ unsigned int irq;
+ char name[8]; /* enough bytes to fit IPI-XXX */
+} *xive_ipis;
+
+static unsigned int xive_ipi_cpu_to_irq(unsigned int cpu)
+{
+ return xive_ipis[cpu_to_node(cpu)].irq;
+}
#endif
/* Xive state for each CPU */
@@ -1087,25 +1095,36 @@ static const struct irq_domain_ops xive_ipi_irq_domain_ops = {
static void __init xive_request_ipi(void)
{
- unsigned int virq;
+ unsigned int node;
- xive_ipi_irq_domain = irq_domain_add_linear(NULL, 1,
+ xive_ipi_irq_domain = irq_domain_add_linear(NULL, nr_node_ids,
&xive_ipi_irq_domain_ops, NULL);
if (WARN_ON(xive_ipi_irq_domain == NULL))
return;
- /* Initialize it */
- virq = irq_create_mapping(xive_ipi_irq_domain, XIVE_IPI_HW_IRQ);
- xive_ipi_irq = virq;
+ xive_ipis = kcalloc(nr_node_ids, sizeof(*xive_ipis), GFP_KERNEL | __GFP_NOFAIL);
+ for_each_node(node) {
+ struct xive_ipi_desc *xid = &xive_ipis[node];
+ irq_hw_number_t node_ipi_hwirq = node;
+
+ /*
+ * Map one IPI interrupt per node for all cpus of that node.
+ * Since the HW interrupt number doesn't have any meaning,
+ * simply use the node number.
+ */
+ xid->irq = irq_create_mapping(xive_ipi_irq_domain, node_ipi_hwirq);
+ snprintf(xid->name, sizeof(xid->name), "IPI-%d", node);
- WARN_ON(request_irq(virq, xive_muxed_ipi_action,
- IRQF_PERCPU | IRQF_NO_THREAD, "IPI", NULL));
+ WARN_ON(request_irq(xid->irq, xive_muxed_ipi_action,
+ IRQF_PERCPU | IRQF_NO_THREAD, xid->name, NULL));
+ }
}
static int xive_setup_cpu_ipi(unsigned int cpu)
{
struct xive_cpu *xc;
int rc;
+ unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
pr_debug("Setting up IPI for CPU %d\n", cpu);
@@ -1146,6 +1165,8 @@ static int xive_setup_cpu_ipi(unsigned int cpu)
static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
{
+ unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
+
/* Disable the IPI and free the IRQ data */
/* Already cleaned up ? */
--
2.26.2
^ permalink raw reply related
* [PATCH 0/5] powerpc/xive: Map one IPI interrupt per node
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
Hello,
ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.
This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.
As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.
As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.
* P9 DD2.2 - 2s * 64 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 4.984023 4.875405 4.996536 5.048892
0-31 10.879164 10.544040 10.757632 11.037859
0-47 15.345301 14.688764 14.926520 15.310053
0-63 17.064907 17.066812 17.613416 17.874511
2 0-79 11.768764 21.650749 22.689120 22.566508
0-95 10.616812 26.878789 28.434703 28.320324
0-111 10.151693 31.397803 31.771773 32.388122
0-127 9.948502 33.139336 34.875716 35.224548
* P10 DD1 - 4s (not homogeneous) 352 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 2.409402 2.364108 2.383303 2.395091
0-31 6.028325 6.046075 6.089999 6.073750
0-47 8.655178 8.644531 8.712830 8.724702
0-63 11.629652 11.735953 12.088203 12.055979
0-79 14.392321 14.729959 14.986701 14.973073
0-95 12.604158 13.004034 17.528748 17.568095
2 0-111 9.767753 13.719831 19.968606 20.024218
0-127 6.744566 16.418854 22.898066 22.995110
0-143 6.005699 19.174421 25.425622 25.417541
0-159 5.649719 21.938836 27.952662 28.059603
0-175 5.441410 24.109484 31.133915 31.127996
3 0-191 5.318341 24.405322 33.999221 33.775354
0-207 5.191382 26.449769 36.050161 35.867307
0-223 5.102790 29.356943 39.544135 39.508169
0-239 5.035295 31.933051 42.135075 42.071975
0-255 4.969209 34.477367 44.655395 44.757074
4 0-271 4.907652 35.887016 47.080545 47.318537
0-287 4.839581 38.076137 50.464307 50.636219
0-303 4.786031 40.881319 53.478684 53.310759
0-319 4.743750 43.448424 56.388102 55.973969
0-335 4.709936 45.623532 59.400930 58.926857
0-351 4.681413 45.646151 62.035804 61.830057
[*] https://github.com/antonblanchard/ipistorm
Thanks,
C.
Cédric Le Goater (5):
powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property
powerpc/xive: Introduce an IPI interrupt domain
powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
powerpc/xive: Simplify xive_core_debug_show()
powerpc/xive: Map one IPI interrupt per node
arch/powerpc/sysdev/xive/xive-internal.h | 2 -
arch/powerpc/sysdev/xive/common.c | 114 +++++++++++------------
2 files changed, 56 insertions(+), 60 deletions(-)
--
2.26.2
^ permalink raw reply
* [PATCH 2/5] powerpc/xive: Introduce an IPI interrupt domain
From: Cédric Le Goater @ 2021-02-09 16:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater
In-Reply-To: <20210209161936.377760-1-clg@kaod.org>
The IPI interrupt is a special case of the XIVE IRQ domain. When
mapping and unmapping the interrupts in the Linux interrupt number
space, the HW interrupt number 0 (XIVE_IPI_HW_IRQ) is checked to
distinguish the IPI interrupt from other interrupts of the system.
Simplify the XIVE interrupt domain by introducing a specific domain
for the IPI.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 51 +++++++++++++------------------
1 file changed, 22 insertions(+), 29 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 776871274b69..4aceac0f3046 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -63,6 +63,8 @@ static const struct xive_ops *xive_ops;
static struct irq_domain *xive_irq_domain;
#ifdef CONFIG_SMP
+static struct irq_domain *xive_ipi_irq_domain;
+
/* The IPIs all use the same logical irq number */
static u32 xive_ipi_irq;
#endif
@@ -1068,20 +1070,32 @@ static struct irq_chip xive_ipi_chip = {
.irq_unmask = xive_ipi_do_nothing,
};
+/*
+ * IPIs are marked per-cpu. We use separate HW interrupts under the
+ * hood but associated with the same "linux" interrupt
+ */
+static int xive_ipi_irq_domain_map(struct irq_domain *h, unsigned int virq,
+ irq_hw_number_t hw)
+{
+ irq_set_chip_and_handler(virq, &xive_ipi_chip, handle_percpu_irq);
+ return 0;
+}
+
+static const struct irq_domain_ops xive_ipi_irq_domain_ops = {
+ .map = xive_ipi_irq_domain_map,
+};
+
static void __init xive_request_ipi(void)
{
unsigned int virq;
- /*
- * Initialization failed, move on, we might manage to
- * reach the point where we display our errors before
- * the system falls appart
- */
- if (!xive_irq_domain)
+ xive_ipi_irq_domain = irq_domain_add_linear(NULL, 1,
+ &xive_ipi_irq_domain_ops, NULL);
+ if (WARN_ON(xive_ipi_irq_domain == NULL))
return;
/* Initialize it */
- virq = irq_create_mapping(xive_irq_domain, XIVE_IPI_HW_IRQ);
+ virq = irq_create_mapping(xive_ipi_irq_domain, XIVE_IPI_HW_IRQ);
xive_ipi_irq = virq;
WARN_ON(request_irq(virq, xive_muxed_ipi_action,
@@ -1179,19 +1193,6 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
*/
irq_clear_status_flags(virq, IRQ_LEVEL);
-#ifdef CONFIG_SMP
- /* IPIs are special and come up with HW number 0 */
- if (hw == XIVE_IPI_HW_IRQ) {
- /*
- * IPIs are marked per-cpu. We use separate HW interrupts under
- * the hood but associated with the same "linux" interrupt
- */
- irq_set_chip_and_handler(virq, &xive_ipi_chip,
- handle_percpu_irq);
- return 0;
- }
-#endif
-
rc = xive_irq_alloc_data(virq, hw);
if (rc)
return rc;
@@ -1203,15 +1204,7 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
static void xive_irq_domain_unmap(struct irq_domain *d, unsigned int virq)
{
- struct irq_data *data = irq_get_irq_data(virq);
- unsigned int hw_irq;
-
- /* XXX Assign BAD number */
- if (!data)
- return;
- hw_irq = (unsigned int)irqd_to_hwirq(data);
- if (hw_irq != XIVE_IPI_HW_IRQ)
- xive_irq_free_data(virq);
+ xive_irq_free_data(virq);
}
static int xive_irq_domain_xlate(struct irq_domain *h, struct device_node *ct,
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
From: Christophe Leroy @ 2021-02-09 16:13 UTC (permalink / raw)
To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
msuchanek, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612834634.qle1lc7n6y.astroid@bobo.none>
Le 09/02/2021 à 02:55, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> When r3 is not modified, reload it from regs->orig_r3 to free
>> volatile registers. This avoids a stack frame for the likely part
>> of system_call_exception()
>
> This doesn't on my 64s build, but it does reduce one non volatile
> register save/restore. With quite a bit more register pressure
> reduction 64s can avoid the stack frame as well.
The stack frame is not due to the registers because on PPC64 you have the redzone that you don't
have on PPC32.
As far as I can see, this is due to a call to .arch_local_irq_restore().
On ppc32 arch_local_irq_restore() is just a write to MSR.
>
> It's a cool trick but quite code and compiler specific so I don't know
> how worthwhile it is to keep considering we're calling out into random
> kernel C code after this.
>
> Maybe just keep it PPC32 specific for the moment, will have to do more
> tuning for 64 and we have other stuff to do there first.
>
> If you are happy to make it 32-bit only then
I think we can leave without this, that's only one or two cycles won.
>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
^ permalink raw reply
* Re: [PATCH v5 19/22] powerpc/syscall: Optimise checks in beginning of system_call_exception()
From: Christophe Leroy @ 2021-02-09 14:32 UTC (permalink / raw)
To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
msuchanek, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612836170.502t0sssvi.astroid@bobo.none>
Le 09/02/2021 à 03:06, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> Combine all tests of regs->msr into a single logical one.
>
> Okay by me unless we choose to do the config option and put these all
> under it. I think I would prefer that because sometimes the registers
> are in a state you can't easily see what the values in the expression
> were. In this case it doesn't matter so much because they should be in
> regs in the interrupt frame.
Yes indeed. I reword the commit log and tell that.
>
> Thanks,
> Nick
>
>>
>> Before the patch:
>>
>> 0: 81 6a 00 84 lwz r11,132(r10)
>> 4: 90 6a 00 88 stw r3,136(r10)
>> 8: 69 60 00 02 xori r0,r11,2
>> c: 54 00 ff fe rlwinm r0,r0,31,31,31
>> 10: 0f 00 00 00 twnei r0,0
>> 14: 69 63 40 00 xori r3,r11,16384
>> 18: 54 63 97 fe rlwinm r3,r3,18,31,31
>> 1c: 0f 03 00 00 twnei r3,0
>> 20: 69 6b 80 00 xori r11,r11,32768
>> 24: 55 6b 8f fe rlwinm r11,r11,17,31,31
>> 28: 0f 0b 00 00 twnei r11,0
>>
>> After the patch:
>>
>> 0: 81 6a 00 84 lwz r11,132(r10)
>> 4: 90 6a 00 88 stw r3,136(r10)
>> 8: 7d 6b 58 f8 not r11,r11
>> c: 71 6b c0 02 andi. r11,r11,49154
>> 10: 0f 0b 00 00 twnei r11,0
>>
>> 6 cycles less on powerpc 8xx (328 => 322 cycles).
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> arch/powerpc/kernel/interrupt.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index 55e1aa18cdb9..8c38e8c95be2 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -28,6 +28,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> unsigned long r0, struct pt_regs *regs)
>> {
>> syscall_fn f;
>> + unsigned long expected_msr;
>>
>> regs->orig_gpr3 = r3;
>>
>> @@ -39,10 +40,13 @@ notrace long system_call_exception(long r3, long r4, long r5,
>>
>> trace_hardirqs_off(); /* finish reconciling */
>>
>> + expected_msr = MSR_PR;
>> if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>> - BUG_ON(!(regs->msr & MSR_RI));
>> - BUG_ON(!(regs->msr & MSR_PR));
>> - BUG_ON(arch_irq_disabled_regs(regs));
>> + expected_msr |= MSR_RI;
>> + if (IS_ENABLED(CONFIG_PPC32))
>> + expected_msr |= MSR_EE;
>> + BUG_ON((regs->msr & expected_msr) ^ expected_msr);
>> + BUG_ON(IS_ENABLED(CONFIG_PPC64) && arch_irq_disabled_regs(regs));
>>
>> #ifdef CONFIG_PPC_PKEY
>> if (mmu_has_feature(MMU_FTR_PKEY)) {
>> --
>> 2.25.0
>>
>>
^ permalink raw reply
* RE: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer
From: David Laight @ 2021-02-09 14:31 UTC (permalink / raw)
To: 'Segher Boessenkool', Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org, msuchanek@suse.de, Paul Mackerras,
linux-kernel@vger.kernel.org
In-Reply-To: <20210209135053.GD27854@gate.crashing.org>
From: Segher Boessenkool
> Sent: 09 February 2021 13:51
>
> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> > What if you did this?
>
> > +static inline struct task_struct *get_current(void)
> > +{
> > + register struct task_struct *task asm ("r2");
> > +
> > + return task;
> > +}
>
> Local register asm variables are *only* guaranteed to live in that
> register as operands to an asm. See
> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> ("The only supported use" etc.)
>
> You can do something like
>
> static inline struct task_struct *get_current(void)
> {
> register struct task_struct *task asm ("r2");
>
> asm("" : "+r"(task));
>
> return task;
> }
>
> which makes sure that "task" actually is in r2 at the point of that asm.
If "r2" always contains current (and is never assigned by the compiler)
why not use a global register variable for it?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH v5 18/22] powerpc/syscall: Remove FULL_REGS verification in system_call_exception
From: Christophe Leroy @ 2021-02-09 14:31 UTC (permalink / raw)
To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
msuchanek, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612836023.l122pe2n2b.astroid@bobo.none>
Le 09/02/2021 à 03:02, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> For book3s/64, FULL_REGS() is 'true' at all time, so the test voids.
>> For others, non volatile registers are saved inconditionally.
>>
>> So the verification is pointless.
>>
>> Should one fail to do it, it would anyway be caught by the
>> CHECK_FULL_REGS() in copy_thread() as we have removed the
>> special versions ppc_fork() and friends.
>>
>> null_syscall benchmark reduction 4 cycles (332 => 328 cycles)
>
> I wonder if we rather make a CONFIG option for a bunch of these simpler
> debug checks here (and also in interrupt exit, wrappers, etc) rather
> than remove them entirely.
We can drop this patch if you prefer. Anyway, like book3s/64, once ppc32 also do interrupt
entry/exit in C, FULL_REGS() will already return true.
Christophe
>
> Thanks,
> Nick
>
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> arch/powerpc/kernel/interrupt.c | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index 8fafca727b8b..55e1aa18cdb9 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -42,7 +42,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>> BUG_ON(!(regs->msr & MSR_RI));
>> BUG_ON(!(regs->msr & MSR_PR));
>> - BUG_ON(!FULL_REGS(regs));
>> BUG_ON(arch_irq_disabled_regs(regs));
>>
>> #ifdef CONFIG_PPC_PKEY
>> --
>> 2.25.0
>>
>>
^ permalink raw reply
* [PATCH] powerpc/64: Fix stack trace not displaying final frame
From: Michael Ellerman @ 2021-02-09 14:16 UTC (permalink / raw)
To: linuxppc-dev; +Cc: npiggin
In commit bf13718bc57a ("powerpc: show registers when unwinding
interrupt frames") we changed our stack dumping logic to show the full
registers whenever we find an interrupt frame on the stack.
However we didn't notice that on 64-bit this doesn't show the final
frame, ie. the interrupt that brought us in from userspace, whereas on
32-bit it does.
That is due to confusion about the size of that last frame. The code
in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE
to check the sp is at least that far below the top of the stack.
However on 64-bit that size is too large for the final frame, because
it includes the red zone, but we don't allocate a red zone for the
first frame.
So add a new define that encodes the correct size for 32-bit and
64-bit, and use it in show_stack().
This results in the full trace being shown on 64-bit, eg:
sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649
Call Trace:
[c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable)
[c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c
[c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50
[c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210
[c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188
[c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0
[c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360
[c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140
[c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0
[c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278
--- interrupt: c00 at 0x7fff9fbab428
NIP: 00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000
REGS: c00000000a1c3e80 TRAP: 0c00 Not tainted (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty)
MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 22002884 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001
GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063
GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0
GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000
GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0
GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001
NIP [00007fff9fbab428] 0x7fff9fbab428
LR [000000001000b724] 0x1000b724
--- interrupt: c00
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
arch/powerpc/include/asm/ptrace.h | 3 +++
arch/powerpc/kernel/asm-offsets.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 58f9dc060a7b..8236c5e749e4 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -70,6 +70,9 @@ struct pt_regs
};
#endif
+
+#define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct pt_regs))
+
#ifdef __powerpc64__
/*
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 31edd9bbce75..6109496e5fdf 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -308,7 +308,7 @@ int main(void)
/* Interrupt register frame */
DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE);
- DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs));
+ DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_WITH_PT_REGS);
STACK_PT_REGS_OFFSET(GPR0, gpr[0]);
STACK_PT_REGS_OFFSET(GPR1, gpr[1]);
STACK_PT_REGS_OFFSET(GPR2, gpr[2]);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e296440e9d16..924d023dad0a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2179,7 +2179,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
* See if this is an exception frame.
* We look for the "regshere" marker in the current frame.
*/
- if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
+ if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
&& stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v3 1/8] powerpc/uaccess: Add unsafe_copy_from_user
From: Christophe Leroy @ 2021-02-09 14:09 UTC (permalink / raw)
To: Michael Ellerman, Christopher M. Riedl, linuxppc-dev
In-Reply-To: <87pn21r7yr.fsf@mpe.ellerman.id.au>
Le 19/01/2021 à 03:11, Michael Ellerman a écrit :
> "Christopher M. Riedl" <cmr@codefail.de> writes:
>> On Mon Jan 11, 2021 at 7:22 AM CST, Christophe Leroy wrote:
>>> Le 09/01/2021 à 04:25, Christopher M. Riedl a écrit :
>>>> Implement raw_copy_from_user_allowed() which assumes that userspace read
>>>> access is open. Use this new function to implement raw_copy_from_user().
>>>> Finally, wrap the new function to follow the usual "unsafe_" convention
>>>> of taking a label argument.
>>>
>>> I think there is no point implementing raw_copy_from_user_allowed(), see
>>> https://github.com/linuxppc/linux/commit/4b842e4e25b1 and
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8c74fc9ce8131cabb10b3e95dc0e430f396ee83e.1610369143.git.christophe.leroy@csgroup.eu/
>>>
>>> You should simply do:
>>>
>>> #define unsafe_copy_from_user(d, s, l, e) \
>>> unsafe_op_wrap(__copy_tofrom_user((__force void __user *)d, s, l), e)
>>>
>>
>> I gave this a try and the signal ops decreased by ~8K. Now, to be
>> honest, I am not sure what an "acceptable" benchmark number here
>> actually is - so maybe this is ok? Same loss with both radix and hash:
>>
>> | | hash | radix |
>> | ------------------------------------ | ------ | ------ |
>> | linuxppc/next | 118693 | 133296 |
>> | linuxppc/next w/o KUAP+KUEP | 228911 | 228654 |
>> | unsafe-signal64 | 200480 | 234067 |
>> | unsafe-signal64 (__copy_tofrom_user) | 192467 | 225119 |
>>
>> To put this into perspective, prior to KUAP and uaccess flush, signal
>> performance in this benchmark was ~290K on hash.
>
> If I'm doing the math right 8K is ~4% of the best number.
>
> It seems like 4% is worth a few lines of code to handle these constant
> sizes. It's not like we have performance to throw away.
>
> Or, we should chase down where the call sites are that are doing small
> constant copies with copy_to/from_user() and change them to use
> get/put_user().
>
I have built pmac32_defconfig and ppc64_defconfig with a BUILD_BUG_ON(__builtin_constant_p(n) && (n
== 1 || n == 2 || n == 4 || n == 8) in raw_copy_from_user() and raw_copy_to_user():
On pmac32_defconfig, no hit.
On ppc64_defconfig, two hits:
- copies of sigset_t in signal64. This problem is only on linux/next. On next-test we don't have
this problem anymore thanks to the series from Christopher.
- in pkey_set() in arch/powerpc/kernel/ptrace/ptrace-view.c, in the copy of new_amr. This is not a
hot path I think so we can live with it.
Christophe
^ permalink raw reply
* [PATCH v2 1/3] powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
From: Christophe Leroy @ 2021-02-09 14:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
Copied from commit 4b842e4e25b1 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()")
Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/uaccess.h | 41 ------------------------------
1 file changed, 41 deletions(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 93d33f7e8b53..a4d2569173ac 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -398,26 +398,6 @@ static inline unsigned long raw_copy_from_user(void *to,
const void __user *from, unsigned long n)
{
unsigned long ret;
- if (__builtin_constant_p(n) && (n <= 8)) {
- ret = 1;
-
- switch (n) {
- case 1:
- __get_user_size(*(u8 *)to, from, 1, ret);
- break;
- case 2:
- __get_user_size(*(u16 *)to, from, 2, ret);
- break;
- case 4:
- __get_user_size(*(u32 *)to, from, 4, ret);
- break;
- case 8:
- __get_user_size(*(u64 *)to, from, 8, ret);
- break;
- }
- if (ret == 0)
- return 0;
- }
allow_read_from_user(from, n);
ret = __copy_tofrom_user((__force void __user *)to, from, n);
@@ -428,27 +408,6 @@ static inline unsigned long raw_copy_from_user(void *to,
static inline unsigned long
raw_copy_to_user_allowed(void __user *to, const void *from, unsigned long n)
{
- if (__builtin_constant_p(n) && (n <= 8)) {
- unsigned long ret = 1;
-
- switch (n) {
- case 1:
- __put_user_size_allowed(*(u8 *)from, (u8 __user *)to, 1, ret);
- break;
- case 2:
- __put_user_size_allowed(*(u16 *)from, (u16 __user *)to, 2, ret);
- break;
- case 4:
- __put_user_size_allowed(*(u32 *)from, (u32 __user *)to, 4, ret);
- break;
- case 8:
- __put_user_size_allowed(*(u64 *)from, (u64 __user *)to, 8, ret);
- break;
- }
- if (ret == 0)
- return 0;
- }
-
return __copy_tofrom_user(to, (__force const void __user *)from, n);
}
--
2.25.0
^ permalink raw reply related
* [PATCH v2 2/3] powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
From: Christophe Leroy @ 2021-02-09 14:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <99d4ccb58a20d8408d0e19874393655ad5b40822.1612879284.git.christophe.leroy@csgroup.eu>
__put_user_size_allowed() is only called from __put_user_size() now.
Merge them together.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/uaccess.h | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index a4d2569173ac..2fb1d95f10d3 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -108,22 +108,18 @@ static inline bool __access_ok(unsigned long addr, unsigned long size)
extern long __put_user_bad(void);
-#define __put_user_size_allowed(x, ptr, size, retval) \
+#define __put_user_size(x, ptr, size, retval) \
do { \
__label__ __pu_failed; \
\
retval = 0; \
+ allow_write_to_user(ptr, size); \
__put_user_size_goto(x, ptr, size, __pu_failed); \
+ prevent_write_to_user(ptr, size); \
break; \
\
__pu_failed: \
retval = -EFAULT; \
-} while (0)
-
-#define __put_user_size(x, ptr, size, retval) \
-do { \
- allow_write_to_user(ptr, size); \
- __put_user_size_allowed(x, ptr, size, retval); \
prevent_write_to_user(ptr, size); \
} while (0)
--
2.25.0
^ permalink raw reply related
* [PATCH v2 3/3] powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
From: Christophe Leroy @ 2021-02-09 14:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <99d4ccb58a20d8408d0e19874393655ad5b40822.1612879284.git.christophe.leroy@csgroup.eu>
Since commit 17bc43367fc2 ("powerpc/uaccess: Implement
unsafe_copy_to_user() as a simple loop"), raw_copy_to_user_allowed()
is only used by raw_copy_to_user().
Merge raw_copy_to_user_allowed() into raw_copy_to_user().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/uaccess.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 2fb1d95f10d3..33b2de642120 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -401,19 +401,13 @@ static inline unsigned long raw_copy_from_user(void *to,
return ret;
}
-static inline unsigned long
-raw_copy_to_user_allowed(void __user *to, const void *from, unsigned long n)
-{
- return __copy_tofrom_user(to, (__force const void __user *)from, n);
-}
-
static inline unsigned long
raw_copy_to_user(void __user *to, const void *from, unsigned long n)
{
unsigned long ret;
allow_write_to_user(to, n);
- ret = raw_copy_to_user_allowed(to, from, n);
+ ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
prevent_write_to_user(to, n);
return ret;
}
--
2.25.0
^ permalink raw reply related
* Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer
From: Segher Boessenkool @ 2021-02-09 13:50 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, msuchanek, linuxppc-dev
In-Reply-To: <1612838134.rvncv9kzls.astroid@bobo.none>
On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> What if you did this?
> +static inline struct task_struct *get_current(void)
> +{
> + register struct task_struct *task asm ("r2");
> +
> + return task;
> +}
Local register asm variables are *only* guaranteed to live in that
register as operands to an asm. See
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
("The only supported use" etc.)
You can do something like
static inline struct task_struct *get_current(void)
{
register struct task_struct *task asm ("r2");
asm("" : "+r"(task));
return task;
}
which makes sure that "task" actually is in r2 at the point of that asm.
Segher
^ permalink raw reply
* Re: [PATCH] tools/perf: Fix powerpc gap between kernel end and module start
From: Arnaldo Carvalho de Melo @ 2021-02-09 12:47 UTC (permalink / raw)
To: Athira Rajeev
Cc: linuxppc-dev, Madhavan Srinivasan, Jiri Olsa, Jiri Olsa,
Kajol Jain
In-Reply-To: <20210203153148.GC854763@kernel.org>
Em Wed, Feb 03, 2021 at 12:31:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Feb 02, 2021 at 04:02:36PM +0530, Athira Rajeev escreveu:
> >
> >
> > On 18-Jan-2021, at 3:51 PM, kajoljain <kjain@linux.ibm.com> wrote:
> >
> >
> >
> > On 1/12/21 3:08 PM, Jiri Olsa wrote:
> >
> > On Mon, Dec 28, 2020 at 09:14:14PM -0500, Athira Rajeev wrote:
> >
> > SNIP
> >
> >
> > c000000002799370 b backtrace_flag
> > c000000002799378 B radix_tree_node_cachep
> > c000000002799380 B __bss_stop
> > c0000000027a0000 B _end
> > c008000003890000 t icmp_checkentry [ip_tables]
> > c008000003890038 t ipt_alloc_initial_table [ip_tables]
> > c008000003890468 T ipt_do_table [ip_tables]
> > c008000003890de8 T ipt_unregister_table_pre_exit [ip_tables]
> > ...
> >
> > Perf calls function symbols__fixup_end() which sets the end of
> > symbol
> > to 0xc008000003890000, which is the next address and this is the
> > start
> > address of first module (icmp_checkentry in above) which will make
> > the
> > huge symbol size of 0x80000010f0000.
> >
> > After symbols__fixup_end:
> > symbols__fixup_end: sym->name: _end, sym->start:
> > 0xc0000000027a0000,
> > sym->end: 0xc008000003890000
> >
> > On powerpc, kernel text segment is located at 0xc000000000000000
> > whereas the modules are located at very high memory addresses,
> > 0xc00800000xxxxxxx. Since the gap between end of kernel text
> > segment
> > and beginning of first module's address is high, histogram
> > allocation
> > using calloc fails.
> >
> > Fix this by detecting the kernel's last symbol and limiting
> > the range of last kernel symbol to pagesize.
> >
> >
> > Patch looks good to me.
> >
> > Tested-By: Kajol Jain<kjain@linux.ibm.com>
> >
> > Thanks,
> > Kajol Jain
> >
> >
> > Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
> >
> >
> > I can't test, but since the same approach works for arm and s390,
> > this also looks ok
> >
> > Acked-by: Jiri Olsa <jolsa@redhat.com>
> >
> > thanks,
> > jirka
> >
> >
> > Hi Arnaldo,
> >
> > Can you please help review this patch and merge if this looks good..
>
> Thanks, collected the Tested-by from Kajol and the Acked-by from Jiri
> and applied to my local tree for testing, then up to my perf/core
> branch.
Had to apply this on top.
- Arnaldo
commit 0f000f9c89182950cd3500226729977251529364
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue Feb 9 09:41:21 2021 -0300
perf powerpc: Fix printf conversion specifier for IP addresses
We need to use "%#" PRIx64 for u64 values, not "%lx", fixing this build
problem on powerpc 32-bit:
72 13.69 ubuntu:18.04-x-powerpc : FAIL powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
arch/powerpc/util/machine.c: In function 'arch__symbols__fixup_end':
arch/powerpc/util/machine.c:23:12: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'u64 {aka long long unsigned int}' [-Werror=format=]
pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
^
/git/linux/tools/perf/util/debug.h:18:21: note: in definition of macro 'pr_fmt'
#define pr_fmt(fmt) fmt
^~~
/git/linux/tools/perf/util/debug.h:33:29: note: in expansion of macro 'pr_debugN'
#define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~
/git/linux/tools/perf/util/debug.h:33:42: note: in expansion of macro 'pr_fmt'
#define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~
arch/powerpc/util/machine.c:23:2: note: in expansion of macro 'pr_debug4'
pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
^~~~~~~~~
cc1: all warnings being treated as errors
/git/linux/tools/build/Makefile.build:139: recipe for target 'util' failed
make[5]: *** [util] Error 2
/git/linux/tools/build/Makefile.build:139: recipe for target 'powerpc' failed
make[4]: *** [powerpc] Error 2
/git/linux/tools/build/Makefile.build:139: recipe for target 'arch' failed
make[3]: *** [arch] Error 2
73 30.47 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Fixes: 557c3eadb7712741 ("perf powerpc: Fix gap between kernel end and module start")
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff --git a/tools/perf/arch/powerpc/util/machine.c b/tools/perf/arch/powerpc/util/machine.c
index c30e5cc88c1673d6..e652a1aa8132274f 100644
--- a/tools/perf/arch/powerpc/util/machine.c
+++ b/tools/perf/arch/powerpc/util/machine.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
+#include <inttypes.h>
#include <stdio.h>
#include <string.h>
#include <internal/lib.h> // page_size
@@ -20,5 +21,5 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
p->end += page_size;
else
p->end = c->start;
- pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
+ pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
}
^ permalink raw reply related
* [PATCH 3/3] powerpc/time: Remove get_tbl()
From: Christophe Leroy @ 2021-02-09 10:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, broonie
Cc: linuxppc-dev, linux-kernel, linux-spi
In-Reply-To: <99bf008e2970de7f8ed3225cda69a6d06ae1a644.1612866360.git.christophe.leroy@csgroup.eu>
There are no more users of get_tbl(). Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/vdso/timebase.h | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/powerpc/include/asm/vdso/timebase.h b/arch/powerpc/include/asm/vdso/timebase.h
index 881f655caa0a..891c9d5eaabe 100644
--- a/arch/powerpc/include/asm/vdso/timebase.h
+++ b/arch/powerpc/include/asm/vdso/timebase.h
@@ -43,12 +43,6 @@
#define mttbl(v) asm volatile("mttbl %0":: "r"(v))
#define mttbu(v) asm volatile("mttbu %0":: "r"(v))
-/* For compatibility, get_tbl() is defined as get_tb() on ppc64 */
-static inline unsigned long get_tbl(void)
-{
- return mftb();
-}
-
static __always_inline u64 get_tb(void)
{
unsigned int tbhi, tblo, tbhi2;
--
2.25.0
^ permalink raw reply related
* [PATCH 1/3] spi: mpc52xx: Avoid using get_tbl()
From: Christophe Leroy @ 2021-02-09 10:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, broonie
Cc: linuxppc-dev, linux-kernel, linux-spi
get_tbl() is confusing as it returns the content TBL register
on PPC32 but the concatenation of TBL and TBU on PPC64.
Use mftb() instead.
This will allow the removal of get_tbl() in a following patch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
drivers/spi/spi-mpc52xx.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/spi/spi-mpc52xx.c b/drivers/spi/spi-mpc52xx.c
index ef2f24420460..e6a30f232370 100644
--- a/drivers/spi/spi-mpc52xx.c
+++ b/drivers/spi/spi-mpc52xx.c
@@ -120,7 +120,7 @@ static void mpc52xx_spi_start_transfer(struct mpc52xx_spi *ms)
ms->cs_change = ms->transfer->cs_change;
/* Write out the first byte */
- ms->wcol_tx_timestamp = get_tbl();
+ ms->wcol_tx_timestamp = mftb();
if (ms->tx_buf)
out_8(ms->regs + SPI_DATA, *ms->tx_buf++);
else
@@ -221,8 +221,8 @@ static int mpc52xx_spi_fsmstate_transfer(int irq, struct mpc52xx_spi *ms,
* but it can also be worked around simply by retrying the
* transfer which is what we do here. */
ms->wcol_count++;
- ms->wcol_ticks += get_tbl() - ms->wcol_tx_timestamp;
- ms->wcol_tx_timestamp = get_tbl();
+ ms->wcol_ticks += mftb() - ms->wcol_tx_timestamp;
+ ms->wcol_tx_timestamp = mftb();
data = 0;
if (ms->tx_buf)
data = *(ms->tx_buf - 1);
@@ -247,14 +247,14 @@ static int mpc52xx_spi_fsmstate_transfer(int irq, struct mpc52xx_spi *ms,
/* Is the transfer complete? */
ms->len--;
if (ms->len == 0) {
- ms->timestamp = get_tbl();
+ ms->timestamp = mftb();
ms->timestamp += ms->transfer->delay_usecs * tb_ticks_per_usec;
ms->state = mpc52xx_spi_fsmstate_wait;
return FSM_CONTINUE;
}
/* Write out the next byte */
- ms->wcol_tx_timestamp = get_tbl();
+ ms->wcol_tx_timestamp = mftb();
if (ms->tx_buf)
out_8(ms->regs + SPI_DATA, *ms->tx_buf++);
else
@@ -276,7 +276,7 @@ mpc52xx_spi_fsmstate_wait(int irq, struct mpc52xx_spi *ms, u8 status, u8 data)
dev_err(&ms->master->dev, "spurious irq, status=0x%.2x\n",
status);
- if (((int)get_tbl()) - ms->timestamp < 0)
+ if (((int)mftb()) - ms->timestamp < 0)
return FSM_POLL;
ms->message->actual_length += ms->transfer->len;
--
2.25.0
^ permalink raw reply related
* [PATCH 2/3] powerpc/time: Avoid using get_tbl()
From: Christophe Leroy @ 2021-02-09 10:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, broonie
Cc: linuxppc-dev, linux-kernel, linux-spi
In-Reply-To: <99bf008e2970de7f8ed3225cda69a6d06ae1a644.1612866360.git.christophe.leroy@csgroup.eu>
get_tbl() is confusing as it returns the content TBL register
on PPC32 but the concatenation of TBL and TBU on PPC64.
Use mftb() instead.
This will allow the removal of get_tbl() in a following patch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
index 05e19470d523..b91ebebd9ff2 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
@@ -229,7 +229,7 @@ static irqreturn_t mpc52xx_lpbfifo_irq(int irq, void *dev_id)
int dma, write, poll_dma;
spin_lock_irqsave(&lpbfifo.lock, flags);
- ts = get_tbl();
+ ts = mftb();
req = lpbfifo.req;
if (!req) {
@@ -307,7 +307,7 @@ static irqreturn_t mpc52xx_lpbfifo_irq(int irq, void *dev_id)
if (irq != 0) /* don't increment on polled case */
req->irq_count++;
- req->irq_ticks += get_tbl() - ts;
+ req->irq_ticks += mftb() - ts;
spin_unlock_irqrestore(&lpbfifo.lock, flags);
/* Spinlock is released; it is now safe to call the callback */
@@ -330,7 +330,7 @@ static irqreturn_t mpc52xx_lpbfifo_bcom_irq(int irq, void *dev_id)
u32 ts;
spin_lock_irqsave(&lpbfifo.lock, flags);
- ts = get_tbl();
+ ts = mftb();
req = lpbfifo.req;
if (!req || (req->flags & MPC52XX_LPBFIFO_FLAG_NO_DMA)) {
@@ -361,7 +361,7 @@ static irqreturn_t mpc52xx_lpbfifo_bcom_irq(int irq, void *dev_id)
lpbfifo.req = NULL;
/* Release the lock before calling out to the callback. */
- req->irq_ticks += get_tbl() - ts;
+ req->irq_ticks += mftb() - ts;
spin_unlock_irqrestore(&lpbfifo.lock, flags);
if (req->callback)
--
2.25.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox