* Re: [PATCHv5 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Pingfan Liu @ 2020-08-27 6:36 UTC (permalink / raw)
To: linuxppc-dev
Cc: Nathan Lynch, Kexec Mailing List, Nathan Fontenot, Laurent Dufour,
Hari Bathini
In-Reply-To: <1597049570-19536-2-git-send-email-kernelfans@gmail.com>
Hello guys. Do you have further comments on this version?
Thanks,
Pingfan
On Mon, Aug 10, 2020 at 4:53 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> A bug is observed on pseries by taking the following steps on rhel:
> -1. drmgr -c mem -r -q 5
> -2. echo c > /proc/sysrq-trigger
>
> And then, the failure looks like:
> kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
> Checking for memory holes : [ 0.0 %] / Checking for memory holes : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Copying data : [ 0.3 %] - eta: 38s[ 44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [ 44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [ 44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [ 44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [ 44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
> [ 44.338548] Core dump to |/bin/false pipe failed
> /lib/kdump-lib-initramfs.sh: line 98: 469 Bus error $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> kdump: saving vmcore failed
>
> * Root cause *
> After analyzing, it turns out that in the current implementation,
> when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> the code __remove_memory() comes before drmem_update_dt().
> So in kdump kernel, when read_from_oldmem() resorts to
> pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> can be observed "Bus error"
>
> From a viewpoint of listener and publisher, the publisher notifies the
> listener before data is ready. This introduces a problem where udev
> launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> updating. And in capture kernel, makedumpfile will access the memory based
> on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
>
> * Fix *
> This bug is introduced by commit 063b8b1251fd
> ("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
> request"), which tried to combine all the dt updating into one.
>
> To fix this issue, meanwhile not to introduce a quadratic runtime
> complexity by the model:
> dlpar_memory_add_by_count
> for_each_drmem_lmb <--
> dlpar_add_lmb
> drmem_update_dt(_v1|_v2)
> for_each_drmem_lmb <--
> The dt should still be only updated once, and just before the last memory
> online/offline event is ejected to user space. Achieve this by tracing the
> num of lmb added or removed.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> To: linuxppc-dev@lists.ozlabs.org
> Cc: kexec@lists.infradead.org
> ---
> v4 -> v5: change dlpar_add_lmb()/dlpar_remove_lmb() prototype to report
> whether dt is updated successfully.
> Fix a condition boundary check bug
> v3 -> v4: resolve a quadratic runtime complexity issue.
> This series is applied on next-test branch
> arch/powerpc/platforms/pseries/hotplug-memory.c | 102 +++++++++++++++++++-----
> 1 file changed, 80 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 46cbcd1..1567d9f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -350,13 +350,22 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
> return true;
> }
>
> -static int dlpar_add_lmb(struct drmem_lmb *);
> +enum dt_update_status {
> + DT_NOUPDATE,
> + DT_TOUPDATE,
> + DT_UPDATED,
> +};
> +
> +/* "*dt_update" returns DT_UPDATED if updated */
> +static int dlpar_add_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update);
>
> -static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> +static int dlpar_remove_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update)
> {
> unsigned long block_sz;
> phys_addr_t base_addr;
> - int rc, nid;
> + int rc, ret, nid;
>
> if (!lmb_is_removable(lmb))
> return -EINVAL;
> @@ -372,6 +381,13 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> + if (*dt_update) {
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt, but continue\n", __func__);
> + else
> + *dt_update = DT_UPDATED;
> + }
>
> __remove_memory(nid, base_addr, block_sz);
>
> @@ -387,6 +403,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> int lmbs_removed = 0;
> int lmbs_available = 0;
> int rc;
> + enum dt_update_status dt_update = DT_NOUPDATE;
>
> pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
>
> @@ -409,7 +426,11 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> }
>
> for_each_drmem_lmb(lmb) {
> - rc = dlpar_remove_lmb(lmb);
> +
> + /* combine dt updating for all LMBs */
> + if (lmbs_to_remove - lmbs_removed <= 1)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (rc)
> continue;
>
> @@ -424,13 +445,17 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> }
>
> if (lmbs_removed != lmbs_to_remove) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> pr_err("Memory hot-remove failed, adding LMB's back\n");
>
> for_each_drmem_lmb(lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_add_lmb(lmb);
> + if (--lmbs_removed == 0 && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to add LMB back, drc index %x\n",
> lmb->drc_index);
> @@ -458,6 +483,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
>
> static int dlpar_memory_remove_by_index(u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_TOUPDATE;
> struct drmem_lmb *lmb;
> int lmb_found;
> int rc;
> @@ -468,7 +494,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
> for_each_drmem_lmb(lmb) {
> if (lmb->drc_index == drc_index) {
> lmb_found = 1;
> - rc = dlpar_remove_lmb(lmb);
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (!rc)
> dlpar_release_drc(lmb->drc_index);
>
> @@ -490,6 +516,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
>
> static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb, *start_lmb, *end_lmb;
> int lmbs_available = 0;
> int rc;
> @@ -519,7 +546,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (lmb == end_lmb)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (rc)
> break;
>
> @@ -527,14 +556,16 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> }
>
> if (rc) {
> - pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
>
> + pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
>
> for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
> -
> - rc = dlpar_add_lmb(lmb);
> + if (lmb == end_lmb && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to add LMB, drc index %x\n",
> lmb->drc_index);
> @@ -572,7 +603,7 @@ static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
> {
> return -EOPNOTSUPP;
> }
> -static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> +static int dlpar_remove_lmb(struct drmem_lmb *lmb, bool dt_update)
> {
> return -EOPNOTSUPP;
> }
> @@ -591,10 +622,11 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> -static int dlpar_add_lmb(struct drmem_lmb *lmb)
> +static int dlpar_add_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update)
> {
> unsigned long block_sz;
> - int rc;
> + int rc, ret;
>
> if (lmb->flags & DRCONF_MEM_ASSIGNED)
> return -EINVAL;
> @@ -607,6 +639,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> lmb_set_nid(lmb);
> lmb->flags |= DRCONF_MEM_ASSIGNED;
> + if (*dt_update) {
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt, but continue\n", __func__);
> + else
> + *dt_update = DT_UPDATED;
> + }
>
> block_sz = memory_block_size_bytes();
>
> @@ -616,6 +655,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> + if (*dt_update == DT_UPDATED)
> + drmem_update_dt();
> return rc;
> }
>
> @@ -627,7 +668,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> -
> + if (*dt_update == DT_UPDATED) {
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt during rollback, but continue\n", __func__);
> + }
> __remove_memory(nid, base_addr, block_sz);
> }
>
> @@ -636,6 +681,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb;
> int lmbs_available = 0;
> int lmbs_added = 0;
> @@ -666,7 +712,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> if (rc)
> continue;
>
> - rc = dlpar_add_lmb(lmb);
> + if (lmbs_to_add - lmbs_added <= 1)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc) {
> dlpar_release_drc(lmb->drc_index);
> continue;
> @@ -683,13 +731,18 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> }
>
> if (lmbs_added != lmbs_to_add) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> pr_err("Memory hot-add failed, removing any added LMBs\n");
>
> for_each_drmem_lmb(lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (--lmbs_added == 0 && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> +
> + rc = dlpar_remove_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to remove LMB, drc index %x\n",
> lmb->drc_index);
> @@ -716,6 +769,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>
> static int dlpar_memory_add_by_index(u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_TOUPDATE;
> struct drmem_lmb *lmb;
> int rc, lmb_found;
>
> @@ -727,7 +781,7 @@ static int dlpar_memory_add_by_index(u32 drc_index)
> lmb_found = 1;
> rc = dlpar_acquire_drc(lmb->drc_index);
> if (!rc) {
> - rc = dlpar_add_lmb(lmb);
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc)
> dlpar_release_drc(lmb->drc_index);
> }
> @@ -750,6 +804,7 @@ static int dlpar_memory_add_by_index(u32 drc_index)
>
> static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb, *start_lmb, *end_lmb;
> int lmbs_available = 0;
> int rc;
> @@ -783,7 +838,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> if (rc)
> break;
>
> - rc = dlpar_add_lmb(lmb);
> + if (lmb == end_lmb)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc) {
> dlpar_release_drc(lmb->drc_index);
> break;
> @@ -796,10 +853,14 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> pr_err("Memory indexed-count-add failed, removing any added LMBs\n");
>
> for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (lmb == end_lmb && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to remove LMB, drc index %x\n",
> lmb->drc_index);
> @@ -879,9 +940,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> break;
> }
>
> - if (!rc)
> - rc = drmem_update_dt();
> -
> unlock_device_hotplug();
> return rc;
> }
> --
> 2.7.5
>
^ permalink raw reply
* Re: [PATCH v1 4/9] powerpc/vdso: Remove unnecessary ifdefs in vdso_pagelist initialization
From: Christophe Leroy @ 2020-08-27 6:47 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87ft89h2st.fsf@mpe.ellerman.id.au>
On 08/26/2020 02:58 PM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
>> index daef14a284a3..bbb69832fd46 100644
>> --- a/arch/powerpc/kernel/vdso.c
>> +++ b/arch/powerpc/kernel/vdso.c
>> @@ -718,16 +710,14 @@ static int __init vdso_init(void)
> ...
>>
>> -
>> -#ifdef CONFIG_VDSO32
>> vdso32_kbase = &vdso32_start;
>>
>> /*
>> @@ -735,8 +725,6 @@ static int __init vdso_init(void)
>> */
>> vdso32_pages = (&vdso32_end - &vdso32_start) >> PAGE_SHIFT;
>> DBG("vdso32_kbase: %p, 0x%x pages\n", vdso32_kbase, vdso32_pages);
>> -#endif
>
> This didn't build for ppc64le:
>
> /opt/cross/gcc-8.20_binutils-2.32/powerpc64-unknown-linux-gnu/bin/powerpc64-unknown-linux-gnu-ld: arch/powerpc/kernel/vdso.o:(.toc+0x0): undefined reference to `vdso32_end'
> /opt/cross/gcc-8.20_binutils-2.32/powerpc64-unknown-linux-gnu/bin/powerpc64-unknown-linux-gnu-ld: arch/powerpc/kernel/vdso.o:(.toc+0x8): undefined reference to `vdso32_start'
> make[1]: *** [/scratch/michael/build/maint/Makefile:1166: vmlinux] Error 1
> make: *** [Makefile:185: __sub-make] Error 2
>
> So I just put that ifdef back.
>
The problem is because is_32bit() can still return true even when
CONFIG_VDSO32 is not set.
The change below fixes the problem:
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index bbb69832fd46..38abff60cbe2 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -132,11 +132,7 @@ int arch_setup_additional_pages(struct linux_binprm
*bprm, int uses_interp)
if (!vdso_ready)
return 0;
- if (is_32bit_task()) {
- vdso_pagelist = vdso32_pagelist;
- vdso_pages = vdso32_pages;
- vdso_base = VDSO32_MBASE;
- } else {
+ if (!is_32bit_task()) {
vdso_pagelist = vdso64_pagelist;
vdso_pages = vdso64_pages;
/*
@@ -145,6 +141,12 @@ int arch_setup_additional_pages(struct linux_binprm
*bprm, int uses_interp)
* and most likely share a SLB entry.
*/
vdso_base = 0;
+ } else if (IS_ENABLED(CONFIG_VDSO32)) {
+ vdso_pagelist = vdso32_pagelist;
+ vdso_pages = vdso32_pages;
+ vdso_base = VDSO32_MBASE;
+ } else {
+ vdso_pages = 0;
}
current->mm->context.vdso_base = 0;
With this change all vdso32 static objects (functions and vars) go away
as expected.
We get a simple conflict with the following patch.
Do you prefer an updated series or a follow up patch, or you take the
above change yourself ?
Christophe
^ permalink raw reply related
* [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
From: Christopher M. Riedl @ 2020-08-27 5:26 UTC (permalink / raw)
To: linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200827052659.24922-1-cmr@codefail.de>
When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must temporarily remap the page(s) containing the patch site with +W
permissions. While this temporary mapping is in use another CPU could
write to the same mapping and maliciously alter kernel text. Implement a
LKDTM test to attempt to exploit such an opening from another (ie. not
the patching) CPU. The test is implemented on x86_64 and powerpc only.
The LKDTM "hijack" test works as follows:
1. A CPU executes an infinite loop to patch an instruction.
This is the "patching" CPU.
2. Another CPU attempts to write to the address of the temporary
mapping used by the "patching" CPU. This other CPU is the
"hijacker" CPU. The hijack either fails with a segfault or
succeeds, in which case some kernel text is now overwritten.
How to run the test:
mount -t debugfs none /sys/kernel/debug
(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
drivers/misc/lkdtm/core.c | 1 +
drivers/misc/lkdtm/lkdtm.h | 1 +
drivers/misc/lkdtm/perms.c | 146 +++++++++++++++++++++++++++++++++++++
3 files changed, 148 insertions(+)
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+ CRASHTYPE(HIJACK_PATCH),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 8878538b2c13..8bd98e8f0443 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -60,6 +60,7 @@ void lkdtm_EXEC_USERSPACE(void);
void lkdtm_EXEC_NULL(void);
void lkdtm_ACCESS_USERSPACE(void);
void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
/* lkdtm_refcount.c */
void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f..0ed32aba5216 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
#include <linux/vmalloc.h>
#include <linux/mman.h>
#include <linux/uaccess.h>
+#include <linux/kthread.h>
#include <asm/cacheflush.h>
/* Whether or not to fill the target memory area with do_nothing(). */
@@ -222,6 +223,151 @@ void lkdtm_ACCESS_NULL(void)
pr_err("FAIL: survived bad write\n");
}
+#if defined(CONFIG_PPC) || defined(CONFIG_X86_64)
+#if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_SMP)
+/*
+ * This is just a dummy location to patch-over.
+ */
+static void patching_target(void)
+{
+ return;
+}
+
+#ifdef CONFIG_PPC
+#include <asm/code-patching.h>
+struct ppc_inst * const patch_site = (struct ppc_inst *)&patching_target;
+#endif
+
+#ifdef CONFIG_X86_64
+#include <asm/text-patching.h>
+int * const patch_site = (int *)&patching_target;
+#endif
+
+static inline int lkdtm_do_patch(int data)
+{
+#ifdef CONFIG_PPC
+ return patch_instruction(patch_site, ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+ text_poke(patch_site, &data, sizeof(int));
+ return 0;
+#endif
+}
+
+static inline bool lkdtm_verify_patch(int data)
+{
+#ifdef CONFIG_PPC
+ return ppc_inst_equal(ppc_inst_read(READ_ONCE(patch_site)),
+ ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+ return READ_ONCE(*patch_site) == data;
+#endif
+}
+
+static int lkdtm_patching_cpu(void *data)
+{
+ int err = 0;
+ int val = 0xdeadbeef;
+
+ pr_info("starting patching_cpu=%d\n", smp_processor_id());
+ do {
+ err = lkdtm_do_patch(val);
+ } while (lkdtm_verify_patch(val) && !err && !kthread_should_stop());
+
+ if (err)
+ pr_warn("patch_instruction returned error: %d\n", err);
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ while (!kthread_should_stop()) {
+ schedule();
+ set_current_state(TASK_INTERRUPTIBLE);
+ }
+
+ return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+#ifdef CONFIG_PPC
+ struct ppc_inst original_insn = ppc_inst_read(READ_ONCE(patch_site));
+#endif
+#ifdef CONFIG_X86_64
+ int original_insn = READ_ONCE(*patch_site);
+#endif
+ struct task_struct *patching_kthrd;
+ int patching_cpu, hijacker_cpu, attempts;
+ unsigned long addr;
+ bool hijacked;
+ const int bad_data = 0xbad00bad;
+
+ if (num_online_cpus() < 2) {
+ pr_warn("need at least two cpus\n");
+ return;
+ }
+
+ hijacker_cpu = smp_processor_id();
+ patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+ patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
+ cpu_to_node(patching_cpu),
+ "lkdtm_patching_cpu");
+ kthread_bind(patching_kthrd, patching_cpu);
+ wake_up_process(patching_kthrd);
+
+ addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
+
+ pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+ for (attempts = 0; attempts < 100000; ++attempts) {
+ /* Use __put_user to catch faults without an Oops */
+ hijacked = !__put_user(bad_data, (int *)addr);
+
+ if (hijacked) {
+ if (kthread_stop(patching_kthrd))
+ pr_err("error trying to stop patching thread\n");
+ break;
+ }
+ }
+ pr_info("hijack attempts: %d\n", attempts);
+
+ if (hijacked) {
+ if (lkdtm_verify_patch(bad_data))
+ pr_err("overwrote kernel text\n");
+ /*
+ * There are window conditions where the hijacker cpu manages to
+ * write to the patch site but the site gets overwritten again by
+ * the patching cpu. We still consider that a "successful" hijack
+ * since the hijacker cpu did not fault on the write.
+ */
+ pr_err("FAIL: wrote to another cpu's patching area\n");
+ } else {
+ kthread_stop(patching_kthrd);
+ }
+
+ /* Restore the original insn for any future lkdtm tests */
+#ifdef CONFIG_PPC
+ patch_instruction(patch_site, original_insn);
+#endif
+#ifdef CONFIG_X86_64
+ lkdtm_do_patch(original_insn);
+#endif
+}
+
+#else
+
+void lkdtm_HIJACK_PATCH(void)
+{
+ if (!IS_ENABLED(CONFIG_PPC) && !IS_ENABLED(CONFIG_X86_64))
+ pr_err("XFAIL: this test only runs on x86_64 or powerpc\n");
+ if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+ pr_err("XFAIL: this test requires CONFIG_STRICT_KERNEL_RWX\n");
+ if (!IS_ENABLED(CONFIG_SMP))
+ pr_err("XFAIL: this test requires CONFIG_SMP\n");
+}
+
+#endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_SMP */
+#endif /* CONFIG_PPC || CONFIG_X86_64 */
+
void __init lkdtm_perms_init(void)
{
/* Make sure we can write to __ro_after_init values during __init */
--
2.28.0
^ permalink raw reply related
* [PATCH v3 4/6] powerpc: Introduce temporary mm
From: Christopher M. Riedl @ 2020-08-27 5:26 UTC (permalink / raw)
To: linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200827052659.24922-1-cmr@codefail.de>
x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.
Mappings in the temporary mm can be set in the userspace portion of the
address-space.
Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.
Based on x86 implementation:
commit cefa929c034e
("x86/mm: Introduce temporary mm structs")
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/include/asm/debug.h | 1 +
arch/powerpc/kernel/process.c | 5 +++
arch/powerpc/lib/code-patching.c | 65 ++++++++++++++++++++++++++++++++
3 files changed, 71 insertions(+)
diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index ec57daf87f40..827350c9bcf3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
#endif
void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
bool ppc_breakpoint_available(void);
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..0758a8db6342 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -843,6 +843,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
return 0;
}
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
+{
+ memcpy(brk, this_cpu_ptr(¤t_brk[nr]), sizeof(*brk));
+}
+
void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
{
memcpy(this_cpu_ptr(¤t_brk[nr]), brk, sizeof(*brk));
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 85d3fdca9452..89b37ece6d2f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,6 +17,7 @@
#include <asm/code-patching.h>
#include <asm/setup.h>
#include <asm/inst.h>
+#include <asm/mmu_context.h>
static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
struct ppc_inst *patch_addr)
@@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
}
#ifdef CONFIG_STRICT_KERNEL_RWX
+
+struct temp_mm {
+ struct mm_struct *temp;
+ struct mm_struct *prev;
+ bool is_kernel_thread;
+ struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+ temp_mm->temp = mm;
+ temp_mm->prev = NULL;
+ temp_mm->is_kernel_thread = false;
+ memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+ lockdep_assert_irqs_disabled();
+
+ temp_mm->is_kernel_thread = current->mm == NULL;
+ if (temp_mm->is_kernel_thread)
+ temp_mm->prev = current->active_mm;
+ else
+ temp_mm->prev = current->mm;
+
+ /*
+ * Hash requires a non-NULL current->mm to allocate a userspace address
+ * when handling a page fault. Does not appear to hurt in Radix either.
+ */
+ current->mm = temp_mm->temp;
+ switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+ if (ppc_breakpoint_available()) {
+ struct arch_hw_breakpoint null_brk = {0};
+ int i = 0;
+
+ for (; i < nr_wp_slots(); ++i) {
+ __get_breakpoint(i, &temp_mm->brk[i]);
+ if (temp_mm->brk[i].type != 0)
+ __set_breakpoint(i, &null_brk);
+ }
+ }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+ lockdep_assert_irqs_disabled();
+
+ if (temp_mm->is_kernel_thread)
+ current->mm = NULL;
+ else
+ current->mm = temp_mm->prev;
+ switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+ if (ppc_breakpoint_available()) {
+ int i = 0;
+
+ for (; i < nr_wp_slots(); ++i)
+ if (temp_mm->brk[i].type != 0)
+ __set_breakpoint(i, &temp_mm->brk[i]);
+ }
+}
+
static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
#ifdef CONFIG_LKDTM
--
2.28.0
^ permalink raw reply related
* Re: fsl_espi errors on v5.7.15
From: Nicholas Piggin @ 2020-08-27 7:12 UTC (permalink / raw)
To: benh@kernel.crashing.org, broonie@kernel.org, Chris Packham,
Heiner Kallweit, mpe@ellerman.id.au, paulus@samba.org
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
linux-spi@vger.kernel.org
In-Reply-To: <1020029e-4cb9-62ba-c6d6-e6b9bdf93aac@gmail.com>
Excerpts from Heiner Kallweit's message of August 26, 2020 4:38 pm:
> On 26.08.2020 08:07, Chris Packham wrote:
>>
>> On 26/08/20 1:48 pm, Chris Packham wrote:
>>>
>>> On 26/08/20 10:22 am, Chris Packham wrote:
>>>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>>>
>>>> <snip>
>>>>> I've been staring at spi-fsl-espi.c for while now and I think I've
>>>>>> identified a couple of deficiencies that may or may not be related
>>>>>> to my
>>>>>> issue.
>>>>>>
>>>>>> First I think the 'Transfer done but SPIE_DON isn't set' message
>>>>>> can be
>>>>>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE
>>>>>> register.
>>>>>> We also write back to it to clear the current events. We re-read it in
>>>>>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>>>>>> naturally end up in that situation if we're doing a large read.
>>>>>> Consider
>>>>>> the messages for reading a block of data from a spi-nor chip
>>>>>>
>>>>>> tx = READ_OP + ADDR
>>>>>> rx = data
>>>>>>
>>>>>> We setup the transfer and pump out the tx_buf. The first interrupt
>>>>>> goes
>>>>>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>>>>>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>>>>>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>>>>>> continues until we've received all the data and we finish with
>>>>>> ESPI_SPIE
>>>>>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>>>>>> isn't set.
>>>>>>
>>>>>> The other deficiency is that we only get an interrupt when the
>>>>>> amount of
>>>>>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>>>>>> FSL_ESPI_RXTHR left to be received we will never pull them out of
>>>>>> the fifo.
>>>>>>
>>>>> SPIM_DON will trigger an interrupt once the last characters have been
>>>>> transferred, and read the remaining characters from the FIFO.
>>>>
>>>> The T2080RM that I have says the following about the DON bit
>>>>
>>>> "Last character was transmitted. The last character was transmitted
>>>> and a new command can be written for the next frame."
>>>>
>>>> That does at least seem to fit with my assertion that it's all about
>>>> the TX direction. But the fact that it doesn't happen all the time
>>>> throws some doubt on it.
>>>>
>>>>> I think the reason I'm seeing some variability is because of how fast
>>>>>> (or slow) the interrupts get processed and how fast the spi-nor
>>>>>> chip can
>>>>>> fill the CPUs rx fifo.
>>>>>>
>>>>> To rule out timing issues at high bus frequencies I initially asked
>>>>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>>>>> or even less, then timing shouldn't be an issue.
>>>> Yes I've currently got spi-max-frequency = <1000000>; in my dts. I
>>>> would also expect a slower frequency would fit my "DON is for TX"
>>>> narrative.
>>>>> Last relevant functional changes have been done almost 4 years ago.
>>>>> And yours is the first such report I see. So question is what could
>>>>> be so
>>>>> special with your setup that it seems you're the only one being
>>>>> affected.
>>>>> The scenarios you describe are standard, therefore much more people
>>>>> should be affected in case of a driver bug.
>>>> Agreed. But even on my hardware (which may have a latent issue
>>>> despite being in the field for going on 5 years) the issue only
>>>> triggers under some fairly specific circumstances.
>>>>> You said that kernel config impacts how frequently the issue happens.
>>>>> Therefore question is what's the diff in kernel config, and how could
>>>>> the differences be related to SPI.
>>>>
>>>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an
>>>> impact but every time I found something that seemed to be having an
>>>> impact I've been able to disprove it. I actually think its about how
>>>> busy the system is which may or may not affect when we get round to
>>>> processing the interrupts.
>>>>
>>>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to
>>>> occur on the T2080RDB.
>>>>
>>>> I've had to add the following to expose the environment as a mtd
>>>> partition
>>>>
>>>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>>> index ff87e67c70da..fbf95fc1fd68 100644
>>>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>>> @@ -116,6 +116,15 @@ flash@0 {
>>>> compatible = "micron,n25q512ax3",
>>>> "jedec,spi-nor";
>>>> reg = <0>;
>>>> spi-max-frequency = <10000000>; /*
>>>> input clock */
>>>> +
>>>> + partition@u-boot {
>>>> + reg = <0x00000000 0x00100000>;
>>>> + label = "u-boot";
>>>> + };
>>>> + partition@u-boot-env {
>>>> + reg = <0x00100000 0x00010000>;
>>>> + label = "u-boot-env";
>>>> + };
>>>> };
>>>> };
>>>>
>>>> And I'm using the following script to poke at the environment
>>>> (warning if anyone does try this and the bug hits it can render your
>>>> u-boot environment invalid).
>>>>
>>>> cat flash/fw_env_test.sh
>>>> #!/bin/sh
>>>>
>>>> generate_fw_env_config()
>>>> {
>>>> cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize
>>>> name ; do
>>>> echo "$dev $size $erasesize $name"
>>>> [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x0000 0x2000
>>>> $erasesize" >/flash/fw_env.config
>>>> done
>>>> }
>>>>
>>>> cycles=10
>>>> [ $# -ge 1 ] && cycles=$1
>>>>
>>>> generate_fw_env_config
>>>>
>>>> fw_printenv -c /flash/fw_env.config
>>>>
>>>> dmesg -c >/dev/null
>>>> x=0
>>>> while [ $x -lt $cycles ]; do
>>>> fw_printenv -c /flash/fw_env.config >/dev/null || break
>>>> fw_setenv -c /flash/fw_env.config foo $RANDOM || break;
>>>> dmesg -c | grep -q fsl_espi && break;
>>>> let x=x+1
>>>> done
>>>>
>>>> echo "Ran $x cycles"
>>>
>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>
>>> With my current workaround of emptying the RX FIFO. It seems
>>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>
>>> fsl_espi ffe110000.spi: tx 70
>>> fsl_espi ffe110000.spi: rx 03
>>> fsl_espi ffe110000.spi: Extra RX 00
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe110000.spi: tx 05
>>> fsl_espi ffe110000.spi: rx 00
>>> fsl_espi ffe110000.spi: Extra RX 03
>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe110000.spi: tx 05
>>> fsl_espi ffe110000.spi: rx 00
>>> fsl_espi ffe110000.spi: Extra RX 03
>>>
>>> From all the Micron SPI-NOR datasheets I've got access to it is
>>> possible to continually read the SR/FSR. But I've no idea why it
>>> happens some times and not others.
>>
>> So I think I've got a reproduction and I think I've bisected the problem
>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>> C"). My day is just finishing now so I haven't applied too much scrutiny
>> to this result. Given the various rabbit holes I've been down on this
>> issue already I'd take this information with a good degree of skepticism.
>>
> OK, so an easy test should be to re-test with a 5.4 kernel.
> It doesn't have yet the change you're referring to, and the fsl-espi driver
> is basically the same as in 5.7 (just two small changes in 5.7).
There's 6cc0c16d82f88 and maybe also other interrupt related patches
around this time that could affect book E, so it's good if that exact
patch is confirmed.
I've been staring at 3282a3da25bd for a while and nothing immediately
stands out. It doesn't look like the low level handlers do anything
special (well 0x900 does ack the decrementer, but so does the masked
handler).
Can you try this patch and also enable CONFIG_PPC_IRQ_SOFT_MASK_DEBUG?
Thanks,
Nick
---
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bf21ebd36190..10d339042330 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -214,7 +214,7 @@ void replay_soft_interrupts(void)
struct pt_regs regs;
ppc_save_regs(®s);
- regs.softe = IRQS_ALL_DISABLED;
+ regs.softe = IRQS_ENABLED;
again:
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -349,6 +349,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
WARN_ON_ONCE(!(mfmsr() & MSR_EE));
__hard_irq_disable();
+ local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
} else {
/*
* We should already be hard disabled here. We had bugs
@@ -368,6 +369,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
}
}
+ preempt_disable();
irq_soft_mask_set(IRQS_ALL_DISABLED);
trace_hardirqs_off();
@@ -377,6 +379,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
trace_hardirqs_on();
irq_soft_mask_set(IRQS_ENABLED);
__hard_irq_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(arch_local_irq_restore);
^ permalink raw reply related
* Re: [PATCH] powerpc/perf: Fix reading of MSR[HV PR] bits in trace-imc
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: mpe, Athira Rajeev; +Cc: maddy, linuxppc-dev
In-Reply-To: <1598424029-1662-1-git-send-email-atrajeev@linux.vnet.ibm.com>
On Wed, 26 Aug 2020 02:40:29 -0400, Athira Rajeev wrote:
> IMC trace-mode uses MSR[HV PR] bits to set the cpumode
> for the instruction pointer captured in each sample.
> The bits are fetched from third DW of the trace record.
> Reading third DW from IMC trace record should use be64_to_cpu
> along with READ_ONCE inorder to fetch correct MSR[HV PR] bits.
> Patch addresses this change.
>
> [...]
Applied to powerpc/fixes.
[1/1] powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
https://git.kernel.org/powerpc/c/82715a0f332843d3a1830d7ebc9ac7c99a00c880
cheers
^ permalink raw reply
* Re: [PATCH kernel] powerpc/perf: Stop crashing with generic_compat_pmu
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: linuxppc-dev, Alexey Kardashevskiy; +Cc: Madhavan Srinivasan
In-Reply-To: <20200602025612.62707-1-aik@ozlabs.ru>
On Tue, 2 Jun 2020 12:56:12 +1000, Alexey Kardashevskiy wrote:
> The bhrb_filter_map ("The Branch History Rolling Buffer") callback is
> only defined in raw CPUs' power_pmu structs. The "architected" CPUs use
> generic_compat_pmu which does not have this callback and crashed occur.
>
> This add a NULL pointer check for bhrb_filter_map() which behaves as if
> the callback returned an error.
>
> [...]
Applied to powerpc/fixes.
[1/1] powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
https://git.kernel.org/powerpc/c/b460b512417ae9c8b51a3bdcc09020cd6c60ff69
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: linuxppc-dev, mpe, Kajol Jain; +Cc: suka, maddy
In-Reply-To: <20200821080610.123997-1-kjain@linux.ibm.com>
On Fri, 21 Aug 2020 13:36:10 +0530, Kajol Jain wrote:
> Commit 792f73f747b8 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7
> device to show cpumask") added cpumask file as part of hv-24x7 driver
> inside the interface folder. Cpumask file suppose to be in the top
> folder of the pmu driver inorder to make hotplug works.
>
> This patch fix that issue and create new group 'cpumask_attr_group'
> to add cpumask file and make sure it added on top folder.
>
> [...]
Applied to powerpc/fixes.
[1/1] powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
https://git.kernel.org/powerpc/c/64ef8f2c4791940d7f3945507b6a45c20d959260
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: Paul Mackerras, Michael Ellerman, schwab, Christophe Leroy,
Benjamin Herrenschmidt
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <09fc73fe9c7423c6b4cf93f93df9bb0ed8eefab5.1597994047.git.christophe.leroy@csgroup.eu>
On Fri, 21 Aug 2020 07:15:25 +0000 (UTC), Christophe Leroy wrote:
> In is_module_segment(), when VMALLOC_END is over 0xf0000000,
> ALIGN(VMALLOC_END, SZ_256M) has value 0.
>
> In that case, addr >= ALIGN(VMALLOC_END, SZ_256M) is always
> true then is_module_segment() always returns false.
>
> Use (ALIGN(VMALLOC_END, SZ_256M) - 1) which will have
> value 0xffffffff and will be suitable for the comparison.
Applied to powerpc/fixes.
[1/1] powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
https://git.kernel.org/powerpc/c/541cebb51f3422d4f2c6cb95c1e5cc3dcc9e5021
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: miltonm, npiggin
In-Reply-To: <20200825093424.3967813-1-mpe@ellerman.id.au>
On Tue, 25 Aug 2020 19:34:24 +1000, Michael Ellerman wrote:
> The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
> unnecessarily changing MSR") changed some of the handling of floating
> point/vector restore.
>
> In particular it caused current->thread.fpexc_mode to be copied into
> the current MSR (via msr_check_and_set()), rather than just into
> regs->msr (which is moved into MSR on return to userspace).
>
> [...]
Applied to powerpc/fixes.
[1/1] powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
https://git.kernel.org/powerpc/c/b91eb5182405b01a8aeb42e9b5207831767e97ee
cheers
^ permalink raw reply
* Re: [PATCH] video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
Cc: linux-fbdev, b.zolnierkie, daniel.vetter, linux-kernel, dri-devel,
sam
In-Reply-To: <20200821104910.3363818-1-mpe@ellerman.id.au>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 633 bytes --]
On Fri, 21 Aug 2020 20:49:10 +1000, Michael Ellerman wrote:
> The build is currently broken, if COMPILE_TEST=y and PPC_PMAC=n:
>
> linux/drivers/video/fbdev/controlfb.c: In function ‘control_set_hardware’:
> linux/drivers/video/fbdev/controlfb.c:276:2: error: implicit declaration of function ‘btext_update_display’
> 276 | btext_update_display(p->frame_buffer_phys + CTRLFB_OFF,
> | ^~~~~~~~~~~~~~~~~~~~
>
> [...]
Applied to powerpc/fixes.
[1/1] video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
https://git.kernel.org/powerpc/c/4d618b9f3fcab84e9ec28c180de46fb2c929d096
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/64s: scv entry should set PPR
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20200825075309.224184-1-npiggin@gmail.com>
On Tue, 25 Aug 2020 17:53:09 +1000, Nicholas Piggin wrote:
> Kernel entry sets PPR to HMT_MEDIUM by convention. The scv entry
> path missed this.
Applied to powerpc/fixes.
[1/1] powerpc/64s: scv entry should set PPR
https://git.kernel.org/powerpc/c/e5fe56092e753c50093c60e757561984abff335e
cheers
^ permalink raw reply
* Re: [PATCH] Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: pratik.r.sampat, linux-kernel, mpe, mikey, svaidy, ego, npiggin,
Pratik Rajesh Sampat, linuxppc-dev
In-Reply-To: <20200826082918.89306-1-psampat@linux.ibm.com>
On Wed, 26 Aug 2020 13:59:18 +0530, Pratik Rajesh Sampat wrote:
> Cpuidle stop state implementation has minor optimizations for P10
> where hardware preserves more SPR registers compared to P9.
> The current P9 driver works for P10, although does few extra
> save-restores. P9 driver can provide the required power management
> features like SMT thread folding and core level power savings
> on a P10 platform.
>
> [...]
Applied to powerpc/fixes.
[1/1] Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
https://git.kernel.org/powerpc/c/16d83a540ca4e7f1ebb2b3756869b77451d31414
cheers
^ permalink raw reply
* Re: [PATCH] Documentation/powerpc: fix malformed table in syscall64-abi
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: Michael Ellerman, LKML, linux-doc@vger.kernel.org,
Nicholas Piggin, Randy Dunlap, linuxppc-dev
In-Reply-To: <e06de4d3-a36f-2745-9775-467e125436cc@infradead.org>
On Sun, 23 Aug 2020 17:31:16 -0700, Randy Dunlap wrote:
> Fix malformed table warning in powerpc/syscall64-abi.rst by making
> two tables and moving the headings.
>
> Documentation/powerpc/syscall64-abi.rst:53: WARNING: Malformed table.
> Text in column margin in table line 2.
>
> =========== ============= ========================================
> --- For the sc instruction, differences with the ELF ABI ---
> r0 Volatile (System call number.)
> r3 Volatile (Parameter 1, and return value.)
> r4-r8 Volatile (Parameters 2-6.)
> cr0 Volatile (cr0.SO is the return error condition.)
> cr1, cr5-7 Nonvolatile
> lr Nonvolatile
>
> [...]
Applied to powerpc/fixes.
[1/1] Documentation/powerpc: fix malformed table in syscall64-abi
https://git.kernel.org/powerpc/c/aa661d7fab436d8a782618b3138da1a84ca28a31
cheers
^ permalink raw reply
* Re: [PATCH v2 0/3] Reintroduce PROT_SAO
From: Michael Ellerman @ 2020-08-27 7:46 UTC (permalink / raw)
To: linuxppc-dev, Shawn Anastasio; +Cc: npiggin
In-Reply-To: <20200821185558.35561-1-shawn@anastas.io>
On Fri, 21 Aug 2020 13:55:55 -0500, Shawn Anastasio wrote:
> Changes in v2:
> - Update prot_sao selftest to skip ISA 3.1
>
> This set re-introduces the PROT_SAO prot flag removed in
> Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support").
>
> To address concerns regarding live migration of guests using SAO
> to P10 hosts without SAO support, the flag is disabled by default
> in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to
> allow users to explicitly enable it if they will not be running
> in an environment where this is a conern.
>
> [...]
Applied to powerpc/fixes.
[1/3] Revert "powerpc/64s: Remove PROT_SAO support"
https://git.kernel.org/powerpc/c/12564485ed8caac3c18572793ec01330792c7191
[2/3] powerpc/64s: Disallow PROT_SAO in LPARs by default
https://git.kernel.org/powerpc/c/9b725a90a8f127802e19466d4e336e701bcea0d2
[3/3] selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
https://git.kernel.org/powerpc/c/24ded46f53f954b9cf246c5d4e3770c7a8aa84ce
cheers
^ permalink raw reply
* Re: [PATCHv5 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Laurent Dufour @ 2020-08-27 7:53 UTC (permalink / raw)
To: Pingfan Liu, linuxppc-dev
Cc: Nathan Lynch, kexec, Hari Bathini, Nathan Fontenot
In-Reply-To: <1597049570-19536-2-git-send-email-kernelfans@gmail.com>
Le 10/08/2020 à 10:52, Pingfan Liu a écrit :
> A bug is observed on pseries by taking the following steps on rhel:
> -1. drmgr -c mem -r -q 5
> -2. echo c > /proc/sysrq-trigger
>
> And then, the failure looks like:
> kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
> Checking for memory holes : [ 0.0 %] / Checking for memory holes : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Copying data : [ 0.3 %] - eta: 38s[ 44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [ 44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [ 44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [ 44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [ 44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
> [ 44.338548] Core dump to |/bin/false pipe failed
> /lib/kdump-lib-initramfs.sh: line 98: 469 Bus error $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> kdump: saving vmcore failed
>
> * Root cause *
> After analyzing, it turns out that in the current implementation,
> when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> the code __remove_memory() comes before drmem_update_dt().
> So in kdump kernel, when read_from_oldmem() resorts to
> pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> can be observed "Bus error"
>
> From a viewpoint of listener and publisher, the publisher notifies the
> listener before data is ready. This introduces a problem where udev
> launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> updating. And in capture kernel, makedumpfile will access the memory based
> on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
>
> * Fix *
> This bug is introduced by commit 063b8b1251fd
> ("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
> request"), which tried to combine all the dt updating into one.
>
> To fix this issue, meanwhile not to introduce a quadratic runtime
> complexity by the model:
> dlpar_memory_add_by_count
> for_each_drmem_lmb <--
> dlpar_add_lmb
> drmem_update_dt(_v1|_v2)
> for_each_drmem_lmb <--
> The dt should still be only updated once, and just before the last memory
> online/offline event is ejected to user space. Achieve this by tracing the
> num of lmb added or removed.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> To: linuxppc-dev@lists.ozlabs.org
> Cc: kexec@lists.infradead.org
> ---
> v4 -> v5: change dlpar_add_lmb()/dlpar_remove_lmb() prototype to report
> whether dt is updated successfully.
> Fix a condition boundary check bug
> v3 -> v4: resolve a quadratic runtime complexity issue.
> This series is applied on next-test branch
> arch/powerpc/platforms/pseries/hotplug-memory.c | 102 +++++++++++++++++++-----
> 1 file changed, 80 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 46cbcd1..1567d9f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -350,13 +350,22 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
> return true;
> }
>
> -static int dlpar_add_lmb(struct drmem_lmb *);
> +enum dt_update_status {
> + DT_NOUPDATE,
> + DT_TOUPDATE,
> + DT_UPDATED,
> +};
> +
> +/* "*dt_update" returns DT_UPDATED if updated */
> +static int dlpar_add_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update);
>
> -static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> +static int dlpar_remove_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update)
> {
> unsigned long block_sz;
> phys_addr_t base_addr;
> - int rc, nid;
> + int rc, ret, nid;
>
> if (!lmb_is_removable(lmb))
> return -EINVAL;
> @@ -372,6 +381,13 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> + if (*dt_update) {
That test is wrong, you should do:
if (*dt_update && *dt_update == DT_TOUPDATE) {
With the current code, the device tree is updated all the time.
Another option would be to pass a valid pointer (!= NULL) only when DT update is
required, this way you don't need the DT_TOUPDATE value. The caller would have
to set the pointer accordingly. The advantage with this option is the caller is
guaranteed that its variable is not touched by the callee when no device tree is
requested. A simple boolean pointer would be enough without the need to this enum.
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt, but continue\n", __func__);
> + else
> + *dt_update = DT_UPDATED;
> + }
>
> __remove_memory(nid, base_addr, block_sz);
>
> @@ -387,6 +403,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> int lmbs_removed = 0;
> int lmbs_available = 0;
> int rc;
> + enum dt_update_status dt_update = DT_NOUPDATE;
>
> pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
>
> @@ -409,7 +426,11 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> }
>
> for_each_drmem_lmb(lmb) {
> - rc = dlpar_remove_lmb(lmb);
> +
> + /* combine dt updating for all LMBs */
> + if (lmbs_to_remove - lmbs_removed <= 1)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (rc)
> continue;
>
> @@ -424,13 +445,17 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
> }
>
> if (lmbs_removed != lmbs_to_remove) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> pr_err("Memory hot-remove failed, adding LMB's back\n");
>
> for_each_drmem_lmb(lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_add_lmb(lmb);
> + if (--lmbs_removed == 0 && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to add LMB back, drc index %x\n",
> lmb->drc_index);
> @@ -458,6 +483,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
>
> static int dlpar_memory_remove_by_index(u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_TOUPDATE;
> struct drmem_lmb *lmb;
> int lmb_found;
> int rc;
> @@ -468,7 +494,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
> for_each_drmem_lmb(lmb) {
> if (lmb->drc_index == drc_index) {
> lmb_found = 1;
> - rc = dlpar_remove_lmb(lmb);
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (!rc)
> dlpar_release_drc(lmb->drc_index);
>
> @@ -490,6 +516,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
>
> static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb, *start_lmb, *end_lmb;
> int lmbs_available = 0;
> int rc;
> @@ -519,7 +546,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (lmb == end_lmb)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &dt_update);
> if (rc)
> break;
>
> @@ -527,14 +556,16 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> }
>
> if (rc) {
> - pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
>
> + pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
>
> for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
> -
> - rc = dlpar_add_lmb(lmb);
> + if (lmb == end_lmb && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to add LMB, drc index %x\n",
> lmb->drc_index);
> @@ -572,7 +603,7 @@ static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
> {
> return -EOPNOTSUPP;
> }
> -static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> +static int dlpar_remove_lmb(struct drmem_lmb *lmb, bool dt_update)
> {
> return -EOPNOTSUPP;
> }
> @@ -591,10 +622,11 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> -static int dlpar_add_lmb(struct drmem_lmb *lmb)
> +static int dlpar_add_lmb(struct drmem_lmb *lmb,
> + enum dt_update_status *dt_update)
> {
> unsigned long block_sz;
> - int rc;
> + int rc, ret;
>
> if (lmb->flags & DRCONF_MEM_ASSIGNED)
> return -EINVAL;
> @@ -607,6 +639,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> lmb_set_nid(lmb);
> lmb->flags |= DRCONF_MEM_ASSIGNED;
> + if (*dt_update) {
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt, but continue\n", __func__);
> + else
> + *dt_update = DT_UPDATED;
> + }
>
> block_sz = memory_block_size_bytes();
>
> @@ -616,6 +655,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> + if (*dt_update == DT_UPDATED)
> + drmem_update_dt();
> return rc;
> }
>
> @@ -627,7 +668,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> -
> + if (*dt_update == DT_UPDATED) {
> + ret = drmem_update_dt();
> + if (ret)
> + pr_warn("%s fail to update dt during rollback, but continue\n", __func__);
> + }
> __remove_memory(nid, base_addr, block_sz);
> }
>
> @@ -636,6 +681,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb;
> int lmbs_available = 0;
> int lmbs_added = 0;
> @@ -666,7 +712,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> if (rc)
> continue;
>
> - rc = dlpar_add_lmb(lmb);
> + if (lmbs_to_add - lmbs_added <= 1)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc) {
> dlpar_release_drc(lmb->drc_index);
> continue;
> @@ -683,13 +731,18 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> }
>
> if (lmbs_added != lmbs_to_add) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> pr_err("Memory hot-add failed, removing any added LMBs\n");
>
> for_each_drmem_lmb(lmb) {
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (--lmbs_added == 0 && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> +
> + rc = dlpar_remove_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to remove LMB, drc index %x\n",
> lmb->drc_index);
> @@ -716,6 +769,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>
> static int dlpar_memory_add_by_index(u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_TOUPDATE;
> struct drmem_lmb *lmb;
> int rc, lmb_found;
>
> @@ -727,7 +781,7 @@ static int dlpar_memory_add_by_index(u32 drc_index)
> lmb_found = 1;
> rc = dlpar_acquire_drc(lmb->drc_index);
> if (!rc) {
> - rc = dlpar_add_lmb(lmb);
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc)
> dlpar_release_drc(lmb->drc_index);
> }
> @@ -750,6 +804,7 @@ static int dlpar_memory_add_by_index(u32 drc_index)
>
> static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> {
> + enum dt_update_status dt_update = DT_NOUPDATE;
> struct drmem_lmb *lmb, *start_lmb, *end_lmb;
> int lmbs_available = 0;
> int rc;
> @@ -783,7 +838,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> if (rc)
> break;
>
> - rc = dlpar_add_lmb(lmb);
> + if (lmb == end_lmb)
> + dt_update = DT_TOUPDATE;
> + rc = dlpar_add_lmb(lmb, &dt_update);
> if (rc) {
> dlpar_release_drc(lmb->drc_index);
> break;
> @@ -796,10 +853,14 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
> pr_err("Memory indexed-count-add failed, removing any added LMBs\n");
>
> for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> + enum dt_update_status rollback_dt_update = DT_NOUPDATE;
> +
> if (!drmem_lmb_reserved(lmb))
> continue;
>
> - rc = dlpar_remove_lmb(lmb);
> + if (lmb == end_lmb && dt_update == DT_UPDATED)
> + rollback_dt_update = DT_TOUPDATE;
> + rc = dlpar_remove_lmb(lmb, &rollback_dt_update);
> if (rc)
> pr_err("Failed to remove LMB, drc index %x\n",
> lmb->drc_index);
> @@ -879,9 +940,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> break;
> }
>
> - if (!rc)
> - rc = drmem_update_dt();
> -
> unlock_device_hotplug();
> return rc;
> }
>
^ permalink raw reply
* [PATCH v3 5/6] powerpc: Initialize a temporary mm for code patching
From: Christopher M. Riedl @ 2020-08-27 5:26 UTC (permalink / raw)
To: linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200827052659.24922-1-cmr@codefail.de>
When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.
Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.
Based on x86 implementation:
commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/lib/code-patching.c | 40 ++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 89b37ece6d2f..051d7ae6d8ee 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
#include <linux/cpuhotplug.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
+#include <linux/sched/task.h>
+#include <linux/random.h>
#include <asm/tlbflush.h>
#include <asm/page.h>
@@ -109,6 +111,44 @@ static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
}
}
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+ spinlock_t *ptl; /* for protecting pte table */
+ pte_t *ptep;
+
+ /*
+ * Some parts of the kernel (static keys for example) depend on
+ * successful code patching. Code patching under STRICT_KERNEL_RWX
+ * requires this setup - otherwise we cannot patch at all. We use
+ * BUG_ON() here and later since an early failure is preferred to
+ * buggy behavior and/or strange crashes later.
+ */
+ patching_mm = copy_init_mm();
+ BUG_ON(!patching_mm);
+
+ /*
+ * Choose a randomized, page-aligned address from the range:
+ * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
+ * The lower address bound is PAGE_SIZE to avoid the zero-page.
+ * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
+ * under DEFAULT_MAP_WINDOW in hash.
+ */
+ patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
+ % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
+
+ /*
+ * PTE allocation uses GFP_KERNEL which means we need to pre-allocate
+ * the PTE here. We cannot do the allocation during patching with IRQs
+ * disabled (ie. "atomic" context).
+ */
+ ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+ BUG_ON(!ptep);
+ pte_unmap_unlock(ptep, ptl);
+}
+
static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
#ifdef CONFIG_LKDTM
--
2.28.0
^ permalink raw reply related
* [PATCH v3 00/13] mm/debug_vm_pgtable fixes
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. We can
merge them via ppc64 tree if required.
Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.
The patches are on top of 15bc20c6af4ceee97a1f90b43c0e386643c071b4 (linus/master)
Changes from v2:
* Fix build failure with different configs and architecture.
Changes from v1:
* Address review feedback
* drop test specific pfn_pte and pfn_pmd.
* Update ppc64 page table helper to add _PAGE_PTE
Aneesh Kumar K.V (13):
powerpc/mm: Add DEBUG_VM WARN for pmd_clear
powerpc/mm: Move setting pte specific flags to pfn_pte
mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
vmap support.
mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
CONFIG_NUMA_BALANCING
mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
set_pmd/pud_at
mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
existing pte entry
mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
mm/debug_vm_pgtable/locks: Move non page table modifying test together
mm/debug_vm_pgtable/locks: Take correct page table lock
mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
mm/debug_vm_pgtable: populate a pte entry before fetching it
arch/powerpc/include/asm/book3s/64/pgtable.h | 29 +++-
arch/powerpc/include/asm/nohash/pgtable.h | 5 -
arch/powerpc/mm/pgtable.c | 5 -
mm/debug_vm_pgtable.c | 170 ++++++++++++-------
4 files changed, 131 insertions(+), 78 deletions(-)
--
2.26.2
^ permalink raw reply
* [PATCH v3 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
With the hash page table, the kernel should not use pmd_clear for clearing
huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..079211968987 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte)
static inline void pmd_clear(pmd_t *pmdp)
{
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+ /*
+ * Don't use this if we can possibly have a hash page table
+ * entry mapping this.
+ */
+ WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));
+ }
*pmdp = __pmd(0);
}
@@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd)
static inline void pud_clear(pud_t *pudp)
{
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+ /*
+ * Don't use this if we can possibly have a hash page table
+ * entry mapping this.
+ */
+ WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));
+ }
*pudp = __pud(0);
}
--
2.26.2
^ permalink raw reply related
* [PATCH v3 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
powerpc used to set the pte specific flags in set_pte_at(). This is different
from other architectures. To be consistent with other architecture update
pfn_pte to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte.
We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting
_PAGE_PTE bit. We will remove that after a few releases.
With respect to huge pmd entries, pmd_mkhuge() takes care of adding the
_PAGE_PTE bit.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +++++++++------
arch/powerpc/include/asm/nohash/pgtable.h | 5 -----
arch/powerpc/mm/pgtable.c | 5 -----
3 files changed, 9 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 079211968987..2382fd516f6b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
VM_BUG_ON(pfn >> (64 - PAGE_SHIFT));
VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK);
- return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot));
+ return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | _PAGE_PTE);
}
static inline unsigned long pte_pfn(pte_t pte)
@@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte)
return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC));
}
-static inline pte_t pte_mkpte(pte_t pte)
-{
- return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
-}
-
static inline pte_t pte_mkwrite(pte_t pte)
{
/*
@@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte)
static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
{
+
+ VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE)));
+ /*
+ * Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE
+ * in all the callers.
+ */
+ pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
+
if (radix_enabled())
return radix__set_pte_at(mm, addr, ptep, pte, percpu);
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..6277e7596ae5 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
}
-static inline pte_t pte_mkpte(pte_t pte)
-{
- return pte;
-}
-
static inline pte_t pte_mkspecial(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_SPECIAL);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9c0547d77af3..ab57b07ef39a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
*/
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
- /* Add the pte bit when trying to set a pte */
- pte = pte_mkpte(pte);
-
/* Note: mm->context.id might not yet have been assigned as
* this context might not have been activated yet when this
* is called.
@@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
*/
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
- pte = pte_mkpte(pte);
-
pte = set_pte_filter(pte);
val = pte_val(pte);
--
2.26.2
^ permalink raw reply related
* [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
* entry type. But these bits might affect the ability to clear entries with
* pxx_clear() because of how dynamic page table folding works on s390. So
* while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
*/
-#define S390_MASK_BITS 4
-#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASK GENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK 0x0
+#endif
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
#define RANDOM_NZVALUE GENMASK(7, 0)
static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
--
2.26.2
^ permalink raw reply related
* [PATCH v3 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
ppc64 supports huge vmap only with radix translation. Hence use arch helper
to determine the huge vmap support.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index bbf9df0e64c6..28f9d0558c20 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -28,6 +28,7 @@
#include <linux/swapops.h>
#include <linux/start_kernel.h>
#include <linux/sched/mm.h>
+#include <linux/io.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
@@ -206,11 +207,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
WARN_ON(!pmd_leaf(pmd));
}
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
{
pmd_t pmd;
- if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+ if (!arch_ioremap_pmd_supported())
return;
pr_debug("Validating PMD huge\n");
@@ -224,6 +226,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
}
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
{
@@ -320,11 +326,12 @@ static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
WARN_ON(!pud_leaf(pud));
}
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
{
pud_t pud;
- if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+ if (!arch_ioremap_pud_supported())
return;
pr_debug("Validating PUD huge\n");
@@ -338,6 +345,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
}
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
static void __init pud_advanced_tests(struct mm_struct *mm,
--
2.26.2
^ permalink raw reply related
* [PATCH v3 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
Saved write support was added to track the write bit of a pte after marking the
pte protnone. This was done so that AUTONUMA can convert a write pte to protnone
and still track the old write bit. When converting it back we set the pte write
bit correctly thereby avoiding a write fault again. Hence enable the test only
when CONFIG_NUMA_BALANCING is enabled and use protnone protflags.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 28f9d0558c20..5c0680836fe9 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -119,10 +119,14 @@ static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
{
pte_t pte = pfn_pte(pfn, prot);
+ if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+ return;
+
pr_debug("Validating PTE saved write\n");
WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte))));
WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte))));
}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
{
@@ -235,6 +239,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
{
pmd_t pmd = pfn_pmd(pfn, prot);
+ if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+ return;
+
pr_debug("Validating PMD saved write\n");
WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd))));
WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd))));
@@ -1020,8 +1027,8 @@ static int __init debug_vm_pgtable(void)
pmd_huge_tests(pmdp, pmd_aligned, prot);
pud_huge_tests(pudp, pud_aligned, prot);
- pte_savedwrite_tests(pte_aligned, prot);
- pmd_savedwrite_tests(pmd_aligned, prot);
+ pte_savedwrite_tests(pte_aligned, protnone);
+ pmd_savedwrite_tests(pmd_aligned, protnone);
pte_unmap_unlock(ptep, ptl);
--
2.26.2
^ permalink raw reply related
* [PATCH v3 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
kernel expects entries to be marked huge before we use set_pmd_at()/set_pud_at().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5c0680836fe9..de83a20c1b30 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -155,7 +155,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
unsigned long pfn, unsigned long vaddr,
pgprot_t prot)
{
- pmd_t pmd = pfn_pmd(pfn, prot);
+ pmd_t pmd;
if (!has_transparent_hugepage())
return;
@@ -164,19 +164,19 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
- pmd = pfn_pmd(pfn, prot);
+ pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
- pmd = pfn_pmd(pfn, prot);
+ pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
- pmd = pfn_pmd(pfn, prot);
+ pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
@@ -237,7 +237,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
{
- pmd_t pmd = pfn_pmd(pfn, prot);
+ pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
return;
@@ -277,7 +277,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
unsigned long pfn, unsigned long vaddr,
pgprot_t prot)
{
- pud_t pud = pfn_pud(pfn, prot);
+ pud_t pud;
if (!has_transparent_hugepage())
return;
@@ -286,25 +286,28 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
/* Align the address wrt HPAGE_PUD_SIZE */
vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
+ pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_set_wrprotect(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_write(pud));
#ifndef __PAGETABLE_PMD_FOLDED
- pud = pfn_pud(pfn, prot);
+
+ pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
- pud = pfn_pud(pfn, prot);
+ pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
#endif /* __PAGETABLE_PMD_FOLDED */
- pud = pfn_pud(pfn, prot);
+
+ pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(mm, vaddr, pudp, pud);
--
2.26.2
^ permalink raw reply related
* [PATCH v3 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry
From: Aneesh Kumar K.V @ 2020-08-27 8:04 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-arch, linux-s390, Anshuman Khandual, Aneesh Kumar K.V, x86,
Mike Rapoport, Qian Cai, Gerald Schaefer, Christophe Leroy,
Vineet Gupta, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-1-aneesh.kumar@linux.ibm.com>
set_pte_at() should not be used to set a pte entry at locations that
already holds a valid pte entry. Architectures like ppc64 don't do TLB
invalidate in set_pte_at() and hence expect it to be used to set locations
that are not a valid PTE.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 35 +++++++++++++++--------------------
1 file changed, 15 insertions(+), 20 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index de83a20c1b30..f9f6358899a8 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -79,15 +79,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
{
pte_t pte = pfn_pte(pfn, prot);
+ /*
+ * Architectures optimize set_pte_at by avoiding TLB flush.
+ * This requires set_pte_at to be not used to update an
+ * existing pte entry. Clear pte before we do set_pte_at
+ */
+
pr_debug("Validating PTE advanced\n");
pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);
ptep_set_wrprotect(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(pte_write(pte));
-
- pte = pfn_pte(pfn, prot);
- set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
@@ -101,13 +104,11 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
pte = ptep_get(ptep);
WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
-
- pte = pfn_pte(pfn, prot);
- set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear_full(mm, vaddr, ptep, 1);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
+ pte = pfn_pte(pfn, prot);
pte = pte_mkyoung(pte);
set_pte_at(mm, vaddr, ptep, pte);
ptep_test_and_clear_young(vma, vaddr, ptep);
@@ -169,9 +170,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
-
- pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
- set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
@@ -185,13 +183,11 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
-
- pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
- set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
+ pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_mkyoung(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_test_and_clear_young(vma, vaddr, pmdp);
@@ -293,18 +289,10 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
WARN_ON(pud_write(pud));
#ifndef __PAGETABLE_PMD_FOLDED
-
- pud = pud_mkhuge(pfn_pud(pfn, prot));
- set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
- pud = pud_mkhuge(pfn_pud(pfn, prot));
- set_pud_at(mm, vaddr, pudp, pud);
- pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
- pud = READ_ONCE(*pudp);
- WARN_ON(!pud_none(pud));
#endif /* __PAGETABLE_PMD_FOLDED */
pud = pud_mkhuge(pfn_pud(pfn, prot));
@@ -317,6 +305,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pud = READ_ONCE(*pudp);
WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
+#ifndef __PAGETABLE_PMD_FOLDED
+ pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(!pud_none(pud));
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+ pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_mkyoung(pud);
set_pud_at(mm, vaddr, pudp, pud);
pudp_test_and_clear_young(vma, vaddr, pudp);
--
2.26.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox