* [PATCH v5 22/26] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS
From: Aneesh Kumar K.V @ 2020-06-19 13:58 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram, bauerman
In-Reply-To: <20200619135850.47155-1-aneesh.kumar@linux.ibm.com>
The next set of patches adds support for kuap with hash translation.
Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of
PPC_MEM_KEYS. Hash translation is going to use pkeys to support
KUAP/KUEP. Adding this dependency reduces the code complexity and
enables us to move some of the initialization code to pkeys.c
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/kup.h | 33 ++++++++++++++----------
arch/powerpc/include/asm/ptrace.h | 2 +-
arch/powerpc/kernel/asm-offsets.c | 2 +-
arch/powerpc/platforms/Kconfig.cputype | 4 +--
4 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 3cecd964a63f..3a0e138d2735 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -62,7 +62,7 @@
#else /* !__ASSEMBLY__ */
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_MEM_KEYS
#include <asm/mmu.h>
#include <asm/ptrace.h>
@@ -97,6 +97,24 @@ static inline void kuap_check_amr(void)
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
}
+#else /* CONFIG_PPC_MEM_KEYS */
+
+static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
+{
+}
+
+static inline void kuap_check_amr(void)
+{
+}
+
+static inline unsigned long kuap_get_and_check_amr(void)
+{
+ return 0;
+}
+#endif /* CONFIG_PPC_MEM_KEYS */
+
+
+#ifdef CONFIG_PPC_KUAP
/*
* We support individually allowing read or write, but we don't support nesting
* because that would require an expensive read/modify write of the AMR.
@@ -166,19 +184,6 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
}
-#else /* CONFIG_PPC_KUAP */
-static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
-{
-}
-
-static inline void kuap_check_amr(void)
-{
-}
-
-static inline unsigned long kuap_get_and_check_amr(void)
-{
- return 0;
-}
#endif /* CONFIG_PPC_KUAP */
#define reset_kuap reset_kuap
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index ac3970fff0d5..1a6cadf63d14 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -53,7 +53,7 @@ struct pt_regs
#ifdef CONFIG_PPC64
unsigned long ppr;
#endif
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_HAVE_KUAP
unsigned long kuap;
#endif
};
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9b9cde07e396..1694c4f531b9 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -354,7 +354,7 @@ int main(void)
STACK_PT_REGS_OFFSET(_PPR, ppr);
#endif /* CONFIG_PPC64 */
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_HAVE_KUAP
STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap);
#endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index d349603fb889..053c46aecf80 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -99,6 +99,8 @@ config PPC_BOOK3S_64
select ARCH_SUPPORTS_NUMA_BALANCING
select IRQ_WORK
select PPC_MM_SLICES
+ select PPC_HAVE_KUAP if PPC_MEM_KEYS
+ select PPC_HAVE_KUEP if PPC_MEM_KEYS
config PPC_BOOK3E_64
bool "Embedded processors"
@@ -350,8 +352,6 @@ config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
select ARCH_HAS_GIGANTIC_PAGE
- select PPC_HAVE_KUEP
- select PPC_HAVE_KUAP
default y
help
Enable support for the Power ISA 3.0 Radix style MMU. Currently this
--
2.26.2
^ permalink raw reply related
* [PATCH v5 23/26] powerpc/book3s64/kuap: Move UAMOR setup to key init function
From: Aneesh Kumar K.V @ 2020-06-19 13:58 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram, bauerman
In-Reply-To: <20200619135850.47155-1-aneesh.kumar@linux.ibm.com>
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.
Move UAMOR initialization to key initialization code. Now that
KUAP/KUEP feature depends on PPC_MEM_KEYS, we can start to consolidate
all register initialization to keys init.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/kup.h | 2 ++
arch/powerpc/include/asm/processor.h | 1 -
arch/powerpc/kernel/ptrace/ptrace-view.c | 17 ++++++++----
arch/powerpc/kernel/smp.c | 5 ++++
arch/powerpc/mm/book3s64/pkeys.c | 35 ++++++++++++++----------
5 files changed, 39 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 3a0e138d2735..942594745dfa 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -67,6 +67,8 @@
#include <asm/mmu.h>
#include <asm/ptrace.h>
+extern u64 default_uamor;
+
static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
{
if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) {
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 52a67835057a..6ac12168f1fe 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -237,7 +237,6 @@ struct thread_struct {
#ifdef CONFIG_PPC_MEM_KEYS
unsigned long amr;
unsigned long iamr;
- unsigned long uamor;
#endif
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void* kvm_shadow_vcpu; /* KVM internal data */
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c
index caeb5822a8f4..689711eb018a 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -488,14 +488,22 @@ static int pkey_active(struct task_struct *target, const struct user_regset *reg
static int pkey_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf)
{
+ int ret;
+
BUILD_BUG_ON(TSO(amr) + sizeof(unsigned long) != TSO(iamr));
- BUILD_BUG_ON(TSO(iamr) + sizeof(unsigned long) != TSO(uamor));
if (!arch_pkeys_enabled())
return -ENODEV;
- return user_regset_copyout(&pos, &count, &kbuf, &ubuf, &target->thread.amr,
- 0, ELF_NPKEY * sizeof(unsigned long));
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &target->thread.amr,
+ 0, 2 * sizeof(unsigned long));
+ if (ret)
+ goto err_out;
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &default_uamor,
+ 2 * sizeof(unsigned long), 3 * sizeof(unsigned long));
+err_out:
+ return ret;
}
static int pkey_set(struct task_struct *target, const struct user_regset *regset,
@@ -518,8 +526,7 @@ static int pkey_set(struct task_struct *target, const struct user_regset *regset
return ret;
/* UAMOR determines which bits of the AMR can be set from userspace. */
- target->thread.amr = (new_amr & target->thread.uamor) |
- (target->thread.amr & ~target->thread.uamor);
+ target->thread.amr = (new_amr & default_uamor) | (target->thread.amr & ~default_uamor);
return 0;
}
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c820c95162ff..eec40082599f 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -59,6 +59,7 @@
#include <asm/asm-prototypes.h>
#include <asm/cpu_has_feature.h>
#include <asm/ftrace.h>
+#include <asm/kup.h>
#ifdef DEBUG
#include <asm/udbg.h>
@@ -1256,6 +1257,10 @@ void start_secondary(void *unused)
mmgrab(&init_mm);
current->active_mm = &init_mm;
+#ifdef CONFIG_PPC_MEM_KEYS
+ mtspr(SPRN_UAMOR, default_uamor);
+#endif
+
smp_store_cpu_info(cpu);
set_dec(tb_ticks_per_jiffy);
preempt_disable();
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index aeecc8b8e11c..3f3593f85358 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -24,7 +24,7 @@ static u32 initial_allocation_mask; /* Bits set for the initially allocated k
static u64 default_amr;
static u64 default_iamr;
/* Allow all keys to be modified by default */
-static u64 default_uamor = ~0x0UL;
+u64 default_uamor = ~0x0UL;
/*
* Key used to implement PROT_EXEC mmap. Denies READ/WRITE
* We pick key 2 because 0 is special key and 1 is reserved as per ISA.
@@ -113,8 +113,16 @@ void __init pkey_early_init_devtree(void)
/* scan the device tree for pkey feature */
pkeys_total = scan_pkey_feature();
if (!pkeys_total) {
- /* No support for pkey. Mark it disabled */
- return;
+ /*
+ * No key support but on radix we can use key 0
+ * to implement kuap.
+ */
+ if (early_radix_enabled())
+ /*
+ * Make sure userspace can't change the AMR
+ */
+ default_uamor = 0;
+ goto err_out;
}
cur_cpu_spec->mmu_features |= MMU_FTR_PKEY;
@@ -197,6 +205,12 @@ void __init pkey_early_init_devtree(void)
initial_allocation_mask |= reserved_allocation_mask;
pr_info("Enabling Memory keys with max key count %d", max_pkey);
+err_out:
+ /*
+ * Setup uamor on boot cpu
+ */
+ mtspr(SPRN_UAMOR, default_uamor);
+
return;
}
@@ -232,8 +246,9 @@ void __init setup_kuap(bool disabled)
cur_cpu_spec->mmu_features |= MMU_FTR_KUAP;
}
- /* Make sure userspace can't change the AMR */
- mtspr(SPRN_UAMOR, 0);
+ /*
+ * Set the default kernel AMR values on all cpus.
+ */
mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
isync();
}
@@ -278,11 +293,6 @@ static inline u64 read_uamor(void)
return mfspr(SPRN_UAMOR);
}
-static inline void write_uamor(u64 value)
-{
- mtspr(SPRN_UAMOR, value);
-}
-
static bool is_pkey_enabled(int pkey)
{
u64 uamor = read_uamor();
@@ -353,7 +363,6 @@ void thread_pkey_regs_save(struct thread_struct *thread)
*/
thread->amr = read_amr();
thread->iamr = read_iamr();
- thread->uamor = read_uamor();
}
void thread_pkey_regs_restore(struct thread_struct *new_thread,
@@ -366,8 +375,6 @@ void thread_pkey_regs_restore(struct thread_struct *new_thread,
write_amr(new_thread->amr);
if (old_thread->iamr != new_thread->iamr)
write_iamr(new_thread->iamr);
- if (old_thread->uamor != new_thread->uamor)
- write_uamor(new_thread->uamor);
}
void thread_pkey_regs_init(struct thread_struct *thread)
@@ -377,11 +384,9 @@ void thread_pkey_regs_init(struct thread_struct *thread)
thread->amr = default_amr;
thread->iamr = default_iamr;
- thread->uamor = default_uamor;
write_amr(default_amr);
write_iamr(default_iamr);
- write_uamor(default_uamor);
}
int execute_only_pkey(struct mm_struct *mm)
--
2.26.2
^ permalink raw reply related
* [PATCH v5 24/26] powerpc/selftest/ptrave-pkey: Rename variables to make it easier to follow code
From: Aneesh Kumar K.V @ 2020-06-19 13:58 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram, bauerman
In-Reply-To: <20200619135850.47155-1-aneesh.kumar@linux.ibm.com>
Rename variable to indicate that they are invalid values which we will use to
test ptrace update of pkeys.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
.../selftests/powerpc/ptrace/ptrace-pkey.c | 26 +++++++++----------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bdbbbe8431e0..f9216c7a1829 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -44,7 +44,7 @@ struct shared_info {
unsigned long amr2;
/* AMR value that ptrace should refuse to write to the child. */
- unsigned long amr3;
+ unsigned long invalid_amr;
/* IAMR value the parent expects to read from the child. */
unsigned long expected_iamr;
@@ -57,8 +57,8 @@ struct shared_info {
* (even though they're valid ones) because userspace doesn't have
* access to those registers.
*/
- unsigned long new_iamr;
- unsigned long new_uamor;
+ unsigned long invalid_iamr;
+ unsigned long invalid_uamor;
};
static int sys_pkey_alloc(unsigned long flags, unsigned long init_access_rights)
@@ -100,7 +100,7 @@ static int child(struct shared_info *info)
info->amr1 |= 3ul << pkeyshift(pkey1);
info->amr2 |= 3ul << pkeyshift(pkey2);
- info->amr3 |= info->amr2 | 3ul << pkeyshift(pkey3);
+ info->invalid_amr |= info->amr2 | 3ul << pkeyshift(pkey3);
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
@@ -111,8 +111,8 @@ static int child(struct shared_info *info)
info->expected_uamor |= 3ul << pkeyshift(pkey1) |
3ul << pkeyshift(pkey2);
- info->new_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
- info->new_uamor |= 3ul << pkeyshift(pkey1);
+ info->invalid_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
+ info->invalid_uamor |= 3ul << pkeyshift(pkey1);
/*
* We won't use pkey3. We just want a plausible but invalid key to test
@@ -196,9 +196,9 @@ static int parent(struct shared_info *info, pid_t pid)
PARENT_SKIP_IF_UNSUPPORTED(ret, &info->child_sync);
PARENT_FAIL_IF(ret, &info->child_sync);
- info->amr1 = info->amr2 = info->amr3 = regs[0];
- info->expected_iamr = info->new_iamr = regs[1];
- info->expected_uamor = info->new_uamor = regs[2];
+ info->amr1 = info->amr2 = info->invalid_amr = regs[0];
+ info->expected_iamr = info->invalid_iamr = regs[1];
+ info->expected_uamor = info->invalid_uamor = regs[2];
/* Wake up child so that it can set itself up. */
ret = prod_child(&info->child_sync);
@@ -234,10 +234,10 @@ static int parent(struct shared_info *info, pid_t pid)
return ret;
/* Write invalid AMR value in child. */
- ret = ptrace_write_regs(pid, NT_PPC_PKEY, &info->amr3, 1);
+ ret = ptrace_write_regs(pid, NT_PPC_PKEY, &info->invalid_amr, 1);
PARENT_FAIL_IF(ret, &info->child_sync);
- printf("%-30s AMR: %016lx\n", ptrace_write_running, info->amr3);
+ printf("%-30s AMR: %016lx\n", ptrace_write_running, info->invalid_amr);
/* Wake up child so that it can verify it didn't change. */
ret = prod_child(&info->child_sync);
@@ -249,7 +249,7 @@ static int parent(struct shared_info *info, pid_t pid)
/* Try to write to IAMR. */
regs[0] = info->amr1;
- regs[1] = info->new_iamr;
+ regs[1] = info->invalid_iamr;
ret = ptrace_write_regs(pid, NT_PPC_PKEY, regs, 2);
PARENT_FAIL_IF(!ret, &info->child_sync);
@@ -257,7 +257,7 @@ static int parent(struct shared_info *info, pid_t pid)
ptrace_write_running, regs[0], regs[1]);
/* Try to write to IAMR and UAMOR. */
- regs[2] = info->new_uamor;
+ regs[2] = info->invalid_uamor;
ret = ptrace_write_regs(pid, NT_PPC_PKEY, regs, 3);
PARENT_FAIL_IF(!ret, &info->child_sync);
--
2.26.2
^ permalink raw reply related
* [PATCH v5 25/26] powerpc/selftest/ptrace-pkey: Update the test to mark an invalid pkey correctly
From: Aneesh Kumar K.V @ 2020-06-19 13:58 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram, bauerman
In-Reply-To: <20200619135850.47155-1-aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
.../selftests/powerpc/ptrace/ptrace-pkey.c | 30 ++++++++-----------
1 file changed, 12 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index f9216c7a1829..bc33d748d95b 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -66,11 +66,6 @@ static int sys_pkey_alloc(unsigned long flags, unsigned long init_access_rights)
return syscall(__NR_pkey_alloc, flags, init_access_rights);
}
-static int sys_pkey_free(int pkey)
-{
- return syscall(__NR_pkey_free, pkey);
-}
-
static int child(struct shared_info *info)
{
unsigned long reg;
@@ -100,7 +95,11 @@ static int child(struct shared_info *info)
info->amr1 |= 3ul << pkeyshift(pkey1);
info->amr2 |= 3ul << pkeyshift(pkey2);
- info->invalid_amr |= info->amr2 | 3ul << pkeyshift(pkey3);
+ /*
+ * invalid amr value where we try to force write
+ * things which are deined by a uamor setting.
+ */
+ info->invalid_amr = info->amr2 | (~0x0UL & ~info->expected_uamor);
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
@@ -111,17 +110,12 @@ static int child(struct shared_info *info)
info->expected_uamor |= 3ul << pkeyshift(pkey1) |
3ul << pkeyshift(pkey2);
- info->invalid_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
- info->invalid_uamor |= 3ul << pkeyshift(pkey1);
-
/*
- * We won't use pkey3. We just want a plausible but invalid key to test
- * whether ptrace will let us write to AMR bits we are not supposed to.
- *
- * This also tests whether the kernel restores the UAMOR permissions
- * after a key is freed.
+ * Create an IAMR value different from expected value.
+ * Kernel will reject an IAMR and UAMOR change.
*/
- sys_pkey_free(pkey3);
+ info->invalid_iamr = info->expected_iamr | (1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2));
+ info->invalid_uamor = info->expected_uamor & ~(0x3ul << pkeyshift(pkey1));
printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
user_write, info->amr1, pkey1, pkey2, pkey3);
@@ -196,9 +190,9 @@ static int parent(struct shared_info *info, pid_t pid)
PARENT_SKIP_IF_UNSUPPORTED(ret, &info->child_sync);
PARENT_FAIL_IF(ret, &info->child_sync);
- info->amr1 = info->amr2 = info->invalid_amr = regs[0];
- info->expected_iamr = info->invalid_iamr = regs[1];
- info->expected_uamor = info->invalid_uamor = regs[2];
+ info->amr1 = info->amr2 = regs[0];
+ info->expected_iamr = regs[1];
+ info->expected_uamor = regs[2];
/* Wake up child so that it can set itself up. */
ret = prod_child(&info->child_sync);
--
2.26.2
^ permalink raw reply related
* [PATCH v5 26/26] powerpc/selftest/ptrace-pkey: IAMR and uamor cannot be updated by ptrace
From: Aneesh Kumar K.V @ 2020-06-19 13:58 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram, bauerman
In-Reply-To: <20200619135850.47155-1-aneesh.kumar@linux.ibm.com>
Both IAMR and uamor are privileged and cannot be updated by userspace. Hence
we also don't allow ptrace interface to update them. Don't update them in the
test. Also expected_iamr is only changed if we can allocate a DISABLE_EXECUTE
pkey.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bc33d748d95b..5c3c8222de46 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -101,15 +101,12 @@ static int child(struct shared_info *info)
*/
info->invalid_amr = info->amr2 | (~0x0UL & ~info->expected_uamor);
+ /*
+ * if PKEY_DISABLE_EXECUTE succeeded we should update the expected_iamr
+ */
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
- else
- info->expected_iamr &= ~(1ul << pkeyshift(pkey1));
-
- info->expected_iamr &= ~(1ul << pkeyshift(pkey2) | 1ul << pkeyshift(pkey3));
- info->expected_uamor |= 3ul << pkeyshift(pkey1) |
- 3ul << pkeyshift(pkey2);
/*
* Create an IAMR value different from expected value.
* Kernel will reject an IAMR and UAMOR change.
--
2.26.2
^ permalink raw reply related
* Re: linux-next: manual merge of the pidfd tree with the powerpc-fixes tree
From: Christian Brauner @ 2020-06-19 14:01 UTC (permalink / raw)
To: Michael Ellerman
Cc: Stephen Rothwell, Linux Next Mailing List, PowerPC,
Linux Kernel Mailing List, Christian Brauner
In-Reply-To: <878sgjcnjp.fsf@mpe.ellerman.id.au>
On Fri, Jun 19, 2020 at 09:17:30PM +1000, Michael Ellerman wrote:
> Stephen Rothwell <sfr@canb.auug.org.au> writes:
> > Hi all,
> >
> > Today's linux-next merge of the pidfd tree got a conflict in:
> >
> > arch/powerpc/kernel/syscalls/syscall.tbl
> >
> > between commit:
> >
> > 35e32a6cb5f6 ("powerpc/syscalls: Split SPU-ness out of ABI")
> >
> > from the powerpc-fixes tree and commit:
> >
> > 9b4feb630e8e ("arch: wire-up close_range()")
> >
> > from the pidfd tree.
> >
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging. You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
>
> Thanks.
>
> I thought the week between rc1 and rc2 would be a safe time to do that
> conversion of the syscall table, but I guess I was wrong :)
:)
>
> I'm planning to send those changes to Linus for rc2, so the conflict
> will then be vs mainline. But I guess it's pretty trivial so it doesn't
> really matter.
close_range() is targeted for the v5.9 merge window. I always do
test-merges with mainline at the time I'm creating a pr and I'll just
mention to Linus that there's conflict with ppc. :)
Thanks!
Christian
^ permalink raw reply
* [PATCH v5] ocxl: control via sysfs whether the FPGA is reloaded on a link reset
From: Frederic Barrat @ 2020-06-19 14:04 UTC (permalink / raw)
To: linuxppc-dev, clombard, ajd, alastair
From: Philippe Bergheaud <felix@linux.ibm.com>
Some opencapi FPGA images allow to control if the FPGA should be reloaded
on the next adapter reset. If it is supported, the image specifies it
through a Vendor Specific DVSEC in the config space of function 0.
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
Changelog:
v2:
- refine ResetReload debug message
- do not call get_function_0() if pci_dev is for function 0
v3:
- avoid get_function_0() in ocxl_config_set_reset_reload also
v4:
- simplify parsing of Vendor Specific DVSEC during AFU init
- only set/unset bit 0 of the config space register
- commonize code to fetch the right PCI function and DVSEC offset
- use kstrtoint() when parsing the sysfs buffer
v5:
- update documentation (Andrew)
Documentation/ABI/testing/sysfs-class-ocxl | 11 +++
drivers/misc/ocxl/config.c | 81 ++++++++++++++++++++--
drivers/misc/ocxl/ocxl_internal.h | 6 ++
drivers/misc/ocxl/sysfs.c | 35 ++++++++++
include/misc/ocxl-config.h | 1 +
5 files changed, 129 insertions(+), 5 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-class-ocxl b/Documentation/ABI/testing/sysfs-class-ocxl
index b5b1fa197592..ae1276efa45a 100644
--- a/Documentation/ABI/testing/sysfs-class-ocxl
+++ b/Documentation/ABI/testing/sysfs-class-ocxl
@@ -33,3 +33,14 @@ Date: January 2018
Contact: linuxppc-dev@lists.ozlabs.org
Description: read/write
Give access the global mmio area for the AFU
+
+What: /sys/class/ocxl/<afu name>/reload_on_reset
+Date: February 2020
+Contact: linuxppc-dev@lists.ozlabs.org
+Description: read/write
+ Control whether the FPGA is reloaded on a link reset. Enabled
+ through a vendor-specific logic block on the FPGA.
+ 0 Do not reload FPGA image from flash
+ 1 Reload FPGA image from flash
+ unavailable
+ The device does not support this capability
diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..42f7a1298775 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -71,6 +71,20 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
return 0;
}
+/**
+ * get_function_0() - Find a related PCI device (function 0)
+ * @device: PCI device to match
+ *
+ * Returns a pointer to the related device, or null if not found
+ */
+static struct pci_dev *get_function_0(struct pci_dev *dev)
+{
+ unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
+
+ return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
+ dev->bus->number, devfn);
+}
+
static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
{
u16 val;
@@ -159,14 +173,15 @@ static int read_dvsec_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn)
static int read_dvsec_vendor(struct pci_dev *dev)
{
int pos;
- u32 cfg, tlx, dlx;
+ u32 cfg, tlx, dlx, reset_reload;
/*
- * vendor specific DVSEC is optional
+ * vendor specific DVSEC, for IBM images only. Some older
+ * images may not have it
*
- * It's currently only used on function 0 to specify the
- * version of some logic blocks. Some older images may not
- * even have it so we ignore any errors
+ * It's only used on function 0 to specify the version of some
+ * logic blocks and to give access to special registers to
+ * enable host-based flashing.
*/
if (PCI_FUNC(dev->devfn) != 0)
return 0;
@@ -178,11 +193,67 @@ static int read_dvsec_vendor(struct pci_dev *dev)
pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_CFG_VERS, &cfg);
pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_TLX_VERS, &tlx);
pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_DLX_VERS, &dlx);
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_RESET_RELOAD,
+ &reset_reload);
dev_dbg(&dev->dev, "Vendor specific DVSEC:\n");
dev_dbg(&dev->dev, " CFG version = 0x%x\n", cfg);
dev_dbg(&dev->dev, " TLX version = 0x%x\n", tlx);
dev_dbg(&dev->dev, " DLX version = 0x%x\n", dlx);
+ dev_dbg(&dev->dev, " ResetReload = 0x%x\n", reset_reload);
+ return 0;
+}
+
+static int get_dvsec_vendor0(struct pci_dev *dev, struct pci_dev **dev0,
+ int *out_pos)
+{
+ int pos;
+
+ if (PCI_FUNC(dev->devfn) != 0) {
+ dev = get_function_0(dev);
+ if (!dev)
+ return -1;
+ }
+ pos = find_dvsec(dev, OCXL_DVSEC_VENDOR_ID);
+ if (!pos)
+ return -1;
+ *dev0 = dev;
+ *out_pos = pos;
+ return 0;
+}
+
+int ocxl_config_get_reset_reload(struct pci_dev *dev, int *val)
+{
+ struct pci_dev *dev0;
+ u32 reset_reload;
+ int pos;
+
+ if (get_dvsec_vendor0(dev, &dev0, &pos))
+ return -1;
+
+ pci_read_config_dword(dev0, pos + OCXL_DVSEC_VENDOR_RESET_RELOAD,
+ &reset_reload);
+ *val = !!(reset_reload & BIT(0));
+ return 0;
+}
+
+int ocxl_config_set_reset_reload(struct pci_dev *dev, int val)
+{
+ struct pci_dev *dev0;
+ u32 reset_reload;
+ int pos;
+
+ if (get_dvsec_vendor0(dev, &dev0, &pos))
+ return -1;
+
+ pci_read_config_dword(dev0, pos + OCXL_DVSEC_VENDOR_RESET_RELOAD,
+ &reset_reload);
+ if (val)
+ reset_reload |= BIT(0);
+ else
+ reset_reload &= ~BIT(0);
+ pci_write_config_dword(dev0, pos + OCXL_DVSEC_VENDOR_RESET_RELOAD,
+ reset_reload);
return 0;
}
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 345bf843a38e..af9a84aeee6f 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -112,6 +112,12 @@ void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
*/
int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+/*
+ * Control whether the FPGA is reloaded on a link reset
+ */
+int ocxl_config_get_reset_reload(struct pci_dev *dev, int *val);
+int ocxl_config_set_reset_reload(struct pci_dev *dev, int val);
+
/*
* Check if an AFU index is valid for the given function.
*
diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
index 58f1ba264206..25c78df8055d 100644
--- a/drivers/misc/ocxl/sysfs.c
+++ b/drivers/misc/ocxl/sysfs.c
@@ -51,11 +51,46 @@ static ssize_t contexts_show(struct device *device,
afu->pasid_count, afu->pasid_max);
}
+static ssize_t reload_on_reset_show(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct ocxl_afu *afu = to_afu(device);
+ struct ocxl_fn *fn = afu->fn;
+ struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
+ int val;
+
+ if (ocxl_config_get_reset_reload(pci_dev, &val))
+ return scnprintf(buf, PAGE_SIZE, "unavailable\n");
+
+ return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t reload_on_reset_store(struct device *device,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ocxl_afu *afu = to_afu(device);
+ struct ocxl_fn *fn = afu->fn;
+ struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
+ int rc, val;
+
+ rc = kstrtoint(buf, 0, &val);
+ if (rc || (val != 0 && val != 1))
+ return -EINVAL;
+
+ if (ocxl_config_set_reset_reload(pci_dev, val))
+ return -ENODEV;
+
+ return count;
+}
+
static struct device_attribute afu_attrs[] = {
__ATTR_RO(global_mmio_size),
__ATTR_RO(pp_mmio_size),
__ATTR_RO(afu_version),
__ATTR_RO(contexts),
+ __ATTR_RW(reload_on_reset),
};
static ssize_t global_mmio_read(struct file *filp, struct kobject *kobj,
diff --git a/include/misc/ocxl-config.h b/include/misc/ocxl-config.h
index 3526fa996a22..ccfd3b463517 100644
--- a/include/misc/ocxl-config.h
+++ b/include/misc/ocxl-config.h
@@ -41,5 +41,6 @@
#define OCXL_DVSEC_VENDOR_CFG_VERS 0x0C
#define OCXL_DVSEC_VENDOR_TLX_VERS 0x10
#define OCXL_DVSEC_VENDOR_DLX_VERS 0x20
+#define OCXL_DVSEC_VENDOR_RESET_RELOAD 0x38
#endif /* _OCXL_CONFIG_H_ */
--
2.26.2
^ permalink raw reply related
* [PATCH v1 3/8] powerpc: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
User space stops at TASK_SIZE. At the moment, kernel space starts
at PAGE_OFFSET.
In order to use space between TASK_SIZE and PAGE_OFFSET for modules,
make TASK_SIZE the limit between user and kernel space.
Note that fault.c already considers TASK_SIZE as the boundary between
user and kernel space.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/page.h | 2 +-
arch/powerpc/mm/ptdump/ptdump.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index a63fe6f3a0ff..352a2b80d505 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -256,7 +256,7 @@ static inline bool pfn_valid(unsigned long pfn)
#ifdef CONFIG_PPC_BOOK3E_64
#define is_kernel_addr(x) ((x) >= 0x8000000000000000ul)
#else
-#define is_kernel_addr(x) ((x) >= PAGE_OFFSET)
+#define is_kernel_addr(x) ((x) >= TASK_SIZE)
#endif
#ifndef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index b71cc628facd..e995f2e9e9f7 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -351,7 +351,7 @@ static void populate_markers(void)
{
int i = 0;
- address_markers[i++].start_address = PAGE_OFFSET;
+ address_markers[i++].start_address = TASK_SIZE;
address_markers[i++].start_address = VMALLOC_START;
address_markers[i++].start_address = VMALLOC_END;
#ifdef CONFIG_PPC64
@@ -388,7 +388,7 @@ static int ptdump_show(struct seq_file *m, void *v)
struct pg_state st = {
.seq = m,
.marker = address_markers,
- .start_address = PAGE_OFFSET,
+ .start_address = TASK_SIZE,
};
#ifdef CONFIG_PPC64
@@ -432,7 +432,7 @@ void ptdump_check_wx(void)
.seq = NULL,
.marker = address_markers,
.check_wx = true,
- .start_address = PAGE_OFFSET,
+ .start_address = TASK_SIZE,
};
#ifdef CONFIG_PPC64
--
2.25.0
^ permalink raw reply related
* [PATCH v1 1/8] powerpc/ptdump: Refactor update of st->last_pa
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
st->last_pa is always updated in note_page() so it can
be done outside the if/elseif/else block.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/ptdump/ptdump.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index de6e05ef871c..d5e42b958e86 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -207,7 +207,6 @@ static void note_page(struct pg_state *st, unsigned long addr,
st->current_flags = flag;
st->start_address = addr;
st->start_pa = pa;
- st->last_pa = pa;
st->page_size = page_size;
pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
/*
@@ -247,13 +246,11 @@ static void note_page(struct pg_state *st, unsigned long addr,
}
st->start_address = addr;
st->start_pa = pa;
- st->last_pa = pa;
st->page_size = page_size;
st->current_flags = flag;
st->level = level;
- } else {
- st->last_pa = pa;
}
+ st->last_pa = pa;
}
static void walk_pte(struct pg_state *st, pmd_t *pmd, unsigned long start)
--
2.25.0
^ permalink raw reply related
* [PATCH v1 0/8] powerpc/32s: Allocate modules outside of vmalloc space for STRICT_KERNEL_RWX
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
On book3s32 (hash), exec protection is set per 256Mb segments with NX bit.
Instead of clearing NX bit on vmalloc space when CONFIG_MODULES is selected,
allocate modules in a dedicated segment (0xb0000000-0xbfffffff by default).
This allows to keep exec protection on vmalloc space while allowing exec
on modules.
Christophe Leroy (8):
powerpc/ptdump: Refactor update of st->last_pa
powerpc/ptdump: Refactor update of pg_state
powerpc: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET
powerpc/lib: Prepare code-patching for modules allocated outside
vmalloc space
powerpc: Use MODULES_VADDR if defined
powerpc/32s: Only leave NX unset on segments used for modules
powerpc/32s: Kernel space starts at TASK_SIZE
powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/book3s/32/pgtable.h | 15 ++----
arch/powerpc/include/asm/page.h | 2 +-
arch/powerpc/kernel/head_32.S | 12 ++---
arch/powerpc/kernel/module.c | 11 ++++
arch/powerpc/lib/code-patching.c | 2 +-
arch/powerpc/mm/book3s32/hash_low.S | 2 +-
arch/powerpc/mm/book3s32/mmu.c | 17 +++++--
arch/powerpc/mm/kasan/kasan_init_32.c | 6 +++
arch/powerpc/mm/ptdump/ptdump.c | 53 ++++++++++++--------
10 files changed, 78 insertions(+), 43 deletions(-)
--
2.25.0
^ permalink raw reply
* [PATCH v1 4/8] powerpc/lib: Prepare code-patching for modules allocated outside vmalloc space
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
Use is_vmalloc_or_module_addr() instead of is_vmalloc_addr()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/lib/code-patching.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 0a051dfeb177..8c3934ea6220 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -93,7 +93,7 @@ static int map_patch_area(void *addr, unsigned long text_poke_addr)
unsigned long pfn;
int err;
- if (is_vmalloc_addr(addr))
+ if (is_vmalloc_or_module_addr(addr))
pfn = vmalloc_to_pfn(addr);
else
pfn = __pa_symbol(addr) >> PAGE_SHIFT;
--
2.25.0
^ permalink raw reply related
* [PATCH v1 2/8] powerpc/ptdump: Refactor update of pg_state
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
In note_page(), the pg_state is updated the same way in two places.
Add note_page_update_state() to do it.
Also include the display of boundary markers there as it is missing
"no level" leg, leading to a mismatch when the first two markers
are at the same address and the first displayed area uses that
address.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/ptdump/ptdump.c | 34 +++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index d5e42b958e86..b71cc628facd 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -195,6 +195,24 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
}
+static void note_page_update_state(struct pg_state *st, unsigned long addr,
+ unsigned int level, u64 val, unsigned long page_size)
+{
+ u64 flag = val & pg_level[level].mask;
+ u64 pa = val & PTE_RPN_MASK;
+
+ st->level = level;
+ st->current_flags = flag;
+ st->start_address = addr;
+ st->start_pa = pa;
+ st->page_size = page_size;
+
+ while (addr >= st->marker[1].start_address) {
+ st->marker++;
+ pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
+ }
+}
+
static void note_page(struct pg_state *st, unsigned long addr,
unsigned int level, u64 val, unsigned long page_size)
{
@@ -203,12 +221,8 @@ static void note_page(struct pg_state *st, unsigned long addr,
/* At first no level is set */
if (!st->level) {
- st->level = level;
- st->current_flags = flag;
- st->start_address = addr;
- st->start_pa = pa;
- st->page_size = page_size;
pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
+ note_page_update_state(st, addr, level, val, page_size);
/*
* Dump the section of virtual memory when:
* - the PTE flags from one entry to the next differs.
@@ -240,15 +254,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
* Address indicates we have passed the end of the
* current section of virtual memory
*/
- while (addr >= st->marker[1].start_address) {
- st->marker++;
- pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
- }
- st->start_address = addr;
- st->start_pa = pa;
- st->page_size = page_size;
- st->current_flags = flag;
- st->level = level;
+ note_page_update_state(st, addr, level, val, page_size);
}
st->last_pa = pa;
}
--
2.25.0
^ permalink raw reply related
* [PATCH v1 8/8] powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc
segments. But modules require exec.
Use a dedicated segment for modules. There is not much space
above kernel, and we don't waste vmalloc space to do alignment.
Therefore, we take the segment before PAGE_OFFSET for modules.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/book3s/32/pgtable.h | 15 +++++----------
arch/powerpc/mm/ptdump/ptdump.c | 8 ++++++++
3 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..2ba6ac9da46f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1198,6 +1198,7 @@ config TASK_SIZE_BOOL
config TASK_SIZE
hex "Size of user task space" if TASK_SIZE_BOOL
default "0x80000000" if PPC_8xx
+ default "0xb0000000" if PPC_BOOK3S_32 && STRICT_KERNEL_RWX
default "0xc0000000"
endmenu
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 224912432821..36443cda8dcf 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -184,17 +184,7 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
*/
#define VMALLOC_OFFSET (0x1000000) /* 16M */
-/*
- * With CONFIG_STRICT_KERNEL_RWX, kernel segments are set NX. But when modules
- * are used, NX cannot be set on VMALLOC space. So vmalloc VM space and linear
- * memory shall not share segments.
- */
-#if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_MODULES)
-#define VMALLOC_START ((ALIGN((long)high_memory, 256L << 20) + VMALLOC_OFFSET) & \
- ~(VMALLOC_OFFSET - 1))
-#else
#define VMALLOC_START ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
-#endif
#ifdef CONFIG_KASAN_VMALLOC
#define VMALLOC_END ALIGN_DOWN(ioremap_bot, PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
@@ -202,6 +192,11 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
#define VMALLOC_END ioremap_bot
#endif
+#ifdef CONFIG_STRICT_KERNEL_RWX
+#define MODULES_END ALIGN_DOWN(PAGE_OFFSET, SZ_256M)
+#define MODULES_VADDR (MODULES_END - SZ_256M)
+#endif
+
#ifndef __ASSEMBLY__
#include <linux/sched.h>
#include <linux/threads.h>
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index e995f2e9e9f7..51aab1b7be31 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -74,6 +74,10 @@ struct addr_marker {
static struct addr_marker address_markers[] = {
{ 0, "Start of kernel VM" },
+#ifdef MODULES_VADDR
+ { 0, "modules start" },
+ { 0, "modules end" },
+#endif
{ 0, "vmalloc() Area" },
{ 0, "vmalloc() End" },
#ifdef CONFIG_PPC64
@@ -352,6 +356,10 @@ static void populate_markers(void)
int i = 0;
address_markers[i++].start_address = TASK_SIZE;
+#ifdef MODULES_VADDR
+ address_markers[i++].start_address = MODULES_VADDR;
+ address_markers[i++].start_address = MODULES_END;
+#endif
address_markers[i++].start_address = VMALLOC_START;
address_markers[i++].start_address = VMALLOC_END;
#ifdef CONFIG_PPC64
--
2.25.0
^ permalink raw reply related
* [PATCH v1 7/8] powerpc/32s: Kernel space starts at TASK_SIZE
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
Kernel space starts at TASK_SIZE. Select kernel page table
when address is over TASK_SIZE.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_32.S | 12 ++++++------
arch/powerpc/mm/book3s32/hash_low.S | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 705c042309d8..bbef6ce8322b 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -474,7 +474,7 @@ InstructionTLBMiss:
/* Get PTE (linux-style) and check access */
mfspr r3,SPRN_IMISS
#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
- lis r1,PAGE_OFFSET@h /* check if kernel address */
+ lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
#endif
mfspr r2, SPRN_SPRG_PGDIR
@@ -484,7 +484,7 @@ InstructionTLBMiss:
li r1,_PAGE_PRESENT | _PAGE_EXEC
#endif
#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
- bge- 112f
+ bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
#endif
@@ -541,7 +541,7 @@ DataLoadTLBMiss:
*/
/* Get PTE (linux-style) and check access */
mfspr r3,SPRN_DMISS
- lis r1,PAGE_OFFSET@h /* check if kernel address */
+ lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SPRG_PGDIR
#ifdef CONFIG_SWAP
@@ -549,7 +549,7 @@ DataLoadTLBMiss:
#else
li r1, _PAGE_PRESENT
#endif
- bge- 112f
+ bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
@@ -621,7 +621,7 @@ DataStoreTLBMiss:
*/
/* Get PTE (linux-style) and check access */
mfspr r3,SPRN_DMISS
- lis r1,PAGE_OFFSET@h /* check if kernel address */
+ lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SPRG_PGDIR
#ifdef CONFIG_SWAP
@@ -629,7 +629,7 @@ DataStoreTLBMiss:
#else
li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT
#endif
- bge- 112f
+ bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index 923ad8f374eb..1690d369688b 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -62,7 +62,7 @@ _GLOBAL(hash_page)
isync
#endif
/* Get PTE (linux-style) and check access */
- lis r0,KERNELBASE@h /* check if kernel address */
+ lis r0, TASK_SIZE@h /* check if kernel address */
cmplw 0,r4,r0
ori r3,r3,_PAGE_USER|_PAGE_PRESENT /* test low addresses as user */
mfspr r5, SPRN_SPRG_PGDIR /* phys page-table root */
--
2.25.0
^ permalink raw reply related
* [PATCH v1 5/8] powerpc: Use MODULES_VADDR if defined
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
In order to allow allocation of modules outside of vmalloc space,
use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined.
Redefine module_alloc() when MODULES_VADDR defined.
Unmap corresponding KASAN shadow memory.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/module.c | 11 +++++++++++
arch/powerpc/mm/kasan/kasan_init_32.c | 6 ++++++
2 files changed, 17 insertions(+)
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index df649acb5631..a211b0253cdb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -86,3 +86,14 @@ int module_finalize(const Elf_Ehdr *hdr,
return 0;
}
+
+#ifdef MODULES_VADDR
+void *module_alloc(unsigned long size)
+{
+ BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
+
+ return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL,
+ PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+ __builtin_return_address(0));
+}
+#endif
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 0760e1e754e4..f1bc267d42af 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -115,6 +115,12 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
unsigned long k_end = (unsigned long)kasan_mem_to_shadow((void *)VMALLOC_END);
kasan_update_early_region(k_start, k_end, __pte(0));
+
+#ifdef MODULES_VADDR
+ k_start = (unsigned long)kasan_mem_to_shadow((void *)MODULES_VADDR);
+ k_end = (unsigned long)kasan_mem_to_shadow((void *)MODULES_END);
+ kasan_update_early_region(k_start, k_end, __pte(0));
+#endif
}
static void __init kasan_mmu_init(void)
--
2.25.0
^ permalink raw reply related
* [PATCH v1 6/8] powerpc/32s: Only leave NX unset on segments used for modules
From: Christophe Leroy @ 2020-06-19 15:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1592578278.git.christophe.leroy@csgroup.eu>
Instead of leaving NX unset on all segments above the start
of vmalloc space, only leave NX unset on segments used for
modules.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/book3s32/mmu.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 03b6ba54460e..c0162911f6cb 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -187,6 +187,17 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
return __mmu_mapin_ram(border, top);
}
+static bool is_module_segment(unsigned long addr)
+{
+ if (!IS_ENABLED(CONFIG_MODULES))
+ return false;
+ if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
+ return false;
+ if (addr >= ALIGN(VMALLOC_END, SZ_256M))
+ return false;
+ return true;
+}
+
void mmu_mark_initmem_nx(void)
{
int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
@@ -223,9 +234,9 @@ void mmu_mark_initmem_nx(void)
for (i = TASK_SIZE >> 28; i < 16; i++) {
/* Do not set NX on VM space for modules */
- if (IS_ENABLED(CONFIG_MODULES) &&
- (VMALLOC_START & 0xf0000000) == i << 28)
- break;
+ if (is_module_segment(i << 28))
+ continue;
+
mtsrin(mfsrin(i << 28) | 0x10000000, i << 28);
}
}
--
2.25.0
^ permalink raw reply related
* [PATCH] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score
From: Scott Cheloha @ 2020-06-19 15:34 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch, Tyrel Datwyler
The H_GetPerformanceCounterInfo PHYP hypercall has a subcall,
Affinity_Domain_Info_By_Partition, which returns, among other things,
a "partition affinity score" for a given LPAR. This score, a value on
[0-100], represents the processor-memory affinity for the LPAR in
question. A score of 0 indicates the worst possible affinity while a
score of 100 indicates perfect affinity. The score can be used to
reason about performance.
This patch adds the score for the local LPAR to the lparcfg procfile
under a new 'partition_affinity_score' key.
The H_GetPerformanceCounterInfo hypercall is already used elsewhere in
the kernel, in powerpc/perf/hv-gpci.c. Refactoring that code and this
code into a more general API might be worthwhile if additional modules
require the hypercall in the future.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
---
arch/powerpc/platforms/pseries/lparcfg.c | 53 ++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index b8d28ab88178..b75151eee0f0 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -136,6 +136,57 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
return rc;
}
+/*
+ * Based on H_GetPerformanceCounterInfo v1.10.
+ */
+static void show_gpci_data(struct seq_file *m)
+{
+ struct perf_counter_info_params {
+ __be32 counter_request;
+ __be32 starting_index;
+ __be16 secondary_index;
+ __be16 returned_values;
+ __be32 detail_rc;
+ __be16 counter_value_element_size;
+ u8 counter_info_version_in;
+ u8 counter_info_version_out;
+ u8 reserved[0xC];
+ } __packed;
+ struct hv_gpci_request_buffer {
+ struct perf_counter_info_params params;
+ u8 output[4096 - sizeof(struct perf_counter_info_params)];
+ } __packed;
+ struct hv_gpci_request_buffer *buf;
+ long ret;
+ unsigned int affinity_score;
+
+ buf = kmalloc(sizeof(*buf), GFP_KERNEL);
+ if (buf == NULL)
+ return;
+
+ /*
+ * Show the local LPAR's affinity score.
+ *
+ * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
+ * The score is at byte 0xB in the output buffer.
+ */
+ memset(&buf->params, 0, sizeof(buf->params));
+ buf->params.counter_request = cpu_to_be32(0xB1);
+ buf->params.starting_index = cpu_to_be32(-1); /* local LPAR */
+ buf->params.counter_info_version_in = 0x5; /* v5+ for score */
+ ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
+ sizeof(*buf));
+ if (ret != H_SUCCESS) {
+ pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
+ ret, be32_to_cpu(buf->params.detail_rc));
+ goto out;
+ }
+ affinity_score = buf->output[0xB];
+ seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
+out:
+ kfree(buf);
+}
+
static unsigned h_pic(unsigned long *pool_idle_time,
unsigned long *num_procs)
{
@@ -487,6 +538,8 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
partition_active_processors * 100);
}
+ show_gpci_data(m);
+
seq_printf(m, "partition_active_processors=%d\n",
partition_active_processors);
--
2.24.1
^ permalink raw reply related
* [PATCH 16/26] mm/powerpc: Use general page fault accounting
From: Peter Xu @ 2020-06-19 16:13 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Andrea Arcangeli, linuxppc-dev, Peter Xu, Linus Torvalds,
Paul Mackerras, Andrew Morton, Will Deacon, Gerald Schaefer
In-Reply-To: <20200619160538.8641-1-peterx@redhat.com>
Use the general page fault accounting by passing regs into handle_mm_fault().
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/powerpc/mm/fault.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 992b10c3761c..e325d13efaf5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);
#ifdef CONFIG_PPC_MEM_KEYS
/*
@@ -604,14 +604,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* Major/minor page fault accounting.
*/
- if (major) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+ if (major)
cmo_account_page_fault();
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
+
return 0;
}
NOKPROBE_SYMBOL(__do_page_fault);
--
2.26.2
^ permalink raw reply related
* Re: [PATCH] pci: pcie: AER: Fix logging of Correctable errors
From: Sinan Kaya @ 2020-06-19 17:17 UTC (permalink / raw)
To: Matt Jolly, Russell Currey, Sam Bobroff, Oliver O'Halloran,
Bjorn Helgaas, linuxppc-dev, linux-pci, linux-kernel
In-Reply-To: <20200618155511.16009-1-Kangie@footclan.ninja>
On 6/18/2020 11:55 AM, Matt Jolly wrote:
> + pci_warn(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
> + dev->vendor, dev->device,
> + info->status, info->mask);
> + } else {
<snip>
> + pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
> + dev->vendor, dev->device,
> + info->status, info->mask);
Function pointers for pci_warn vs. pci_err ?
This looks like a lot of copy/paste.
^ permalink raw reply
* [PATCH] hvc: unify console setup naming
From: Sergey Senozhatsky @ 2020-06-19 17:22 UTC (permalink / raw)
To: Petr Mladek
Cc: Greg Kroah-Hartman, linux-kernel, Steven Rostedt,
Sergey Senozhatsky, Jiri Slaby, Andy Shevchenko, linuxppc-dev
Use the 'common' foo_console_setup() naming scheme. There are 71
foo_console_setup() callbacks and only one foo_setup_console().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
---
drivers/tty/hvc/hvc_xen.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 5ef08905fe05..2a0e51a20e34 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -603,7 +603,7 @@ static void xen_hvm_early_write(uint32_t vtermno, const char *str, int len) { }
#endif
#ifdef CONFIG_EARLY_PRINTK
-static int __init xenboot_setup_console(struct console *console, char *string)
+static int __init xenboot_console_setup(struct console *console, char *string)
{
static struct xencons_info xenboot;
@@ -647,7 +647,7 @@ static void xenboot_write_console(struct console *console, const char *string,
struct console xenboot_console = {
.name = "xenboot",
.write = xenboot_write_console,
- .setup = xenboot_setup_console,
+ .setup = xenboot_console_setup,
.flags = CON_PRINTBUFFER | CON_BOOT | CON_ANYTIME,
.index = -1,
};
--
2.27.0
^ permalink raw reply related
* Re: [PATCH] pci: pcie: AER: Fix logging of Correctable errors
From: Joe Perches @ 2020-06-19 18:09 UTC (permalink / raw)
To: Sinan Kaya, Matt Jolly, Russell Currey, Sam Bobroff,
Oliver O'Halloran, Bjorn Helgaas, linuxppc-dev, linux-pci,
linux-kernel
In-Reply-To: <d611dabd-b943-8492-a29e-0b7fb1980de8@kernel.org>
On Fri, 2020-06-19 at 13:17 -0400, Sinan Kaya wrote:
> On 6/18/2020 11:55 AM, Matt Jolly wrote:
>
> > + pci_warn(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
> > + dev->vendor, dev->device,
> > + info->status, info->mask);
> > + } else {
>
> <snip>
>
> > + pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
> > + dev->vendor, dev->device,
> > + info->status, info->mask);
>
> Function pointers for pci_warn vs. pci_err ?
Not really possible as both are function-like macros.
^ permalink raw reply
* Re: [PATCH 00/22] ReST conversion patches (final?)
From: Jonathan Corbet @ 2020-06-19 20:20 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Rich Felker, Linux Doc Mailing List, David Airlie,
Catalin Marinas, Dragan Cvetic, linux-pci, Jarkko Sakkinen,
Bjorn Andersson, David Howells, linux-mm, Harry Wei,
Paul Mackerras, Alex Shi, Will Deacon, Javi Merino, Herbert Xu,
Yoshinori Sato, linux-sh, Anil S Keshavamurthy, Viresh Kumar,
Naveen N. Rao, Derek Kiernan, linux-crypto, Ohad Ben-Cohen,
devicetree, Daniel Vetter, Michael Hennerich, linux-pm,
linux-remoteproc, Maarten Lankhorst, Maxime Ripard, Rob Herring,
dri-devel, Bjorn Helgaas, Dan Williams, linux-arm-kernel,
Daniel Lezcano, Greg Kroah-Hartman, Amit Daniel Kachhap,
linux-kernel, David S. Miller, tee-dev, Vinod Koul, keyrings,
Arnd Bergmann, Masami Hiramatsu, Thomas Zimmermann, dmaengine,
Andrew Morton, linuxppc-dev, Jens Wiklander
In-Reply-To: <cover.1592203650.git.mchehab+huawei@kernel.org>
On Mon, 15 Jun 2020 08:50:05 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> That's my final(*) series of conversion patches from .txt to ReST.
>
> (*) Well, running the script I'm using to check, I noticed a couple of new *.txt files.
> If I have some time, I'll try to address those last pending things for v5.9.
OK, I've applied the set except for parts:
1: |copy| as mentioned before
18: because of the license boilerplate
19: doesn't apply at all (perhaps because of one of the above)
22: because I don't like the latex markup.
Also, I took the liberty of just reverting the |copy| change in #10.
Getting there..!
Thanks,
jon
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score
From: Tyrel Datwyler @ 2020-06-19 20:38 UTC (permalink / raw)
To: Scott Cheloha, linuxppc-dev; +Cc: Nathan Lynch
In-Reply-To: <20200619153419.3676392-1-cheloha@linux.ibm.com>
On 6/19/20 8:34 AM, Scott Cheloha wrote:
> The H_GetPerformanceCounterInfo PHYP hypercall has a subcall,
> Affinity_Domain_Info_By_Partition, which returns, among other things,
> a "partition affinity score" for a given LPAR. This score, a value on
> [0-100], represents the processor-memory affinity for the LPAR in
> question. A score of 0 indicates the worst possible affinity while a
> score of 100 indicates perfect affinity. The score can be used to
> reason about performance.
>
> This patch adds the score for the local LPAR to the lparcfg procfile
> under a new 'partition_affinity_score' key.
I expect that you will probably get a NACK from Michael on this. The overall
desire is to move away from these dated /proc interfaces. While its true that I
did add a new value recently it was strictly to facilitate and correct the
calculation of a derived value that was already dependent on a couple other
existing values in lparcfg.
With that said I would expect that you would likely be advised to expose this as
a sysfs attribute. The question is where? We probably should put some thought in
to this as I would like to port each lparcfg value over to sysfs so that we can
move to deprecating lparcfg. Putting everything under something like
/sys/kernel/lparcfg/* maybe. Michael may have a better suggestion.
>
> The H_GetPerformanceCounterInfo hypercall is already used elsewhere in
> the kernel, in powerpc/perf/hv-gpci.c. Refactoring that code and this
> code into a more general API might be worthwhile if additional modules
> require the hypercall in the future.
If you are duplicating code its likely you should already be doing this. See the
rest of my comments about below.
>
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
> ---
> arch/powerpc/platforms/pseries/lparcfg.c | 53 ++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
> index b8d28ab88178..b75151eee0f0 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -136,6 +136,57 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
> return rc;
> }
>
> +/*
> + * Based on H_GetPerformanceCounterInfo v1.10.
> + */
> +static void show_gpci_data(struct seq_file *m)
> +{
> + struct perf_counter_info_params {
> + __be32 counter_request;
> + __be32 starting_index;
> + __be16 secondary_index;
> + __be16 returned_values;
> + __be32 detail_rc;
> + __be16 counter_value_element_size;
> + u8 counter_info_version_in;
> + u8 counter_info_version_out;
> + u8 reserved[0xC];
> + } __packed;
This looks to duplicate the hv_get_perf_counter_info_params struct from
arch/powerpc/perf/hv-gpci.h. Maybe this include file needs to move to
arch/powerpc/asm/inlcude so you don't have to redefine this struct.
> + struct hv_gpci_request_buffer {
> + struct perf_counter_info_params params;
> + u8 output[4096 - sizeof(struct perf_counter_info_params)];
> + } __packed;
This struct is code duplication of the one defined in
arch/powerpc/perf/hv-gpci.c and could be moved into hv-gpci.h along with
HGPCI_MAX_DATA_BYTES so that you can use those versions here.
> + struct hv_gpci_request_buffer *buf;
> + long ret;
> + unsigned int affinity_score;
> +
> + buf = kmalloc(sizeof(*buf), GFP_KERNEL);
> + if (buf == NULL)
> + return;
> +
> + /*
> + * Show the local LPAR's affinity score.
> + *
> + * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
> + * The score is at byte 0xB in the output buffer.
> + */
> + memset(&buf->params, 0, sizeof(buf->params));
> + buf->params.counter_request = cpu_to_be32(0xB1);
> + buf->params.starting_index = cpu_to_be32(-1); /* local LPAR */
> + buf->params.counter_info_version_in = 0x5; /* v5+ for score */
> + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
> + sizeof(*buf));
> + if (ret != H_SUCCESS) {
> + pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
> + ret, be32_to_cpu(buf->params.detail_rc));
> + goto out;
> + }
> + affinity_score = buf->output[0xB];
> + seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
> +out:
> + kfree(buf);
> +}
> +
IIUC we should already be able to get this value from userspace using perf tool,
right? If thats the case can't we also programatically retrieve it via the
perf_event interface in userspace as well?
-Tyrel
> static unsigned h_pic(unsigned long *pool_idle_time,
> unsigned long *num_procs)
> {
> @@ -487,6 +538,8 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
> partition_active_processors * 100);
> }
>
> + show_gpci_data(m);
> +
> seq_printf(m, "partition_active_processors=%d\n",
> partition_active_processors);
>
>
^ permalink raw reply
* Re: [PATCH 6/6] kernel: add a kernel_wait helper
From: Luis Chamberlain @ 2020-06-19 21:17 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton, Greg Kroah-Hartman
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Al Viro, sparclinux,
linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-7-hch@lst.de>
On Thu, Jun 18, 2020 at 04:46:27PM +0200, Christoph Hellwig wrote:
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1626,6 +1626,22 @@ long kernel_wait4(pid_t upid, int __user *stat_addr, int options,
> return ret;
> }
>
> +int kernel_wait(pid_t pid, int *stat)
> +{
> + struct wait_opts wo = {
> + .wo_type = PIDTYPE_PID,
> + .wo_pid = find_get_pid(pid),
> + .wo_flags = WEXITED,
> + };
> + int ret;
> +
> + ret = do_wait(&wo);
> + if (ret > 0 && wo.wo_stat)
> + *stat = wo.wo_stat;
Since all we care about is WEXITED, that could be simplified
to something like this:
if (ret > 0 && KWIFEXITED(wo.wo_stat)
*stat = KWEXITSTATUS(wo.wo_stat)
Otherwise callers have to use W*() wrappers.
> + put_pid(wo.wo_pid);
> + return ret;
> +}
Then we don't get *any* in-kernel code dealing with the W*() crap.
I just unwrapped this for the umh [0], given that otherwise we'd
have to use KW*() callers elsewhere. Doing it upshot one level
further would be even better.
[0] https://lkml.kernel.org/r/20200610154923.27510-1-mcgrof@kernel.org
Luis
^ permalink raw reply
* [PATCH v3 0/4] Migrate non-migrated pages of a SVM.
From: Ram Pai @ 2020-06-19 22:43 UTC (permalink / raw)
To: kvm-ppc, linuxppc-dev
Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
sukadev, bauerman, david
The time taken to switch a VM to Secure-VM, increases by the size of the VM. A
100GB VM takes about 7minutes. This is unacceptable. This linear increase is
caused by a suboptimal behavior by the Ultravisor and the Hypervisor. The
Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
secure-memory. It has to just migrate the necessary and sufficient GFNs.
However when the optimization is incorporated in the Ultravisor, the Hypervisor
starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
will explicitly request to migrate, each and every GFN of the VM. If only
necessary and sufficient GFNs are requested for migration, the Hypervisor
continues to manage the remaining GFNs as normal GFNs. This leads of memory
corruption, manifested consistently when the SVM reboots.
The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
expects the ultravisor to request migration of all GFNs to secure-GFN. But at
the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall. This
problem manifests as random errors in the SVM, when a memory-slot is
hotplugged.
This patch series automatically migrates the non-migrated pages of a SVM,
and thus solves the problem.
Testing: Passed rigorous SVM test using various sized SVMs.
Changelog:
v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.
v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.
Laurent Dufour (1):
KVM: PPC: Book3S HV: migrate hot plugged memory
Ram Pai (3):
KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
H_SVM_INIT_DONE
Documentation/powerpc/ultravisor.rst | 2 +
arch/powerpc/include/asm/kvm_book3s_uvmem.h | 2 +
arch/powerpc/kvm/book3s_hv.c | 10 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 360 +++++++++++++++++++++++-----
4 files changed, 315 insertions(+), 59 deletions(-)
--
1.8.3.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox