* Re: [PATCH] hw_breakpoint: Fix build warnings with clang
From: Christophe Leroy @ 2020-06-02 5:10 UTC (permalink / raw)
To: Ravi Bangoria, mpe
Cc: christophe.leroy, apopple, mikey, linux-kernel, npiggin, paulus,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200602041208.128913-1-ravi.bangoria@linux.ibm.com>
Le 02/06/2020 à 06:12, Ravi Bangoria a écrit :
> kbuild test robot reported few build warnings with hw_breakpoint code
> when compiled with clang[1]. Fix those.
>
> [1]: https://lore.kernel.org/linuxppc-dev/202005192233.oi9CjRtA%25lkp@intel.com/
>
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
> Note: Prepared on top of powerpc/next.
>
> arch/powerpc/include/asm/hw_breakpoint.h | 3 ---
> include/linux/hw_breakpoint.h | 4 ++++
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h
> index f42a55eb77d2..cb424799da0d 100644
> --- a/arch/powerpc/include/asm/hw_breakpoint.h
> +++ b/arch/powerpc/include/asm/hw_breakpoint.h
> @@ -70,9 +70,6 @@ extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
> unsigned long val, void *data);
> int arch_install_hw_breakpoint(struct perf_event *bp);
> void arch_uninstall_hw_breakpoint(struct perf_event *bp);
> -int arch_reserve_bp_slot(struct perf_event *bp);
> -void arch_release_bp_slot(struct perf_event *bp);
> -void arch_unregister_hw_breakpoint(struct perf_event *bp);
> void hw_breakpoint_pmu_read(struct perf_event *bp);
> extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
>
> diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
> index 6058c3844a76..521481f0d320 100644
> --- a/include/linux/hw_breakpoint.h
> +++ b/include/linux/hw_breakpoint.h
> @@ -80,6 +80,10 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp);
> extern int dbg_release_bp_slot(struct perf_event *bp);
> extern int reserve_bp_slot(struct perf_event *bp);
> extern void release_bp_slot(struct perf_event *bp);
> +extern int hw_breakpoint_weight(struct perf_event *bp);
> +extern int arch_reserve_bp_slot(struct perf_event *bp);
> +extern void arch_release_bp_slot(struct perf_event *bp);
> +extern void arch_unregister_hw_breakpoint(struct perf_event *bp);
Please no new 'extern'. In the old days 'extern' keyword was used, but
new code shall not introduce new unnecessary usage of 'extern' keyword.
See report from Checkpatch below:
WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description
(prefer a maximum 75 chars per line)
#9:
[1]:
https://lore.kernel.org/linuxppc-dev/202005192233.oi9CjRtA%25lkp@intel.com/
CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#40: FILE: include/linux/hw_breakpoint.h:83:
+extern int hw_breakpoint_weight(struct perf_event *bp);
CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#41: FILE: include/linux/hw_breakpoint.h:84:
+extern int arch_reserve_bp_slot(struct perf_event *bp);
CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#42: FILE: include/linux/hw_breakpoint.h:85:
+extern void arch_release_bp_slot(struct perf_event *bp);
CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#43: FILE: include/linux/hw_breakpoint.h:86:
+extern void arch_unregister_hw_breakpoint(struct perf_event *bp);
total: 0 errors, 1 warnings, 4 checks, 19 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or
--fix-inplace.
the.patch has style problems, please review.
NOTE: Ignored message types: ARCH_INCLUDE_LINUX BIT_MACRO
COMPARISON_TO_NULL DT_SPLIT_BINDING_PATCH EMAIL_SUBJECT
FILE_PATH_CHANGES GLOBAL_INITIALISERS LINE_SPACING MULTIPLE_ASSIGNMENTS
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Christophe
^ permalink raw reply
* Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)
From: Bhupesh Sharma @ 2020-06-02 5:24 UTC (permalink / raw)
To: linux-arm-kernel, x86
Cc: Mark Rutland, Linux Doc Mailing List, Paul Mackerras,
Amit Kachhap, Will Deacon, Ingo Molnar, Jonathan Corbet,
Catalin Marinas, John Donnelly, scott.branden, Boris Petkov,
Thomas Gleixner, Bhupesh SHARMA, Kazuhito Hagio, Ard Biesheuvel,
Steve Capper, kexec mailing list, Linux Kernel Mailing List,
James Morse, Dave Anderson, linuxppc-dev
In-Reply-To: <1589395957-24628-1-git-send-email-bhsharma@redhat.com>
Hello,
On Thu, May 14, 2020 at 12:22 AM Bhupesh Sharma <bhsharma@redhat.com> wrote:
>
> Apologies for the delayed update. Its been quite some time since I
> posted the last version (v5), but I have been really caught up in some
> other critical issues.
>
> Changes since v5:
> ----------------
> - v5 can be viewed here:
> http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> - Addressed review comments from James Morse and Boris.
> - Added Tested-by received from John on v5 patchset.
> - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
> patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
> applied.
>
> Changes since v4:
> ----------------
> - v4 can be seen here:
> http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
> - Addressed comments from Dave and added patches for documenting
> new variables appended to vmcoreinfo documentation.
> - Added testing report shared by Akashi for PATCH 2/5.
>
> Changes since v3:
> ----------------
> - v3 can be seen here:
> http://lists.infradead.org/pipermail/kexec/2019-March/022590.html
> - Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo
> instead of PTRS_PER_PGD.
> - Added a new patch (via [PATCH 3/3]), which fixes a simple typo in
> 'Documentation/arm64/memory.rst'
>
> Changes since v2:
> ----------------
> - v2 can be seen here:
> http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
> - Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
> ifdef sections, as suggested by Kazu.
> - Updated vmcoreinfo documentation to add description about
> 'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).
>
> Changes since v1:
> ----------------
> - v1 was sent out as a single patch which can be seen here:
> http://lists.infradead.org/pipermail/kexec/2019-February/022411.html
>
> - v2 breaks the single patch into two independent patches:
> [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
> [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code (all archs)
>
> This patchset primarily fixes the regression reported in user-space
> utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
> with the availability of 52-bit address space feature in underlying
> kernel. These regressions have been reported both on CPUs which don't
> support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
> and also on prototype platforms (like ARMv8 FVP simulator model) which
> support ARMv8.2 extensions and are running newer kernels.
>
> The reason for these regressions is that right now user-space tools
> have no direct access to these values (since these are not exported
> from the kernel) and hence need to rely on a best-guess method of
> determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported
> by underlying kernel.
>
> Exporting these values via vmcoreinfo will help user-land in such cases.
> In addition, as per suggestion from makedumpfile maintainer (Kazu),
> it makes more sense to append 'MAX_PHYSMEM_BITS' to
> vmcoreinfo in the core code itself rather than in arm64 arch-specific
> code, so that the user-space code for other archs can also benefit from
> this addition to the vmcoreinfo and use it as a standard way of
> determining 'SECTIONS_SHIFT' value in user-land.
>
> Cc: Boris Petkov <bp@alien8.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: James Morse <james.morse@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Dave Anderson <anderson@redhat.com>
> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> Cc: John Donnelly <john.p.donnelly@oracle.com>
> Cc: scott.branden@broadcom.com
> Cc: Amit Kachhap <amit.kachhap@arm.com>
> Cc: x86@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: kexec@lists.infradead.org
>
> Bhupesh Sharma (2):
> crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
> arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
>
> Documentation/admin-guide/kdump/vmcoreinfo.rst | 16 ++++++++++++++++
> arch/arm64/include/asm/pgtable-hwdef.h | 1 +
> arch/arm64/kernel/crash_core.c | 10 ++++++++++
> kernel/crash_core.c | 1 +
> 4 files changed, 28 insertions(+)
Ping. @James Morse , Others
Please share if you have some comments regarding this patchset.
Thanks,
Bhupesh
^ permalink raw reply
* Re: [PATCH] powerpc/64s: Fix restore of NV GPRs after facility unavailable exception
From: Michael Ellerman @ 2020-06-02 5:25 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: npiggin
In-Reply-To: <20200526061808.2472279-1-mpe@ellerman.id.au>
On Tue, 2020-05-26 at 06:18:08 UTC, Michael Ellerman wrote:
> Commit 702f09805222 ("powerpc/64s/exception: Remove lite interrupt
> return") changed the interrupt return path to not restore non-volatile
> registers by default, and explicitly restore them in paths where it is
> required.
>
> But it missed that the facility unavailable exception can sometimes
> modify user registers, ie. when it does emulation of move from DSCR.
>
> This is seen as a failure of the dscr_sysfs_thread_test:
> test: dscr_sysfs_thread_test
> [cpu 0] User DSCR should be 1 but is 0
> failure: dscr_sysfs_thread_test
>
> So restore non-volatile GPRs after facility unavailable exceptions.
>
> Currently the hypervisor facility unavailable exception is also wired
> up to call facility_unavailable_exception().
>
> In practice we should never take a hypervisor facility unavailable
> exception for the DSCR. On older bare metal systems we set HFSCR_DSCR
> unconditionally in __init_HFSCR, or on newer systems it should be
> enabled via the "data-stream-control-register" device tree CPU
> feature.
>
> Even if it's not, since commit f3c99f97a3cd ("KVM: PPC: Book3S HV:
> Don't access HFSCR, LPIDR or LPCR when running nested"), the KVM code
> has unconditionally set HFSCR_DSCR when running guests.
>
> So we should only get a hypervisor facility unavailable for the DSCR
> if skiboot has disabled the "data-stream-control-register" feature,
> and we are somehow in guest context but not via KVM.
>
> Given all that, it should be unnecessary to add a restore of
> non-volatile GPRs after the hypervisor facility exception, because we
> never expect to hit that path. But equally we may as well add the
> restore, because we never expect to hit that path, and if we ever did,
> at least we would correctly restore the registers to their post
> emulation state.
>
> In future we can split the non-HV and HV facility unavailable handling
> so that there is no emulation in the HV handler, and then remove the
> restore for the HV case.
>
> Fixes: 702f09805222 ("powerpc/64s/exception: Remove lite interrupt return")
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Applied to powerpc fixes.
https://git.kernel.org/powerpc/c/595d153dd1022392083ac93a1550382cbee127e0
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code
From: Michael Ellerman @ 2020-06-02 5:25 UTC (permalink / raw)
To: Daniel Axtens, linuxppc-dev, npiggin; +Cc: ajd, Daniel Axtens
In-Reply-To: <20200529061446.2773-1-dja@axtens.net>
On Fri, 2020-05-29 at 06:14:46 UTC, Daniel Axtens wrote:
> syzkaller is picking up a bunch of crashes that look like this:
>
> Unrecoverable exception 380 at c00000000037ed60 (msr=8000000000001031)
> Oops: Unrecoverable exception, sig: 6 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
> NIP: c00000000037ed60 LR: c00000000004bac8 CTR: c000000000030990
> REGS: c0000000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
> MSR: 8000000000001031 <SF,ME,IR,DR,LE> CR: 48222882 XER: 20000000
> CFAR: c00000000004bac4 IRQMASK: 0
> GPR00: c00000000004bb68 c0000000555a74c0 c0000000024b3500 0000000000000005
> GPR04: 0000000000000000 0000000000000000 c00000000004bb88 c008000000910000
> GPR08: 00000000000b0000 c00000000004bac8 0000000000016000 c000000002503500
> GPR12: c000000000030990 c000000003190000 00000000106a5898 00000000106a0000
> GPR16: 00000000106a5890 c000000007a92000 c000000008180e00 c000000007a8f700
> GPR20: c000000007a904b0 0000000010110000 c00000000259d318 5deadbeef0000100
> GPR24: 5deadbeef0000122 c000000078422700 c000000009ee88b8 c000000078422778
> GPR28: 0000000000000001 800000000280b033 0000000000000000 c0000000555a75a0
> NIP [c00000000037ed60] __sanitizer_cov_trace_pc+0x40/0x50
> LR [c00000000004bac8] interrupt_exit_kernel_prepare+0x118/0x310
> Call Trace:
> [c0000000555a74c0] [c00000000004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable)
> [c0000000555a7530] [c00000000000f9a8] interrupt_return+0x118/0x1c0
> --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
> ...<random previous call chain>...
>
> That looks like the KCOV helper accessing memory that's not safe to
> access in the interrupt handling context.
>
> Do not instrument the new syscall entry/exit code with KCOV, GCOV or
> UBSAN.
>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
> Signed-off-by: Daniel Axtens <dja@axtens.net>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/2f26ed1764b42a8c40d9c48441c73a70d805decf
cheers
^ permalink raw reply
* [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper
From: Jordan Niethe @ 2020-06-02 5:27 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Jordan Niethe, alistair
There are quite a few places where instructions are printed, this is
done using a '%x' format specifier. With the introduction of prefixed
instructions, this does not work well. Currently in these places,
ppc_inst_val() is used for the value for %x so only the first word of
prefixed instructions are printed.
When the instructions are word instructions, only a single word should
be printed. For prefixed instructions both the prefix and suffix should
be printed. To accommodate both of these situations, instead of a '%x'
specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
returns a char *. The char * __ppc_inst_as_str() returns is buffer that
is passed to it by the caller.
It is cumbersome to require every caller of __ppc_inst_as_str() to now
declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
wrap it in a macro that uses a compound statement to allocate a buffer
on the caller's stack before calling it.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
arch/powerpc/include/asm/inst.h | 19 +++++++++++++++++++
arch/powerpc/kernel/kprobes.c | 2 +-
arch/powerpc/kernel/trace/ftrace.c | 26 +++++++++++++-------------
arch/powerpc/lib/test_emulate_step.c | 4 ++--
arch/powerpc/xmon/xmon.c | 2 +-
5 files changed, 36 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index 45f3ec868258..3df7806e6dc3 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -122,6 +122,25 @@ static inline u64 ppc_inst_as_u64(struct ppc_inst x)
#endif
}
+#define PPC_INST_STR_LEN sizeof("0x00000000 0x00000000")
+
+static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_inst x)
+{
+ if (ppc_inst_prefixed(x))
+ sprintf(str, "0x%08x 0x%08x", ppc_inst_val(x), ppc_inst_suffix(x));
+ else
+ sprintf(str, "0x%08x", ppc_inst_val(x));
+
+ return str;
+}
+
+#define ppc_inst_as_str(x) \
+({ \
+ char __str[PPC_INST_STR_LEN]; \
+ __ppc_inst_as_str(__str, x); \
+ __str; \
+})
+
int probe_user_read_inst(struct ppc_inst *inst,
struct ppc_inst __user *nip);
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 227510df8c55..d0797171dba3 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -244,7 +244,7 @@ static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
* So, we should never get here... but, its still
* good to catch them, just in case...
*/
- printk("Can't step on instruction %x\n", ppc_inst_val(insn));
+ printk("Can't step on instruction %s\n", ppc_inst_as_str(insn));
BUG();
} else {
/*
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 5e399628f51a..da11a26d8213 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -73,8 +73,8 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, struct ppc_inst new)
/* Make sure it is what we expect it to be */
if (!ppc_inst_equal(replaced, old)) {
- pr_err("%p: replaced (%#x) != old (%#x)",
- (void *)ip, ppc_inst_val(replaced), ppc_inst_val(old));
+ pr_err("%p: replaced (%s) != old (%s)",
+ (void *)ip, ppc_inst_as_str(replaced), ppc_inst_as_str(old));
return -EINVAL;
}
@@ -137,7 +137,7 @@ __ftrace_make_nop(struct module *mod,
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(op)) {
- pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
+ pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
return -EINVAL;
}
@@ -172,8 +172,8 @@ __ftrace_make_nop(struct module *mod,
/* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
if (!ppc_inst_equal(op, ppc_inst(PPC_INST_MFLR)) &&
!ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) {
- pr_err("Unexpected instruction %08x around bl _mcount\n",
- ppc_inst_val(op));
+ pr_err("Unexpected instruction %s around bl _mcount\n",
+ ppc_inst_as_str(op));
return -EINVAL;
}
#else
@@ -203,7 +203,7 @@ __ftrace_make_nop(struct module *mod,
}
if (!ppc_inst_equal(op, ppc_inst(PPC_INST_LD_TOC))) {
- pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, ppc_inst_val(op));
+ pr_err("Expected %08x found %s\n", PPC_INST_LD_TOC, ppc_inst_as_str(op));
return -EINVAL;
}
#endif /* CONFIG_MPROFILE_KERNEL */
@@ -231,7 +231,7 @@ __ftrace_make_nop(struct module *mod,
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(op)) {
- pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
+ pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
return -EINVAL;
}
@@ -406,7 +406,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(op)) {
- pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
+ pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
return -EINVAL;
}
@@ -533,8 +533,8 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
return -EFAULT;
if (!expected_nop_sequence(ip, op[0], op[1])) {
- pr_err("Unexpected call sequence at %p: %x %x\n",
- ip, ppc_inst_val(op[0]), ppc_inst_val(op[1]));
+ pr_err("Unexpected call sequence at %p: %s %s\n",
+ ip, ppc_inst_as_str(op[0]), ppc_inst_as_str(op[1]));
return -EINVAL;
}
@@ -597,7 +597,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
/* It should be pointing to a nop */
if (!ppc_inst_equal(op, ppc_inst(PPC_INST_NOP))) {
- pr_err("Expected NOP but have %x\n", ppc_inst_val(op));
+ pr_err("Expected NOP but have %s\n", ppc_inst_as_str(op));
return -EINVAL;
}
@@ -654,7 +654,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long addr)
}
if (!ppc_inst_equal(op, ppc_inst(PPC_INST_NOP))) {
- pr_err("Unexpected call sequence at %p: %x\n", ip, ppc_inst_val(op));
+ pr_err("Unexpected call sequence at %p: %s\n", ip, ppc_inst_as_str(op));
return -EINVAL;
}
@@ -733,7 +733,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(op)) {
- pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
+ pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
return -EINVAL;
}
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 46af80279ebc..5fb6f5267d70 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -852,7 +852,7 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
if (analyse_instr(&op, regs, instr) != 1 ||
GETTYPE(op.type) != COMPUTE) {
- pr_info("emulation failed, instruction = 0x%08x\n", ppc_inst_val(instr));
+ pr_info("execution failed, instruction = %s\n", ppc_inst_as_str(instr));
return -EFAULT;
}
@@ -872,7 +872,7 @@ static int __init execute_compute_instr(struct pt_regs *regs,
/* Patch the NOP with the actual instruction */
patch_instruction_site(&patch__exec_instr, instr);
if (exec_instr(regs)) {
- pr_info("execution failed, instruction = 0x%08x\n", ppc_inst_val(instr));
+ pr_info("execution failed, instruction = %s\n", ppc_inst_as_str(instr));
return -EFAULT;
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 16ee6639a60c..1dd3bf02021b 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2958,7 +2958,7 @@ generic_inst_dump(unsigned long adr, long count, int praddr,
dotted = 0;
last_inst = inst;
if (praddr)
- printf(REG" %.8x", adr, ppc_inst_val(inst));
+ printf(REG" %s", adr, ppc_inst_as_str(inst));
printf("\t");
dump_func(ppc_inst_val(inst), adr);
printf("\n");
--
2.17.1
^ permalink raw reply related
* [PATCH 2/4] powerpc/xmon: Improve dumping prefixed instructions
From: Jordan Niethe @ 2020-06-02 5:27 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Jordan Niethe, alistair
In-Reply-To: <20200602052728.18227-1-jniethe5@gmail.com>
Currently prefixed instructions are dumped as two separate word
instructions. Use mread_instr() so that prefixed instructions are read
as such and update the incrementor in the loop to take this into
account.
'dump_func' is print_insn_powerpc() which comes from ppc-dis.c which is
taken from binutils. When this is updated prefixed instructions will be
disassembled.
Currently dumping prefixed instructions looks like this:
0:mon> di c000000000094168
c000000000094168 0x06000000 .long 0x6000000
c00000000009416c 0x392a0003 addi r9,r10,3
c000000000094170 0x913f0028 stw r9,40(r31)
c000000000094174 0xe93f002a lwa r9,40(r31)
c000000000094178 0x7d234b78 mr r3,r9
c00000000009417c 0x383f0040 addi r1,r31,64
c000000000094180 0xebe1fff8 ld r31,-8(r1)
c000000000094184 0x4e800020 blr
c000000000094188 0x60000000 nop
...
c000000000094190 0x3c4c0121 addis r2,r12,289
c000000000094194 0x38429670 addi r2,r2,-27024
c000000000094198 0x7c0802a6 mflr r0
c00000000009419c 0x60000000 nop
c0000000000941a0 0xe9240100 ld r9,256(r4)
c0000000000941a4 0x39400001 li r10,1
After this it looks like:
0:mon> di c000000000094168
c000000000094168 0x06000000 0x392a0003 .long 0x392a000306000000
c000000000094170 0x913f0028 stw r9,40(r31)
c000000000094174 0xe93f002a lwa r9,40(r31)
c000000000094178 0x7d234b78 mr r3,r9
c00000000009417c 0x383f0040 addi r1,r31,64
c000000000094180 0xebe1fff8 ld r31,-8(r1)
c000000000094184 0x4e800020 blr
c000000000094188 0x60000000 nop
...
c000000000094190 0x3c4c0121 addis r2,r12,289
c000000000094194 0x38429570 addi r2,r2,-27280
c000000000094198 0x7c0802a6 mflr r0
c00000000009419c 0x60000000 nop
c0000000000941a0 0xe9240100 ld r9,256(r4)
c0000000000941a4 0x39400001 li r10,1
c0000000000941a8 0x3d02000b addis r8,r2,11
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
arch/powerpc/xmon/xmon.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 1dd3bf02021b..548571536bd1 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2935,11 +2935,10 @@ generic_inst_dump(unsigned long adr, long count, int praddr,
int nr, dotted;
unsigned long first_adr;
struct ppc_inst inst, last_inst = ppc_inst(0);
- unsigned char val[4];
dotted = 0;
- for (first_adr = adr; count > 0; --count, adr += 4) {
- nr = mread(adr, val, 4);
+ for (first_adr = adr; count > 0; --count, adr += ppc_inst_len(inst)) {
+ nr = mread_instr(adr, &inst);
if (nr == 0) {
if (praddr) {
const char *x = fault_chars[fault_type];
@@ -2947,7 +2946,6 @@ generic_inst_dump(unsigned long adr, long count, int praddr,
}
break;
}
- inst = ppc_inst(GETWORD(val));
if (adr > first_adr && ppc_inst_equal(inst, last_inst)) {
if (!dotted) {
printf(" ...\n");
@@ -2960,7 +2958,10 @@ generic_inst_dump(unsigned long adr, long count, int praddr,
if (praddr)
printf(REG" %s", adr, ppc_inst_as_str(inst));
printf("\t");
- dump_func(ppc_inst_val(inst), adr);
+ if (!ppc_inst_prefixed(inst))
+ dump_func(ppc_inst_val(inst), adr);
+ else
+ dump_func(ppc_inst_as_u64(inst), adr);
printf("\n");
}
return adr - first_adr;
--
2.17.1
^ permalink raw reply related
* [PATCH 3/4] powerpc: Handle prefixed instructions in show_user_instructions()
From: Jordan Niethe @ 2020-06-02 5:27 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Jordan Niethe, alistair
In-Reply-To: <20200602052728.18227-1-jniethe5@gmail.com>
Currently prefixed instructions are treated as two word instructions by
show_user_instructions(), treat them as a single instruction. '<' and
'>' are placed around the instruction at the NIP, and for prefixed
instructions this is placed around the prefix only. Make the '<' and '>'
wrap the prefix and suffix.
Currently showing a prefixed instruction looks like:
fbe1fff8 39200000 06000000 a3e30000 <04000000> f7e40000 ebe1fff8 4e800020
Make it look like:
0xfbe1fff8 0x39200000 0x06000000 0xa3e30000 <0x04000000 0xf7e40000> 0xebe1fff8 0x4e800020 0x00000000 0x00000000
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
arch/powerpc/kernel/process.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 048d64c4e115..b3f73e398d00 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1292,7 +1292,8 @@ void show_user_instructions(struct pt_regs *regs)
unsigned long pc;
int n = NR_INSN_TO_PRINT;
struct seq_buf s;
- char buf[96]; /* enough for 8 times 9 + 2 chars */
+ char buf[8 * sizeof("0x00000000 0x00000000") + 2];
+ struct ppc_inst instr;
pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
@@ -1303,14 +1304,15 @@ void show_user_instructions(struct pt_regs *regs)
seq_buf_clear(&s);
- for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
- int instr;
+ for (i = 0; i < 8 && n; i++, n--, pc += ppc_inst_len(instr)) {
- if (probe_user_read(&instr, (void __user *)pc, sizeof(instr))) {
+ if (probe_user_read_inst(&instr, (void __user *)pc)) {
seq_buf_printf(&s, "XXXXXXXX ");
+ instr = ppc_inst(PPC_INST_NOP);
continue;
}
- seq_buf_printf(&s, regs->nip == pc ? "<%08x> " : "%08x ", instr);
+ seq_buf_printf(&s, regs->nip == pc ? "<%s> " : "%s ",
+ ppc_inst_as_str(instr));
}
if (!seq_buf_has_overflowed(&s))
--
2.17.1
^ permalink raw reply related
* [PATCH 4/4] powerpc: Handle prefixed instructions in show_instructions()
From: Jordan Niethe @ 2020-06-02 5:27 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Jordan Niethe, alistair
In-Reply-To: <20200602052728.18227-1-jniethe5@gmail.com>
Currently show_instructions() treats prefixed instructions as two
separate word instructions. '<' and '>' are placed around the
instruction at the NIP, but as a result they only wrap around the
prefix. Make '<' and '>' straddle the whole prefixed instruction.
Currently showing a prefixed instruction looks like:
Instruction dump:
60000000 60000000 60000000 60000000 60000000 60000000 60000000 60000000
60000000 60000000 60000000 60000000 <04000000> 00000000 60000000 60000000
Make it look like:
Instruction dump:
0x60000000 0x60000000 0x60000000 0x60000000 0x60000000 0x60000000 0x60000000 0x60000000
0x60000000 0x60000000 0x60000000 0x60000000 <0x04000000 0x00000000> 0x60000000 0x60000000 0x60000000
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
arch/powerpc/kernel/process.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b3f73e398d00..bcd7277a9395 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1258,7 +1258,7 @@ static void show_instructions(struct pt_regs *regs)
printk("Instruction dump:");
for (i = 0; i < NR_INSN_TO_PRINT; i++) {
- int instr;
+ struct ppc_inst instr;
if (!(i % 8))
pr_cont("\n");
@@ -1272,16 +1272,17 @@ static void show_instructions(struct pt_regs *regs)
#endif
if (!__kernel_text_address(pc) ||
- probe_kernel_address((const void *)pc, instr)) {
+ probe_kernel_read_inst(&instr, (struct ppc_inst *)pc)) {
+ instr = ppc_inst(PPC_INST_NOP);
pr_cont("XXXXXXXX ");
} else {
if (regs->nip == pc)
- pr_cont("<%08x> ", instr);
+ pr_cont("<%s> ", ppc_inst_as_str(instr));
else
- pr_cont("%08x ", instr);
+ pr_cont("%s ", ppc_inst_as_str(instr));
}
- pc += sizeof(int);
+ pc += ppc_inst_len(instr);
}
pr_cont("\n");
--
2.17.1
^ permalink raw reply related
* [PATCH] powerpc/kvm: Enable support for ISA v3.1 guests
From: Alistair Popple @ 2020-06-02 5:53 UTC (permalink / raw)
To: linuxppc-dev; +Cc: ravi.bangoria, mikey, kvm-ppc, Alistair Popple
Adds support for emulating ISAv3.1 guests by adding the appropriate PCR
and FSCR bits.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kvm/book3s_hv.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 773f76402392..d77040d0588a 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1348,6 +1348,7 @@
#define PVR_ARCH_206p 0x0f100003
#define PVR_ARCH_207 0x0f000004
#define PVR_ARCH_300 0x0f000005
+#define PVR_ARCH_31 0x0f000006
/* Macros for setting and retrieving special purpose registers */
#ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 93493f0cbfe8..359bb2ed43e1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -345,7 +345,7 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
}
/* Dummy value used in computing PCR value below */
-#define PCR_ARCH_300 (PCR_ARCH_207 << 1)
+#define PCR_ARCH_31 (PCR_ARCH_300 << 1)
static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
{
@@ -353,7 +353,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
struct kvmppc_vcore *vc = vcpu->arch.vcore;
/* We can (emulate) our own architecture version and anything older */
- if (cpu_has_feature(CPU_FTR_ARCH_300))
+ if (cpu_has_feature(CPU_FTR_ARCH_31))
+ host_pcr_bit = PCR_ARCH_31;
+ else if (cpu_has_feature(CPU_FTR_ARCH_300))
host_pcr_bit = PCR_ARCH_300;
else if (cpu_has_feature(CPU_FTR_ARCH_207S))
host_pcr_bit = PCR_ARCH_207;
@@ -379,6 +381,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
case PVR_ARCH_300:
guest_pcr_bit = PCR_ARCH_300;
break;
+ case PVR_ARCH_31:
+ guest_pcr_bit = PCR_ARCH_31;
+ break;
default:
return -EINVAL;
}
@@ -2318,7 +2323,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
* to trap and then we emulate them.
*/
vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
- HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
+ HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX;
if (cpu_has_feature(CPU_FTR_HVMODE)) {
vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] powerpc/kvm: Enable support for ISA v3.1 guests
From: Jordan Niethe @ 2020-06-02 6:26 UTC (permalink / raw)
To: Alistair Popple; +Cc: ravi.bangoria, mikey, linuxppc-dev, kvm-ppc
In-Reply-To: <20200602055325.6102-1-alistair@popple.id.au>
On Tue, Jun 2, 2020 at 3:55 PM Alistair Popple <alistair@popple.id.au> wrote:
>
> Adds support for emulating ISAv3.1 guests by adding the appropriate PCR
> and FSCR bits.
>
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
> arch/powerpc/include/asm/reg.h | 1 +
> arch/powerpc/kvm/book3s_hv.c | 11 ++++++++---
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 773f76402392..d77040d0588a 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1348,6 +1348,7 @@
> #define PVR_ARCH_206p 0x0f100003
> #define PVR_ARCH_207 0x0f000004
> #define PVR_ARCH_300 0x0f000005
> +#define PVR_ARCH_31 0x0f000006
>
> /* Macros for setting and retrieving special purpose registers */
> #ifndef __ASSEMBLY__
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..359bb2ed43e1 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -345,7 +345,7 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> }
>
> /* Dummy value used in computing PCR value below */
> -#define PCR_ARCH_300 (PCR_ARCH_207 << 1)
> +#define PCR_ARCH_31 (PCR_ARCH_300 << 1)
>
> static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
> {
> @@ -353,7 +353,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
> struct kvmppc_vcore *vc = vcpu->arch.vcore;
>
> /* We can (emulate) our own architecture version and anything older */
> - if (cpu_has_feature(CPU_FTR_ARCH_300))
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + host_pcr_bit = PCR_ARCH_31;
> + else if (cpu_has_feature(CPU_FTR_ARCH_300))
> host_pcr_bit = PCR_ARCH_300;
> else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> host_pcr_bit = PCR_ARCH_207;
> @@ -379,6 +381,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
> case PVR_ARCH_300:
> guest_pcr_bit = PCR_ARCH_300;
> break;
> + case PVR_ARCH_31:
> + guest_pcr_bit = PCR_ARCH_31;
> + break;
> default:
> return -EINVAL;
> }
> @@ -2318,7 +2323,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
> * to trap and then we emulate them.
> */
The comment above this:
"...
* Set the default HFSCR for the guest from the host value.
* This value is only used on POWER9..."
would need to be updated.
> vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
> - HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
> + HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX;
> if (cpu_has_feature(CPU_FTR_HVMODE)) {
> vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
> if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
> --
> 2.20.1
>
^ permalink raw reply
* [PATCH v2] cxl: Remove dead Kconfig option
From: Andrew Donnellan @ 2020-06-02 7:05 UTC (permalink / raw)
To: linuxppc-dev; +Cc: fbarrat
The CXL_AFU_DRIVER_OPS Kconfig option was added to coordinate merging of
new features. It no longer serves any purpose, so remove it.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
v1->v2:
- keep CXL_LIB for now to avoid breaking a driver that is currently out of
tree
---
drivers/misc/cxl/Kconfig | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 39eec9031487..984114f7cee9 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -7,9 +7,6 @@ config CXL_BASE
bool
select PPC_COPRO_BASE
-config CXL_AFU_DRIVER_OPS
- bool
-
config CXL_LIB
bool
@@ -17,7 +14,6 @@ config CXL
tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
- select CXL_AFU_DRIVER_OPS
select CXL_LIB
default m
help
--
2.20.1
^ permalink raw reply related
* Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper
From: Joel Stanley @ 2020-06-02 7:43 UTC (permalink / raw)
To: Jordan Niethe; +Cc: linuxppc-dev, Alistair Popple
In-Reply-To: <20200602052728.18227-1-jniethe5@gmail.com>
On Tue, 2 Jun 2020 at 05:31, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
>
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
>
> It is cumbersome to require every caller of __ppc_inst_as_str() to now
> declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
> wrap it in a macro that uses a compound statement to allocate a buffer
> on the caller's stack before calling it.
>
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
> ---
> arch/powerpc/include/asm/inst.h | 19 +++++++++++++++++++
> arch/powerpc/kernel/kprobes.c | 2 +-
> arch/powerpc/kernel/trace/ftrace.c | 26 +++++++++++++-------------
> arch/powerpc/lib/test_emulate_step.c | 4 ++--
> arch/powerpc/xmon/xmon.c | 2 +-
> 5 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index 45f3ec868258..3df7806e6dc3 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -122,6 +122,25 @@ static inline u64 ppc_inst_as_u64(struct ppc_inst x)
> #endif
> }
>
> +#define PPC_INST_STR_LEN sizeof("0x00000000 0x00000000")
> +
> +static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_inst x)
> +{
> + if (ppc_inst_prefixed(x))
> + sprintf(str, "0x%08x 0x%08x", ppc_inst_val(x), ppc_inst_suffix(x));
> + else
> + sprintf(str, "0x%08x", ppc_inst_val(x));
> +
> + return str;
> +}
> +
> +#define ppc_inst_as_str(x) \
> +({ \
> + char __str[PPC_INST_STR_LEN]; \
> + __ppc_inst_as_str(__str, x); \
> + __str; \
> +})
> +
> int probe_user_read_inst(struct ppc_inst *inst,
> struct ppc_inst __user *nip);
>
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 227510df8c55..d0797171dba3 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -244,7 +244,7 @@ static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
> * So, we should never get here... but, its still
> * good to catch them, just in case...
> */
> - printk("Can't step on instruction %x\n", ppc_inst_val(insn));
> + printk("Can't step on instruction %s\n", ppc_inst_as_str(insn));
> BUG();
> } else {
> /*
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 5e399628f51a..da11a26d8213 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -73,8 +73,8 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, struct ppc_inst new)
>
> /* Make sure it is what we expect it to be */
> if (!ppc_inst_equal(replaced, old)) {
> - pr_err("%p: replaced (%#x) != old (%#x)",
> - (void *)ip, ppc_inst_val(replaced), ppc_inst_val(old));
> + pr_err("%p: replaced (%s) != old (%s)",
> + (void *)ip, ppc_inst_as_str(replaced), ppc_inst_as_str(old));
> return -EINVAL;
> }
>
> @@ -137,7 +137,7 @@ __ftrace_make_nop(struct module *mod,
>
> /* Make sure that that this is still a 24bit jump */
> if (!is_bl_op(op)) {
> - pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
> + pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -172,8 +172,8 @@ __ftrace_make_nop(struct module *mod,
> /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> if (!ppc_inst_equal(op, ppc_inst(PPC_INST_MFLR)) &&
> !ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) {
> - pr_err("Unexpected instruction %08x around bl _mcount\n",
> - ppc_inst_val(op));
> + pr_err("Unexpected instruction %s around bl _mcount\n",
> + ppc_inst_as_str(op));
> return -EINVAL;
> }
> #else
> @@ -203,7 +203,7 @@ __ftrace_make_nop(struct module *mod,
> }
>
> if (!ppc_inst_equal(op, ppc_inst(PPC_INST_LD_TOC))) {
> - pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, ppc_inst_val(op));
> + pr_err("Expected %08x found %s\n", PPC_INST_LD_TOC, ppc_inst_as_str(op));
> return -EINVAL;
> }
> #endif /* CONFIG_MPROFILE_KERNEL */
> @@ -231,7 +231,7 @@ __ftrace_make_nop(struct module *mod,
>
> /* Make sure that that this is still a 24bit jump */
> if (!is_bl_op(op)) {
> - pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
> + pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -406,7 +406,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
>
> /* Make sure that that this is still a 24bit jump */
> if (!is_bl_op(op)) {
> - pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
> + pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -533,8 +533,8 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> return -EFAULT;
>
> if (!expected_nop_sequence(ip, op[0], op[1])) {
> - pr_err("Unexpected call sequence at %p: %x %x\n",
> - ip, ppc_inst_val(op[0]), ppc_inst_val(op[1]));
> + pr_err("Unexpected call sequence at %p: %s %s\n",
> + ip, ppc_inst_as_str(op[0]), ppc_inst_as_str(op[1]));
> return -EINVAL;
> }
>
> @@ -597,7 +597,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>
> /* It should be pointing to a nop */
> if (!ppc_inst_equal(op, ppc_inst(PPC_INST_NOP))) {
> - pr_err("Expected NOP but have %x\n", ppc_inst_val(op));
> + pr_err("Expected NOP but have %s\n", ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -654,7 +654,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long addr)
> }
>
> if (!ppc_inst_equal(op, ppc_inst(PPC_INST_NOP))) {
> - pr_err("Unexpected call sequence at %p: %x\n", ip, ppc_inst_val(op));
> + pr_err("Unexpected call sequence at %p: %s\n", ip, ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -733,7 +733,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>
> /* Make sure that that this is still a 24bit jump */
> if (!is_bl_op(op)) {
> - pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
> + pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
> index 46af80279ebc..5fb6f5267d70 100644
> --- a/arch/powerpc/lib/test_emulate_step.c
> +++ b/arch/powerpc/lib/test_emulate_step.c
> @@ -852,7 +852,7 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
>
> if (analyse_instr(&op, regs, instr) != 1 ||
> GETTYPE(op.type) != COMPUTE) {
> - pr_info("emulation failed, instruction = 0x%08x\n", ppc_inst_val(instr));
> + pr_info("execution failed, instruction = %s\n", ppc_inst_as_str(instr));
> return -EFAULT;
> }
>
> @@ -872,7 +872,7 @@ static int __init execute_compute_instr(struct pt_regs *regs,
> /* Patch the NOP with the actual instruction */
> patch_instruction_site(&patch__exec_instr, instr);
> if (exec_instr(regs)) {
> - pr_info("execution failed, instruction = 0x%08x\n", ppc_inst_val(instr));
> + pr_info("execution failed, instruction = %s\n", ppc_inst_as_str(instr));
> return -EFAULT;
> }
>
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index 16ee6639a60c..1dd3bf02021b 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -2958,7 +2958,7 @@ generic_inst_dump(unsigned long adr, long count, int praddr,
> dotted = 0;
> last_inst = inst;
> if (praddr)
> - printf(REG" %.8x", adr, ppc_inst_val(inst));
> + printf(REG" %s", adr, ppc_inst_as_str(inst));
> printf("\t");
> dump_func(ppc_inst_val(inst), adr);
> printf("\n");
> --
> 2.17.1
>
^ permalink raw reply
* [RFC PATCH v2 1/5] libnvdimm/dax: Add a dax flag to control synchronous fault support
From: Aneesh Kumar K.V @ 2020-06-02 7:49 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware
This patch adds a dax attribute (/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)
which can be used to control this flag. If the device supports synchronous flush
then userspace can update this attribute to enable/disable the synchronous
fault. The attribute is only visible if there is write cache enabled on the device.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
drivers/dax/super.c | 73 ++++++++++++++++++++++++++++++++++++++++++++-
mm/Kconfig | 3 ++
2 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 8e32345be0f7..980f7be7e56d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -198,6 +198,12 @@ enum dax_device_flags {
DAXDEV_WRITE_CACHE,
/* flag to check if device supports synchronous flush */
DAXDEV_SYNC,
+ /*
+ * flag to indicate whether synchronous flush is enabled.
+ * Some platform may want to disable synchronous flush support
+ * even though device supports the same.
+ */
+ DAXDEV_SYNC_ENABLED,
};
/**
@@ -254,6 +260,60 @@ static ssize_t write_cache_store(struct device *dev,
}
static DEVICE_ATTR_RW(write_cache);
+static bool dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+ return test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+
+static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
+{
+ if (!test_bit(DAXDEV_SYNC, &dax_dev->flags))
+ return;
+
+ if (enable)
+ set_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+ else
+ clear_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+
+
+static ssize_t sync_fault_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+ ssize_t rc;
+
+ WARN_ON_ONCE(!dax_dev);
+ if (!dax_dev)
+ return -ENXIO;
+
+ rc = sprintf(buf, "%d\n", !!__dax_synchronous(dax_dev));
+ put_dax(dax_dev);
+ return rc;
+}
+
+static ssize_t sync_fault_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ bool enable_sync;
+ int rc = strtobool(buf, &enable_sync);
+ struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+
+ WARN_ON_ONCE(!dax_dev);
+ if (!dax_dev)
+ return -ENXIO;
+
+ if (rc)
+ len = rc;
+ else
+ set_dax_synchronous_enable(dax_dev, enable_sync);
+
+ put_dax(dax_dev);
+ return len;
+}
+
+static DEVICE_ATTR_RW(sync_fault);
+
static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = container_of(kobj, typeof(*dev), kobj);
@@ -267,11 +327,18 @@ static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
if (a == &dev_attr_write_cache.attr)
return 0;
#endif
+ if (a == &dev_attr_sync_fault.attr) {
+ if (dax_write_cache_enabled(dax_dev))
+ return a->mode;
+ return 0;
+ }
+
return a->mode;
}
static struct attribute *dax_attributes[] = {
&dev_attr_write_cache.attr,
+ &dev_attr_sync_fault.attr,
NULL,
};
@@ -394,13 +461,17 @@ EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
bool __dax_synchronous(struct dax_device *dax_dev)
{
- return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+ return test_bit(DAXDEV_SYNC, &dax_dev->flags) &&
+ test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
}
EXPORT_SYMBOL_GPL(__dax_synchronous);
void __set_dax_synchronous(struct dax_device *dax_dev)
{
set_bit(DAXDEV_SYNC, &dax_dev->flags);
+#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
+ set_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+#endif
}
EXPORT_SYMBOL_GPL(__set_dax_synchronous);
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
config MAPPING_DIRTY_HELPERS
bool
+config ARCH_MAP_SYNC_DISABLE
+ bool
+
endmenu
--
2.26.2
^ permalink raw reply related
* [RFC PATCH v2 3/5] libnvdimm/dax: Make DAXDEV_SYNC_ENABLED flag region-specific
From: Aneesh Kumar K.V @ 2020-06-02 7:49 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V
In-Reply-To: <20200602074909.36738-1-aneesh.kumar@linux.ibm.com>
This patch makes sync fault enable/disable feature more fine-grained
by allowing region-wise control of the same.
In a followup patch on ppc64 only device with compat string "ibm,pmemory-v2"
will disable the sync fault feature.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
drivers/dax/bus.c | 2 +-
drivers/dax/super.c | 16 +++++++++-------
drivers/nvdimm/pmem.c | 4 ++++
drivers/nvdimm/region_devs.c | 16 ++++++++++++++++
include/linux/dax.h | 16 ++++++++++++++++
include/linux/libnvdimm.h | 4 ++++
6 files changed, 50 insertions(+), 8 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index df238c8b6ef2..8a825ecff49b 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -420,7 +420,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
* No 'host' or dax_operations since there is no access to this
* device outside of mmap of the resulting character device.
*/
- dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+ dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC | DAXDEV_F_SYNC_ENABLED);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 980f7be7e56d..f93e6649d452 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -260,10 +260,11 @@ static ssize_t write_cache_store(struct device *dev,
}
static DEVICE_ATTR_RW(write_cache);
-static bool dax_synchronous_enabled(struct dax_device *dax_dev)
+bool __dax_synchronous_enabled(struct dax_device *dax_dev)
{
return test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
}
+EXPORT_SYMBOL_GPL(__dax_synchronous_enabled);
static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
{
@@ -280,6 +281,7 @@ static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
static ssize_t sync_fault_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
+ int enabled;
struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
ssize_t rc;
@@ -287,7 +289,8 @@ static ssize_t sync_fault_show(struct device *dev,
if (!dax_dev)
return -ENXIO;
- rc = sprintf(buf, "%d\n", !!__dax_synchronous(dax_dev));
+ enabled = (dax_synchronous(dax_dev) && dax_synchronous_enabled(dax_dev));
+ rc = sprintf(buf, "%d\n", enabled);
put_dax(dax_dev);
return rc;
}
@@ -461,17 +464,13 @@ EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
bool __dax_synchronous(struct dax_device *dax_dev)
{
- return test_bit(DAXDEV_SYNC, &dax_dev->flags) &&
- test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+ return test_bit(DAXDEV_SYNC, &dax_dev->flags);
}
EXPORT_SYMBOL_GPL(__dax_synchronous);
void __set_dax_synchronous(struct dax_device *dax_dev)
{
set_bit(DAXDEV_SYNC, &dax_dev->flags);
-#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
- set_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
-#endif
}
EXPORT_SYMBOL_GPL(__set_dax_synchronous);
@@ -665,6 +664,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
if (flags & DAXDEV_F_SYNC)
set_dax_synchronous(dax_dev);
+ if (flags & DAXDEV_F_SYNC_ENABLED)
+ set_dax_synchronous_enable(dax_dev, true);
+
return dax_dev;
err_dev:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf83..dc9c269eb50d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -485,6 +485,10 @@ static int pmem_attach_disk(struct device *dev,
if (is_nvdimm_sync(nd_region))
flags = DAXDEV_F_SYNC;
+
+ if (is_nvdimm_sync_enabled(nd_region))
+ flags |= DAXDEV_F_SYNC_ENABLED;
+
dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops, flags);
if (IS_ERR(dax_dev)) {
put_disk(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..2181560cf655 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1283,6 +1283,22 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
}
EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+bool is_nvdimm_sync_enabled(struct nd_region *nd_region)
+{
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+ if (is_nd_volatile(&nd_region->dev))
+ return true;
+
+ return is_nd_pmem(&nd_region->dev) &&
+ test_bit(ND_REGION_SYNC_ENABLED, &nd_region->flags);
+#else
+ return true;
+#endif
+
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync_enabled);
+
+
struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d7af5d243f24..c4a3551557de 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,9 @@
/* Flag for synchronous flush */
#define DAXDEV_F_SYNC (1UL << 0)
+/* flag for platform forcing synchronous flush disable */
+#define DAXDEV_F_SYNC_ENABLED (1UL << 1)
+
typedef unsigned long dax_entry_t;
struct iomap_ops;
@@ -59,6 +62,13 @@ static inline void set_dax_synchronous(struct dax_device *dax_dev)
{
__set_dax_synchronous(dax_dev);
}
+
+bool __dax_synchronous_enabled(struct dax_device *dax_dev);
+static inline bool dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+ return __dax_synchronous_enabled(dax_dev);
+}
+
/*
* Check if given mapping is supported by the file / underlying device.
*/
@@ -69,6 +79,12 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
return true;
if (!IS_DAX(file_inode(vma->vm_file)))
return false;
+ /*
+ * check MAP_SYNC is disabled by platform for this device.
+ */
+ if (!dax_synchronous_enabled(dax_dev))
+ return false;
+
return dax_synchronous(dax_dev);
}
#else
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..cc3962c4978f 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -63,6 +63,9 @@ enum {
/* Platform provides asynchronous flush mechanism */
ND_REGION_ASYNC = 3,
+ /* Platform wants to disable synchronous flush mechanism */
+ ND_REGION_SYNC_ENABLED= 4,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
};
@@ -262,6 +265,7 @@ int nvdimm_has_flush(struct nd_region *nd_region);
int nvdimm_has_cache(struct nd_region *nd_region);
int nvdimm_in_overwrite(struct nvdimm *nvdimm);
bool is_nvdimm_sync(struct nd_region *nd_region);
+bool is_nvdimm_sync_enabled(struct nd_region *nd_region);
static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
--
2.26.2
^ permalink raw reply related
* [RFC PATCH v2 4/5] powerpc/papr_scm: disable MAP_SYNC for newer hardware
From: Aneesh Kumar K.V @ 2020-06-02 7:49 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V
In-Reply-To: <20200602074909.36738-1-aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/pseries/papr_scm.c | 17 ++++++++++++++++-
drivers/nvdimm/of_pmem.c | 7 +++++++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..30474d4cd109 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -30,6 +30,7 @@ struct papr_scm_priv {
uint64_t block_size;
int metadata_size;
bool is_volatile;
+ bool disable_map_sync;
uint64_t bound_addr;
@@ -340,11 +341,18 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = &mapping;
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
+ set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+ /*
+ * for a persistent region, check if the platform needs to
+ * force MAP_SYNC disable.
+ */
+ if (p->disable_map_sync)
+ clear_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
}
if (!p->region) {
@@ -365,7 +373,7 @@ err: nvdimm_bus_unregister(p->bus);
static int papr_scm_probe(struct platform_device *pdev)
{
- struct device_node *dn = pdev->dev.of_node;
+ struct device_node *dn;
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
@@ -373,6 +381,10 @@ static int papr_scm_probe(struct platform_device *pdev)
u64 uuid[2];
int rc;
+ dn = dev_of_node(&pdev->dev);
+ if (!dn)
+ return -ENXIO;
+
/* check we have all the required DT properties */
if (of_property_read_u32(dn, "ibm,my-drc-index", &drc_index)) {
dev_err(&pdev->dev, "%pOF: missing drc-index!\n", dn);
@@ -402,6 +414,9 @@ static int papr_scm_probe(struct platform_device *pdev)
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
+ if (of_device_is_compatible(dn, "ibm,pmemory-v2"))
+ p->disable_map_sync = true;
+
p->dn = dn;
p->drc_index = drc_index;
p->block_size = block_size;
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..a6cc3488e552 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -59,12 +59,19 @@ static int of_pmem_region_probe(struct platform_device *pdev)
ndr_desc.res = &pdev->resource[i];
ndr_desc.of_node = np;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+ set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
if (is_volatile)
region = nvdimm_volatile_region_create(bus, &ndr_desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+ /*
+ * for a persistent region, check for newer device
+ */
+ if (of_device_is_compatible(np, "pmem-region-v2"))
+ clear_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
region = nvdimm_pmem_region_create(bus, &ndr_desc);
+
}
if (!region)
--
2.26.2
^ permalink raw reply related
* [RFC PATCH v2 5/5] libnvdimm: Add prctl control for disabling synchronous fault support
From: Aneesh Kumar K.V @ 2020-06-02 7:49 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V
In-Reply-To: <20200602074909.36738-1-aneesh.kumar@linux.ibm.com>
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.
This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. This is in addition to the namespace specific control
already added (/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)
With this patch, if the device supports synchronous fault, then an application
can enable the synchronous fault support using the prctl() interface even if
the platform disabled it for the namespace.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
include/linux/dax.h | 5 +++--
include/linux/sched/coredump.h | 13 ++++++++++---
include/uapi/linux/prctl.h | 3 +++
kernel/fork.c | 8 +++++++-
kernel/sys.c | 18 ++++++++++++++++++
5 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index c4a3551557de..0733aae23828 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -80,9 +80,10 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
if (!IS_DAX(file_inode(vma->vm_file)))
return false;
/*
- * check MAP_SYNC is disabled by platform for this device.
+ * MAP_SYNC is disabled by platform for this device.
+ * check for prctl.
*/
- if (!dax_synchronous_enabled(dax_dev))
+ if (!dax_synchronous_enabled(dax_dev) && !map_sync_enabled(vma->vm_mm))
return false;
return dax_synchronous(dax_dev);
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..35698adc3d13 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_ENABLE_MAP_SYNC 27 /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_ENABLE_MAP_SYNC_MASK (1 << MMF_ENABLE_MAP_SYNC)
-#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
- MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+ MMF_DISABLE_THP_MASK | MMF_ENABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+ return !!(mm->flags & MMF_ENABLE_MAP_SYNC_MASK);
+}
#endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
#define PR_SET_IO_FLUSHER 57
#define PR_GET_IO_FLUSHER 58
+#define PR_SET_MAP_SYNC_ENABLE 59
+#define PR_GET_MAP_SYNC_ENABLE 60
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d50cac15ef41 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_ENABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
static int __init coredump_filter_setup(char *s)
{
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
- mm->flags = default_dump_filter;
+ mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..5011912831b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, &me->mm->flags);
up_write(&me->mm->mmap_sem);
break;
+
+ case PR_GET_MAP_SYNC_ENABLE:
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = !!test_bit(MMF_ENABLE_MAP_SYNC, &me->mm->flags);
+ break;
+ case PR_SET_MAP_SYNC_ENABLE:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ if (down_write_killable(&me->mm->mmap_sem))
+ return -EINTR;
+ if (arg2)
+ set_bit(MMF_ENABLE_MAP_SYNC, &me->mm->flags);
+ else
+ clear_bit(MMF_ENABLE_MAP_SYNC, &me->mm->flags);
+ up_write(&me->mm->mmap_sem);
+ break;
+
case PR_MPX_ENABLE_MANAGEMENT:
case PR_MPX_DISABLE_MANAGEMENT:
/* No longer implemented: */
--
2.26.2
^ permalink raw reply related
* [RFC PATCH v2 2/5] powerpc/pmem: Disable synchronous fault by default
From: Aneesh Kumar K.V @ 2020-06-02 7:49 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V
In-Reply-To: <20200602074909.36738-1-aneesh.kumar@linux.ibm.com>
This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.
This config allows user to control whether MAP_SYNC should be enabled by
default or not.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/Kconfig.cputype | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..f8694838ad4e 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
If you're unsure, say Y.
+config ARCH_MAP_SYNC_DISABLE
+ bool "Disable synchronous fault support (MAP_SYNC)"
+ default y
+ help
+ Disable support for synchronous fault with nvdimm namespaces.
+
+ If you're unsure, say Y.
+
+
config PPC_HAVE_KUAP
bool
--
2.26.2
^ permalink raw reply related
* Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
From: Aneesh Kumar K.V @ 2020-06-02 7:57 UTC (permalink / raw)
To: Michal Suchánek; +Cc: linuxppc-dev, jack, linux-nvdimm, Jan Kara
In-Reply-To: <af150987-156f-71dc-a4cd-e6f8e670839a@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> On 6/1/20 5:37 PM, Michal Suchánek wrote:
>> On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
>>> On 6/1/20 3:39 PM, Jan Kara wrote:
>>>> On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
>>>>> On 5/29/20 3:22 PM, Jan Kara wrote:
>>>>>> On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
>>>>>>> Thanks Michal. I also missed Jeff in this email thread.
>>>>>>
>>>>>> And I think you'll also need some of the sched maintainers for the prctl
>>>>>> bits...
>>>>>>
>>>>>>> On 5/29/20 3:03 PM, Michal Suchánek wrote:
>>>>>>>> Adding Jan
>>>>>>>>
>>>>>>>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
>>>>>>>>> With POWER10, architecture is adding new pmem flush and sync instructions.
>>>>>>>>> The kernel should prevent the usage of MAP_SYNC if applications are not using
>>>>>>>>> the new instructions on newer hardware.
>>>>>>>>>
>>>>>>>>> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
>>>>>>>>> the usage of MAP_SYNC. The kernel config option is added to allow the user
>>>>>>>>> to control whether MAP_SYNC should be enabled by default or not.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>> ...
>>>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>>>> index 8c700f881d92..d5a9a363e81e 100644
>>>>>>>>> --- a/kernel/fork.c
>>>>>>>>> +++ b/kernel/fork.c
>>>>>>>>> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
>>>>>>>>> static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
>>>>>>>>> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
>>>>>>>>> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
>>>>>>>>> +#else
>>>>>>>>> +unsigned long default_map_sync_mask = 0;
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>
>>>>>> I'm not sure CONFIG is really the right approach here. For a distro that would
>>>>>> basically mean to disable MAP_SYNC for all PPC kernels unless application
>>>>>> explicitly uses the right prctl. Shouldn't we rather initialize
>>>>>> default_map_sync_mask on boot based on whether the CPU we run on requires
>>>>>> new flush instructions or not? Otherwise the patch looks sensible.
>>>>>>
>>>>>
>>>>> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
>>>>> But on a virtualized platform there is no easy way to detect that. We could
>>>>> ideally hook this into the nvdimm driver where we look at the new compat
>>>>> string ibm,persistent-memory-v2 and then disable MAP_SYNC
>>>>> if we find a device with the specific value.
>>>>
>>>> Hum, couldn't we set some flag for nvdimm devices with
>>>> "ibm,persistent-memory-v2" property and then check it during mmap(2) time
>>>> and when the device has this propery and the mmap(2) caller doesn't have
>>>> the prctl set, we'd disallow MAP_SYNC? That should make things mostly
>>>> seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
>>>> devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
>>>> applications need to be aware of new instructions so this isn't that much
>>>> additional burden...
>>>
>>> I am not sure application would want to add that much details/knowledge
>>> about a platform in their code. I was expecting application to do
>>>
>>> #ifdef __ppc64__
>>> prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
>>> #endif
>>> a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
>>> MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);
>>>
>>>
>>> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
>>> is not useful. Do you see a value in making all these device specific rather
>>> than a conditional on __ppc64__?
>
>> If the vpmem devices continue to work with the old instruction on
>> POWER10 then it makes sense to make this per-device.
>
> vPMEM doesn't have write_cache and hence it is synchronous even without
> using any specific flush instruction. The question is do we want to have
> different programming steps when running on vPMEM vs a persistent PMEM
> device on ppc64.
>
> I will work on the device specific ENABLE flag and then we can compare
> the kernel complexity against the added benefit.
I have posted an RFC v2 [1] that implements a device-specific MAP_SYNC
enable/disable feature. The Posted changes also add a dax flag suggested
by Dan. With device-specific MAP_SYNC enable/disable, it was just a sysfs
file export of the same flag.
1. https://lore.kernel.org/linuxppc-dev/20200602074909.36738-1-aneesh.kumar@linux.ibm.com/
-aneesh
^ permalink raw reply
* Re: [PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory
From: Laurent Dufour @ 2020-06-02 8:31 UTC (permalink / raw)
To: Ram Pai, kvm-ppc, linuxppc-dev
Cc: cclaudio, bharata, aneesh.kumar, sukadev, bauerman, david
In-Reply-To: <1590892071-25549-5-git-send-email-linuxram@us.ibm.com>
Le 31/05/2020 à 04:27, Ram Pai a écrit :
> From: Laurent Dufour <ldufour@linux.ibm.com>
>
> When a memory slot is hot plugged to a SVM, GFNs associated with that
> memory slot automatically default to secure GFN. Hence migrate the
> PFNs associated with these GFNs to device-PFNs.
>
> uv_migrate_mem_slot() is called to achieve that. It will not call
> UV_PAGE_IN since this request is ignored by the Ultravisor.
> NOTE: Ultravisor does not trust any page content provided by
> the Hypervisor, ones the VM turns secure.
>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> (fixed merge conflicts. Modified the commit message)
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
> arch/powerpc/include/asm/kvm_book3s_uvmem.h | 4 ++++
> arch/powerpc/kvm/book3s_hv.c | 11 +++++++----
> arch/powerpc/kvm/book3s_hv_uvmem.c | 3 +--
> 3 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f0c5708..2ec2e5afb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
> void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> struct kvm *kvm, bool skip_page_out,
> bool purge_gfn);
> +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot);
> #else
> static inline int kvmppc_uvmem_init(void)
> {
> @@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
> kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> struct kvm *kvm, bool skip_page_out,
> bool purge_gfn) { }
> +
> +static int uv_migrate_mem_slot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot);
That line was not part of the patch I sent to you!
> #endif /* CONFIG_PPC_UV */
> #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 4c62bfe..604d062 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4516,13 +4516,16 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
> case KVM_MR_CREATE:
> if (kvmppc_uvmem_slot_init(kvm, new))
> return;
> - uv_register_mem_slot(kvm->arch.lpid,
> - new->base_gfn << PAGE_SHIFT,
> - new->npages * PAGE_SIZE,
> - 0, new->id);
> + if (uv_register_mem_slot(kvm->arch.lpid,
> + new->base_gfn << PAGE_SHIFT,
> + new->npages * PAGE_SIZE,
> + 0, new->id))
> + return;
> + uv_migrate_mem_slot(kvm, new);
> break;
> case KVM_MR_DELETE:
> uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> + kvmppc_uvmem_drop_pages(old, kvm, true, true);
Again that line has been changed from the patch I sent to you. The last 'true'
argument has nothing to do here.
Is that series really building?
> kvmppc_uvmem_slot_free(kvm, old);
> break;
> default:
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 36dda1d..1fa5f2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -377,8 +377,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
> return ret;
> }
>
> -static int uv_migrate_mem_slot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot)
> +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
> {
> unsigned long gfn = memslot->base_gfn;
> unsigned long end;
>
^ permalink raw reply
* Re: [PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE
From: Bharata B Rao @ 2020-06-02 10:06 UTC (permalink / raw)
To: Ram Pai
Cc: rcampbell, ldufour, cclaudio, kvm-ppc, aneesh.kumar, sukadev,
linuxppc-dev, bauerman, david
In-Reply-To: <20200601190535.GA6925@oc0525413822.ibm.com>
On Mon, Jun 01, 2020 at 12:05:35PM -0700, Ram Pai wrote:
> On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> > On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > > called H_SVM_PAGE_IN for all secure pages.
> >
> > I don't think that is quite true. HV doesn't assume anything about
> > secure pages by itself.
>
> Yes. Currently, it does not assume anything about secure pages. But I am
> proposing that it should consider all pages (except the shared pages) as
> secure pages, when H_SVM_INIT_DONE is called.
Ok, then may be also add the proposed changes to H_SVM_INIT_DONE
documentation.
>
> In other words, HV should treat all pages; except shared pages, as
> secure pages once H_SVM_INIT_DONE is called. And this includes pages
> added subsequently through memory hotplug.
So after H_SVM_INIT_DONE, if HV touches a secure page for any
reason and gets encrypted contents via page-out, HV drops the
device pfn at that time. So what state we would be in that case? We
have completed H_SVM_INIT_DONE, but still have a normal (but encrypted)
page in HV?
>
> Yes. the Ultravisor can explicitly request the HV to move the pages
> individually. But that will slow down the transition too significantly.
> It takes above 20min to transition them, for a SVM of size 100G.
>
> With this proposed enhancement, the switch completes in a few seconds.
I think, many pages during initial switch and most pages for hotplugged
memory are zero pages, for which we don't anyway issue UV page-in calls.
So the 20min saving you are observing is purely due to hcall overhead?
How about extending H_SVM_PAGE_IN interface or a new hcall to request
multiple pages in one request?
Also, how about requesting for bigger page sizes (2M)? Ralph Campbell
had patches that added THP support for migrate_vma_* calls.
>
> >
> > > These GFNs continue to be
> > > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > > have been secure GFNs associated with device PFNs.
> >
> > Transition to secure state is driven by SVM/UV and HV just responds to
> > hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> > determine the required pages that need to be moved into secure side.
> > HV just responds to it and tracks such pages as device private pages.
> >
> > If SVM/UV doesn't get in all the pages to secure side by the time
> > of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> > otherwise) pages as far as HV is concerned. Why should HV assume that
> > SVM/UV didn't ask for a few pages and hence push those pages during
> > H_SVM_INIT_DONE?
>
> By definition, SVM is a VM backed by secure pages.
> Hence all pages(except shared) must turn secure when a VM switches to SVM.
>
> UV is interested in only a certain pages for the VM, which it will
> request explicitly through H_SVM_PAGE_IN. All other pages, need not
> be paged-in through UV_PAGE_IN. They just need to be switched to
> device-pages.
>
> >
> > I think UV should drive the movement of pages into secure side both
> > of boot-time SVM memory and hot-plugged memory. HV does memslot
> > registration uvcall when new memory is plugged in, UV should explicitly
> > get the required pages in at that time instead of expecting HV to drive
> > the same.
> >
> > > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > > + const struct kvm_memory_slot *memslot)
> > > +{
> > > + unsigned long gfn = memslot->base_gfn;
> > > + unsigned long end;
> > > + bool downgrade = false;
> > > + struct vm_area_struct *vma;
> > > + int i, ret = 0;
> > > + unsigned long start = gfn_to_hva(kvm, gfn);
> > > +
> > > + if (kvm_is_error_hva(start))
> > > + return H_STATE;
> > > +
> > > + end = start + (memslot->npages << PAGE_SHIFT);
> > > +
> > > + down_write(&kvm->mm->mmap_sem);
> > > +
> > > + mutex_lock(&kvm->arch.uvmem_lock);
> > > + vma = find_vma_intersection(kvm->mm, start, end);
> > > + if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > > + ret = H_STATE;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > > + MADV_UNMERGEABLE, &vma->vm_flags);
> > > + downgrade_write(&kvm->mm->mmap_sem);
> > > + downgrade = true;
> > > + if (ret) {
> > > + ret = H_STATE;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + for (i = 0; i < memslot->npages; i++, ++gfn) {
> > > + /* skip paged-in pages and shared pages */
> > > + if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
> > > + kvmppc_gfn_is_uvmem_shared(gfn, kvm))
> > > + continue;
> > > +
> > > + start = gfn_to_hva(kvm, gfn);
> > > + end = start + (1UL << PAGE_SHIFT);
> > > + ret = kvmppc_svm_migrate_page(vma, start, end,
> > > + (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > > +
> > > + if (ret)
> > > + goto out_unlock;
> > > + }
> >
> > Is there a guarantee that the vma you got for the start address remains
> > valid for all the addresses till end in a memslot? If not, you should
> > re-get the vma for the current address in each iteration I suppose.
>
>
> mm->mmap_sem is the semaphore that guards the vma. right? If that
> semaphore is held, can the vma change?
I am not sure if the vma you obtained would span the entire address range
in the memslot.
Regards,
Bharata.
^ permalink raw reply
* [RESEND PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health
From: Vaibhav Jain @ 2020-06-02 10:14 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
Changes since v9 [1]:
* Added acks from Aneesh and Steven Steven Rostedt.
Changes since v8 [2]:
* Updated proposed changes to remove usage of term 'SCM' due to
ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
[ Aneesh ]
* Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* .
* Updates to various newly introduced identifiers in 'papr_scm.c'
removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'.
* Renamed sysfs ABI document from sysfs-bus-papr-scm to
sysfs-bus-papr-pmem.
* No behavioural changes from v8 [1].
[1] https://lore.kernel.org/linux-nvdimm/20200529214719.223344-1-vaibhav@linux.ibm.com
[2] https://lore.kernel.org/linux-nvdimm/20200527041244.37821-1-vaibhav@linux.ibm.com/
---
The PAPR standard[3][5] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[4]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[4]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[6]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[3] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[4] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[5] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[6] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9
---
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 600 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
^ permalink raw reply
* [RESEND PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-06-02 10:14 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200602101438.73929-1-vaibhav@linux.ibm.com>
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
Resend:
* None
v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]
v7..v8:
* Added a clarification on bit-ordering of Health Bitmap
Resend:
* None
v6..v7:
* None
v5..v6:
* New patch in the series
---
Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
**H_SCM_HEALTH**
| Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
| Return Value: *H_Success, H_Parameter, H_Hardware*
Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more states
+(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400000000000000
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++------+-----------------------------------------------------------------------+
+| Bit | Definition |
++======+=======================================================================+
+| 00 | PMEM device is unable to persist memory contents. |
+| | If the system is powered down, nothing will be saved. |
++------+-----------------------------------------------------------------------+
+| 01 | PMEM device failed to persist memory contents. Either contents were |
+| | not saved successfully on power down or were not restored properly on |
+| | power up. |
++------+-----------------------------------------------------------------------+
+| 02 | PMEM device contents are persisted from previous IPL. The data from |
+| | the last boot were successfully restored. |
++------+-----------------------------------------------------------------------+
+| 03 | PMEM device contents are not persisted from previous IPL. There was no|
+| | data to restore from the last boot. |
++------+-----------------------------------------------------------------------+
+| 04 | PMEM device memory life remaining is critically low |
++------+-----------------------------------------------------------------------+
+| 05 | PMEM device will be garded off next IPL due to failure |
++------+-----------------------------------------------------------------------+
+| 06 | PMEM device contents cannot persist due to current platform health |
+| | status. A hardware failure may prevent data from being saved or |
+| | restored. |
++------+-----------------------------------------------------------------------+
+| 07 | PMEM device is unable to persist memory contents in certain conditions|
++------+-----------------------------------------------------------------------+
+| 08 | PMEM device is encrypted |
++------+-----------------------------------------------------------------------+
+| 09 | PMEM device has successfully completed a requested erase or secure |
+| | erase procedure. |
++------+-----------------------------------------------------------------------+
+|10:63 | Reserved / Unused |
++------+-----------------------------------------------------------------------+
**H_SCM_PERFORMANCE_STATS**
--
2.26.2
^ permalink raw reply related
* [RESEND PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-06-02 10:14 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200602101438.73929-1-vaibhav@linux.ibm.com>
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* Rename some variables and defines to reduce usage of term SCM
replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
bitmap/ [ Ira, Aneesh ].
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 169 +++++++++++++++++-
2 files changed, 194 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index 000000000000..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+ (RO) Report flags indicating various states of a
+ papr-pmem NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..149431594839 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
+ PAPR_PMEM_HEALTH_FATAL | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%ld\n", rc);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+out:
+ return rc;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ int rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_nd_attributes,
+};
+
+static const struct attribute_group *papr_nd_attr_groups[] = {
+ &papr_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
^ permalink raw reply related
* [RESEND PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-02 10:14 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200602101438.73929-1-vaibhav@linux.ibm.com>
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.
A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.
v7..v8:
* Removed the 'payload_offset' field from 'struct
nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
[ Ira ]
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++++++++++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 101 +++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 232 insertions(+), 6 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index 000000000000..6407fefcc007
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include <linux/types.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ * +-------------+---------------------+---------------------------+
+ * | 64-Bytes | 16-Bytes | Max 176-Bytes |
+ * +-------------+---------------------+---------------------------+
+ * | nd_pdsm_cmd_pkg | |
+ * |-------------+ | |
+ * | nd_cmd_pkg | | |
+ * +-------------+---------------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | payload_version | payload |
+ * | nd_command | reserved | |
+ * | nd_fw_size | | |
+ * +-------------+---------------------+---------------------------+
+ *
+ * PDSM Header:
+ *
+ * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'payload_version' : (In/Out) Version number associated with the payload.
+ * 'reserved' : Not used and reserved for future.
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a PDSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * PDSM command. Similarly the output of servicing the PDSM command will be
+ * copied to the payload field using the same struct.
+ *
+ * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
+ * leaves around 176 bytes for the envelope payload (ignoring any padding that
+ * the compiler may silently introduce).
+ *
+ * Payload Version:
+ *
+ * A 'payload_version' field is present in PDSM header that indicates a specific
+ * version of the structure present in PDSM Payload for a given PDSM command.
+ * This provides backward compatibility in case the PDSM Payload structure
+ * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
+ *
+ * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
+ * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
+ * module when servicing the PDSM envelope checks the 'payload_version' and then
+ * uses 'payload struct version' == MIN('payload_version field',
+ * 'max payload-struct-version supported by papr_scm') to service the PDSM.
+ * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
+ * struct in returned 'payload_version' field.
+ *
+ * Libndctl on receiving the envelope back from papr_scm again checks the
+ * 'payload_version' field and based on it use the appropriate version dsm
+ * struct to parse the results.
+ *
+ * Backward Compatibility:
+ *
+ * Above scheme of exchanging different versioned PDSM struct between libndctl
+ * and papr_scm should provide backward compatibility until following two
+ * assumptions/conditions when defining new PDSM structs hold:
+ *
+ * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
+ *
+ * 1. T(X) is a proper subset of T(Y) if Y > X.
+ * i.e Each new version of PDSM struct should retain existing struct
+ * attributes from previous version
+ *
+ * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
+ * it should also support T(1), T(2)...T(X - 1).
+ * i.e When adding support for new version of a PDSM struct, libndctl
+ * and papr_scm should retain support of the existing PDSM struct
+ * version they support.
+ */
+
+/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pdsm_cmd_pkg {
+ struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 reserved[5]; /* Ignored and to be used in future */
+ __u16 payload_version; /* In/Out: version of the payload */
+ __u8 payload[]; /* In/Out: Sub-cmd data buffer */
+} __packed;
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
+ */
+enum papr_pdsm {
+ PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_MAX,
+};
+
+/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
+static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
+{
+ return (struct nd_pdsm_cmd_pkg *) cmd;
+}
+
+/* Return the payload pointer for a given pcmd */
+static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
+{
+ if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
+ return NULL;
+ else
+ return (void *)(pcmd->payload);
+}
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 149431594839..5e2237e7ec08 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Validate the inputs args to dimm-control function and return '0' if valid.
+ * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
+ struct papr_scm_priv *p;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider date from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ } else if (cmd == ND_CMD_CALL) {
+
+ /* Verify the envelope package */
+ if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the PDSM family is valid */
+ if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ pkg->hdr.nd_family);
+ return -EINVAL;
+
+ }
+
+ /* We except a payload with all PDSM commands */
+ if (pdsm_cmd_to_payload(pkg) == NULL) {
+ dev_dbg(&p->pdev->dev,
+ "Empty payload for sub-command=0x%llx\n",
+ pkg->hdr.nd_command);
+ return -EINVAL;
+ }
+ }
+
+ /* Command looks valid */
+ return 0;
+}
+
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *call_pkg)
+{
+ /* unknown subcommands return error in packages */
+ if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
+ call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -EINVAL;
+ return 0;
+ }
+
+ /* Depending on the DSM command call appropriate service routine */
+ switch (call_pkg->hdr.nd_command) {
+ default:
+ dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -ENOENT;
+ return 0;
+ }
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ struct nd_pdsm_cmd_pkg *call_pkg = NULL;
+ int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (cmd_rc == NULL)
+ cmd_rc = &rc;
+
+ *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (*cmd_rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
+ return *cmd_rc;
+ }
p = nvdimm_provider_data(nvdimm);
@@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = nd_to_pdsm_cmd_pkg(buf);
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
- return -EINVAL;
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
+ *cmd_rc = -EINVAL;
}
dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
- return 0;
+ return *cmd_rc;
}
static ssize_t flags_show(struct device *dev,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
^ permalink raw reply related
* [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Vaibhav Jain @ 2020-06-02 10:14 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200602101438.73929-1-vaibhav@linux.ibm.com>
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
The patch also introduces a new member 'struct papr_scm_priv.health'
thats an instance of 'struct nd_papr_pdsm_health' to cache the health
information of a nvdimm. As a result functions drc_pmem_query_health()
and flags_show() are updated to populate and use this new struct
instead of a u64 integer that was earlier used.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.
v7..v8:
* None
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
2 files changed, 147 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 6407fefcc007..411725a91591 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
*/
enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
};
@@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
return (void *)(pcmd->payload);
}
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health_v1 {
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+} __packed;
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_PDSM_HEALTH_VERSION 1
+
#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 5e2237e7ec08..c0606c0c659c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -88,7 +88,7 @@ struct papr_scm_priv {
unsigned long lasthealth_jiffies;
/* Health information for the dimm */
- u64 health_bitmap;
+ struct nd_papr_pdsm_health health;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
static int __drc_pmem_query_health(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ u64 health;
long rc;
/* issue the hcall */
@@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
if (rc != H_SUCCESS) {
dev_err(&p->pdev->dev,
"Failed to query health information, Err:%ld\n", rc);
- rc = -ENXIO;
- goto out;
+ return -ENXIO;
}
p->lasthealth_jiffies = jiffies;
- p->health_bitmap = ret[0] & ret[1];
+ health = ret[0] & ret[1];
dev_dbg(&p->pdev->dev,
"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
ret[0], ret[1]);
-out:
- return rc;
+
+ memset(&p->health, 0, sizeof(p->health));
+
+ /* Check for various masks in bitmap and set the buffer */
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ p->health.dimm_unarmed = 1;
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ p->health.dimm_bad_shutdown = 1;
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ p->health.dimm_bad_restore = 1;
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ p->health.dimm_encrypted = 1;
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
+ p->health.dimm_locked = 1;
+ p->health.dimm_scrubbed = 1;
+ }
+
+ if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
+ p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ if (health & PAPR_PMEM_HEALTH_CRITICAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+
+ if (health & PAPR_PMEM_HEALTH_FATAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+
+ return 0;
}
/* Min interval in seconds for assuming stable dimm health */
@@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_pdsm_health(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ int rc;
+ size_t copysize = sizeof(p->health);
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc)
+ goto out_unlock;
+ /*
+ * If the requested payload version is greater than one we know
+ * about, return the payload version we know about and let
+ * caller/userspace handle.
+ */
+ if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
+ pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
+
+ if (pkg->hdr.nd_size_out < copysize) {
+ dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
+ pkg->hdr.nd_size_out, copysize);
+ rc = -ENOSPC;
+ goto out_unlock;
+ }
+
+ dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
+ copysize, pkg->payload_version);
+
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
+ pkg->hdr.nd_fw_size = copysize;
+
+out_unlock:
+ mutex_unlock(&p->health_mutex);
+
+out:
+ /*
+ * Put the error in out package and return success from function
+ * so that errors if any are propogated back to userspace.
+ */
+ pkg->cmd_status = rc;
+ dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
+
+ return 0;
+}
+
static int papr_scm_service_pdsm(struct papr_scm_priv *p,
struct nd_pdsm_cmd_pkg *call_pkg)
{
@@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
/* Depending on the DSM command call appropriate service routine */
switch (call_pkg->hdr.nd_command) {
+ case PAPR_PDSM_HEALTH:
+ return papr_pdsm_health(p, call_pkg);
+
default:
dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
call_pkg->hdr.nd_command);
@@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
struct nvdimm *dimm = to_nvdimm(dev);
struct papr_scm_priv *p = nvdimm_provider_data(dimm);
struct seq_buf s;
- u64 health;
int rc;
rc = drc_pmem_query_health(p);
if (rc)
return rc;
- /* Copy health_bitmap locally, check masks & update out buffer */
- health = READ_ONCE(p->health_bitmap);
-
seq_buf_init(&s, buf, PAGE_SIZE);
- if (health & PAPR_PMEM_UNARMED_MASK)
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ if (p->health.dimm_unarmed)
seq_buf_printf(&s, "not_armed ");
- if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ if (p->health.dimm_bad_shutdown)
seq_buf_printf(&s, "flush_fail ");
- if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ if (p->health.dimm_bad_restore)
seq_buf_printf(&s, "restore_fail ");
- if (health & PAPR_PMEM_ENCRYPTED)
+ if (p->health.dimm_encrypted)
seq_buf_printf(&s, "encrypted ");
- if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ if (p->health.dimm_health)
seq_buf_printf(&s, "smart_notify ");
- if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
- seq_buf_printf(&s, "scrubbed locked ");
+ if (p->health.dimm_scrubbed)
+ seq_buf_printf(&s, "scrubbed ");
+
+ if (p->health.dimm_locked)
+ seq_buf_printf(&s, "locked ");
+
+ mutex_unlock(&p->health_mutex);
if (seq_buf_used(&s))
seq_buf_printf(&s, "\n");
--
2.26.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox