* [PATCH 03/13] hpet: use new sysctl subdir helper register_sysctl_subdir()
From: Luis Chamberlain @ 2020-05-29 7:40 UTC (permalink / raw)
To: keescook, yzaikin, nixiaoming, ebiederm, axboe, clemens, arnd,
gregkh, jani.nikula, joonas.lahtinen, rodrigo.vivi, airlied,
daniel, benh, rdna, viro, mark, jlbec, joseph.qi, vbabka, sfr,
jack, amir73il, rafael, tytso
Cc: intel-gfx, linux-kernel, dri-devel, julia.lawall,
Luis Chamberlain, akpm, linuxppc-dev, ocfs2-devel
In-Reply-To: <20200529074108.16928-1-mcgrof@kernel.org>
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.
// pycocci sysctl-subdir.cocci drivers/char/hpet.c
@c1@
expression E1;
identifier subdir, sysctls;
@@
static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};
@c2@
identifier c1.subdir;
expression E2;
identifier base;
@@
static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};
@c3@
identifier c2.base;
identifier header;
@@
header = register_sysctl_table(base);
@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@
-static struct ctl_table subdir[] = {
- {
- .procname = E1,
- .maxlen = 0,
- .mode = 0555,
- .child = sysctls,
- },
- { }
-};
@r2 depends on c1 && c2 && c3@
identifier c1.subdir;
expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
- {
- .procname = E2,
- .maxlen = 0,
- .mode = 0555,
- .child = subdir,
- },
- { }
-};
@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@
header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
drivers/char/hpet.c | 22 +---------------------
1 file changed, 1 insertion(+), 21 deletions(-)
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index ed3b7dab678d..169c970d5ff8 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = {
{}
};
-static struct ctl_table hpet_root[] = {
- {
- .procname = "hpet",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_table,
- },
- {}
-};
-
-static struct ctl_table dev_root[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_root,
- },
- {}
-};
-
static struct ctl_table_header *sysctl_header;
/*
@@ -1059,7 +1039,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
- sysctl_header = register_sysctl_table(dev_root);
+ sysctl_header = register_sysctl_subdir("dev", "hpet", hpet_table);
result = acpi_bus_register_driver(&hpet_acpi_driver);
if (result < 0) {
--
2.26.2
^ permalink raw reply related
* [PATCH 00/13] sysctl: spring cleaning
From: Luis Chamberlain @ 2020-05-29 7:40 UTC (permalink / raw)
To: keescook, yzaikin, nixiaoming, ebiederm, axboe, clemens, arnd,
gregkh, jani.nikula, joonas.lahtinen, rodrigo.vivi, airlied,
daniel, benh, rdna, viro, mark, jlbec, joseph.qi, vbabka, sfr,
jack, amir73il, rafael, tytso
Cc: intel-gfx, linux-kernel, dri-devel, julia.lawall,
Luis Chamberlain, akpm, linuxppc-dev, ocfs2-devel
Me and Xiaoming are working on some kernel/sysctl.c spring cleaning.
During a recent linux-next merge conflict it became clear that
the kitchen sink on kernel/sysctl.c creates too many conflicts,
and so we need to do away with stuffing everyone's knobs on this
one file.
This is part of that work. This is not expected to get merged yet, but
since our delta is pretty considerable at this point, we need to piece
meal this and collect reviews for what we have so far. This follows up
on some of his recent work.
This series focuses on a new helper to deal with subdirectories and
empty subdirectories. The terminology that we will embrace will be
that things like "fs", "kernel", "debug" are based directories, and
directories underneath this are subdirectories.
In this case, the cleanup ends up also trimming the amount of
code we have for sysctls.
If this seems reasonable we'll kdocify this a bit too.
This code has been boot tested without issues, and I'm letting 0day do
its thing to test against many kconfig builds. If you however spot
any issues please let us know.
Luis Chamberlain (9):
sysctl: add new register_sysctl_subdir() helper
cdrom: use new sysctl subdir helper register_sysctl_subdir()
hpet: use new sysctl subdir helper register_sysctl_subdir()
i915: use new sysctl subdir helper register_sysctl_subdir()
macintosh/mac_hid.c: use new sysctl subdir helper
register_sysctl_subdir()
ocfs2: use new sysctl subdir helper register_sysctl_subdir()
test_sysctl: use new sysctl subdir helper register_sysctl_subdir()
sysctl: add helper to register empty subdir
fs: move binfmt_misc sysctl to its own file
Xiaoming Ni (4):
inotify: simplify sysctl declaration with register_sysctl_subdir()
firmware_loader: simplify sysctl declaration with
register_sysctl_subdir()
eventpoll: simplify sysctl declaration with register_sysctl_subdir()
random: simplify sysctl declaration with register_sysctl_subdir()
drivers/base/firmware_loader/fallback.c | 4 +
drivers/base/firmware_loader/fallback.h | 11 +++
drivers/base/firmware_loader/fallback_table.c | 22 ++++-
drivers/cdrom/cdrom.c | 23 +----
drivers/char/hpet.c | 22 +----
drivers/char/random.c | 14 +++-
drivers/gpu/drm/i915/i915_perf.c | 22 +----
drivers/macintosh/mac_hid.c | 25 +-----
fs/binfmt_misc.c | 1 +
fs/eventpoll.c | 10 ++-
fs/notify/inotify/inotify_user.c | 11 ++-
fs/ocfs2/stackglue.c | 27 +-----
include/linux/inotify.h | 3 -
include/linux/poll.h | 2 -
include/linux/sysctl.h | 21 ++++-
kernel/sysctl.c | 84 +++++++++++--------
lib/test_sysctl.c | 23 +----
17 files changed, 144 insertions(+), 181 deletions(-)
--
2.26.2
^ permalink raw reply
* [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code
From: Daniel Axtens @ 2020-05-29 6:14 UTC (permalink / raw)
To: linuxppc-dev, npiggin; +Cc: ajd, Daniel Axtens
syzkaller is picking up a bunch of crashes that look like this:
Unrecoverable exception 380 at c00000000037ed60 (msr=8000000000001031)
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
NIP: c00000000037ed60 LR: c00000000004bac8 CTR: c000000000030990
REGS: c0000000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
MSR: 8000000000001031 <SF,ME,IR,DR,LE> CR: 48222882 XER: 20000000
CFAR: c00000000004bac4 IRQMASK: 0
GPR00: c00000000004bb68 c0000000555a74c0 c0000000024b3500 0000000000000005
GPR04: 0000000000000000 0000000000000000 c00000000004bb88 c008000000910000
GPR08: 00000000000b0000 c00000000004bac8 0000000000016000 c000000002503500
GPR12: c000000000030990 c000000003190000 00000000106a5898 00000000106a0000
GPR16: 00000000106a5890 c000000007a92000 c000000008180e00 c000000007a8f700
GPR20: c000000007a904b0 0000000010110000 c00000000259d318 5deadbeef0000100
GPR24: 5deadbeef0000122 c000000078422700 c000000009ee88b8 c000000078422778
GPR28: 0000000000000001 800000000280b033 0000000000000000 c0000000555a75a0
NIP [c00000000037ed60] __sanitizer_cov_trace_pc+0x40/0x50
LR [c00000000004bac8] interrupt_exit_kernel_prepare+0x118/0x310
Call Trace:
[c0000000555a74c0] [c00000000004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable)
[c0000000555a7530] [c00000000000f9a8] interrupt_return+0x118/0x1c0
--- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
...<random previous call chain>...
That looks like the KCOV helper accessing memory that's not safe to
access in the interrupt handling context.
Do not instrument the new syscall entry/exit code with KCOV, GCOV or
UBSAN.
Cc: Nicholas Piggin <npiggin@gmail.com>
Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
be warned: I haven't attempted to reproduce the crash yet,
nor have I been able to test that this fixes it. I will attempt to do
that soon. Logically though, it does seem like this would be a
good thing to do regardless.
---
arch/powerpc/kernel/Makefile | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1c4385852d3d..1d443a7dc8a7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -156,12 +156,19 @@ obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
GCOV_PROFILE_prom_init.o := n
KCOV_INSTRUMENT_prom_init.o := n
UBSAN_SANITIZE_prom_init.o := n
+
GCOV_PROFILE_kprobes.o := n
KCOV_INSTRUMENT_kprobes.o := n
UBSAN_SANITIZE_kprobes.o := n
+
GCOV_PROFILE_kprobes-ftrace.o := n
KCOV_INSTRUMENT_kprobes-ftrace.o := n
UBSAN_SANITIZE_kprobes-ftrace.o := n
+
+GCOV_PROFILE_syscall_64.o := n
+KCOV_INSTRUMENT_syscall_64.o := n
+UBSAN_SANITIZE_syscall_64.o := n
+
UBSAN_SANITIZE_vdso.o := n
# Necessary for booting with kcov enabled on book3e machines
--
2.20.1
^ permalink raw reply related
* [RFC PATCH 2/2] powerpc/pmem: Disable synchronous fault by default.
From: Aneesh Kumar K.V @ 2020-05-29 5:41 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams; +Cc: Aneesh Kumar K.V, oohall
In-Reply-To: <20200529054141.156384-1-aneesh.kumar@linux.ibm.com>
This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.
This config allows user to control whether MAP_SYNC should be enabled by
default or not.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/Kconfig.cputype | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..f8694838ad4e 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
If you're unsure, say Y.
+config ARCH_MAP_SYNC_DISABLE
+ bool "Disable synchronous fault support (MAP_SYNC)"
+ default y
+ help
+ Disable support for synchronous fault with nvdimm namespaces.
+
+ If you're unsure, say Y.
+
+
config PPC_HAVE_KUAP
bool
--
2.26.2
^ permalink raw reply related
* [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
From: Aneesh Kumar K.V @ 2020-05-29 5:41 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams; +Cc: Aneesh Kumar K.V, oohall
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.
This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
include/linux/sched/coredump.h | 13 ++++++++++---
include/uapi/linux/prctl.h | 3 +++
kernel/fork.c | 8 +++++++-
kernel/sys.c | 18 ++++++++++++++++++
mm/Kconfig | 3 +++
mm/mmap.c | 4 ++++
6 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..9ba6b3d5f991 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC 27 /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC_MASK (1 << MMF_DISABLE_MAP_SYNC)
-#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
- MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+ MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+ return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
+}
#endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
#define PR_SET_IO_FLUSHER 57
#define PR_GET_IO_FLUSHER 58
+#define PR_SET_MAP_SYNC_ENABLE 59
+#define PR_GET_MAP_SYNC_ENABLE 60
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
static int __init coredump_filter_setup(char *s)
{
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
- mm->flags = default_dump_filter;
+ mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..f6127cf4128b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, &me->mm->flags);
up_write(&me->mm->mmap_sem);
break;
+
+ case PR_GET_MAP_SYNC_ENABLE:
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = !test_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ break;
+ case PR_SET_MAP_SYNC_ENABLE:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ if (down_write_killable(&me->mm->mmap_sem))
+ return -EINTR;
+ if (arg2)
+ clear_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ else
+ set_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ up_write(&me->mm->mmap_sem);
+ break;
+
case PR_MPX_ENABLE_MANAGEMENT:
case PR_MPX_DISABLE_MANAGEMENT:
/* No longer implemented: */
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
config MAPPING_DIRTY_HELPERS
bool
+config ARCH_MAP_SYNC_DISABLE
+ bool
+
endmenu
diff --git a/mm/mmap.c b/mm/mmap.c
index f609e9ec4a25..613e5894f178 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1464,6 +1464,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
case MAP_SHARED_VALIDATE:
if (flags & ~flags_mask)
return -EOPNOTSUPP;
+
+ if ((flags & MAP_SYNC) && !map_sync_enabled(mm))
+ return -EOPNOTSUPP;
+
if (prot & PROT_WRITE) {
if (!(file->f_mode & FMODE_WRITE))
return -EACCES;
--
2.26.2
^ permalink raw reply related
* [PATCH v4 8/8] powerpc/pmem: Initialize pmem device on newer hardware
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have
compatible "ibm,pmemory-v2"
and for OF pmem device we have
compatible "pmem-region-v2"
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/pseries/papr_scm.c | 1 +
drivers/nvdimm/of_pmem.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index ad506e7003c9..407e08f08157 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -483,6 +483,7 @@ static int papr_scm_remove(struct platform_device *pdev)
static const struct of_device_id papr_scm_match[] = {
{ .compatible = "ibm,pmemory" },
+ { .compatible = "ibm,pmemory-v2" },
{ },
};
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..10dbdcdfb9ce 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -90,6 +90,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
static const struct of_device_id of_pmem_region_match[] = {
{ .compatible = "pmem-region" },
+ { .compatible = "pmem-region-v2" },
{ },
};
--
2.26.2
^ permalink raw reply related
* [PATCH v4 7/8] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions.
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
We only support persistent memory on P8 and above. This is enforced by the
firmware and further checked on virtualzied platform during platform init.
Add WARN_ONCE in pmem flush routines to catch the wrong usage of these.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/cacheflush.h | 2 ++
arch/powerpc/lib/pmem.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index bc3ea009cf14..865fae8a226e 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -125,6 +125,8 @@ static inline void arch_pmem_flush_barrier(void)
{
if (cpu_has_feature(CPU_FTR_ARCH_207S))
asm volatile(PPC_PHWSYNC ::: "memory");
+ else
+ WARN_ONCE(1, "Using pmem flush on older hardware.");
}
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 21210fa676e5..f40bd908d28d 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -37,12 +37,14 @@ static inline void clean_pmem_range(unsigned long start, unsigned long stop)
{
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __clean_pmem_range(start, stop);
+ WARN_ONCE(1, "Using pmem flush on older hardware.");
}
static inline void flush_pmem_range(unsigned long start, unsigned long stop)
{
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __flush_pmem_range(start, stop);
+ WARN_ONCE(1, "Using pmem flush on older hardware.");
}
/*
--
2.26.2
^ permalink raw reply related
* [PATCH v4 6/8] powerpc/pmem: Avoid the barrier in flush routines
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().
Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/lib/pmem.c | 6 ------
arch/powerpc/platforms/pseries/papr_scm.c | 13 +++++++++++++
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
- asm volatile(PPC_PHWSYNC ::: "memory");
}
static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
- asm volatile(PPC_PHWSYNC ::: "memory");
}
static inline void clean_pmem_range(unsigned long start, unsigned long stop)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+ arch_pmem_flush_barrier();
+ return 0;
+}
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = &mapping;
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
+ ndr_desc.flush = papr_scm_flush_sync;
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
--
2.26.2
^ permalink raw reply related
* [PATCH v4 4/8] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()
Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
drivers/md/dm-writecache.c | 2 +-
drivers/nvdimm/region_devs.c | 8 ++++----
include/linux/libnvdimm.h | 4 ++++
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 613c171b1b6d..904fdbf2b089 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -540,7 +540,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
{
if (WC_MODE_PMEM(wc))
- wmb();
+ arch_pmem_flush_barrier();
else
ssd_commit_flushed(wc, wait_for_ios);
}
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..88ea34a9c7fd 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
/*
- * The first wmb() is needed to 'sfence' all previous writes
- * such that they are architecturally visible for the platform
- * buffer flush. Note that we've already arranged for pmem
+ * The first arch_pmem_flush_barrier() is needed to 'sfence' all
+ * previous writes such that they are architecturally visible for
+ * the platform buffer flush. Note that we've already arranged for pmem
* writes to avoid the cache via memcpy_flushcache(). The final
* wmb() ensures ordering for the NVDIMM flush write.
*/
- wmb();
+ arch_pmem_flush_barrier();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd_get_flush_wpq(ndrd, i, 0))
writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
}
#endif
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
#endif /* __LIBNVDIMM_H__ */
--
2.26.2
^ permalink raw reply related
* [PATCH v4 5/8] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/cacheflush.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index e92191b390f3..bc3ea009cf14 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -119,6 +119,13 @@ static inline void invalidate_dcache_range(unsigned long start,
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
+
+#define arch_pmem_flush_barrier arch_pmem_flush_barrier
+static inline void arch_pmem_flush_barrier(void)
+{
+ if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ asm volatile(PPC_PHWSYNC ::: "memory");
+}
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_CACHEFLUSH_H */
--
2.26.2
^ permalink raw reply related
* [PATCH v4 3/8] powerpc/pmem: Add flush routines using new pmem store and sync instruction
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/lib/pmem.c | 50 +++++++++++++++++++++++++++++++++++++----
1 file changed, 46 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
#include <asm/cacheflush.h>
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+ unsigned long shift = l1_dcache_shift();
+ unsigned long bytes = l1_dcache_bytes();
+ void *addr = (void *)(start & ~(bytes - 1));
+ unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+ unsigned long i;
+
+ for (i = 0; i < size >> shift; i++, addr += bytes)
+ asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+ asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+ unsigned long shift = l1_dcache_shift();
+ unsigned long bytes = l1_dcache_bytes();
+ void *addr = (void *)(start & ~(bytes - 1));
+ unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+ unsigned long i;
+
+ for (i = 0; i < size >> shift; i++, addr += bytes)
+ asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+ asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+ if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+ if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ return __flush_pmem_range(start, stop);
+}
+
/*
* CONFIG_ARCH_HAS_PMEM_API symbols
*/
void arch_wb_cache_pmem(void *addr, size_t size)
{
unsigned long start = (unsigned long) addr;
- flush_dcache_range(start, start + size);
+ clean_pmem_range(start, start + size);
}
EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
void arch_invalidate_pmem(void *addr, size_t size)
{
unsigned long start = (unsigned long) addr;
- flush_dcache_range(start, start + size);
+ flush_pmem_range(start, start + size);
}
EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
unsigned long copied, start = (unsigned long) dest;
copied = __copy_from_user(dest, src, size);
- flush_dcache_range(start, start + size);
+ clean_pmem_range(start, start + size);
return copied;
}
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
unsigned long start = (unsigned long) dest;
memcpy(dest, src, size);
- flush_dcache_range(start, start + size);
+ clean_pmem_range(start, start + size);
return dest;
}
--
2.26.2
^ permalink raw reply related
* [PATCH v4 2/8] powerpc/pmem: Add new instructions for persistent storage and sync
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.
Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.
This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
#define PPC_INST_STWCX 0x7c00012d
#define PPC_INST_LWSYNC 0x7c2004ac
#define PPC_INST_SYNC 0x7c0004ac
+#define PPC_INST_PHWSYNC 0x7c8004ac
+#define PPC_INST_PLWSYNC 0x7ca004ac
#define PPC_INST_SYNC_MASK 0xfc0007fe
#define PPC_INST_ISYNC 0x4c00012c
#define PPC_INST_LXVD2X 0x7c000698
@@ -284,6 +286,8 @@
#define PPC_INST_TABORT 0x7c00071d
#define PPC_INST_TSR 0x7c0005dd
+#define PPC_INST_DCBF 0x7c0000ac
+
#define PPC_INST_NAP 0x4c000364
#define PPC_INST_SLEEP 0x4c0003a4
#define PPC_INST_WINKLE 0x4c0003e4
@@ -532,6 +536,14 @@
#define STBCIX(s,a,b) stringify_in_c(.long PPC_INST_STBCIX | \
__PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
+#define PPC_DCBFPS(a, b) stringify_in_c(.long PPC_INST_DCBF | \
+ ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#define PPC_DCBSTPS(a, b) stringify_in_c(.long PPC_INST_DCBF | \
+ ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#define PPC_PHWSYNC stringify_in_c(.long PPC_INST_PHWSYNC)
+#define PPC_PLWSYNC stringify_in_c(.long PPC_INST_PLWSYNC)
+
/*
* Define what the VSX XX1 form instructions will look like, then add
* the 128 bit load store instructions based on that.
--
2.26.2
^ permalink raw reply related
* [PATCH v4 0/8] Support new pmem flush and sync instructions for POWER
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.
On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.
Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.
Aneesh Kumar K.V (8):
powerpc/pmem: Restrict papr_scm to P8 and above.
powerpc/pmem: Add new instructions for persistent storage and sync
powerpc/pmem: Add flush routines using new pmem store and sync
instruction
libnvdimm/nvdimm/flush: Allow architecture to override the flush
barrier
powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
instruction.
powerpc/pmem: Avoid the barrier in flush routines
powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
flush functions.
powerpc/pmem: Initialize pmem device on newer hardware
arch/powerpc/include/asm/cacheflush.h | 9 +++++
arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++
arch/powerpc/lib/pmem.c | 46 +++++++++++++++++++++--
arch/powerpc/platforms/pseries/papr_scm.c | 14 +++++++
arch/powerpc/platforms/pseries/pmem.c | 6 +++
drivers/md/dm-writecache.c | 2 +-
drivers/nvdimm/of_pmem.c | 1 +
drivers/nvdimm/region_devs.c | 8 ++--
include/linux/libnvdimm.h | 4 ++
9 files changed, 93 insertions(+), 9 deletions(-)
--
2.26.2
^ permalink raw reply
* [PATCH v4 1/8] powerpc/pmem: Restrict papr_scm to P8 and above.
From: Aneesh Kumar K.V @ 2020-05-29 5:28 UTC (permalink / raw)
To: linuxppc-dev, mpe, linux-nvdimm; +Cc: Aneesh Kumar K.V, dan.j.williams, oohall
In-Reply-To: <20200529052820.151651-1-aneesh.kumar@linux.ibm.com>
The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.
Considering these devices are only supported on P8 and above, update the driver
to prevent a P7-compat guest from using persistent memory devices.
We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
static int pseries_pmem_init(void)
{
+ /*
+ * Only supported on POWER8 and above.
+ */
+ if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+ return 0;
+
pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
if (!pmem_node)
return 0;
--
2.26.2
^ permalink raw reply related
* Re: [PATCH] powerpc/xive: Enforce load-after-store ordering when StoreEOI is active
From: Michael Ellerman @ 2020-05-29 4:30 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Alistair Popple, linuxppc-dev, Greg Kurz, Paul Mackerras,
Cédric Le Goater
In-Reply-To: <20200220081506.31209-1-clg@kaod.org>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]
On Thu, 2020-02-20 at 08:15:06 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= wrote:
> When an interrupt has been handled, the OS notifies the interrupt
> controller with a EOI sequence. On a POWER9 system using the XIVE
> interrupt controller, this can be done with a load or a store
> operation on the ESB interrupt management page of the interrupt. The
> StoreEOI operation has less latency and improves interrupt handling
> performance but it was deactivated during the POWER9 DD2.0 timeframe
> because of ordering issues. We use the LoadEOI today but we plan to
> reactivate StoreEOI in future architectures.
>
> There is usually no need to enforce ordering between ESB load and
> store operations as they should lead to the same result. E.g. a store
> trigger and a load EOI can be executed in any order. Assuming the
> interrupt state is PQ=10, a store trigger followed by a load EOI will
> return a Q bit. In the reverse order, it will create a new interrupt
> trigger from HW. In both cases, the handler processing interrupts is
> notified.
>
> In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
> disable temporarily the interrupt source (mask/unmask). When the
> source is reenabled, the OS can detect if interrupts were received
> while the source was disabled and reinject them. This process needs
> special care when StoreEOI is activated. The ESB load and store
> operations should be correctly ordered because a XIVE_ESB_STORE_EOI
> operation could leave the source enabled if it has not completed
> before the loads.
>
> For those cases, we enforce Load-after-Store ordering with a special
> load operation offset. To avoid performance impact, this ordering is
> only enforced when really needed, that is when interrupt sources are
> temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
> be needed for other loads.
>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
Applied to powerpc topic/ppc-kvm, thanks.
https://git.kernel.org/powerpc/c/b1f9be9392f090f08e4ad9e2c68963aeff03bd67
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/uaccess: Don't use "m<>" constraint
From: Michael Ellerman @ 2020-05-29 4:24 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200507123324.2250024-1-mpe@ellerman.id.au>
On Thu, 2020-05-07 at 12:33:24 UTC, Michael Ellerman wrote:
> The "m<>" constraint breaks compilation with GCC 4.6.x era compilers.
>
> The use of the constraint allows the compiler to use update-form
> instructions, however in practice current compilers never generate
> those forms for any of the current uses of __put_user_asm_goto().
>
> We anticipate that GCC 4.6 will be declared unsupported for building
> the kernel in the not too distant future. So for now just switch to
> the "m" constraint.
>
> Fixes: 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'")
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Applied to powerpc topic/uaccess-ppc.
https://git.kernel.org/powerpc/c/e2a8b49e79553bd8ec48f73cead84e6146c09408
cheers
^ permalink raw reply
* Re: [PATCH v4 2/2] powerpc/uaccess: Implement unsafe_copy_to_user() as a simple loop
From: Michael Ellerman @ 2020-05-29 4:24 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, npiggin,
segher
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <fe952112c29bf6a0a2778c9e6bbb4f4afd2c4258.1587143308.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-17 at 17:08:52 UTC, Christophe Leroy wrote:
> At the time being, unsafe_copy_to_user() is based on
> raw_copy_to_user() which calls __copy_tofrom_user().
>
> __copy_tofrom_user() is a big optimised function to copy big amount
> of data. It aligns destinations to cache line in order to use
> dcbz instruction.
>
> Today unsafe_copy_to_user() is called only from filldir().
> It is used to mainly copy small amount of data like filenames,
> so __copy_tofrom_user() is not fit.
>
> Also, unsafe_copy_to_user() is used within user_access_begin/end
> sections. In those section, it is preferable to not call functions.
>
> Rewrite unsafe_copy_to_user() as a macro that uses __put_user_goto().
> We first perform a loop of long, then we finish with necessary
> complements.
>
> unsafe_copy_to_user() might be used in the near future to copy
> fixed-size data, like pt_regs structs during signal processing.
> Having it as a macro allows GCC to optimise it for instead when
> it knows the size in advance, it can unloop loops, drop complements
> when the size is a multiple of longs, etc ...
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Applied to powerpc topic/uaccess-ppc, thanks.
https://git.kernel.org/powerpc/c/17bc43367fc2a720400d21c745db641c654c1e6b
cheers
^ permalink raw reply
* Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'
From: Michael Ellerman @ 2020-05-29 4:24 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, npiggin,
segher
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <23e680624680a9a5405f4b88740d2596d4b17c26.1587143308.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-17 at 17:08:51 UTC, Christophe Leroy wrote:
> unsafe_put_user() is designed to take benefit of 'asm goto'.
>
> Instead of using the standard __put_user() approach and branch
> based on the returned error, use 'asm goto' and make the
> exception code branch directly to the error label. There is
> no code anymore in the fixup section.
>
> This change significantly simplifies functions using
> unsafe_put_user()
...
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Applied to powerpc topic/uaccess-ppc, thanks.
https://git.kernel.org/powerpc/c/334710b1496af8a0960e70121f850e209c20958f
cheers
^ permalink raw reply
* Re: [PATCH v3] powerpc/uaccess: evaluate macro arguments once, before user access is allowed
From: Michael Ellerman @ 2020-05-29 4:24 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200407041245.600651-1-npiggin@gmail.com>
On Tue, 2020-04-07 at 04:12:45 UTC, Nicholas Piggin wrote:
> get/put_user can be called with nontrivial arguments. fs/proc/page.c
> has a good example:
>
> if (put_user(stable_page_flags(ppage), out)) {
>
> stable_page_flags is quite a lot of code, including spin locks in the
> page allocator.
>
> Ensure these arguments are evaluated before user access is allowed.
> This improves security by reducing code with access to userspace, but
> it also fixes a PREEMPT bug with KUAP on powerpc/64s:
> stable_page_flags is currently called with AMR set to allow writes,
> it ends up calling spin_unlock(), which can call preempt_schedule. But
> the task switch code can not be called with AMR set (it relies on
> interrupts saving the register), so this blows up.
>
> It's fine if the code inside allow_user_access is preemptible, because
> a timer or IPI will save the AMR, but it's not okay to explicitly
> cause a reschedule.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Applied to powerpc topic/uaccess-ppc, thanks.
https://git.kernel.org/powerpc/c/d02f6b7dab8228487268298ea1f21081c0b4b3eb
cheers
^ permalink raw reply
* Re: [PATCH v2 4/5] powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin
From: Michael Ellerman @ 2020-05-29 4:24 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, airlied,
daniel, torvalds, viro, akpm, keescook, hpa
Cc: linux-arch, linux-mm, intel-gfx, linuxppc-dev, linux-kernel
In-Reply-To: <6c83af0f0809ef2a955c39ac622767f6cbede035.1585898438.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-03 at 07:20:53 UTC, Christophe Leroy wrote:
> Add support for selective read or write user access with
> user_read_access_begin/end and user_write_access_begin/end.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Kees Cook <keescook@chromium.org>
Applied to powerpc topic/uaccess-ppc, thanks.
https://git.kernel.org/powerpc/c/4fe5cda9f89d0aea8e915b7c96ae34bda4e12e51
cheers
^ permalink raw reply
* Re: [PATCH v2 3/5] drm/i915/gem: Replace user_access_begin by user_write_access_begin
From: Michael Ellerman @ 2020-05-29 4:20 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, airlied,
daniel, torvalds, viro, akpm, keescook, hpa
Cc: linux-arch, linux-mm, intel-gfx, linuxppc-dev, linux-kernel
In-Reply-To: <ebf1250b6d4f351469fb339e5399d8b92aa8a1c1.1585898438.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-03 at 07:20:52 UTC, Christophe Leroy wrote:
> When i915_gem_execbuffer2_ioctl() is using user_access_begin(),
> that's only to perform unsafe_put_user() so use
> user_write_access_begin() in order to only open write access.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Kees Cook <keescook@chromium.org>
Applied to powerpc topic/uaccess, thanks.
https://git.kernel.org/powerpc/c/b44f687386875b714dae2afa768e73401e45c21c
cheers
^ permalink raw reply
* Re: [PATCH v2 2/5] uaccess: Selectively open read or write user access
From: Michael Ellerman @ 2020-05-29 4:20 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, airlied,
daniel, torvalds, viro, akpm, keescook, hpa
Cc: linux-arch, linux-mm, intel-gfx, linuxppc-dev, linux-kernel
In-Reply-To: <2e73bc57125c2c6ab12a587586a4eed3a47105fc.1585898438.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-03 at 07:20:51 UTC, Christophe Leroy wrote:
> When opening user access to only perform reads, only open read access.
> When opening user access to only perform writes, only open write
> access.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Kees Cook <keescook@chromium.org>
Applied to powerpc topic/uaccess, thanks.
https://git.kernel.org/powerpc/c/41cd780524674082b037e7c8461f90c5e42103f0
cheers
^ permalink raw reply
* Re: [PATCH v2 1/5] uaccess: Add user_read_access_begin/end and user_write_access_begin/end
From: Michael Ellerman @ 2020-05-29 4:20 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, airlied,
daniel, torvalds, viro, akpm, keescook, hpa
Cc: linux-arch, linux-mm, intel-gfx, linuxppc-dev, linux-kernel
In-Reply-To: <36e43241c7f043a24b5069e78c6a7edd11043be5.1585898438.git.christophe.leroy@c-s.fr>
On Fri, 2020-04-03 at 07:20:50 UTC, Christophe Leroy wrote:
> Some architectures like powerpc64 have the capability to separate
> read access and write access protection.
> For get_user() and copy_from_user(), powerpc64 only open read access.
> For put_user() and copy_to_user(), powerpc64 only open write access.
> But when using unsafe_get_user() or unsafe_put_user(),
> user_access_begin open both read and write.
>
> Other architectures like powerpc book3s 32 bits only allow write
> access protection. And on this architecture protection is an heavy
> operation as it requires locking/unlocking per segment of 256Mbytes.
> On those architecture it is therefore desirable to do the unlocking
> only for write access. (Note that book3s/32 ranges from very old
> powermac from the 90's with powerpc 601 processor, till modern
> ADSL boxes with PowerQuicc II processors for instance so it
> is still worth considering.)
>
> In order to avoid any risk based of hacking some variable parameters
> passed to user_access_begin/end that would allow hacking and
> leaving user access open or opening too much, it is preferable to
> use dedicated static functions that can't be overridden.
>
> Add a user_read_access_begin and user_read_access_end to only open
> read access.
>
> Add a user_write_access_begin and user_write_access_end to only open
> write access.
>
> By default, when undefined, those new access helpers default on the
> existing user_access_begin and user_access_end.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Kees Cook <keescook@chromium.org>
Applied to powerpc topic/uaccess, thanks.
https://git.kernel.org/powerpc/c/999a22890cb183b918e4372395d24426a755cef2
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message
From: Michael Ellerman @ 2020-05-29 4:04 UTC (permalink / raw)
To: Yi Wang
Cc: wang.yi59, tony.luck, keescook, wang.liang82, gregkh, anton,
linux-kernel, paulus, Liao Pingfang, xue.zhihong, ccross, tglx,
linuxppc-dev, allison
In-Reply-To: <1590714135-15818-1-git-send-email-wang.yi59@zte.com.cn>
Yi Wang <wang.yi59@zte.com.cn> writes:
> From: Liao Pingfang <liao.pingfang@zte.com.cn>
>
> Use kzalloc instead of kmalloc in the error message according to
> the previous kzalloc() call.
Please just remove the message instead, it's a tiny allocation that's
unlikely to ever fail, and the caller will print an error anyway.
cheers
> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
> index fb4f610..c3a0c8d 100644
> --- a/arch/powerpc/kernel/nvram_64.c
> +++ b/arch/powerpc/kernel/nvram_64.c
> @@ -892,7 +892,7 @@ loff_t __init nvram_create_partition(const char *name, int sig,
> /* Create our OS partition */
> new_part = kzalloc(sizeof(*new_part), GFP_KERNEL);
> if (!new_part) {
> - pr_err("%s: kmalloc failed\n", __func__);
> + pr_err("%s: kzalloc failed\n", __func__);
> return -ENOMEM;
> }
>
> --
> 2.9.5
^ permalink raw reply
* Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset
From: Sam Bobroff @ 2020-05-29 3:17 UTC (permalink / raw)
To: wenxiong; +Cc: brking, oohall, linuxppc-dev, wenxiong
In-Reply-To: <1590499319-6472-1-git-send-email-wenxiong@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 6543 bytes --]
On Tue, May 26, 2020 at 08:21:59AM -0500, wenxiong@linux.vnet.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
>
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code when triggering kdump. PHB reset stop all PCI
> transactions from normal kernel. We have tested the patch in several
> enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
>
> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Looks good to me.
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
(It would be easier to review if you included a patchset change log and
CC'd people who reviewed earlier versions.)
Cheers,
Sam.
>
> ---
> arch/powerpc/platforms/pseries/pci.c | 152 +++++++++++++++++++++++++++
> 1 file changed, 152 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
> index 911534b89c85..cb7e4276cf04 100644
> --- a/arch/powerpc/platforms/pseries/pci.c
> +++ b/arch/powerpc/platforms/pseries/pci.c
> @@ -11,6 +11,8 @@
> #include <linux/kernel.h>
> #include <linux/pci.h>
> #include <linux/string.h>
> +#include <linux/crash_dump.h>
> +#include <linux/delay.h>
>
> #include <asm/eeh.h>
> #include <asm/pci-bridge.h>
> @@ -354,3 +356,153 @@ int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
>
> return 0;
> }
> +
> +/**
> + * pseries_get_pdn_addr - Retrieve PHB address
> + * @pe: EEH PE
> + *
> + * Retrieve the assocated PHB address. Actually, there're 2 RTAS
> + * function calls dedicated for the purpose. We need implement
> + * it through the new function and then the old one. Besides,
> + * you should make sure the config address is figured out from
> + * FDT node before calling the function.
> + *
> + */
> +static int pseries_get_pdn_addr(struct pci_controller *phb)
> +{
> + int ret = -1;
> + int rets[3];
> + int ibm_get_config_addr_info;
> + int ibm_get_config_addr_info2;
> + int config_addr = 0;
> + struct pci_dn *root_pdn, *pdn;
> +
> + ibm_get_config_addr_info2 = rtas_token("ibm,get-config-addr-info2");
> + ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info");
> +
> + root_pdn = PCI_DN(phb->dn);
> + pdn = list_first_entry(&root_pdn->child_list, struct pci_dn, list);
> + config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
> +
> + if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
> + /*
> + * First of all, we need to make sure there has one PE
> + * associated with the device. If option is 1, it
> + * queries if config address is supported in a PE or not.
> + * If option is 0, it returns PE config address or config
> + * address for the PE primary bus.
> + */
> + ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 1);
> + if (ret || (rets[0] == 0)) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# option=%d config_addr=%x\n",
> + __func__, pdn->phb->global_number, 1, rets[0]);
> + return -1;
> + }
> +
> + /* Retrieve the associated PE config address */
> + ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 0);
> + if (ret) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# option=%d config_addr=%x\n",
> + __func__, pdn->phb->global_number, 0, rets[0]);
> + return -1;
> + }
> + return rets[0];
> + }
> +
> + if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
> + ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
> + config_addr, BUID_HI(pdn->phb->buid),
> + BUID_LO(pdn->phb->buid), 0);
> + if (ret || rets[0]) {
> + pr_warn("%s: Failed to get address for PHB#%x-PE# config_addr=%x\n",
> + __func__, pdn->phb->global_number, rets[0]);
> + return -1;
> + }
> + return rets[0];
> + }
> +
> + return ret;
> +}
> +
> +static int __init pseries_phb_reset(void)
> +{
> + struct pci_controller *phb;
> + int config_addr;
> + int ibm_set_slot_reset;
> + int ibm_configure_pe;
> + int ret;
> +
> + if (is_kdump_kernel() || reset_devices) {
> + pr_info("Issue PHB reset ...\n");
> + ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
> + ibm_configure_pe = rtas_token("ibm,configure-pe");
> +
> + if (ibm_set_slot_reset == RTAS_UNKNOWN_SERVICE ||
> + ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
> + pr_info("%s: EEH functionality not supported\n",
> + __func__);
> + }
> +
> + list_for_each_entry(phb, &hose_list, list_node) {
> + config_addr = pseries_get_pdn_addr(phb);
> + if (config_addr == -1)
> + continue;
> +
> + ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> + config_addr, BUID_HI(phb->buid),
> + BUID_LO(phb->buid), EEH_RESET_FUNDAMENTAL);
> +
> + /* If fundamental-reset not supported, try hot-reset */
> + if (ret == -8)
> + ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> + config_addr, BUID_HI(phb->buid),
> + BUID_LO(phb->buid), EEH_RESET_HOT);
> +
> + if (ret) {
> + pr_err("%s: PHB#%x-PE# failed with rtas_call activate reset=%d\n",
> + __func__, phb->global_number, ret);
> + continue;
> + }
> + }
> + msleep(EEH_PE_RST_SETTLE_TIME);
> +
> + list_for_each_entry(phb, &hose_list, list_node) {
> + config_addr = pseries_get_pdn_addr(phb);
> + if (config_addr == -1)
> + continue;
> +
> + ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> + config_addr, BUID_HI(phb->buid),
> + BUID_LO(phb->buid), EEH_RESET_DEACTIVATE);
> + if (ret) {
> + pr_err("%s: PHB#%x-PE# failed with rtas_call deactive reset=%d\n",
> + __func__, phb->global_number, ret);
> + continue;
> + }
> + }
> + msleep(EEH_PE_RST_SETTLE_TIME);
> +
> + list_for_each_entry(phb, &hose_list, list_node) {
> + config_addr = pseries_get_pdn_addr(phb);
> + if (config_addr == -1)
> + continue;
> +
> + ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
> + config_addr, BUID_HI(phb->buid),
> + BUID_LO(phb->buid));
> + if (ret) {
> + pr_err("%s: PHB#%x-PE# failed with rtas_call configure_pe =%d\n",
> + __func__, phb->global_number, ret);
> + continue;
> + }
> + }
> + }
> +
> + return 0;
> +}
> +machine_postcore_initcall(pseries, pseries_phb_reset);
> +
> --
> 2.18.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox