* [PATCH seccomp 5/8] s390: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
To: containers
Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>
From: YiFei Zhu <yifeifz2@illinois.edu>
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for s390.
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
arch/s390/include/asm/seccomp.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/s390/include/asm/seccomp.h b/arch/s390/include/asm/seccomp.h
index 795bbe0d7ca6..71d46f0ba97b 100644
--- a/arch/s390/include/asm/seccomp.h
+++ b/arch/s390/include/asm/seccomp.h
@@ -16,4 +16,13 @@
#include <asm-generic/seccomp.h>
+#define SECCOMP_ARCH_NATIVE AUDIT_ARCH_S390X
+#define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+#define SECCOMP_ARCH_NATIVE_NAME "s390x"
+#ifdef CONFIG_COMPAT
+# define SECCOMP_ARCH_COMPAT AUDIT_ARCH_S390
+# define SECCOMP_ARCH_COMPAT_NR NR_syscalls
+# define SECCOMP_ARCH_COMPAT_NAME "s390"
+#endif
+
#endif /* _ASM_S390_SECCOMP_H */
--
2.29.2
^ permalink raw reply related
* [PATCH seccomp 2/8] parisc: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
To: containers
Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>
From: YiFei Zhu <yifeifz2@illinois.edu>
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for parisc.
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
arch/parisc/include/asm/Kbuild | 1 -
arch/parisc/include/asm/seccomp.h | 22 ++++++++++++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
create mode 100644 arch/parisc/include/asm/seccomp.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index e3ee5c0bfe80..f16c4db80116 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -5,5 +5,4 @@ generated-y += syscall_table_c32.h
generic-y += kvm_para.h
generic-y += local64.h
generic-y += mcs_spinlock.h
-generic-y += seccomp.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/seccomp.h b/arch/parisc/include/asm/seccomp.h
new file mode 100644
index 000000000000..b058b2220322
--- /dev/null
+++ b/arch/parisc/include/asm/seccomp.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include <asm-generic/seccomp.h>
+
+#ifdef CONFIG_64BIT
+# define SECCOMP_ARCH_NATIVE AUDIT_ARCH_PARISC64
+# define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME "parisc64"
+# ifdef CONFIG_COMPAT
+# define SECCOMP_ARCH_COMPAT AUDIT_ARCH_PARISC
+# define SECCOMP_ARCH_COMPAT_NR NR_syscalls
+# define SECCOMP_ARCH_COMPAT_NAME "parisc"
+# endif
+#else /* !CONFIG_64BIT */
+# define SECCOMP_ARCH_NATIVE AUDIT_ARCH_PARISC
+# define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME "parisc"
+#endif
+
+#endif /* _ASM_SECCOMP_H */
--
2.29.2
^ permalink raw reply related
* [PATCH seccomp 3/8] powerpc: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
To: containers
Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>
From: YiFei Zhu <yifeifz2@illinois.edu>
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for powerpc.
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
arch/powerpc/include/asm/seccomp.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/powerpc/include/asm/seccomp.h b/arch/powerpc/include/asm/seccomp.h
index 51209f6071c5..3efcc83e9cc6 100644
--- a/arch/powerpc/include/asm/seccomp.h
+++ b/arch/powerpc/include/asm/seccomp.h
@@ -8,4 +8,25 @@
#include <asm-generic/seccomp.h>
+#ifdef __LITTLE_ENDIAN__
+#define __SECCOMP_ARCH_LE_BIT __AUDIT_ARCH_LE
+#else
+#define __SECCOMP_ARCH_LE_BIT 0
+#endif
+
+#ifdef CONFIG_PPC64
+# define SECCOMP_ARCH_NATIVE (AUDIT_ARCH_PPC64 | __SECCOMP_ARCH_LE)
+# define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME "ppc64"
+# ifdef CONFIG_COMPAT
+# define SECCOMP_ARCH_COMPAT (AUDIT_ARCH_PPC | __SECCOMP_ARCH_LE)
+# define SECCOMP_ARCH_COMPAT_NR NR_syscalls
+# define SECCOMP_ARCH_COMPAT_NAME "powerpc"
+# endif
+#else /* !CONFIG_PPC64 */
+# define SECCOMP_ARCH_NATIVE (AUDIT_ARCH_PPC | __SECCOMP_ARCH_LE)
+# define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME "powerpc"
+#endif
+
#endif /* _ASM_POWERPC_SECCOMP_H */
--
2.29.2
^ permalink raw reply related
* [PATCH seccomp 0/8] seccomp: add bitmap cache support on remaining arches and report cache in procfs
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
To: containers
Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
From: YiFei Zhu <yifeifz2@illinois.edu>
This patch series enables bitmap cache for the remaining arches with
SECCOMP_FILTER, other than MIPS.
I was unable to find any of the arches having subarch-specific NR_syscalls
macros, so generic NR_syscalls is used. SH's syscall_get_arch seems to
only have the 32-bit subarch implementation. I'm not sure if this is
expected.
This series has not been tested; I have not built all the cross compilers
necessary to build test, let alone run the kernel or benchmark the
performance, so help on making sure the bitmap cache works as expected
would be appreciated. The series applies on top of Kees's for-next/seccomp
branch.
YiFei Zhu (8):
csky: Enable seccomp architecture tracking
parisc: Enable seccomp architecture tracking
powerpc: Enable seccomp architecture tracking
riscv: Enable seccomp architecture tracking
s390: Enable seccomp architecture tracking
sh: Enable seccomp architecture tracking
xtensa: Enable seccomp architecture tracking
seccomp/cache: Report cache data through /proc/pid/seccomp_cache
arch/Kconfig | 15 ++++++++
arch/csky/include/asm/Kbuild | 1 -
arch/csky/include/asm/seccomp.h | 11 ++++++
arch/parisc/include/asm/Kbuild | 1 -
arch/parisc/include/asm/seccomp.h | 22 +++++++++++
arch/powerpc/include/asm/seccomp.h | 21 +++++++++++
arch/riscv/include/asm/seccomp.h | 10 +++++
arch/s390/include/asm/seccomp.h | 9 +++++
arch/sh/include/asm/seccomp.h | 10 +++++
arch/xtensa/include/asm/Kbuild | 1 -
arch/xtensa/include/asm/seccomp.h | 11 ++++++
fs/proc/base.c | 6 +++
include/linux/seccomp.h | 7 ++++
kernel/seccomp.c | 59 ++++++++++++++++++++++++++++++
14 files changed, 181 insertions(+), 3 deletions(-)
create mode 100644 arch/csky/include/asm/seccomp.h
create mode 100644 arch/parisc/include/asm/seccomp.h
create mode 100644 arch/xtensa/include/asm/seccomp.h
base-commit: 38c37e8fd3d2590c4234d8cfbc22158362f0eb04
--
2.29.2
^ permalink raw reply
* [PATCH seccomp 1/8] csky: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
To: containers
Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>
From: YiFei Zhu <yifeifz2@illinois.edu>
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for csky.
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
arch/csky/include/asm/Kbuild | 1 -
arch/csky/include/asm/seccomp.h | 11 +++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
create mode 100644 arch/csky/include/asm/seccomp.h
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 64876e59e2ef..93372255984d 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -4,6 +4,5 @@ generic-y += gpio.h
generic-y += kvm_para.h
generic-y += local64.h
generic-y += qrwlock.h
-generic-y += seccomp.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/csky/include/asm/seccomp.h b/arch/csky/include/asm/seccomp.h
new file mode 100644
index 000000000000..d33e758126fb
--- /dev/null
+++ b/arch/csky/include/asm/seccomp.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include <asm-generic/seccomp.h>
+
+#define SECCOMP_ARCH_NATIVE AUDIT_ARCH_CSKY
+#define SECCOMP_ARCH_NATIVE_NR NR_syscalls
+#define SECCOMP_ARCH_NATIVE_NAME "csky"
+
+#endif /* _ASM_SECCOMP_H */
--
2.29.2
^ permalink raw reply related
* Re: [patch V3 05/37] asm-generic: Provide kmap_size.h
From: Arnd Bergmann @ 2020-11-03 12:25 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
Joonas Lahtinen, dri-devel, open list:BROADCOM NVRAM DRIVER,
Ben Segall, Chris Mason, Huang Rui, Paul Mackerras, Gerd Hoffmann,
Daniel Bristot de Oliveira, sparclinux, Rodrigo Vivi,
Vincent Chen, Christoph Hellwig, Vincent Guittot, Paul McKenney,
Max Filippov, the arch/x86 maintainers, Russell King, linux-csky,
Ingo Molnar, David Airlie, VMware Graphics, Mel Gorman,
ML nouveau, Dave Airlie, open list:SYNOPSYS ARC ARCHITECTURE,
Ben Skeggs, linux-xtensa, Arnd Bergmann, Intel Graphics,
Roland Scheidegger, Josef Bacik, Steven Rostedt, Linus Torvalds,
Alexander Viro, spice-devel, David Sterba, virtualization,
Dietmar Eggemann, Linux ARM, Jani Nikula, Chris Zankel,
Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
Vineet Gupta, LKML, Christian Koenig, Benjamin LaHaise,
Daniel Vetter, Linux FS-devel Mailing List, Andrew Morton,
linuxppc-dev, David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <20201103095857.078043987@linutronix.de>
On Tue, Nov 3, 2020 at 10:27 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> kmap_types.h is a misnomer because the old atomic MAP based array does not
> exist anymore and the whole indirection of architectures including
> kmap_types.h is inconinstent and does not allow to provide guard page
> debugging for this misfeature.
>
> Add a common header file which defines the mapping stack size for all
> architectures. Will be used when converting architectures over to a
> generic kmap_local/atomic implementation.
>
> The array size is chosen with the following constraints in mind:
>
> - The deepest nest level in one context is 3 according to code
> inspection.
>
> - The worst case nesting for the upcoming reemptible version would be:
>
> 2 maps in task context and a fault inside
> 2 maps in the fault handler
> 3 maps in softirq
> 2 maps in interrupt
>
> So a total of 16 is sufficient and probably overestimated.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
^ permalink raw reply
* Re: [patch V3 03/37] fs: Remove asm/kmap_types.h includes
From: David Sterba @ 2020-11-03 11:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, linux-mm,
Huang Rui, Paul Mackerras, Gerd Hoffmann,
Daniel Bristot de Oliveira, sparclinux, Rodrigo Vivi,
Vincent Chen, Christoph Hellwig, Vincent Guittot, Paul McKenney,
Max Filippov, x86, Russell King, linux-csky, Ingo Molnar,
David Airlie, VMware Graphics, Mel Gorman, nouveau, Dave Airlie,
linux-snps-arc, Ben Skeggs, linux-xtensa, Arnd Bergmann,
intel-gfx, Roland Scheidegger, Josef Bacik, Steven Rostedt,
Linus Torvalds, Alexander Viro, spice-devel, David Sterba,
virtualization, Dietmar Eggemann, linux-arm-kernel, Jani Nikula,
Chris Zankel, Michal Simek, Thomas Bogendoerfer, Nick Hu,
Chris Mason, Vineet Gupta, LKML, Christian Koenig,
Benjamin LaHaise, Daniel Vetter, linux-fsdevel, Andrew Morton,
linuxppc-dev, David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <20201103095856.870272797@linutronix.de>
On Tue, Nov 03, 2020 at 10:27:15AM +0100, Thomas Gleixner wrote:
> Historical leftovers from the time where kmap() had fixed slots.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Benjamin LaHaise <bcrl@kvack.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-aio@kvack.org
> Cc: Chris Mason <clm@fb.com>
> Cc: Josef Bacik <josef@toxicpanda.com>
> Cc: David Sterba <dsterba@suse.com>
Acked-by: David Sterba <dsterba@suse.com>
For the btrfs bits
> fs/btrfs/ctree.h | 1 -
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -17,7 +17,6 @@
> #include <linux/wait.h>
> #include <linux/slab.h>
> #include <trace/events/btrfs.h>
> -#include <asm/kmap_types.h>
> #include <asm/unaligned.h>
> #include <linux/pagemap.h>
> #include <linux/btrfs.h>
^ permalink raw reply
* [PATCH v13 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v12: Added missing prototype
---
arch/powerpc/include/asm/vdso/gettimeofday.h | 2 ++
arch/powerpc/kernel/vdso32/gettimeofday.S | 9 +++++++++
arch/powerpc/kernel/vdso32/vdso32.lds.S | 1 +
arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++++++
4 files changed, 18 insertions(+)
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
index cdbd297e61b8..f590b88bef1b 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -187,6 +187,8 @@ int __c_kernel_clock_getres(clockid_t clock_id, struct __kernel_timespec *res,
#else
int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
const struct vdso_data *vd);
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+ const struct vdso_data *vd);
int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
const struct vdso_data *vd);
#endif
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cvdso_call __c_kernel_clock_gettime
V_FUNCTION_END(__kernel_clock_gettime)
+/*
+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+ cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
/*
* Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 51e9b3f3f88a..27a2d03c72d5 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -147,6 +147,7 @@ VERSION
__kernel_get_syscall_map;
__kernel_gettimeofday;
__kernel_clock_gettime;
+ __kernel_clock_gettime64;
__kernel_clock_getres;
__kernel_time;
__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0d4bc217529e..65fb03fb1731 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -10,6 +10,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
return __cvdso_clock_gettime32_data(vd, clock, ts);
}
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+ const struct vdso_data *vd)
+{
+ return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
const struct vdso_data *vd)
{
--
2.25.0
^ permalink raw reply related
* [PATCH v13 7/8] powerpc/vdso: Switch VDSO to generic C implementation.
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.
With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.
On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday: vdso: 828 nsec/call
clock-getres-realtime-coarse: vdso: 391 nsec/call
clock-gettime-realtime-coarse: vdso: 614 nsec/call
clock-getres-realtime: vdso: 460 nsec/call
clock-gettime-realtime: vdso: 876 nsec/call
clock-getres-monotonic-coarse: vdso: 399 nsec/call
clock-gettime-monotonic-coarse: vdso: 691 nsec/call
clock-getres-monotonic: vdso: 460 nsec/call
clock-gettime-monotonic: vdso: 1026 nsec/call
On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday: vdso: 955 nsec/call
clock-getres-realtime-coarse: vdso: 545 nsec/call
clock-gettime-realtime-coarse: vdso: 592 nsec/call
clock-getres-realtime: vdso: 545 nsec/call
clock-gettime-realtime: vdso: 941 nsec/call
clock-getres-monotonic-coarse: vdso: 545 nsec/call
clock-gettime-monotonic-coarse: vdso: 591 nsec/call
clock-getres-monotonic: vdso: 545 nsec/call
clock-gettime-monotonic: vdso: 940 nsec/call
It is even better for gettime with monotonic clocks.
Unsupported clocks with ASM VDSO:
clock-gettime-boottime: vdso: 3851 nsec/call
clock-gettime-tai: vdso: 3852 nsec/call
clock-gettime-monotonic-raw: vdso: 3396 nsec/call
Same clocks with C VDSO:
clock-gettime-tai: vdso: 941 nsec/call
clock-gettime-monotonic-raw: vdso: 1001 nsec/call
clock-gettime-monotonic-coarse: vdso: 591 nsec/call
On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday: vdso: 220 nsec/call
clock-getres-realtime-coarse: vdso: 102 nsec/call
clock-gettime-realtime-coarse: vdso: 178 nsec/call
clock-getres-realtime: vdso: 129 nsec/call
clock-gettime-realtime: vdso: 235 nsec/call
clock-getres-monotonic-coarse: vdso: 105 nsec/call
clock-gettime-monotonic-coarse: vdso: 208 nsec/call
clock-getres-monotonic: vdso: 129 nsec/call
clock-gettime-monotonic: vdso: 274 nsec/call
On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday: vdso: 272 nsec/call
clock-getres-realtime-coarse: vdso: 160 nsec/call
clock-gettime-realtime-coarse: vdso: 184 nsec/call
clock-getres-realtime: vdso: 166 nsec/call
clock-gettime-realtime: vdso: 281 nsec/call
clock-getres-monotonic-coarse: vdso: 160 nsec/call
clock-gettime-monotonic-coarse: vdso: 184 nsec/call
clock-getres-monotonic: vdso: 169 nsec/call
clock-gettime-monotonic: vdso: 275 nsec/call
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v13:
- Discard .opd and .got1 section as they are not used
- Removed the config-fake32.h file
v12:
- Rebased (Impact on arch/powerpc/kernel/vdso??/Makefile)
v9:
- Rebased (Impact on arch/powerpc/kernel/vdso??/Makefile
v7:
- Split out preparatory changes in a new preceding patch
- Added -fasynchronous-unwind-tables to CC flags.
v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 2 +
arch/powerpc/include/asm/vdso/vsyscall.h | 25 ++
arch/powerpc/include/asm/vdso_datapage.h | 40 +--
arch/powerpc/kernel/asm-offsets.c | 49 +---
arch/powerpc/kernel/time.c | 91 +------
arch/powerpc/kernel/vdso.c | 5 +-
arch/powerpc/kernel/vdso32/Makefile | 26 +-
arch/powerpc/kernel/vdso32/gettimeofday.S | 291 +---------------------
arch/powerpc/kernel/vdso32/vdso32.lds.S | 1 +
arch/powerpc/kernel/vdso64/Makefile | 23 +-
arch/powerpc/kernel/vdso64/gettimeofday.S | 242 +-----------------
arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +-
12 files changed, 106 insertions(+), 691 deletions(-)
create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..d0396e3d5452 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -176,6 +176,7 @@ config PPC
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+ select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
@@ -206,6 +207,7 @@ config PPC
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC
+ select HAVE_GENERIC_VDSO
select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx)
select HAVE_IDE
select HAVE_IOREMAP_PROT
diff --git a/arch/powerpc/include/asm/vdso/vsyscall.h b/arch/powerpc/include/asm/vdso/vsyscall.h
new file mode 100644
index 000000000000..c56a030c0623
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/vsyscall.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_VSYSCALL_H
+#define __ASM_VDSO_VSYSCALL_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/timekeeper_internal.h>
+#include <asm/vdso_datapage.h>
+
+/*
+ * Update the vDSO data page to keep in sync with kernel timekeeping.
+ */
+static __always_inline
+struct vdso_data *__arch_get_k_vdso_data(void)
+{
+ return vdso_data->data;
+}
+#define __arch_get_k_vdso_data __arch_get_k_vdso_data
+
+/* The asm-generic header needs to be included after the definitions above */
+#include <asm-generic/vdso/vsyscall.h>
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_VSYSCALL_H */
diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index b9ef6cf50ea5..c4d320504d26 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -36,6 +36,7 @@
#include <linux/unistd.h>
#include <linux/time.h>
+#include <vdso/datapage.h>
#define SYSCALL_MAP_SIZE ((NR_syscalls + 31) / 32)
@@ -45,7 +46,7 @@
#ifdef CONFIG_PPC64
-struct vdso_data {
+struct vdso_arch_data {
__u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */
struct { /* Systemcfg version numbers */
__u32 major; /* Major number 0x10 */
@@ -59,13 +60,13 @@ struct vdso_data {
__u32 processor; /* Processor type 0x1C */
__u64 processorCount; /* # of physical processors 0x20 */
__u64 physicalMemorySize; /* Size of real memory(B) 0x28 */
- __u64 tb_orig_stamp; /* Timebase at boot 0x30 */
+ __u64 tb_orig_stamp; /* (NU) Timebase at boot 0x30 */
__u64 tb_ticks_per_sec; /* Timebase tics / sec 0x38 */
- __u64 tb_to_xs; /* Inverse of TB to 2^20 0x40 */
- __u64 stamp_xsec; /* 0x48 */
- __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */
- __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */
- __u32 tz_dsttime; /* Type of dst correction 0x5C */
+ __u64 tb_to_xs; /* (NU) Inverse of TB to 2^20 0x40 */
+ __u64 stamp_xsec; /* (NU) 0x48 */
+ __u64 tb_update_count; /* (NU) Timebase atomicity ctr 0x50 */
+ __u32 tz_minuteswest; /* (NU) Min. west of Greenwich 0x58 */
+ __u32 tz_dsttime; /* (NU) Type of dst correction 0x5C */
__u32 dcache_size; /* L1 d-cache size 0x60 */
__u32 dcache_line_size; /* L1 d-cache line size 0x64 */
__u32 icache_size; /* L1 i-cache size 0x68 */
@@ -78,14 +79,10 @@ struct vdso_data {
__u32 icache_block_size; /* L1 i-cache block size */
__u32 dcache_log_block_size; /* L1 d-cache log block size */
__u32 icache_log_block_size; /* L1 i-cache log block size */
- __u32 stamp_sec_fraction; /* fractional seconds of stamp_xtime */
- __s32 wtom_clock_nsec; /* Wall to monotonic clock nsec */
- __s64 wtom_clock_sec; /* Wall to monotonic clock sec */
- __s64 stamp_xtime_sec; /* xtime secs as at tb_orig_stamp */
- __s64 stamp_xtime_nsec; /* xtime nsecs as at tb_orig_stamp */
- __u32 hrtimer_res; /* hrtimer resolution */
__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
+
+ struct vdso_data data[CS_BASES];
};
#else /* CONFIG_PPC64 */
@@ -93,26 +90,15 @@ struct vdso_data {
/*
* And here is the simpler 32 bits version
*/
-struct vdso_data {
- __u64 tb_orig_stamp; /* Timebase at boot 0x30 */
+struct vdso_arch_data {
__u64 tb_ticks_per_sec; /* Timebase tics / sec 0x38 */
- __u64 tb_to_xs; /* Inverse of TB to 2^20 0x40 */
- __u64 stamp_xsec; /* 0x48 */
- __u32 tb_update_count; /* Timebase atomicity ctr 0x50 */
- __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */
- __u32 tz_dsttime; /* Type of dst correction 0x5C */
- __s32 wtom_clock_sec; /* Wall to monotonic clock */
- __s32 wtom_clock_nsec;
- __s32 stamp_xtime_sec; /* xtime seconds as at tb_orig_stamp */
- __s32 stamp_xtime_nsec; /* xtime nsecs as at tb_orig_stamp */
- __u32 stamp_sec_fraction; /* fractional seconds of stamp_xtime */
- __u32 hrtimer_res; /* hrtimer resolution */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
+ struct vdso_data data[CS_BASES];
};
#endif /* CONFIG_PPC64 */
-extern struct vdso_data *vdso_data;
+extern struct vdso_arch_data *vdso_data;
#else /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index c2722ff36e98..a2dcb8ed79b9 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,47 +398,16 @@ int main(void)
#endif /* ! CONFIG_PPC64 */
/* datapage offsets for use by vdso */
- OFFSET(CFG_TB_ORIG_STAMP, vdso_data, tb_orig_stamp);
- OFFSET(CFG_TB_TICKS_PER_SEC, vdso_data, tb_ticks_per_sec);
- OFFSET(CFG_TB_TO_XS, vdso_data, tb_to_xs);
- OFFSET(CFG_TB_UPDATE_COUNT, vdso_data, tb_update_count);
- OFFSET(CFG_TZ_MINUTEWEST, vdso_data, tz_minuteswest);
- OFFSET(CFG_TZ_DSTTIME, vdso_data, tz_dsttime);
- OFFSET(CFG_SYSCALL_MAP32, vdso_data, syscall_map_32);
- OFFSET(WTOM_CLOCK_SEC, vdso_data, wtom_clock_sec);
- OFFSET(WTOM_CLOCK_NSEC, vdso_data, wtom_clock_nsec);
- OFFSET(STAMP_XTIME_SEC, vdso_data, stamp_xtime_sec);
- OFFSET(STAMP_XTIME_NSEC, vdso_data, stamp_xtime_nsec);
- OFFSET(STAMP_SEC_FRAC, vdso_data, stamp_sec_fraction);
- OFFSET(CLOCK_HRTIMER_RES, vdso_data, hrtimer_res);
+ OFFSET(VDSO_DATA_OFFSET, vdso_arch_data, data);
+ OFFSET(CFG_TB_TICKS_PER_SEC, vdso_arch_data, tb_ticks_per_sec);
+ OFFSET(CFG_SYSCALL_MAP32, vdso_arch_data, syscall_map_32);
#ifdef CONFIG_PPC64
- OFFSET(CFG_ICACHE_BLOCKSZ, vdso_data, icache_block_size);
- OFFSET(CFG_DCACHE_BLOCKSZ, vdso_data, dcache_block_size);
- OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_data, icache_log_block_size);
- OFFSET(CFG_DCACHE_LOGBLOCKSZ, vdso_data, dcache_log_block_size);
- OFFSET(CFG_SYSCALL_MAP64, vdso_data, syscall_map_64);
- OFFSET(TVAL64_TV_SEC, __kernel_old_timeval, tv_sec);
- OFFSET(TVAL64_TV_USEC, __kernel_old_timeval, tv_usec);
-#endif
- OFFSET(TSPC64_TV_SEC, __kernel_timespec, tv_sec);
- OFFSET(TSPC64_TV_NSEC, __kernel_timespec, tv_nsec);
- OFFSET(TVAL32_TV_SEC, old_timeval32, tv_sec);
- OFFSET(TVAL32_TV_USEC, old_timeval32, tv_usec);
- OFFSET(TSPC32_TV_SEC, old_timespec32, tv_sec);
- OFFSET(TSPC32_TV_NSEC, old_timespec32, tv_nsec);
- /* timeval/timezone offsets for use by vdso */
- OFFSET(TZONE_TZ_MINWEST, timezone, tz_minuteswest);
- OFFSET(TZONE_TZ_DSTTIME, timezone, tz_dsttime);
-
- /* Other bits used by the vdso */
- DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
- DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
- DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
- DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
- DEFINE(CLOCK_MAX, CLOCK_TAI);
- DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
- DEFINE(EINVAL, EINVAL);
- DEFINE(KTIME_LOW_RES, KTIME_LOW_RES);
+ OFFSET(CFG_ICACHE_BLOCKSZ, vdso_arch_data, icache_block_size);
+ OFFSET(CFG_DCACHE_BLOCKSZ, vdso_arch_data, dcache_block_size);
+ OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_arch_data, icache_log_block_size);
+ OFFSET(CFG_DCACHE_LOGBLOCKSZ, vdso_arch_data, dcache_log_block_size);
+ OFFSET(CFG_SYSCALL_MAP64, vdso_arch_data, syscall_map_64);
+#endif
#ifdef CONFIG_BUG
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f5532..92481463f9dc 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -82,6 +82,7 @@ static struct clocksource clocksource_timebase = {
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.mask = CLOCKSOURCE_MASK(64),
.read = timebase_read,
+ .vdso_clock_mode = VDSO_CLOCKMODE_ARCHTIMER,
};
#define DECREMENTER_DEFAULT_MAX 0x7FFFFFFF
@@ -831,95 +832,6 @@ static notrace u64 timebase_read(struct clocksource *cs)
return (u64)get_tb();
}
-
-void update_vsyscall(struct timekeeper *tk)
-{
- struct timespec64 xt;
- struct clocksource *clock = tk->tkr_mono.clock;
- u32 mult = tk->tkr_mono.mult;
- u32 shift = tk->tkr_mono.shift;
- u64 cycle_last = tk->tkr_mono.cycle_last;
- u64 new_tb_to_xs, new_stamp_xsec;
- u64 frac_sec;
-
- if (clock != &clocksource_timebase)
- return;
-
- xt.tv_sec = tk->xtime_sec;
- xt.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift);
-
- /* Make userspace gettimeofday spin until we're done. */
- ++vdso_data->tb_update_count;
- smp_mb();
-
- /*
- * This computes ((2^20 / 1e9) * mult) >> shift as a
- * 0.64 fixed-point fraction.
- * The computation in the else clause below won't overflow
- * (as long as the timebase frequency is >= 1.049 MHz)
- * but loses precision because we lose the low bits of the constant
- * in the shift. Note that 19342813113834067 ~= 2^(20+64) / 1e9.
- * For a shift of 24 the error is about 0.5e-9, or about 0.5ns
- * over a second. (Shift values are usually 22, 23 or 24.)
- * For high frequency clocks such as the 512MHz timebase clock
- * on POWER[6789], the mult value is small (e.g. 32768000)
- * and so we can shift the constant by 16 initially
- * (295147905179 ~= 2^(20+64-16) / 1e9) and then do the
- * remaining shifts after the multiplication, which gives a
- * more accurate result (e.g. with mult = 32768000, shift = 24,
- * the error is only about 1.2e-12, or 0.7ns over 10 minutes).
- */
- if (mult <= 62500000 && clock->shift >= 16)
- new_tb_to_xs = ((u64) mult * 295147905179ULL) >> (clock->shift - 16);
- else
- new_tb_to_xs = (u64) mult * (19342813113834067ULL >> clock->shift);
-
- /*
- * Compute the fractional second in units of 2^-32 seconds.
- * The fractional second is tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift
- * in nanoseconds, so multiplying that by 2^32 / 1e9 gives
- * it in units of 2^-32 seconds.
- * We assume shift <= 32 because clocks_calc_mult_shift()
- * generates shift values in the range 0 - 32.
- */
- frac_sec = tk->tkr_mono.xtime_nsec << (32 - shift);
- do_div(frac_sec, NSEC_PER_SEC);
-
- /*
- * Work out new stamp_xsec value for any legacy users of systemcfg.
- * stamp_xsec is in units of 2^-20 seconds.
- */
- new_stamp_xsec = frac_sec >> 12;
- new_stamp_xsec += tk->xtime_sec * XSEC_PER_SEC;
-
- /*
- * tb_update_count is used to allow the userspace gettimeofday code
- * to assure itself that it sees a consistent view of the tb_to_xs and
- * stamp_xsec variables. It reads the tb_update_count, then reads
- * tb_to_xs and stamp_xsec and then reads tb_update_count again. If
- * the two values of tb_update_count match and are even then the
- * tb_to_xs and stamp_xsec values are consistent. If not, then it
- * loops back and reads them again until this criteria is met.
- */
- vdso_data->tb_orig_stamp = cycle_last;
- vdso_data->stamp_xsec = new_stamp_xsec;
- vdso_data->tb_to_xs = new_tb_to_xs;
- vdso_data->wtom_clock_sec = tk->wall_to_monotonic.tv_sec;
- vdso_data->wtom_clock_nsec = tk->wall_to_monotonic.tv_nsec;
- vdso_data->stamp_xtime_sec = xt.tv_sec;
- vdso_data->stamp_xtime_nsec = xt.tv_nsec;
- vdso_data->stamp_sec_fraction = frac_sec;
- vdso_data->hrtimer_res = hrtimer_resolution;
- smp_wmb();
- ++(vdso_data->tb_update_count);
-}
-
-void update_vsyscall_tz(void)
-{
- vdso_data->tz_minuteswest = sys_tz.tz_minuteswest;
- vdso_data->tz_dsttime = sys_tz.tz_dsttime;
-}
-
static void __init clocksource_init(void)
{
struct clocksource *clock = &clocksource_timebase;
@@ -1079,7 +991,6 @@ void __init time_init(void)
sys_tz.tz_dsttime = 0;
}
- vdso_data->tb_update_count = 0;
vdso_data->tb_ticks_per_sec = tb_ticks_per_sec;
/* initialise and enable the large decrementer (if we have one) */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8dad44262e75..23208a051af5 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -17,6 +17,7 @@
#include <linux/elf.h>
#include <linux/security.h>
#include <linux/memblock.h>
+#include <vdso/datapage.h>
#include <asm/processor.h>
#include <asm/mmu.h>
@@ -70,10 +71,10 @@ static int vdso_ready;
* with it, it will become dynamically allocated
*/
static union {
- struct vdso_data data;
+ struct vdso_arch_data data;
u8 page[PAGE_SIZE];
} vdso_data_store __page_aligned_data;
-struct vdso_data *vdso_data = &vdso_data_store.data;
+struct vdso_arch_data *vdso_data = &vdso_data_store.data;
/* Format of the patch table */
struct vdso_patch_def
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 73eada6bc8cd..853545a19a1e 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -2,8 +2,20 @@
# List of files in the vdso, has to be asm only for now
+ARCH_REL_TYPE_ABS := R_PPC_JUMP_SLOT|R_PPC_GLOB_DAT|R_PPC_ADDR32|R_PPC_ADDR24|R_PPC_ADDR16|R_PPC_ADDR16_LO|R_PPC_ADDR16_HI|R_PPC_ADDR16_HA|R_PPC_ADDR14|R_PPC_ADDR14_BRTAKEN|R_PPC_ADDR14_BRNTAKEN
+include $(srctree)/lib/vdso/Makefile
+
obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
+ifneq ($(c-gettimeofday-y),)
+ CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
+ CFLAGS_vgettimeofday.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+ CFLAGS_vgettimeofday.o += $(call cc-option, -fno-stack-protector)
+ CFLAGS_vgettimeofday.o += -DDISABLE_BRANCH_PROFILING
+ CFLAGS_vgettimeofday.o += -ffreestanding -fasynchronous-unwind-tables
+ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE)
+endif
+
# Build rules
ifdef CROSS32_COMPILE
@@ -15,6 +27,7 @@ endif
CC32FLAGS :=
ifdef CONFIG_PPC64
CC32FLAGS += -m32
+KBUILD_CFLAGS := $(filter-out -mcmodel=medium,$(KBUILD_CFLAGS))
endif
targets := $(obj-vdso32) vdso32.so vdso32.so.dbg
@@ -23,6 +36,7 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
+KASAN_SANITIZE := n
ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
-Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both
@@ -36,8 +50,8 @@ CPPFLAGS_vdso32.lds += -P -C -Upowerpc
$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
# link rule for the .so file, .lds has to be first
-$(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) FORCE
- $(call if_changed,vdso32ld)
+$(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o FORCE
+ $(call if_changed,vdso32ld_and_check)
# strip rule for the .so file
$(obj)/%.so: OBJCOPYFLAGS := -S
@@ -47,12 +61,16 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
# assembly rules for the .S files
$(obj-vdso32): %.o: %.S FORCE
$(call if_changed_dep,vdso32as)
+$(obj)/vgettimeofday.o: %.o: %.c FORCE
+ $(call if_changed_dep,vdso32cc)
# actual build commands
-quiet_cmd_vdso32ld = VDSO32L $@
- cmd_vdso32ld = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^)
+quiet_cmd_vdso32ld_and_check = VDSO32L $@
+ cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) ; $(cmd_vdso_check)
quiet_cmd_vdso32as = VDSO32A $@
cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) -c -o $@ $<
+quiet_cmd_vdso32cc = VDSO32C $@
+ cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $<
# install commands for the unstripped file
quiet_cmd_vdso_install = INSTALL $@
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index e7f8f9f1b3f4..fd7b01c51281 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -12,13 +12,7 @@
#include <asm/vdso_datapage.h>
#include <asm/asm-offsets.h>
#include <asm/unistd.h>
-
-/* Offset for the low 32-bit part of a field of long type */
-#ifdef CONFIG_PPC64
-#define LOPART 4
-#else
-#define LOPART 0
-#endif
+#include <asm/vdso/gettimeofday.h>
.text
/*
@@ -28,32 +22,7 @@
*
*/
V_FUNCTION_BEGIN(__kernel_gettimeofday)
- .cfi_startproc
- mflr r12
- .cfi_register lr,r12
-
- mr. r10,r3 /* r10 saves tv */
- mr r11,r4 /* r11 saves tz */
- get_datapage r9, r0
- beq 3f
- LOAD_REG_IMMEDIATE(r7, 1000000) /* load up USEC_PER_SEC */
- bl __do_get_tspec@local /* get sec/usec from tb & kernel */
- stw r3,TVAL32_TV_SEC(r10)
- stw r4,TVAL32_TV_USEC(r10)
-
-3: cmplwi r11,0 /* check if tz is NULL */
- mtlr r12
- crclr cr0*4+so
- li r3,0
- beqlr
-
- lwz r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */
- lwz r5,CFG_TZ_DSTTIME(r9)
- stw r4,TZONE_TZ_MINWEST(r11)
- stw r5,TZONE_TZ_DSTTIME(r11)
-
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_gettimeofday
V_FUNCTION_END(__kernel_gettimeofday)
/*
@@ -63,127 +32,7 @@ V_FUNCTION_END(__kernel_gettimeofday)
*
*/
V_FUNCTION_BEGIN(__kernel_clock_gettime)
- .cfi_startproc
- /* Check for supported clock IDs */
- cmpli cr0,r3,CLOCK_REALTIME
- cmpli cr1,r3,CLOCK_MONOTONIC
- cror cr0*4+eq,cr0*4+eq,cr1*4+eq
-
- cmpli cr5,r3,CLOCK_REALTIME_COARSE
- cmpli cr6,r3,CLOCK_MONOTONIC_COARSE
- cror cr5*4+eq,cr5*4+eq,cr6*4+eq
-
- cror cr0*4+eq,cr0*4+eq,cr5*4+eq
- bne cr0, .Lgettime_fallback
-
- mflr r12 /* r12 saves lr */
- .cfi_register lr,r12
- mr r11,r4 /* r11 saves tp */
- get_datapage r9, r0
- LOAD_REG_IMMEDIATE(r7, NSEC_PER_SEC) /* load up NSEC_PER_SEC */
- beq cr5, .Lcoarse_clocks
-.Lprecise_clocks:
- bl __do_get_tspec@local /* get sec/nsec from tb & kernel */
- bne cr1, .Lfinish /* not monotonic -> all done */
-
- /*
- * CLOCK_MONOTONIC
- */
-
- /* now we must fixup using wall to monotonic. We need to snapshot
- * that value and do the counter trick again. Fortunately, we still
- * have the counter value in r8 that was returned by __do_get_xsec.
- * At this point, r3,r4 contain our sec/nsec values, r5 and r6
- * can be used, r7 contains NSEC_PER_SEC.
- */
-
- lwz r5,(WTOM_CLOCK_SEC+LOPART)(r9)
- lwz r6,WTOM_CLOCK_NSEC(r9)
-
- /* We now have our offset in r5,r6. We create a fake dependency
- * on that value and re-check the counter
- */
- or r0,r6,r5
- xor r0,r0,r0
- add r9,r9,r0
- lwz r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
- cmpl cr0,r8,r0 /* check if updated */
- bne- .Lprecise_clocks
- b .Lfinish_monotonic
-
- /*
- * For coarse clocks we get data directly from the vdso data page, so
- * we don't need to call __do_get_tspec, but we still need to do the
- * counter trick.
- */
-.Lcoarse_clocks:
- lwz r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
- andi. r0,r8,1 /* pending update ? loop */
- bne- .Lcoarse_clocks
- add r9,r9,r0 /* r0 is already 0 */
-
- /*
- * CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
- * too
- */
- lwz r3,STAMP_XTIME_SEC+LOPART(r9)
- lwz r4,STAMP_XTIME_NSEC+LOPART(r9)
- bne cr6,1f
-
- /* CLOCK_MONOTONIC_COARSE */
- lwz r5,(WTOM_CLOCK_SEC+LOPART)(r9)
- lwz r6,WTOM_CLOCK_NSEC(r9)
-
- /* check if counter has updated */
- or r0,r6,r5
-1: or r0,r0,r3
- or r0,r0,r4
- xor r0,r0,r0
- add r3,r3,r0
- lwz r0,CFG_TB_UPDATE_COUNT+LOPART(r9)
- cmpl cr0,r0,r8 /* check if updated */
- bne- .Lcoarse_clocks
-
- /* Counter has not updated, so continue calculating proper values for
- * sec and nsec if monotonic coarse, or just return with the proper
- * values for realtime.
- */
- bne cr6, .Lfinish
-
- /* Calculate and store result. Note that this mimics the C code,
- * which may cause funny results if nsec goes negative... is that
- * possible at all ?
- */
-.Lfinish_monotonic:
- add r3,r3,r5
- add r4,r4,r6
- cmpw cr0,r4,r7
- cmpwi cr1,r4,0
- blt 1f
- subf r4,r7,r4
- addi r3,r3,1
-1: bge cr1, .Lfinish
- addi r3,r3,-1
- add r4,r4,r7
-
-.Lfinish:
- stw r3,TSPC32_TV_SEC(r11)
- stw r4,TSPC32_TV_NSEC(r11)
-
- mtlr r12
- crclr cr0*4+so
- li r3,0
- blr
-
- /*
- * syscall fallback
- */
-.Lgettime_fallback:
- li r0,__NR_clock_gettime
- .cfi_restore lr
- sc
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_clock_gettime
V_FUNCTION_END(__kernel_clock_gettime)
@@ -194,37 +43,7 @@ V_FUNCTION_END(__kernel_clock_gettime)
*
*/
V_FUNCTION_BEGIN(__kernel_clock_getres)
- .cfi_startproc
- /* Check for supported clock IDs */
- cmplwi cr0, r3, CLOCK_MAX
- cmpwi cr1, r3, CLOCK_REALTIME_COARSE
- cmpwi cr7, r3, CLOCK_MONOTONIC_COARSE
- bgt cr0, 99f
- LOAD_REG_IMMEDIATE(r5, KTIME_LOW_RES)
- beq cr1, 1f
- beq cr7, 1f
-
- mflr r12
- .cfi_register lr,r12
- get_datapage r3, r0
- lwz r5, CLOCK_HRTIMER_RES(r3)
- mtlr r12
-1: li r3,0
- cmpli cr0,r4,0
- crclr cr0*4+so
- beqlr
- stw r3,TSPC32_TV_SEC(r4)
- stw r5,TSPC32_TV_NSEC(r4)
- blr
-
- /*
- * syscall fallback
- */
-99:
- li r0,__NR_clock_getres
- sc
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_clock_getres
V_FUNCTION_END(__kernel_clock_getres)
@@ -235,105 +54,5 @@ V_FUNCTION_END(__kernel_clock_getres)
*
*/
V_FUNCTION_BEGIN(__kernel_time)
- .cfi_startproc
- mflr r12
- .cfi_register lr,r12
-
- mr r11,r3 /* r11 holds t */
- get_datapage r9, r0
-
- lwz r3,STAMP_XTIME_SEC+LOPART(r9)
-
- cmplwi r11,0 /* check if t is NULL */
- mtlr r12
- crclr cr0*4+so
- beqlr
- stw r3,0(r11) /* store result at *t */
- blr
- .cfi_endproc
+ cvdso_call_time __c_kernel_time
V_FUNCTION_END(__kernel_time)
-
-/*
- * This is the core of clock_gettime() and gettimeofday(),
- * it returns the current time in r3 (seconds) and r4.
- * On entry, r7 gives the resolution of r4, either USEC_PER_SEC
- * or NSEC_PER_SEC, giving r4 in microseconds or nanoseconds.
- * It expects the datapage ptr in r9 and doesn't clobber it.
- * It clobbers r0, r5 and r6.
- * On return, r8 contains the counter value that can be reused.
- * This clobbers cr0 but not any other cr field.
- */
-__do_get_tspec:
- .cfi_startproc
- /* Check for update count & load values. We use the low
- * order 32 bits of the update count
- */
-1: lwz r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
- andi. r0,r8,1 /* pending update ? loop */
- bne- 1b
- xor r0,r8,r8 /* create dependency */
- add r9,r9,r0
-
- /* Load orig stamp (offset to TB) */
- lwz r5,CFG_TB_ORIG_STAMP(r9)
- lwz r6,(CFG_TB_ORIG_STAMP+4)(r9)
-
- /* Get a stable TB value */
-2: MFTBU(r3)
- MFTBL(r4)
- MFTBU(r0)
- cmplw cr0,r3,r0
- bne- 2b
-
- /* Subtract tb orig stamp and shift left 12 bits.
- */
- subfc r4,r6,r4
- subfe r0,r5,r3
- slwi r0,r0,12
- rlwimi. r0,r4,12,20,31
- slwi r4,r4,12
-
- /*
- * Load scale factor & do multiplication.
- * We only use the high 32 bits of the tb_to_xs value.
- * Even with a 1GHz timebase clock, the high 32 bits of
- * tb_to_xs will be at least 4 million, so the error from
- * ignoring the low 32 bits will be no more than 0.25ppm.
- * The error will just make the clock run very very slightly
- * slow until the next time the kernel updates the VDSO data,
- * at which point the clock will catch up to the kernel's value,
- * so there is no long-term error accumulation.
- */
- lwz r5,CFG_TB_TO_XS(r9) /* load values */
- mulhwu r4,r4,r5
- li r3,0
-
- beq+ 4f /* skip high part computation if 0 */
- mulhwu r3,r0,r5
- mullw r5,r0,r5
- addc r4,r4,r5
- addze r3,r3
-4:
- /* At this point, we have seconds since the xtime stamp
- * as a 32.32 fixed-point number in r3 and r4.
- * Load & add the xtime stamp.
- */
- lwz r5,STAMP_XTIME_SEC+LOPART(r9)
- lwz r6,STAMP_SEC_FRAC(r9)
- addc r4,r4,r6
- adde r3,r3,r5
-
- /* We create a fake dependency on the result in r3/r4
- * and re-check the counter
- */
- or r6,r4,r3
- xor r0,r6,r6
- add r9,r9,r0
- lwz r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
- cmplw cr0,r8,r0 /* check if updated */
- bne- 1b
-
- mulhwu r4,r4,r7 /* convert to micro or nanoseconds */
-
- blr
- .cfi_endproc
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 7eadac74c7f9..51e9b3f3f88a 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -111,6 +111,7 @@ SECTIONS
*(.note.GNU-stack)
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
+ *(.got1)
}
}
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index dfd34f68bfa1..4a8c5e4d25c0 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -1,8 +1,20 @@
# SPDX-License-Identifier: GPL-2.0
# List of files in the vdso, has to be asm only for now
+ARCH_REL_TYPE_ABS := R_PPC_JUMP_SLOT|R_PPC_GLOB_DAT|R_PPC_ADDR32|R_PPC_ADDR24|R_PPC_ADDR16|R_PPC_ADDR16_LO|R_PPC_ADDR16_HI|R_PPC_ADDR16_HA|R_PPC_ADDR14|R_PPC_ADDR14_BRTAKEN|R_PPC_ADDR14_BRNTAKEN
+include $(srctree)/lib/vdso/Makefile
+
obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
+ifneq ($(c-gettimeofday-y),)
+ CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
+ CFLAGS_vgettimeofday.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+ CFLAGS_vgettimeofday.o += $(call cc-option, -fno-stack-protector)
+ CFLAGS_vgettimeofday.o += -DDISABLE_BRANCH_PROFILING
+ CFLAGS_vgettimeofday.o += -ffreestanding -fasynchronous-unwind-tables
+ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE)
+endif
+
# Build rules
targets := $(obj-vdso64) vdso64.so vdso64.so.dbg
@@ -11,6 +23,7 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
+KASAN_SANITIZE := n
ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
-Wl,-soname=linux-vdso64.so.1 -Wl,--hash-style=both
@@ -20,12 +33,14 @@ obj-y += vdso64_wrapper.o
targets += vdso64.lds
CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
+$(obj)/vgettimeofday.o: %.o: %.c FORCE
+
# Force dependency (incbin is bad)
$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so
# link rule for the .so file, .lds has to be first
-$(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) FORCE
- $(call if_changed,vdso64ld)
+$(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE
+ $(call if_changed,vdso64ld_and_check)
# strip rule for the .so file
$(obj)/%.so: OBJCOPYFLAGS := -S
@@ -33,8 +48,8 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
$(call if_changed,objcopy)
# actual build commands
-quiet_cmd_vdso64ld = VDSO64L $@
- cmd_vdso64ld = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^)
+quiet_cmd_vdso64ld_and_check = VDSO64L $@
+ cmd_vdso64ld_and_check = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); $(cmd_vdso_check)
# install commands for the unstripped file
quiet_cmd_vdso_install = INSTALL $@
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 20f8be40c653..d7a7bfb51081 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -12,6 +12,7 @@
#include <asm/vdso_datapage.h>
#include <asm/asm-offsets.h>
#include <asm/unistd.h>
+#include <asm/vdso/gettimeofday.h>
.text
/*
@@ -21,31 +22,7 @@
*
*/
V_FUNCTION_BEGIN(__kernel_gettimeofday)
- .cfi_startproc
- mflr r12
- .cfi_register lr,r12
-
- mr r11,r3 /* r11 holds tv */
- mr r10,r4 /* r10 holds tz */
- get_datapage r3, r0
- cmpldi r11,0 /* check if tv is NULL */
- beq 2f
- lis r7,1000000@ha /* load up USEC_PER_SEC */
- addi r7,r7,1000000@l
- bl V_LOCAL_FUNC(__do_get_tspec) /* get sec/us from tb & kernel */
- std r4,TVAL64_TV_SEC(r11) /* store sec in tv */
- std r5,TVAL64_TV_USEC(r11) /* store usec in tv */
-2: cmpldi r10,0 /* check if tz is NULL */
- beq 1f
- lwz r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */
- lwz r5,CFG_TZ_DSTTIME(r3)
- stw r4,TZONE_TZ_MINWEST(r10)
- stw r5,TZONE_TZ_DSTTIME(r10)
-1: mtlr r12
- crclr cr0*4+so
- li r3,0 /* always success */
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_gettimeofday
V_FUNCTION_END(__kernel_gettimeofday)
@@ -56,120 +33,7 @@ V_FUNCTION_END(__kernel_gettimeofday)
*
*/
V_FUNCTION_BEGIN(__kernel_clock_gettime)
- .cfi_startproc
- /* Check for supported clock IDs */
- cmpwi cr0,r3,CLOCK_REALTIME
- cmpwi cr1,r3,CLOCK_MONOTONIC
- cror cr0*4+eq,cr0*4+eq,cr1*4+eq
-
- cmpwi cr5,r3,CLOCK_REALTIME_COARSE
- cmpwi cr6,r3,CLOCK_MONOTONIC_COARSE
- cror cr5*4+eq,cr5*4+eq,cr6*4+eq
-
- cror cr0*4+eq,cr0*4+eq,cr5*4+eq
- bne cr0,99f
-
- mflr r12 /* r12 saves lr */
- .cfi_register lr,r12
- mr r11,r4 /* r11 saves tp */
- get_datapage r3, r0
- lis r7,NSEC_PER_SEC@h /* want nanoseconds */
- ori r7,r7,NSEC_PER_SEC@l
- beq cr5,70f
-50: bl V_LOCAL_FUNC(__do_get_tspec) /* get time from tb & kernel */
- bne cr1,80f /* if not monotonic, all done */
-
- /*
- * CLOCK_MONOTONIC
- */
-
- /* now we must fixup using wall to monotonic. We need to snapshot
- * that value and do the counter trick again. Fortunately, we still
- * have the counter value in r8 that was returned by __do_get_tspec.
- * At this point, r4,r5 contain our sec/nsec values.
- */
-
- ld r6,WTOM_CLOCK_SEC(r3)
- lwa r9,WTOM_CLOCK_NSEC(r3)
-
- /* We now have our result in r6,r9. We create a fake dependency
- * on that result and re-check the counter
- */
- or r0,r6,r9
- xor r0,r0,r0
- add r3,r3,r0
- ld r0,CFG_TB_UPDATE_COUNT(r3)
- cmpld cr0,r0,r8 /* check if updated */
- bne- 50b
- b 78f
-
- /*
- * For coarse clocks we get data directly from the vdso data page, so
- * we don't need to call __do_get_tspec, but we still need to do the
- * counter trick.
- */
-70: ld r8,CFG_TB_UPDATE_COUNT(r3)
- andi. r0,r8,1 /* pending update ? loop */
- bne- 70b
- add r3,r3,r0 /* r0 is already 0 */
-
- /*
- * CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
- * too
- */
- ld r4,STAMP_XTIME_SEC(r3)
- ld r5,STAMP_XTIME_NSEC(r3)
- bne cr6,75f
-
- /* CLOCK_MONOTONIC_COARSE */
- ld r6,WTOM_CLOCK_SEC(r3)
- lwa r9,WTOM_CLOCK_NSEC(r3)
-
- /* check if counter has updated */
- or r0,r6,r9
-75: or r0,r0,r4
- or r0,r0,r5
- xor r0,r0,r0
- add r3,r3,r0
- ld r0,CFG_TB_UPDATE_COUNT(r3)
- cmpld cr0,r0,r8 /* check if updated */
- bne- 70b
-
- /* Counter has not updated, so continue calculating proper values for
- * sec and nsec if monotonic coarse, or just return with the proper
- * values for realtime.
- */
- bne cr6,80f
-
- /* Add wall->monotonic offset and check for overflow or underflow */
-78: add r4,r4,r6
- add r5,r5,r9
- cmpd cr0,r5,r7
- cmpdi cr1,r5,0
- blt 79f
- subf r5,r7,r5
- addi r4,r4,1
-79: bge cr1,80f
- addi r4,r4,-1
- add r5,r5,r7
-
-80: std r4,TSPC64_TV_SEC(r11)
- std r5,TSPC64_TV_NSEC(r11)
-
- mtlr r12
- crclr cr0*4+so
- li r3,0
- blr
-
- /*
- * syscall fallback
- */
-99:
- li r0,__NR_clock_gettime
- .cfi_restore lr
- sc
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_clock_gettime
V_FUNCTION_END(__kernel_clock_gettime)
@@ -180,34 +44,7 @@ V_FUNCTION_END(__kernel_clock_gettime)
*
*/
V_FUNCTION_BEGIN(__kernel_clock_getres)
- .cfi_startproc
- /* Check for supported clock IDs */
- cmpwi cr0,r3,CLOCK_REALTIME
- cmpwi cr1,r3,CLOCK_MONOTONIC
- cror cr0*4+eq,cr0*4+eq,cr1*4+eq
- bne cr0,99f
-
- mflr r12
- .cfi_register lr,r12
- get_datapage r3, r0
- lwz r5, CLOCK_HRTIMER_RES(r3)
- mtlr r12
- li r3,0
- cmpldi cr0,r4,0
- crclr cr0*4+so
- beqlr
- std r3,TSPC64_TV_SEC(r4)
- std r5,TSPC64_TV_NSEC(r4)
- blr
-
- /*
- * syscall fallback
- */
-99:
- li r0,__NR_clock_getres
- sc
- blr
- .cfi_endproc
+ cvdso_call __c_kernel_clock_getres
V_FUNCTION_END(__kernel_clock_getres)
/*
@@ -217,74 +54,5 @@ V_FUNCTION_END(__kernel_clock_getres)
*
*/
V_FUNCTION_BEGIN(__kernel_time)
- .cfi_startproc
- mflr r12
- .cfi_register lr,r12
-
- mr r11,r3 /* r11 holds t */
- get_datapage r3, r0
-
- ld r4,STAMP_XTIME_SEC(r3)
-
- cmpldi r11,0 /* check if t is NULL */
- beq 2f
- std r4,0(r11) /* store result at *t */
-2: mtlr r12
- crclr cr0*4+so
- mr r3,r4
- blr
- .cfi_endproc
+ cvdso_call_time __c_kernel_time
V_FUNCTION_END(__kernel_time)
-
-
-/*
- * This is the core of clock_gettime() and gettimeofday(),
- * it returns the current time in r4 (seconds) and r5.
- * On entry, r7 gives the resolution of r5, either USEC_PER_SEC
- * or NSEC_PER_SEC, giving r5 in microseconds or nanoseconds.
- * It expects the datapage ptr in r3 and doesn't clobber it.
- * It clobbers r0, r6 and r9.
- * On return, r8 contains the counter value that can be reused.
- * This clobbers cr0 but not any other cr field.
- */
-V_FUNCTION_BEGIN(__do_get_tspec)
- .cfi_startproc
- /* check for update count & load values */
-1: ld r8,CFG_TB_UPDATE_COUNT(r3)
- andi. r0,r8,1 /* pending update ? loop */
- bne- 1b
- xor r0,r8,r8 /* create dependency */
- add r3,r3,r0
-
- /* Get TB & offset it. We use the MFTB macro which will generate
- * workaround code for Cell.
- */
- MFTB(r6)
- ld r9,CFG_TB_ORIG_STAMP(r3)
- subf r6,r9,r6
-
- /* Scale result */
- ld r5,CFG_TB_TO_XS(r3)
- sldi r6,r6,12 /* compute time since stamp_xtime */
- mulhdu r6,r6,r5 /* in units of 2^-32 seconds */
-
- /* Add stamp since epoch */
- ld r4,STAMP_XTIME_SEC(r3)
- lwz r5,STAMP_SEC_FRAC(r3)
- or r0,r4,r5
- or r0,r0,r6
- xor r0,r0,r0
- add r3,r3,r0
- ld r0,CFG_TB_UPDATE_COUNT(r3)
- cmpld r0,r8 /* check if updated */
- bne- 1b /* reload if so */
-
- /* convert to seconds & nanoseconds and add to stamp */
- add r6,r6,r5 /* add on fractional seconds of xtime */
- mulhwu r5,r6,r7 /* compute micro or nanoseconds and */
- srdi r6,r6,32 /* seconds since stamp_xtime */
- clrldi r5,r5,32
- add r4,r4,r6
- blr
- .cfi_endproc
-V_FUNCTION_END(__do_get_tspec)
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 256fb9720298..71be083b24ed 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -61,7 +61,6 @@ SECTIONS
.gcc_except_table : { *(.gcc_except_table) }
.rela.dyn ALIGN(8) : { *(.rela.dyn) }
- .opd ALIGN(8) : { KEEP (*(.opd)) }
.got ALIGN(8) : { *(.got .toc) }
_end = .;
@@ -111,6 +110,7 @@ SECTIONS
*(.branch_lt)
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
+ *(.opd)
}
}
--
2.25.0
^ permalink raw reply related
* [PATCH v13 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions
powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.
Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.
Implement __arch_vdso_capable() and always return true.
Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:
18: 35 25 ff e0 addic. r9,r5,-32
1c: 41 80 00 10 blt 2c <shift+0x14>
20: 7c 64 4c 30 srw r4,r3,r9
24: 38 60 00 00 li r3,0
...
2c: 54 69 08 3c rlwinm r9,r3,1,0,30
30: 21 45 00 1f subfic r10,r5,31
34: 7c 84 2c 30 srw r4,r4,r5
38: 7d 29 50 30 slw r9,r9,r10
3c: 7c 63 2c 30 srw r3,r3,r5
40: 7d 24 23 78 or r4,r9,r4
In our case the shift is always <= 32. In addition, the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.
With the patch, we get:
0: 21 25 00 20 subfic r9,r5,32
4: 7c 69 48 30 slw r9,r3,r9
8: 7c 84 2c 30 srw r4,r4,r5
c: 7d 24 23 78 or r4,r9,r4
10: 7c 63 2c 30 srw r3,r3,r5
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v13:
- Using PPC_MIN_STKFRM instead of STACK_FRAME_OVERHEAD to avoid including ptrace.h
- Reworked header inclusion to avoid build failure when building VDSO32 on PPC64
- __arch_vdso_capable() now returns always true after the removal of ppc601
- Using DOTSYM() to call the function directly.
v11:
- Changed of __arch_get_hw_counter() to adapt to 4c5a116ada95
("vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()")
v10:
- Added a comment to explain the reason for the two stack frames.
- Moved back the .cfi_register next to mflr
v9:
- No more modification of __get_datapage(). Offset is added after.
- Adding a second stack frame because the PPC VDSO ABI doesn't force
the caller to set one.
v8:
- New, splitted out of last patch of the series
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/clocksource.h | 7 +
arch/powerpc/include/asm/ppc_asm.h | 2 +
arch/powerpc/include/asm/vdso/clocksource.h | 7 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 187 +++++++++++++++++++
arch/powerpc/kernel/vdso32/vgettimeofday.c | 28 +++
arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 +++
6 files changed, 260 insertions(+)
create mode 100644 arch/powerpc/include/asm/clocksource.h
create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c
diff --git a/arch/powerpc/include/asm/clocksource.h b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index 000000000000..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include <asm/vdso/clocksource.h>
+
+#endif
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 511786f0e40d..db158accb63e 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -251,6 +251,8 @@ GLUE(.,name):
#define _GLOBAL_TOC(name) _GLOBAL(name)
+#define DOTSYM(a) a
+
#endif
/*
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index 000000000000..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index 000000000000..7a8eb9dfd06b
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/ppc_asm.h>
+
+/*
+ * The macros sets two stack frames, one for the caller and one for the callee
+ * because there are no requirement for the caller to set a stack frame when
+ * calling VDSO so it may have omitted to set one, especially on PPC64
+ */
+
+.macro cvdso_call funct
+ .cfi_startproc
+ PPC_STLU r1, -PPC_MIN_STKFRM(r1)
+ mflr r0
+ .cfi_register lr, r0
+ PPC_STLU r1, -PPC_MIN_STKFRM(r1)
+ PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+ get_datapage r5, r0
+ addi r5, r5, VDSO_DATA_OFFSET
+ bl DOTSYM(\funct)
+ PPC_LL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+ cmpwi r3, 0
+ mtlr r0
+ .cfi_restore lr
+ addi r1, r1, 2 * PPC_MIN_STKFRM
+ crclr so
+ beqlr+
+ crset so
+ neg r3, r3
+ blr
+ .cfi_endproc
+.endm
+
+.macro cvdso_call_time funct
+ .cfi_startproc
+ PPC_STLU r1, -PPC_MIN_STKFRM(r1)
+ mflr r0
+ .cfi_register lr, r0
+ PPC_STLU r1, -PPC_MIN_STKFRM(r1)
+ PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+ get_datapage r4, r0
+ addi r4, r4, VDSO_DATA_OFFSET
+ bl DOTSYM(\funct)
+ PPC_LL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+ crclr so
+ mtlr r0
+ .cfi_restore lr
+ addi r1, r1, 2 * PPC_MIN_STKFRM
+ blr
+ .cfi_endproc
+.endm
+
+#else
+
+#include <asm/timebase.h>
+#include <asm/barrier.h>
+#include <asm/unistd.h>
+#include <uapi/linux/time.h>
+
+#define VDSO_HAS_CLOCK_GETRES 1
+
+#define VDSO_HAS_TIME 1
+
+static __always_inline int do_syscall_2(const unsigned long _r0, const unsigned long _r3,
+ const unsigned long _r4)
+{
+ register long r0 asm("r0") = _r0;
+ register unsigned long r3 asm("r3") = _r3;
+ register unsigned long r4 asm("r4") = _r4;
+ register int ret asm ("r3");
+
+ asm volatile(
+ " sc\n"
+ " bns+ 1f\n"
+ " neg %0, %0\n"
+ "1:\n"
+ : "=r" (ret), "+r" (r4), "+r" (r0)
+ : "r" (r3)
+ : "memory", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "cr0", "ctr");
+
+ return ret;
+}
+
+static __always_inline
+int gettimeofday_fallback(struct __kernel_old_timeval *_tv, struct timezone *_tz)
+{
+ return do_syscall_2(__NR_gettimeofday, (unsigned long)_tv, (unsigned long)_tz);
+}
+
+static __always_inline
+int clock_gettime_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
+{
+ return do_syscall_2(__NR_clock_gettime, _clkid, (unsigned long)_ts);
+}
+
+static __always_inline
+int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
+{
+ return do_syscall_2(__NR_clock_getres, _clkid, (unsigned long)_ts);
+}
+
+#ifdef CONFIG_VDSO32
+
+#define BUILD_VDSO32 1
+
+static __always_inline
+int clock_gettime32_fallback(clockid_t _clkid, struct old_timespec32 *_ts)
+{
+ return do_syscall_2(__NR_clock_gettime, _clkid, (unsigned long)_ts);
+}
+
+static __always_inline
+int clock_getres32_fallback(clockid_t _clkid, struct old_timespec32 *_ts)
+{
+ return do_syscall_2(__NR_clock_getres, _clkid, (unsigned long)_ts);
+}
+#endif
+
+static __always_inline u64 __arch_get_hw_counter(s32 clock_mode,
+ const struct vdso_data *vd)
+{
+ return get_tb();
+}
+
+const struct vdso_data *__arch_get_vdso_data(void);
+
+static inline bool vdso_clocksource_ok(const struct vdso_data *vd)
+{
+ return true;
+}
+#define vdso_clocksource_ok vdso_clocksource_ok
+
+/*
+ * powerpc specific delta calculation.
+ *
+ * This variant removes the masking of the subtraction because the
+ * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
+ * which would result in a pointless operation. The compiler cannot
+ * optimize it away as the mask comes from the vdso data and is not compile
+ * time constant.
+ */
+static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
+{
+ return (cycles - last) * mult;
+}
+#define vdso_calc_delta vdso_calc_delta
+
+#ifndef __powerpc64__
+static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
+{
+ u32 hi = ns >> 32;
+ u32 lo = ns;
+
+ lo >>= shift;
+ lo |= hi << (32 - shift);
+ hi >>= shift;
+
+ if (likely(hi == 0))
+ return lo;
+
+ return ((u64)hi << 32) | lo;
+}
+#define vdso_shift_ns vdso_shift_ns
+#endif
+
+#ifdef __powerpc64__
+int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
+ const struct vdso_data *vd);
+int __c_kernel_clock_getres(clockid_t clock_id, struct __kernel_timespec *res,
+ const struct vdso_data *vd);
+#else
+int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
+ const struct vdso_data *vd);
+int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
+ const struct vdso_data *vd);
+#endif
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+ const struct vdso_data *vd);
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time,
+ const struct vdso_data *vd);
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c
new file mode 100644
index 000000000000..0d4bc217529e
--- /dev/null
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Powerpc userspace implementations of gettimeofday() and similar.
+ */
+#include <linux/types.h>
+
+int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
+ const struct vdso_data *vd)
+{
+ return __cvdso_clock_gettime32_data(vd, clock, ts);
+}
+
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+ const struct vdso_data *vd)
+{
+ return __cvdso_gettimeofday_data(vd, tv, tz);
+}
+
+int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
+ const struct vdso_data *vd)
+{
+ return __cvdso_clock_getres_time32_data(vd, clock_id, res);
+}
+
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time, const struct vdso_data *vd)
+{
+ return __cvdso_time_data(vd, time);
+}
diff --git a/arch/powerpc/kernel/vdso64/vgettimeofday.c b/arch/powerpc/kernel/vdso64/vgettimeofday.c
new file mode 100644
index 000000000000..5b5500058344
--- /dev/null
+++ b/arch/powerpc/kernel/vdso64/vgettimeofday.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Powerpc userspace implementations of gettimeofday() and similar.
+ */
+#include <linux/time.h>
+#include <linux/types.h>
+
+int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
+ const struct vdso_data *vd)
+{
+ return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
+int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz,
+ const struct vdso_data *vd)
+{
+ return __cvdso_gettimeofday_data(vd, tv, tz);
+}
+
+int __c_kernel_clock_getres(clockid_t clock_id, struct __kernel_timespec *res,
+ const struct vdso_data *vd)
+{
+ return __cvdso_clock_getres_data(vd, clock_id, res);
+}
+
+__kernel_old_time_t __c_kernel_time(__kernel_old_time_t *time, const struct vdso_data *vd)
+{
+ return __cvdso_time_data(vd, time);
+}
--
2.25.0
^ permalink raw reply related
* [PATCH v13 6/8] powerpc/vdso: Save and restore TOC pointer on PPC64
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
On PPC64, the TOC pointer needs to be saved and restored.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v13: Using __powerpc64__ instead of CONFIG_PPC64 to exclude it from VDSO32 build on PPC64.
v9: New.
I'm not sure this is really needed, I can't see the VDSO C code doing
anything with r2, at least on ppc64_defconfig.
So I let you decide whether you take it or not.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/vdso/gettimeofday.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 7a8eb9dfd06b..cdbd297e61b8 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -19,10 +19,16 @@
.cfi_register lr, r0
PPC_STLU r1, -PPC_MIN_STKFRM(r1)
PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+#ifdef __powerpc64__
+ PPC_STL r2, PPC_MIN_STKFRM + STK_GOT(r1)
+#endif
get_datapage r5, r0
addi r5, r5, VDSO_DATA_OFFSET
bl DOTSYM(\funct)
PPC_LL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+#ifdef __powerpc64__
+ PPC_LL r2, PPC_MIN_STKFRM + STK_GOT(r1)
+#endif
cmpwi r3, 0
mtlr r0
.cfi_restore lr
@@ -42,10 +48,16 @@
.cfi_register lr, r0
PPC_STLU r1, -PPC_MIN_STKFRM(r1)
PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+#ifdef __powerpc64__
+ PPC_STL r2, PPC_MIN_STKFRM + STK_GOT(r1)
+#endif
get_datapage r4, r0
addi r4, r4, VDSO_DATA_OFFSET
bl DOTSYM(\funct)
PPC_LL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
+#ifdef __powerpc64__
+ PPC_LL r2, PPC_MIN_STKFRM + STK_GOT(r1)
+#endif
crclr so
mtlr r0
.cfi_restore lr
--
2.25.0
^ permalink raw reply related
* [PATCH v13 4/8] powerpc/time: Move timebase functions into new asm/timebase.h
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
In order to easily use get_tb() from C VDSO, move timebase
functions into a new header named asm/timebase.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v13: new
---
arch/powerpc/include/asm/time.h | 30 +--------------------
arch/powerpc/include/asm/timebase.h | 42 +++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 29 deletions(-)
create mode 100644 arch/powerpc/include/asm/timebase.h
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 2f566c1a754c..46d0d65cda0e 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -15,6 +15,7 @@
#include <asm/processor.h>
#include <asm/cpu_has_feature.h>
+#include <asm/timebase.h>
/* time.c */
extern unsigned long tb_ticks_per_jiffy;
@@ -38,12 +39,6 @@ struct div_result {
u64 result_low;
};
-/* For compatibility, get_tbl() is defined as get_tb() on ppc64 */
-static inline unsigned long get_tbl(void)
-{
- return mftb();
-}
-
static inline u64 get_vtb(void)
{
#ifdef CONFIG_PPC_BOOK3S_64
@@ -53,29 +48,6 @@ static inline u64 get_vtb(void)
return 0;
}
-static inline u64 get_tb(void)
-{
- unsigned int tbhi, tblo, tbhi2;
-
- if (IS_ENABLED(CONFIG_PPC64))
- return mftb();
-
- do {
- tbhi = mftbu();
- tblo = mftb();
- tbhi2 = mftbu();
- } while (tbhi != tbhi2);
-
- return ((u64)tbhi << 32) | tblo;
-}
-
-static inline void set_tb(unsigned int upper, unsigned int lower)
-{
- mtspr(SPRN_TBWL, 0);
- mtspr(SPRN_TBWU, upper);
- mtspr(SPRN_TBWL, lower);
-}
-
/* Accessor functions for the decrementer register.
* The 4xx doesn't even have a decrementer. I tried to use the
* generic timer interrupt code, which seems OK, with the 4xx PIT
diff --git a/arch/powerpc/include/asm/timebase.h b/arch/powerpc/include/asm/timebase.h
new file mode 100644
index 000000000000..a8eae3adaa91
--- /dev/null
+++ b/arch/powerpc/include/asm/timebase.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Common timebase prototypes and such for all ppc machines.
+ *
+ * Written by Cort Dougan (cort@cs.nmt.edu) to merge
+ * Paul Mackerras' version and mine for PReP and Pmac.
+ */
+
+#ifndef __POWERPC_TIMEBASE_H
+#define __POWERPC_TIMEBASE_H
+
+#include <asm/reg.h>
+
+/* For compatibility, get_tbl() is defined as get_tb() on ppc64 */
+static inline unsigned long get_tbl(void)
+{
+ return mftb();
+}
+
+static inline u64 get_tb(void)
+{
+ unsigned int tbhi, tblo, tbhi2;
+
+ if (IS_ENABLED(CONFIG_PPC64))
+ return mftb();
+
+ do {
+ tbhi = mftbu();
+ tblo = mftb();
+ tbhi2 = mftbu();
+ } while (tbhi != tbhi2);
+
+ return ((u64)tbhi << 32) | tblo;
+}
+
+static inline void set_tb(unsigned int upper, unsigned int lower)
+{
+ mtspr(SPRN_TBWL, 0);
+ mtspr(SPRN_TBWU, upper);
+ mtspr(SPRN_TBWL, lower);
+}
+#endif /* __POWERPC_TIMEBASE_H */
--
2.25.0
^ permalink raw reply related
* [PATCH v13 0/8] powerpc: switch VDSO to C implementation
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
This is a series to switch powerpc VDSO to generic C implementation.
Changes in v13:
- Reorganised headers to avoid the need for a fake 32 bits config for building VDSO32 on PPC64
- Rebased after the removal of powerpc 601
- Using DOTSYM() macro to call functions directly without using OPD
- Explicitely dropped .opd and .got1 sections which are now unused
Changes in v12:
- Rebased to today's powerpc/merge branch (Conflicts on VDSO Makefiles)
- Added missing prototype for __kernel_clock_gettime64()
Changes in v11:
- Rebased to today's powerpc/merge branch
- Prototype of __arch_get_hw_counter() was modified in mainline (patch 2)
Changes in v10 are:
- Added a comment explaining the reason for the double stack frame
- Moved back .cfi_register lr next to mflr
Main changes in v9 are:
- Dropped the patches which put the VDSO datapage in front of VDSO text in the mapping
- Adds a second stack frame because the caller doesn't set one, at least on PPC64
- Saving the TOC pointer on PPC64 (is that really needed ?)
This series applies on today's powerpc/merge branch.
See the last patches for details on changes and performance.
Christophe Leroy (8):
powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
powerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define
possible features
powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
powerpc/time: Move timebase functions into new asm/timebase.h
powerpc/vdso: Prepare for switching VDSO to generic C implementation.
powerpc/vdso: Save and restore TOC pointer on PPC64
powerpc/vdso: Switch VDSO to generic C implementation.
powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
arch/powerpc/Kconfig | 2 +
arch/powerpc/include/asm/clocksource.h | 7 +
arch/powerpc/include/asm/cputable.h | 9 +-
arch/powerpc/include/asm/ppc_asm.h | 2 +
arch/powerpc/include/asm/processor.h | 13 +-
arch/powerpc/include/asm/time.h | 30 +-
arch/powerpc/include/asm/timebase.h | 42 +++
arch/powerpc/include/asm/vdso/clocksource.h | 7 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 201 +++++++++++++
arch/powerpc/include/asm/vdso/processor.h | 23 ++
arch/powerpc/include/asm/vdso/vsyscall.h | 25 ++
arch/powerpc/include/asm/vdso_datapage.h | 40 +--
arch/powerpc/kernel/asm-offsets.c | 49 +--
arch/powerpc/kernel/time.c | 91 +-----
arch/powerpc/kernel/vdso.c | 5 +-
arch/powerpc/kernel/vdso32/Makefile | 26 +-
arch/powerpc/kernel/vdso32/gettimeofday.S | 300 +------------------
arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +
arch/powerpc/kernel/vdso32/vgettimeofday.c | 34 +++
arch/powerpc/kernel/vdso64/Makefile | 23 +-
arch/powerpc/kernel/vdso64/gettimeofday.S | 242 +--------------
arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +-
arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 ++
23 files changed, 466 insertions(+), 738 deletions(-)
create mode 100644 arch/powerpc/include/asm/clocksource.h
create mode 100644 arch/powerpc/include/asm/timebase.h
create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
create mode 100644 arch/powerpc/include/asm/vdso/processor.h
create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c
--
2.25.0
^ permalink raw reply
* [PATCH v13 1/8] powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
On 8xx, we get the following features:
[ 0.000000] cpu_features = 0x0000000000000100
[ 0.000000] possible = 0x0000000000000120
[ 0.000000] always = 0x0000000000000000
This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all
other configurations, the three lines should be equal.
The problem is due to CPU_FTRS_GENERIC_32 which is taken when
CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is
pointless because there is no generic configuration supporting
all 32 bits but book3s/32.
Remove this pointless generic features definition to unbreak the
calculation of 'possible' features and 'always' features.
Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/cputable.h | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 3d2f94afc13a..5e31960a56a9 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -409,7 +409,6 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DBELL | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_DEBUG_LVL_EXC | CPU_FTR_EMB_HV | CPU_FTR_ALTIVEC_COMP | \
CPU_FTR_CELL_TB_BUG | CPU_FTR_SMT)
-#define CPU_FTRS_GENERIC_32 (CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN)
/* 64-bit CPUs */
#define CPU_FTRS_PPC970 (CPU_FTR_LWSYNC | \
@@ -520,8 +519,6 @@ enum {
CPU_FTRS_7447 | CPU_FTRS_7447A | CPU_FTRS_82XX |
CPU_FTRS_G2_LE | CPU_FTRS_E300 | CPU_FTRS_E300C2 |
CPU_FTRS_CLASSIC32 |
-#else
- CPU_FTRS_GENERIC_32 |
#endif
#ifdef CONFIG_PPC_8xx
CPU_FTRS_8XX |
@@ -596,8 +593,6 @@ enum {
CPU_FTRS_7447 & CPU_FTRS_7447A & CPU_FTRS_82XX &
CPU_FTRS_G2_LE & CPU_FTRS_E300 & CPU_FTRS_E300C2 &
CPU_FTRS_CLASSIC32 &
-#else
- CPU_FTRS_GENERIC_32 &
#endif
#ifdef CONFIG_PPC_8xx
CPU_FTRS_8XX &
--
2.25.0
^ permalink raw reply related
* [PATCH v13 3/8] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.
Move it there.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v9: Forgot to remove cpu_relax() from processor.h in v8
---
arch/powerpc/include/asm/processor.h | 13 ++-----------
arch/powerpc/include/asm/vdso/processor.h | 23 +++++++++++++++++++++++
2 files changed, 25 insertions(+), 11 deletions(-)
create mode 100644 arch/powerpc/include/asm/vdso/processor.h
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c61c859b51a8..333e3b6c76fb 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -6,6 +6,8 @@
* Copyright (C) 2001 PPC 64 Team, IBM Corp
*/
+#include <vdso/processor.h>
+
#include <asm/reg.h>
#ifdef CONFIG_VSX
@@ -63,14 +65,6 @@ extern int _chrp_type;
#endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
-/* Macros for adjusting thread priority (hardware multi-threading) */
-#define HMT_very_low() asm volatile("or 31,31,31 # very low priority")
-#define HMT_low() asm volatile("or 1,1,1 # low priority")
-#define HMT_medium_low() asm volatile("or 6,6,6 # medium low priority")
-#define HMT_medium() asm volatile("or 2,2,2 # medium priority")
-#define HMT_medium_high() asm volatile("or 5,5,5 # medium high priority")
-#define HMT_high() asm volatile("or 3,3,3 # high priority")
-
#ifdef __KERNEL__
#ifdef CONFIG_PPC64
@@ -344,7 +338,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
}
#ifdef CONFIG_PPC64
-#define cpu_relax() do { HMT_low(); HMT_medium(); barrier(); } while (0)
#define spin_begin() HMT_low()
@@ -363,8 +356,6 @@ do { \
} \
} while (0)
-#else
-#define cpu_relax() barrier()
#endif
/* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/powerpc/include/asm/vdso/processor.h b/arch/powerpc/include/asm/vdso/processor.h
new file mode 100644
index 000000000000..39b9beace9ca
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/processor.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+/* Macros for adjusting thread priority (hardware multi-threading) */
+#define HMT_very_low() asm volatile("or 31, 31, 31 # very low priority")
+#define HMT_low() asm volatile("or 1, 1, 1 # low priority")
+#define HMT_medium_low() asm volatile("or 6, 6, 6 # medium low priority")
+#define HMT_medium() asm volatile("or 2, 2, 2 # medium priority")
+#define HMT_medium_high() asm volatile("or 5, 5, 5 # medium high priority")
+#define HMT_high() asm volatile("or 3, 3, 3 # high priority")
+
+#ifdef CONFIG_PPC64
+#define cpu_relax() do { HMT_low(); HMT_medium(); barrier(); } while (0)
+#else
+#define cpu_relax() barrier()
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
--
2.25.0
^ permalink raw reply related
* [PATCH v13 2/8] powerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define possible features
From: Christophe Leroy @ 2020-11-03 18:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, nathanl,
anton
Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>
In order to build VDSO32 for PPC64, we need to have CPU_FTRS_POSSIBLE
and CPU_FTRS_ALWAYS independant of whether we are building the
32 bits VDSO or the 64 bits VDSO.
Use #ifdef CONFIG_PPC64 instead of #ifdef __powerpc64__
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v13: new
---
arch/powerpc/include/asm/cputable.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 5e31960a56a9..e069a2d9f7c1 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -488,7 +488,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_PURR | CPU_FTR_REAL_LE | CPU_FTR_DABRX)
#define CPU_FTRS_COMPATIBLE (CPU_FTR_PPCAS_ARCH_V2)
-#ifdef __powerpc64__
+#ifdef CONFIG_PPC64
#ifdef CONFIG_PPC_BOOK3E
#define CPU_FTRS_POSSIBLE (CPU_FTRS_E6500 | CPU_FTRS_E5500)
#else
@@ -545,7 +545,7 @@ enum {
};
#endif /* __powerpc64__ */
-#ifdef __powerpc64__
+#ifdef CONFIG_PPC64
#ifdef CONFIG_PPC_BOOK3E
#define CPU_FTRS_ALWAYS (CPU_FTRS_E6500 & CPU_FTRS_E5500)
#else
--
2.25.0
^ permalink raw reply related
* [PATCH] powerpc/vdso: Fix VDSO unmap check
From: Laurent Dufour @ 2020-11-03 17:13 UTC (permalink / raw)
To: linuxppc-dev; +Cc: linux-kernel, Paul Mackerras, Thomas Gleixner
The check introduced by the commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO
remap") is wrong and is missing some partial unmaps of the VDSO.
To be complete the check needs the base and end address of the
VDSO. Currently only the base is available in the mm_context of a task, but
the end address can easily be computed because the size of VDSO is
constant. However, there are 2 sizes for 32 or 64 bits task and they are
stored in static variables in arch/powerpc/kernel/vdso.c.
Exporting a new function called vdso_pages() to get the number of pages of
the VDSO based on the static variables from arch/powerpc/kernel/vdso.c.
Fixes: 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/mmu_context.h | 18 ++++++++++++++++--
arch/powerpc/kernel/vdso.c | 14 ++++++++++++++
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index e02aa793420b..ced80897b7a1 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -259,11 +259,25 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
extern void arch_exit_mmap(struct mm_struct *mm);
+extern int vdso_pages(void);
static inline void arch_unmap(struct mm_struct *mm,
unsigned long start, unsigned long end)
{
- if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
- mm->context.vdso_base = 0;
+ unsigned long vdso_end;
+
+ if (mm->context.vdso_base) {
+ /*
+ * case 1 > | VDSO | <
+ * case 2 > | < |
+ * case 3 | > < |
+ * case 4 | > | <
+ */
+ vdso_end = mm->context.vdso_base;
+ vdso_end += vdso_pages() << PAGE_SHIFT;
+
+ if (start < vdso_end && mm->context.vdso_base < end)
+ mm->context.vdso_base = 0;
+ }
}
#ifdef CONFIG_PPC_MEM_KEYS
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8dad44262e75..9defa35a1eba 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -117,6 +117,20 @@ struct lib64_elfinfo
unsigned long text;
};
+/*
+ * Return the number of pages of the VDSO for the current task.
+ */
+int vdso_pages(void)
+{
+ int vdso_pages = vdso32_pages;
+
+#ifdef CONFIG_PPC64
+ if (!is_32bit_task())
+ vdso_pages = vdso64_pages;
+#endif
+
+ return vdso_pages + 1; /* Add the data page */
+}
/*
* This is called from binfmt_elf, we create the special vma for the
--
2.29.2
^ permalink raw reply related
* Re: [PATCH] x86/mpx: fix recursive munmap() corruption
From: Laurent Dufour @ 2020-11-03 17:11 UTC (permalink / raw)
To: Christophe Leroy, Michael Ellerman
Cc: mhocko, rguenther, linux-mm, Dave Hansen, x86, stable, LKML,
Dave Hansen, Thomas Gleixner, luto, linuxppc-dev, Andrew Morton,
vbabka
In-Reply-To: <12313ba8-75b5-d44d-dbc0-0bf2c87dfb59@csgroup.eu>
Le 23/10/2020 à 14:28, Christophe Leroy a écrit :
> Hi Laurent
>
> Le 07/05/2019 à 18:35, Laurent Dufour a écrit :
>> Le 01/05/2019 à 12:32, Michael Ellerman a écrit :
>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
>>>> Le 23/04/2019 à 18:04, Dave Hansen a écrit :
>>>>> On 4/23/19 4:16 AM, Laurent Dufour wrote:
>>> ...
>>>>>> There are 2 assumptions here:
>>>>>> 1. 'start' and 'end' are page aligned (this is guaranteed by
>>>>>> __do_munmap().
>>>>>> 2. the VDSO is 1 page (this is guaranteed by the union vdso_data_store
>>>>>> on powerpc)
>>>>>
>>>>> Are you sure about #2? The 'vdso64_pages' variable seems rather
>>>>> unnecessary if the VDSO is only 1 page. ;)
>>>>
>>>> Hum, not so sure now ;)
>>>> I got confused, only the header is one page.
>>>> The test is working as a best effort, and don't cover the case where
>>>> only few pages inside the VDSO are unmmapped (start >
>>>> mm->context.vdso_base). This is not what CRIU is doing and so this was
>>>> enough for CRIU support.
>>>>
>>>> Michael, do you think there is a need to manage all the possibility
>>>> here, since the only user is CRIU and unmapping the VDSO is not a so
>>>> good idea for other processes ?
>>>
>>> Couldn't we implement the semantic that if any part of the VDSO is
>>> unmapped then vdso_base is set to zero? That should be fairly easy, eg:
>>>
>>> if (start < vdso_end && end >= mm->context.vdso_base)
>>> mm->context.vdso_base = 0;
>>>
>>>
>>> We might need to add vdso_end to the mm->context, but that should be OK.
>>>
>>> That seems like it would work for CRIU and make sense in general?
>>
>> Sorry for the late answer, yes this would make more sense.
>>
>> Here is a patch doing that.
>>
>
> In your patch, the test seems overkill:
>
> + if ((start <= vdso_base && vdso_end <= end) || /* 1 */
> + (vdso_base <= start && start < vdso_end) || /* 3,4 */
> + (vdso_base < end && end <= vdso_end)) /* 2,3 */
> + mm->context.vdso_base = mm->context.vdso_end = 0;
>
> What about
>
> if (start < vdso_end && vdso_start < end)
> mm->context.vdso_base = mm->context.vdso_end = 0;
>
> This should cover all cases, or am I missing something ?
>
>
> And do we really need to store vdso_end in the context ?
> I think it should be possible to re-calculate it: the size of the VDSO should be
> (&vdso32_end - &vdso32_start) + PAGE_SIZE for 32 bits VDSO, and (&vdso64_end -
> &vdso64_start) + PAGE_SIZE for the 64 bits VDSO.
Thanks Christophe for the advise.
That is covering all the cases, and indeed is similar to the Michael's proposal
I missed last year.
I'll send a patch fixing this issue following your proposal.
Cheers,
Laurent.
^ permalink raw reply
* Re: [PATCH] powerpc: Don't use asm goto for put_user() with GCC 4.9
From: Christophe Leroy @ 2020-11-03 17:04 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: schwab
In-Reply-To: <4fe837f8-ecae-f009-c193-8da386a70705@csgroup.eu>
Quoting Christophe Leroy <christophe.leroy@csgroup.eu>:
> Le 03/11/2020 à 14:29, Michael Ellerman a écrit :
>> Andreas reported that commit ee0a49a6870e ("powerpc/uaccess: Switch
>> __put_user_size_allowed() to __put_user_asm_goto()") broke
>> CLONE_CHILD_SETTID.
>>
>> Further inspection showed that the put_user() in schedule_tail() was
>> missing entirely, the store not emitted by the compiler.
>>
>
>>
>> Notice there are no stores other than to the stack. There should be a
>> stw in there for the store to current->set_child_tid.
>>
>> This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and
>> 4.9.4), and only when CONFIG_PPC_KUAP is disabled.
>>
>> When CONFIG_PPC_KUAP=y, the memory clobber that's part of the isync()
>> and mtspr() inlined via allow_user_access() seems to be enough to
>> avoid the bug.
>>
>> For now though let's just not use asm goto with GCC 4.9, to avoid this
>> bug and any other issues we haven't noticed yet. Possibly in future we
>> can find a smaller workaround.
>
> Is that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 ?
>
> Should we use asm_volatile_goto() defined in include/linux/compiler-gcc.h ?
It seems to be OK with asm_volatile_goto() with GCC 4.9, and it is
already what is used in our asm/jump_label.h
diff --git a/arch/powerpc/include/asm/uaccess.h
b/arch/powerpc/include/asm/uaccess.h
index ef5bbb705c08..501c9a79038c 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -178,7 +178,7 @@ do { \
* are no aliasing issues.
*/
#define __put_user_asm_goto(x, addr, label, op) \
- asm volatile goto( \
+ asm_volatile_goto( \
"1: " op "%U1%X1 %0,%1 # put_user\n" \
EX_TABLE(1b, %l2) \
: \
@@ -191,7 +191,7 @@ do { \
__put_user_asm_goto(x, ptr, label, "std")
#else /* __powerpc64__ */
#define __put_user_asm2_goto(x, addr, label) \
- asm volatile goto( \
+ asm_volatile_goto( \
"1: stw%X1 %0, %1\n" \
"2: stw%X1 %L0, %L1\n" \
EX_TABLE(1b, %l2) \
---
Christophe
^ permalink raw reply related
* [PATCH v4 4/4] arch, mm: make kernel_page_present() always available
From: Mike Rapoport @ 2020-11-03 16:20 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, linux-mm,
Paul Mackerras, Pavel Machek, H. Peter Anvin, sparclinux,
Christoph Lameter, Will Deacon, linux-riscv, linux-s390, x86,
Mike Rapoport, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Len Brown, Albert Ou, Vasily Gorbik, linux-pm,
Heiko Carstens, David Rientjes, Borislav Petkov, Andy Lutomirski,
Paul Walmsley, Kirill A. Shutemov, Thomas Gleixner,
linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
Palmer Dabbelt, Kirill A . Shutemov, Joonsoo Kim,
Edgecombe, Rick P, linuxppc-dev, David S. Miller, Mike Rapoport
In-Reply-To: <20201103162057.22916-1-rppt@kernel.org>
From: Mike Rapoport <rppt@linux.ibm.com>
For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
verify that a page is mapped in the kernel direct map can be useful
regardless of hibernation.
Add RISC-V implementation of kernel_page_present(), update its forward
declarations and stubs to be a part of set_memory API and remove ugly
ifdefery in inlcude/linux/mm.h around current declarations of
kernel_page_present().
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 4 +---
arch/riscv/include/asm/set_memory.h | 1 +
arch/riscv/mm/pageattr.c | 29 +++++++++++++++++++++++++++++
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 4 +---
include/linux/mm.h | 7 -------
include/linux/set_memory.h | 5 +++++
8 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 9384fd8fc13c..45217f21f1fe 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -140,6 +140,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+bool kernel_page_present(struct page *page);
#include <asm-generic/cacheflush.h>
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 439325532be1..92eccaf595c8 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -186,8 +186,8 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
set_memory_valid((unsigned long)page_address(page), numpages, enable);
}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
-#ifdef CONFIG_HIBERNATION
/*
* This function is used to determine if a linear map page has been marked as
* not-valid. Walk the page table and check the PTE_VALID bit. This is based
@@ -234,5 +234,3 @@ bool kernel_page_present(struct page *page)
ptep = pte_offset_kernel(pmdp, addr);
return pte_valid(READ_ONCE(*ptep));
}
-#endif /* CONFIG_HIBERNATION */
-#endif /* CONFIG_DEBUG_PAGEALLOC */
diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h
index 4c5bae7ca01c..d690b08dff2a 100644
--- a/arch/riscv/include/asm/set_memory.h
+++ b/arch/riscv/include/asm/set_memory.h
@@ -24,6 +24,7 @@ static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+bool kernel_page_present(struct page *page);
#endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 321b09d2e2ea..87ba5a68bbb8 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -198,3 +198,32 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
__pgprot(0), __pgprot(_PAGE_PRESENT));
}
#endif
+
+bool kernel_page_present(struct page *page)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgd_t *pgd;
+ pud_t *pud;
+ p4d_t *p4d;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset_k(addr);
+ if (!pgd_present(*pgd))
+ return false;
+
+ p4d = p4d_offset(pgd, addr);
+ if (!p4d_present(*p4d))
+ return false;
+
+ pud = pud_offset(p4d, addr);
+ if (!pud_present(*pud))
+ return false;
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd))
+ return false;
+
+ pte = pte_offset_kernel(pmd, addr);
+ return pte_present(*pte);
+}
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 5948218f35c5..4352f08bfbb5 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -82,6 +82,7 @@ int set_pages_rw(struct page *page, int numpages);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
+bool kernel_page_present(struct page *page);
extern int kernel_set_to_readonly;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bc9be96b777f..16f878c26667 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2226,8 +2226,8 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
arch_flush_lazy_mmu_mode();
}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
-#ifdef CONFIG_HIBERNATION
bool kernel_page_present(struct page *page)
{
unsigned int level;
@@ -2239,8 +2239,6 @@ bool kernel_page_present(struct page *page)
pte = lookup_address((unsigned long)page_address(page), &level);
return (pte_val(*pte) & _PAGE_PRESENT);
}
-#endif /* CONFIG_HIBERNATION */
-#endif /* CONFIG_DEBUG_PAGEALLOC */
int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
unsigned numpages, unsigned long page_flags)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ab0ef6bd351d..44b82f22e76a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2937,16 +2937,9 @@ static inline void debug_pagealloc_map_pages(struct page *page,
if (debug_pagealloc_enabled_static())
__kernel_map_pages(page, numpages, enable);
}
-
-#ifdef CONFIG_HIBERNATION
-extern bool kernel_page_present(struct page *page);
-#endif /* CONFIG_HIBERNATION */
#else /* CONFIG_DEBUG_PAGEALLOC */
static inline void debug_pagealloc_map_pages(struct page *page,
int numpages, int enable) {}
-#ifdef CONFIG_HIBERNATION
-static inline bool kernel_page_present(struct page *page) { return true; }
-#endif /* CONFIG_HIBERNATION */
#endif /* CONFIG_DEBUG_PAGEALLOC */
#ifdef __HAVE_ARCH_GATE_AREA
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 860e0f843c12..fe1aa4e54680 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -23,6 +23,11 @@ static inline int set_direct_map_default_noflush(struct page *page)
{
return 0;
}
+
+static inline bool kernel_page_present(struct page *page)
+{
+ return true;
+}
#endif
#ifndef set_mce_nospec
--
2.28.0
^ permalink raw reply related
* [PATCH v4 3/4] arch, mm: restore dependency of __kernel_map_pages() of DEBUG_PAGEALLOC
From: Mike Rapoport @ 2020-11-03 16:20 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, linux-mm,
Paul Mackerras, Pavel Machek, H. Peter Anvin, sparclinux,
Christoph Lameter, Will Deacon, linux-riscv, linux-s390, x86,
Mike Rapoport, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Len Brown, Albert Ou, Vasily Gorbik, linux-pm,
Heiko Carstens, David Rientjes, Borislav Petkov, Andy Lutomirski,
Paul Walmsley, Kirill A. Shutemov, Thomas Gleixner,
linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
Palmer Dabbelt, Kirill A . Shutemov, Joonsoo Kim,
Edgecombe, Rick P, linuxppc-dev, David S. Miller, Mike Rapoport
In-Reply-To: <20201103162057.22916-1-rppt@kernel.org>
From: Mike Rapoport <rppt@linux.ibm.com>
The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must never
fail. With this assumption is wouldn't be safe to allow general usage of
this function.
Moreover, some architectures that implement __kernel_map_pages() have this
function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
pages when page allocation debugging is disabled at runtime.
As all the users of __kernel_map_pages() were converted to use
debug_pagealloc_map_pages() it is safe to make it available only when
DEBUG_PAGEALLOC is set.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/Kconfig | 3 +++
arch/arm64/Kconfig | 4 +---
arch/arm64/mm/pageattr.c | 8 ++++++--
arch/powerpc/Kconfig | 5 +----
arch/riscv/Kconfig | 4 +---
arch/riscv/include/asm/pgtable.h | 2 --
arch/riscv/mm/pageattr.c | 2 ++
arch/s390/Kconfig | 4 +---
arch/sparc/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
arch/x86/mm/pat/set_memory.c | 2 ++
include/linux/mm.h | 10 +++++++---
12 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 56b6ccc0e32d..56d4752b6db6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1028,6 +1028,9 @@ config HAVE_STATIC_CALL_INLINE
bool
depends on HAVE_STATIC_CALL
+config ARCH_SUPPORTS_DEBUG_PAGEALLOC
+ bool
+
source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1d466addb078..a932810cfd90 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -71,6 +71,7 @@ config ARM64
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_ATOMIC_RMW
@@ -1025,9 +1026,6 @@ config HOLES_IN_ZONE
source "kernel/Kconfig.hz"
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- def_bool y
-
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_VMEMMAP_ENABLE
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 1b94f5b82654..439325532be1 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -155,7 +155,7 @@ int set_direct_map_invalid_noflush(struct page *page)
.clear_mask = __pgprot(PTE_VALID),
};
- if (!rodata_full)
+ if (!debug_pagealloc_enabled() && !rodata_full)
return 0;
return apply_to_page_range(&init_mm,
@@ -170,7 +170,7 @@ int set_direct_map_default_noflush(struct page *page)
.clear_mask = __pgprot(PTE_RDONLY),
};
- if (!rodata_full)
+ if (!debug_pagealloc_enabled() && !rodata_full)
return 0;
return apply_to_page_range(&init_mm,
@@ -178,6 +178,7 @@ int set_direct_map_default_noflush(struct page *page)
PAGE_SIZE, change_page_range, &data);
}
+#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled() && !rodata_full)
@@ -186,6 +187,7 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
set_memory_valid((unsigned long)page_address(page), numpages, enable);
}
+#ifdef CONFIG_HIBERNATION
/*
* This function is used to determine if a linear map page has been marked as
* not-valid. Walk the page table and check the PTE_VALID bit. This is based
@@ -232,3 +234,5 @@ bool kernel_page_present(struct page *page)
ptep = pte_offset_kernel(pmdp, addr);
return pte_valid(READ_ONCE(*ptep));
}
+#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_DEBUG_PAGEALLOC */
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..ad8a83f3ddca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -146,6 +146,7 @@ config PPC
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC if PPC32 || PPC_BOOK3S_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_QUEUED_RWLOCKS if PPC_QUEUED_SPINLOCKS
@@ -355,10 +356,6 @@ config PPC_OF_PLATFORM_PCI
depends on PCI
depends on PPC64 # not supported on 32 bits yet
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- depends on PPC32 || PPC_BOOK3S_64
- def_bool y
-
config ARCH_SUPPORTS_UPROBES
def_bool y
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 44377fd7860e..9283c6f9ae2a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -14,6 +14,7 @@ config RISCV
def_bool y
select ARCH_CLOCKSOURCE_INIT
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_VIRTUAL if MMU
@@ -153,9 +154,6 @@ config ARCH_SELECT_MEMORY_MODEL
config ARCH_WANT_GENERAL_HUGETLB
def_bool y
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- def_bool y
-
config SYS_SUPPORTS_HUGETLBFS
depends on MMU
def_bool y
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 183f1f4b2ae6..41a72861987c 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -461,8 +461,6 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
#define VMALLOC_START 0
#define VMALLOC_END TASK_SIZE
-static inline void __kernel_map_pages(struct page *page, int numpages, int enable) {}
-
#endif /* !CONFIG_MMU */
#define kern_addr_valid(addr) (1) /* FIXME */
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 19fecb362d81..321b09d2e2ea 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -184,6 +184,7 @@ int set_direct_map_default_noflush(struct page *page)
return ret;
}
+#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
@@ -196,3 +197,4 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
__set_memory((unsigned long)page_address(page), numpages,
__pgprot(0), __pgprot(_PAGE_PRESENT));
}
+#endif
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 4a2a12be04c9..991a850a6c0b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -35,9 +35,6 @@ config GENERIC_LOCKBREAK
config PGSTE
def_bool y if KVM
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- def_bool y
-
config AUDIT_ARCH
def_bool y
@@ -106,6 +103,7 @@ config S390
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a6ca135442f9..2c729b8d097a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -88,6 +88,7 @@ config SPARC64
select HAVE_C_RECORDMCOUNT
select HAVE_ARCH_AUDITSYSCALL
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_NMI
select HAVE_REGS_AND_STACK_ACCESS_API
select ARCH_USE_QUEUED_RWLOCKS
@@ -148,9 +149,6 @@ config GENERIC_ISA_DMA
bool
default y if SPARC32
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- def_bool y if SPARC64
-
config PGTABLE_LEVELS
default 4 if 64BIT
default 3
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..0db3fb1da70c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -91,6 +91,7 @@ config X86
select ARCH_STACKWALK
select ARCH_SUPPORTS_ACPI
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
@@ -329,9 +330,6 @@ config ZONE_DMA32
config AUDIT_ARCH
def_bool y if X86_64
-config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- def_bool y
-
config KASAN_SHADOW_OFFSET
hex
depends on KASAN
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 40baa90e74f4..bc9be96b777f 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2194,6 +2194,7 @@ int set_direct_map_default_noflush(struct page *page)
return __set_pages_p(page, 1);
}
+#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (PageHighMem(page))
@@ -2239,6 +2240,7 @@ bool kernel_page_present(struct page *page)
return (pte_val(*pte) & _PAGE_PRESENT);
}
#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_DEBUG_PAGEALLOC */
int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
unsigned numpages, unsigned long page_flags)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 14e397f3752c..ab0ef6bd351d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2924,7 +2924,11 @@ static inline bool debug_pagealloc_enabled_static(void)
return static_branch_unlikely(&_debug_pagealloc_enabled);
}
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP)
+#ifdef CONFIG_DEBUG_PAGEALLOC
+/*
+ * To support DEBUG_PAGEALLOC architecture must ensure that
+ * __kernel_map_pages() never fails
+ */
extern void __kernel_map_pages(struct page *page, int numpages, int enable);
static inline void debug_pagealloc_map_pages(struct page *page,
@@ -2937,13 +2941,13 @@ static inline void debug_pagealloc_map_pages(struct page *page,
#ifdef CONFIG_HIBERNATION
extern bool kernel_page_present(struct page *page);
#endif /* CONFIG_HIBERNATION */
-#else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */
+#else /* CONFIG_DEBUG_PAGEALLOC */
static inline void debug_pagealloc_map_pages(struct page *page,
int numpages, int enable) {}
#ifdef CONFIG_HIBERNATION
static inline bool kernel_page_present(struct page *page) { return true; }
#endif /* CONFIG_HIBERNATION */
-#endif /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */
+#endif /* CONFIG_DEBUG_PAGEALLOC */
#ifdef __HAVE_ARCH_GATE_AREA
extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
--
2.28.0
^ permalink raw reply related
* [PATCH v4 2/4] PM: hibernate: make direct map manipulations more explicit
From: Mike Rapoport @ 2020-11-03 16:20 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J . Wysocki, David Hildenbrand, Peter Zijlstra,
Dave Hansen, linux-mm, Paul Mackerras, Pavel Machek,
H. Peter Anvin, sparclinux, Christoph Lameter, Will Deacon,
linux-riscv, linux-s390, x86, Mike Rapoport,
Christian Borntraeger, Ingo Molnar, Catalin Marinas, Len Brown,
Albert Ou, Vasily Gorbik, linux-pm, Heiko Carstens,
David Rientjes, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
Kirill A. Shutemov, Thomas Gleixner, linux-arm-kernel,
Rafael J. Wysocki, linux-kernel, Pekka Enberg, Palmer Dabbelt,
Kirill A . Shutemov, Joonsoo Kim, Edgecombe, Rick P, linuxppc-dev,
David S. Miller, Mike Rapoport
In-Reply-To: <20201103162057.22916-1-rppt@kernel.org>
From: Mike Rapoport <rppt@linux.ibm.com>
When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be
not present in the direct map and has to be explicitly mapped before it
could be copied.
Introduce hibernate_map_page() that will explicitly use
set_direct_map_{default,invalid}_noflush() for ARCH_HAS_SET_DIRECT_MAP case
and debug_pagealloc_map_pages() for DEBUG_PAGEALLOC case.
The remapping of the pages in safe_copy_page() presumes that it only
changes protection bits in an existing PTE and so it is safe to ignore
return value of set_direct_map_{default,invalid}_noflush().
Still, add a pr_warn() so that future changes in set_memory APIs will not
silently break hibernation.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 12 ------------
kernel/power/snapshot.c | 32 ++++++++++++++++++++++++++++++--
2 files changed, 30 insertions(+), 14 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1fc0609056dc..14e397f3752c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2927,16 +2927,6 @@ static inline bool debug_pagealloc_enabled_static(void)
#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP)
extern void __kernel_map_pages(struct page *page, int numpages, int enable);
-/*
- * When called in DEBUG_PAGEALLOC context, the call should most likely be
- * guarded by debug_pagealloc_enabled() or debug_pagealloc_enabled_static()
- */
-static inline void
-kernel_map_pages(struct page *page, int numpages, int enable)
-{
- __kernel_map_pages(page, numpages, enable);
-}
-
static inline void debug_pagealloc_map_pages(struct page *page,
int numpages, int enable)
{
@@ -2948,8 +2938,6 @@ static inline void debug_pagealloc_map_pages(struct page *page,
extern bool kernel_page_present(struct page *page);
#endif /* CONFIG_HIBERNATION */
#else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */
-static inline void
-kernel_map_pages(struct page *page, int numpages, int enable) {}
static inline void debug_pagealloc_map_pages(struct page *page,
int numpages, int enable) {}
#ifdef CONFIG_HIBERNATION
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 46b1804c1ddf..57d54b9d84bb 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -76,6 +76,34 @@ static inline void hibernate_restore_protect_page(void *page_address) {}
static inline void hibernate_restore_unprotect_page(void *page_address) {}
#endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_ARCH_HAS_SET_MEMORY */
+static inline void hibernate_map_page(struct page *page, int enable)
+{
+ if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
+ unsigned long addr = (unsigned long)page_address(page);
+ int ret;
+
+ /*
+ * This should not fail because remapping a page here means
+ * that we only update protection bits in an existing PTE.
+ * It is still worth to have a warning here if something
+ * changes and this will no longer be the case.
+ */
+ if (enable)
+ ret = set_direct_map_default_noflush(page);
+ else
+ ret = set_direct_map_invalid_noflush(page);
+
+ if (ret) {
+ pr_warn_once("Failed to remap page\n");
+ return;
+ }
+
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+ } else {
+ debug_pagealloc_map_pages(page, 1, enable);
+ }
+}
+
static int swsusp_page_is_free(struct page *);
static void swsusp_set_page_forbidden(struct page *);
static void swsusp_unset_page_forbidden(struct page *);
@@ -1355,9 +1383,9 @@ static void safe_copy_page(void *dst, struct page *s_page)
if (kernel_page_present(s_page)) {
do_copy_page(dst, page_address(s_page));
} else {
- kernel_map_pages(s_page, 1, 1);
+ hibernate_map_page(s_page, 1);
do_copy_page(dst, page_address(s_page));
- kernel_map_pages(s_page, 1, 0);
+ hibernate_map_page(s_page, 0);
}
}
--
2.28.0
^ permalink raw reply related
* [PATCH v4 1/4] mm: introduce debug_pagealloc_map_pages() helper
From: Mike Rapoport @ 2020-11-03 16:20 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, linux-mm,
Paul Mackerras, Pavel Machek, H. Peter Anvin, sparclinux,
Christoph Lameter, Will Deacon, linux-riscv, linux-s390, x86,
Mike Rapoport, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Len Brown, Albert Ou, Vasily Gorbik, linux-pm,
Heiko Carstens, David Rientjes, Borislav Petkov, Andy Lutomirski,
Paul Walmsley, Kirill A. Shutemov, Thomas Gleixner,
linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
Palmer Dabbelt, Kirill A . Shutemov, Joonsoo Kim,
Edgecombe, Rick P, linuxppc-dev, David S. Miller, Mike Rapoport
In-Reply-To: <20201103162057.22916-1-rppt@kernel.org>
From: Mike Rapoport <rppt@linux.ibm.com>
When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
direct mapping after free_pages(). The pages than need to be mapped back
before they could be used. Theese mapping operations use
__kernel_map_pages() guarded with with debug_pagealloc_enabled().
The only place that calls __kernel_map_pages() without checking whether
DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
availability of this function when ARCH_HAS_SET_DIRECT_MAP is set.
Still, on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is
not enabled but set_direct_map_invalid_noflush() may render some pages not
present in the direct map and hibernation code won't be able to save such
pages.
To make page allocation debugging and hibernation interaction more robust,
the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be made
more explicit.
Start with combining the guard condition and the call to
__kernel_map_pages() into a single debug_pagealloc_map_pages() function to
emphasize that __kernel_map_pages() should not be called without
DEBUG_PAGEALLOC and use this new function to map/unmap pages when page
allocation debug is enabled.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 10 ++++++++++
mm/memory_hotplug.c | 3 +--
mm/page_alloc.c | 6 ++----
mm/slab.c | 8 +++-----
4 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe70aaf..1fc0609056dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2936,12 +2936,22 @@ kernel_map_pages(struct page *page, int numpages, int enable)
{
__kernel_map_pages(page, numpages, enable);
}
+
+static inline void debug_pagealloc_map_pages(struct page *page,
+ int numpages, int enable)
+{
+ if (debug_pagealloc_enabled_static())
+ __kernel_map_pages(page, numpages, enable);
+}
+
#ifdef CONFIG_HIBERNATION
extern bool kernel_page_present(struct page *page);
#endif /* CONFIG_HIBERNATION */
#else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */
static inline void
kernel_map_pages(struct page *page, int numpages, int enable) {}
+static inline void debug_pagealloc_map_pages(struct page *page,
+ int numpages, int enable) {}
#ifdef CONFIG_HIBERNATION
static inline bool kernel_page_present(struct page *page) { return true; }
#endif /* CONFIG_HIBERNATION */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..e2b6043a4428 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -614,8 +614,7 @@ void generic_online_page(struct page *page, unsigned int order)
* so we should map it first. This is better than introducing a special
* case in page freeing fast path.
*/
- if (debug_pagealloc_enabled_static())
- kernel_map_pages(page, 1 << order, 1);
+ debug_pagealloc_map_pages(page, 1 << order, 1);
__free_pages_core(page, order);
totalram_pages_add(1UL << order);
#ifdef CONFIG_HIGHMEM
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 23f5066bd4a5..9a66a1ff9193 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1272,8 +1272,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
*/
arch_free_page(page, order);
- if (debug_pagealloc_enabled_static())
- kernel_map_pages(page, 1 << order, 0);
+ debug_pagealloc_map_pages(page, 1 << order, 0);
kasan_free_nondeferred_pages(page, order);
@@ -2270,8 +2269,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_refcounted(page);
arch_alloc_page(page, order);
- if (debug_pagealloc_enabled_static())
- kernel_map_pages(page, 1 << order, 1);
+ debug_pagealloc_map_pages(page, 1 << order, 1);
kasan_alloc_pages(page, order);
kernel_poison_pages(page, 1 << order, 1);
set_page_owner(page, order, gfp_flags);
diff --git a/mm/slab.c b/mm/slab.c
index b1113561b98b..340db0ce74c4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1431,10 +1431,8 @@ static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
#ifdef CONFIG_DEBUG_PAGEALLOC
static void slab_kernel_map(struct kmem_cache *cachep, void *objp, int map)
{
- if (!is_debug_pagealloc_cache(cachep))
- return;
-
- kernel_map_pages(virt_to_page(objp), cachep->size / PAGE_SIZE, map);
+ debug_pagealloc_map_pages(virt_to_page(objp),
+ cachep->size / PAGE_SIZE, map);
}
#else
@@ -2062,7 +2060,7 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
#if DEBUG
/*
- * If we're going to use the generic kernel_map_pages()
+ * If we're going to use the generic debug_pagealloc_map_pages()
* poisoning, then it's going to smash the contents of
* the redzone and userword anyhow, so switch them off.
*/
--
2.28.0
^ permalink raw reply related
* [PATCH v4 0/4] arch, mm: improve robustness of direct map manipulation
From: Mike Rapoport @ 2020-11-03 16:20 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, linux-mm,
Paul Mackerras, Pavel Machek, H. Peter Anvin, sparclinux,
Christoph Lameter, Will Deacon, linux-riscv, linux-s390, x86,
Mike Rapoport, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Len Brown, Albert Ou, Vasily Gorbik, linux-pm,
Heiko Carstens, David Rientjes, Borislav Petkov, Andy Lutomirski,
Paul Walmsley, Kirill A. Shutemov, Thomas Gleixner,
linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
Palmer Dabbelt, Kirill A . Shutemov, Joonsoo Kim,
Edgecombe, Rick P, linuxppc-dev, David S. Miller, Mike Rapoport
From: Mike Rapoport <rppt@linux.ibm.com>
Hi,
During recent discussion about KVM protected memory, David raised a concern
about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC scope [1].
Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
possible that __kernel_map_pages() would fail, but since this function is
void, the failure will go unnoticed.
Moreover, there's lack of consistency of __kernel_map_pages() semantics
across architectures as some guard this function with
#ifdef DEBUG_PAGEALLOC, some refuse to update the direct map if page
allocation debugging is disabled at run time and some allow modifying the
direct map regardless of DEBUG_PAGEALLOC settings.
This set straightens this out by restoring dependency of
__kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
accordingly.
Since currently the only user of __kernel_map_pages() outside
DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
there more explicit.
[1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com
v4 changes:
* s/WARN_ON/pr_warn_once/ per David and Kirill
* rebase on v5.10-rc2
* add Acked/Reviewed tags
v3 changes:
* update arm64 changes to avoid regression, per Rick's comments
* fix bisectability
https://lore.kernel.org/lkml/20201101170815.9795-1-rppt@kernel.org
v2 changes:
* Rephrase patch 2 changelog to better describe the change intentions and
implications
* Move removal of kernel_map_pages() from patch 1 to patch 2, per David
https://lore.kernel.org/lkml/20201029161902.19272-1-rppt@kernel.org
v1:
https://lore.kernel.org/lkml/20201025101555.3057-1-rppt@kernel.org
Mike Rapoport (4):
mm: introduce debug_pagealloc_map_pages() helper
PM: hibernate: make direct map manipulations more explicit
arch, mm: restore dependency of __kernel_map_pages() of DEBUG_PAGEALLOC
arch, mm: make kernel_page_present() always available
arch/Kconfig | 3 +++
arch/arm64/Kconfig | 4 +---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 6 +++--
arch/powerpc/Kconfig | 5 +----
arch/riscv/Kconfig | 4 +---
arch/riscv/include/asm/pgtable.h | 2 --
arch/riscv/include/asm/set_memory.h | 1 +
arch/riscv/mm/pageattr.c | 31 +++++++++++++++++++++++++
arch/s390/Kconfig | 4 +---
arch/sparc/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 4 ++--
include/linux/mm.h | 35 +++++++++++++----------------
include/linux/set_memory.h | 5 +++++
kernel/power/snapshot.c | 30 +++++++++++++++++++++++--
mm/memory_hotplug.c | 3 +--
mm/page_alloc.c | 6 ++---
mm/slab.c | 8 +++----
20 files changed, 103 insertions(+), 58 deletions(-)
--
2.28.0
*** BLURB HERE ***
Mike Rapoport (4):
mm: introduce debug_pagealloc_map_pages() helper
PM: hibernate: make direct map manipulations more explicit
arch, mm: restore dependency of __kernel_map_pages() of
DEBUG_PAGEALLOC
arch, mm: make kernel_page_present() always available
arch/Kconfig | 3 +++
arch/arm64/Kconfig | 4 +---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 6 +++--
arch/powerpc/Kconfig | 5 +----
arch/riscv/Kconfig | 4 +---
arch/riscv/include/asm/pgtable.h | 2 --
arch/riscv/include/asm/set_memory.h | 1 +
arch/riscv/mm/pageattr.c | 31 +++++++++++++++++++++++++
arch/s390/Kconfig | 4 +---
arch/sparc/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 4 ++--
include/linux/mm.h | 35 +++++++++++++----------------
include/linux/set_memory.h | 5 +++++
kernel/power/snapshot.c | 32 ++++++++++++++++++++++++--
mm/memory_hotplug.c | 3 +--
mm/page_alloc.c | 6 ++---
mm/slab.c | 8 +++----
20 files changed, 105 insertions(+), 58 deletions(-)
base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
--
2.28.0
^ permalink raw reply
* Re: [PATCH 11/11 v2.2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-03 16:14 UTC (permalink / raw)
To: Petr Mladek
Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201103141043.GO20201@alley>
On Tue, 3 Nov 2020 15:10:43 +0100
Petr Mladek <pmladek@suse.com> wrote:
> BTW: What is actually the purpose of paranoid_test, please?
>
> It prevents nested ftrace_record_recursion() calls on the same CPU
> (recursion, nesting from IRQ, NMI context).
>
> Parallel calls from different CPUs are still possible:
>
> CPU0 CPU1
> if (!atomic_read(¶noid_test)) if (!atomic_read(¶noid_test))
> // passes // passes
> atomic_inc(¶noid_test); atomic_inc(¶noid_test);
>
>
> I do not see how a nested call could cause crash while a parallel
> one would be OK.
>
Yeah, I should make that per cpu, but was lazy. ;-)
It was added at the end.
I'll update that to a per cpu, and local inc operations.
-- Steve
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox