* Re: [PATCH 05/11] test_bitmap: skip user bitmap tests for !CONFIG_SET_FS
From: Christophe Leroy @ 2020-08-17 7:50 UTC (permalink / raw)
To: Christoph Hellwig, Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-6-hch@lst.de>
Le 17/08/2020 à 09:32, Christoph Hellwig a écrit :
> We can't run the tests for userspace bitmap parsing if set_fs() doesn't
> exist.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> lib/test_bitmap.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index df903c53952bb9..49b1d25fbaf546 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -365,6 +365,7 @@ static void __init __test_bitmap_parselist(int is_user)
> for (i = 0; i < ARRAY_SIZE(parselist_tests); i++) {
> #define ptest parselist_tests[i]
>
> +#ifdef CONFIG_SET_FS
get_fs() and set_fs() have stubs for when an arch doesn't define them,
so I this it would be cleaner if you were using 'if
(IS_ENABLED(CONFIG_SET_FS) && is_user)`instead of an ifdefery in the
middle of the if/else.
Christophe
> if (is_user) {
> mm_segment_t orig_fs = get_fs();
> size_t len = strlen(ptest.in);
> @@ -375,7 +376,9 @@ static void __init __test_bitmap_parselist(int is_user)
> bmap, ptest.nbits);
> time = ktime_get() - time;
> set_fs(orig_fs);
> - } else {
> + } else
> +#endif /* CONFIG_SET_FS */
> + {
> time = ktime_get();
> err = bitmap_parselist(ptest.in, bmap, ptest.nbits);
> time = ktime_get() - time;
> @@ -454,6 +457,7 @@ static void __init __test_bitmap_parse(int is_user)
> for (i = 0; i < ARRAY_SIZE(parse_tests); i++) {
> struct test_bitmap_parselist test = parse_tests[i];
>
> +#ifdef CONFIG_SET_FS
> if (is_user) {
> size_t len = strlen(test.in);
> mm_segment_t orig_fs = get_fs();
> @@ -464,7 +468,9 @@ static void __init __test_bitmap_parse(int is_user)
> bmap, test.nbits);
> time = ktime_get() - time;
> set_fs(orig_fs);
> - } else {
> + } else
> +#endif /* CONFIG_SET_FS */
> + {
> size_t len = test.flags & NO_LEN ?
> UINT_MAX : strlen(test.in);
> time = ktime_get();
>
^ permalink raw reply
* Re: remove the last set_fs() in common code, and remove it for x86 and powerpc
From: Christoph Hellwig @ 2020-08-17 7:39 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-arch, Kees Cook, x86, linux-kernel, Al Viro, linux-fsdevel,
linuxppc-dev
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Adding Linus as I forgot to add him to the patch bomb, sorry..
On Mon, Aug 17, 2020 at 09:32:01AM +0200, Christoph Hellwig wrote:
> Hi all,
>
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
>
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced. For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them. The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it. All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
>
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged. The process to convert
> architectures is roughtly:
>
> - ensure there is no set_fs(KERNEL_DS) left in arch specific code
> - implement __get_kernel_nofault and __put_kernel_nofault
> - remove the arch specific address limitation functionality
>
> Diffstat:
> arch/Kconfig | 3
> arch/alpha/Kconfig | 1
> arch/arc/Kconfig | 1
> arch/arm/Kconfig | 1
> arch/arm64/Kconfig | 1
> arch/c6x/Kconfig | 1
> arch/csky/Kconfig | 1
> arch/h8300/Kconfig | 1
> arch/hexagon/Kconfig | 1
> arch/ia64/Kconfig | 1
> arch/m68k/Kconfig | 1
> arch/microblaze/Kconfig | 1
> arch/mips/Kconfig | 1
> arch/nds32/Kconfig | 1
> arch/nios2/Kconfig | 1
> arch/openrisc/Kconfig | 1
> arch/parisc/Kconfig | 1
> arch/powerpc/include/asm/processor.h | 7 -
> arch/powerpc/include/asm/thread_info.h | 5 -
> arch/powerpc/include/asm/uaccess.h | 78 ++++++++-----------
> arch/powerpc/kernel/signal.c | 3
> arch/powerpc/lib/sstep.c | 6 -
> arch/riscv/Kconfig | 1
> arch/s390/Kconfig | 1
> arch/sh/Kconfig | 1
> arch/sparc/Kconfig | 1
> arch/um/Kconfig | 1
> arch/x86/ia32/ia32_aout.c | 1
> arch/x86/include/asm/page_32_types.h | 11 ++
> arch/x86/include/asm/page_64_types.h | 38 +++++++++
> arch/x86/include/asm/processor.h | 60 ---------------
> arch/x86/include/asm/thread_info.h | 2
> arch/x86/include/asm/uaccess.h | 26 ------
> arch/x86/kernel/asm-offsets.c | 3
> arch/x86/lib/getuser.S | 28 ++++---
> arch/x86/lib/putuser.S | 21 +++--
> arch/xtensa/Kconfig | 1
> drivers/char/mem.c | 16 ----
> drivers/misc/lkdtm/bugs.c | 2
> drivers/misc/lkdtm/core.c | 4 +
> drivers/misc/lkdtm/usercopy.c | 2
> fs/read_write.c | 69 ++++++++++-------
> fs/splice.c | 130 +++------------------------------
> include/linux/fs.h | 2
> include/linux/uaccess.h | 18 ++++
> lib/test_bitmap.c | 10 ++
> 46 files changed, 235 insertions(+), 332 deletions(-)
---end quoted text---
^ permalink raw reply
* [PATCH 11/11] powerpc: remove address space overrides using set_fs()
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/powerpc/Kconfig | 1 -
arch/powerpc/include/asm/processor.h | 7 ---
arch/powerpc/include/asm/thread_info.h | 5 +--
arch/powerpc/include/asm/uaccess.h | 62 ++++++++------------------
arch/powerpc/kernel/signal.c | 3 --
arch/powerpc/lib/sstep.c | 6 +--
6 files changed, 22 insertions(+), 62 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3f09d6fdf89405..1f48bbfb3ce99d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -249,7 +249,6 @@ config PPC
select PCI_SYSCALL if PCI
select PPC_DAWR if PPC64
select RTC_LIB
- select SET_FS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa42..f01e4d650c520a 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -83,10 +83,6 @@ struct task_struct;
void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
void release_thread(struct task_struct *);
-typedef struct {
- unsigned long seg;
-} mm_segment_t;
-
#define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
#define TS_CKFPR(i) ckfp_state.fpr[i][TS_FPROFFSET]
@@ -148,7 +144,6 @@ struct thread_struct {
unsigned long ksp_vsid;
#endif
struct pt_regs *regs; /* Pointer to saved register state */
- mm_segment_t addr_limit; /* for get_fs() validation */
#ifdef CONFIG_BOOKE
/* BookE base exception scratch space; align on cacheline */
unsigned long normsave[8] ____cacheline_aligned;
@@ -295,7 +290,6 @@ struct thread_struct {
#define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
- .addr_limit = KERNEL_DS, \
.pgdir = swapper_pg_dir, \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
@@ -303,7 +297,6 @@ struct thread_struct {
#else
#define INIT_THREAD { \
.ksp = INIT_SP, \
- .addr_limit = KERNEL_DS, \
.fpexc_mode = 0, \
}
#endif
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index ca6c9702570494..46a210b03d2b80 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -90,7 +90,6 @@ void arch_setup_new_exec(void);
#define TIF_SYSCALL_TRACE 0 /* syscall trace active */
#define TIF_SIGPENDING 1 /* signal pending */
#define TIF_NEED_RESCHED 2 /* rescheduling necessary */
-#define TIF_FSCHECK 3 /* Check FS is USER_DS on return */
#define TIF_SYSCALL_EMU 4 /* syscall emulation active */
#define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */
#define TIF_PATCH_PENDING 6 /* pending live patching update */
@@ -130,7 +129,6 @@ void arch_setup_new_exec(void);
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_EMULATE_STACK_STORE (1<<TIF_EMULATE_STACK_STORE)
#define _TIF_NOHZ (1<<TIF_NOHZ)
-#define _TIF_FSCHECK (1<<TIF_FSCHECK)
#define _TIF_SYSCALL_EMU (1<<TIF_SYSCALL_EMU)
#define _TIF_SYSCALL_DOTRACE (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
@@ -138,8 +136,7 @@ void arch_setup_new_exec(void);
#define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
_TIF_NOTIFY_RESUME | _TIF_UPROBE | \
- _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
- _TIF_FSCHECK)
+ _TIF_RESTORE_TM | _TIF_PATCH_PENDING)
#define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR)
/* Bits in local_flags */
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index a31de40ac00b62..38ed408bee6cb2 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -8,62 +8,36 @@
#include <asm/extable.h>
#include <asm/kup.h>
-/*
- * The fs value determines whether argument validity checking should be
- * performed or not. If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- *
- * The fs/ds values are now the highest legal address in the "segment".
- * This simplifies the checking in the routines below.
- */
-
-#define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
-
-#define KERNEL_DS MAKE_MM_SEG(~0UL)
#ifdef __powerpc64__
/* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
-#define USER_DS MAKE_MM_SEG(TASK_SIZE_USER64 - 1)
-#else
-#define USER_DS MAKE_MM_SEG(TASK_SIZE - 1)
-#endif
-
-#define get_fs() (current->thread.addr_limit)
+#define TASK_SIZE_MAX TASK_SIZE_USER64
-static inline void set_fs(mm_segment_t fs)
+static inline bool __access_ok(unsigned long addr, unsigned long size)
{
- current->thread.addr_limit = fs;
- /* On user-mode return check addr_limit (fs) is correct */
- set_thread_flag(TIF_FSCHECK);
+ if (addr >= TASK_SIZE_MAX)
+ return false;
+ /*
+ * This check is sufficient because there is a large enough gap between
+ * user addresses and the kernel addresses.
+ */
+ return size <= TASK_SIZE_MAX;
}
-
-#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-#define user_addr_max() (get_fs().seg)
-
-#ifdef __powerpc64__
-/*
- * This check is sufficient because there is a large enough
- * gap between user addresses and the kernel addresses
- */
-#define __access_ok(addr, size, segment) \
- (((addr) <= (segment).seg) && ((size) <= (segment).seg))
-
#else
+#define TASK_SIZE_MAX TASK_SIZE
-static inline int __access_ok(unsigned long addr, unsigned long size,
- mm_segment_t seg)
+static inline bool __access_ok(unsigned long addr, unsigned long size)
{
- if (addr > seg.seg)
- return 0;
- return (size == 0 || size - 1 <= seg.seg - addr);
+ if (addr >= TASK_SIZE_MAX)
+ return false;
+ if (size == 0)
+ return false;
+ return size <= TASK_SIZE_MAX - addr;
}
-
-#endif
+#endif /* __powerpc64__ */
#define access_ok(addr, size) \
(__chk_user_ptr(addr), \
- __access_ok((__force unsigned long)(addr), (size), get_fs()))
+ __access_ok((unsigned long)(addr), (size)))
/*
* These are the main single-value transfer routines. They automatically
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d15a98c758b8b4..df547d8e31e49c 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -312,9 +312,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
{
user_exit();
- /* Check valid addr_limit, TIF check is done there */
- addr_limit_user_check();
-
if (thread_info_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index caee8cc77e1954..8342188ea1acd0 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -108,11 +108,11 @@ static nokprobe_inline long address_ok(struct pt_regs *regs,
{
if (!user_mode(regs))
return 1;
- if (__access_ok(ea, nb, USER_DS))
+ if (__access_ok(ea, nb))
return 1;
- if (__access_ok(ea, 1, USER_DS))
+ if (__access_ok(ea, 1))
/* Access overlaps the end of the user region */
- regs->dar = USER_DS.seg;
+ regs->dar = TASK_SIZE_MAX - 1;
else
regs->dar = ea;
return 0;
--
2.28.0
^ permalink raw reply related
* [PATCH 10/11] powerpc: use non-set_fs based maccess routines
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Provide __get_kernel_nofault and __put_kernel_nofault routines to
implement the maccess routines without messing with set_fs and without
opening up access to user space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/powerpc/include/asm/uaccess.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 00699903f1efca..a31de40ac00b62 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -623,4 +623,20 @@ do { \
__put_user_goto(*(u8*)(_src + _i), (u8 __user *)(_dst + _i), e);\
} while (0)
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label) \
+do { \
+ int __kr_err; \
+ \
+ __get_user_size(*((type *)(dst)), (__force type __user *)(src), \
+ sizeof(type), __kr_err); \
+ if (unlikely(__kr_err)) \
+ goto err_label; \
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label) \
+ __put_user_size_goto(*((type *)(src)), \
+ (__force type __user *)(dst), sizeof(type), err_label)
+
#endif /* _ARCH_POWERPC_UACCESS_H */
--
2.28.0
^ permalink raw reply related
* [PATCH 09/11] x86: remove address space overrides using set_fs()
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more. To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/Kconfig | 1 -
arch/x86/ia32/ia32_aout.c | 1 -
arch/x86/include/asm/processor.h | 11 +----------
arch/x86/include/asm/thread_info.h | 2 --
arch/x86/include/asm/uaccess.h | 26 +-------------------------
arch/x86/kernel/asm-offsets.c | 3 ---
arch/x86/lib/getuser.S | 28 ++++++++++++++++++----------
arch/x86/lib/putuser.S | 21 ++++++++++++---------
8 files changed, 32 insertions(+), 61 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f85c13355732fe..7101ac64bb209d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -237,7 +237,6 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
- select SET_FS
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index ca8a657edf5977..a09fc37ead9d47 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -239,7 +239,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
(regs)->ss = __USER32_DS;
regs->r8 = regs->r9 = regs->r10 = regs->r11 =
regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
- set_fs(USER_DS);
return 0;
}
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1618eeb08361a9..189573d95c3af6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -482,10 +482,6 @@ extern unsigned int fpu_user_xstate_size;
struct perf_event;
-typedef struct {
- unsigned long seg;
-} mm_segment_t;
-
struct thread_struct {
/* Cached TLS descriptors: */
struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
@@ -538,8 +534,6 @@ struct thread_struct {
*/
unsigned long iopl_emul;
- mm_segment_t addr_limit;
-
unsigned int sig_on_uaccess_err:1;
/* Floating point and extended processor state */
@@ -785,15 +779,12 @@ static inline void spin_lock_prefetch(const void *x)
#define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
.sysenter_cs = __KERNEL_CS, \
- .addr_limit = KERNEL_DS, \
}
#define KSTK_ESP(task) (task_pt_regs(task)->sp)
#else
-#define INIT_THREAD { \
- .addr_limit = KERNEL_DS, \
-}
+#define INIT_THREAD { }
extern unsigned long KSTK_ESP(struct task_struct *task);
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86dd..44733a4bfc4294 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -102,7 +102,6 @@ struct thread_info {
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */
#define TIF_X32 30 /* 32-bit native x86-64 binary */
-#define TIF_FSCHECK 31 /* Check FS is USER_DS on return */
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -131,7 +130,6 @@ struct thread_info {
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_ADDR32 (1 << TIF_ADDR32)
#define _TIF_X32 (1 << TIF_X32)
-#define _TIF_FSCHECK (1 << TIF_FSCHECK)
/* flags to check in __switch_to() */
#define _TIF_WORK_CTXSW_BASE \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ecefaffd15d4c8..a4ceda0510ea87 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -12,30 +12,6 @@
#include <asm/smap.h>
#include <asm/extable.h>
-/*
- * The fs value determines whether argument validity checking should be
- * performed or not. If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- */
-
-#define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
-
-#define KERNEL_DS MAKE_MM_SEG(-1UL)
-#define USER_DS MAKE_MM_SEG(TASK_SIZE_MAX)
-
-#define get_fs() (current->thread.addr_limit)
-static inline void set_fs(mm_segment_t fs)
-{
- current->thread.addr_limit = fs;
- /* On user-mode return, check fs is correct */
- set_thread_flag(TIF_FSCHECK);
-}
-
-#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-#define user_addr_max() (current->thread.addr_limit.seg)
-
/*
* Test whether a block of memory is a valid user space address.
* Returns 0 if the range is valid, nonzero otherwise.
@@ -93,7 +69,7 @@ static inline bool pagefault_disabled(void);
#define access_ok(addr, size) \
({ \
WARN_ON_IN_IRQ(); \
- likely(!__range_not_ok(addr, size, user_addr_max())); \
+ likely(!__range_not_ok(addr, size, TASK_SIZE_MAX)); \
})
/*
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 3ca07ad552ae0c..70b7154f4bdd62 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -37,9 +37,6 @@ static void __used common(void)
OFFSET(TASK_stack_canary, task_struct, stack_canary);
#endif
- BLANK();
- OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
-
BLANK();
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c8a85b512796e1..ccc9808c66420a 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -35,10 +35,18 @@
#include <asm/smap.h>
#include <asm/export.h>
+#ifdef CONFIG_X86_5LEVEL
+#define LOAD_TASK_SIZE_MAX \
+ ALTERNATIVE "mov $((1 << 47) - 4096),%rdx", \
+ "mov $((1 << 56) - 4096),%rdx", X86_FEATURE_LA57
+#else
+#define LOAD_TASK_SIZE_MAX mov $TASK_SIZE_MAX,%_ASM_DX
+#endif
+
.text
SYM_FUNC_START(__get_user_1)
- mov PER_CPU_VAR(current_task), %_ASM_DX
- cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_DX,%_ASM_AX
jae bad_get_user
sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
and %_ASM_DX, %_ASM_AX
@@ -53,8 +61,8 @@ EXPORT_SYMBOL(__get_user_1)
SYM_FUNC_START(__get_user_2)
add $1,%_ASM_AX
jc bad_get_user
- mov PER_CPU_VAR(current_task), %_ASM_DX
- cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_DX,%_ASM_AX
jae bad_get_user
sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
and %_ASM_DX, %_ASM_AX
@@ -69,8 +77,8 @@ EXPORT_SYMBOL(__get_user_2)
SYM_FUNC_START(__get_user_4)
add $3,%_ASM_AX
jc bad_get_user
- mov PER_CPU_VAR(current_task), %_ASM_DX
- cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_DX,%_ASM_AX
jae bad_get_user
sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
and %_ASM_DX, %_ASM_AX
@@ -86,8 +94,8 @@ SYM_FUNC_START(__get_user_8)
#ifdef CONFIG_X86_64
add $7,%_ASM_AX
jc bad_get_user
- mov PER_CPU_VAR(current_task), %_ASM_DX
- cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_DX,%_ASM_AX
jae bad_get_user
sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
and %_ASM_DX, %_ASM_AX
@@ -99,8 +107,8 @@ SYM_FUNC_START(__get_user_8)
#else
add $7,%_ASM_AX
jc bad_get_user_8
- mov PER_CPU_VAR(current_task), %_ASM_DX
- cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_DX,%_ASM_AX
jae bad_get_user_8
sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
and %_ASM_DX, %_ASM_AX
diff --git a/arch/x86/lib/putuser.S b/arch/x86/lib/putuser.S
index 7c7c92db8497af..f5a56394985875 100644
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -31,12 +31,18 @@
* as they get called from within inline assembly.
*/
-#define ENTER mov PER_CPU_VAR(current_task), %_ASM_BX
+#ifdef CONFIG_X86_5LEVEL
+#define LOAD_TASK_SIZE_MAX \
+ ALTERNATIVE "mov $((1 << 47) - 4096),%rbx", \
+ "mov $((1 << 56) - 4096),%rbx", X86_FEATURE_LA57
+#else
+#define LOAD_TASK_SIZE_MAX mov $TASK_SIZE_MAX,%_ASM_BX
+#endif
.text
SYM_FUNC_START(__put_user_1)
- ENTER
- cmp TASK_addr_limit(%_ASM_BX),%_ASM_CX
+ LOAD_TASK_SIZE_MAX
+ cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
ASM_STAC
1: movb %al,(%_ASM_CX)
@@ -47,8 +53,7 @@ SYM_FUNC_END(__put_user_1)
EXPORT_SYMBOL(__put_user_1)
SYM_FUNC_START(__put_user_2)
- ENTER
- mov TASK_addr_limit(%_ASM_BX),%_ASM_BX
+ LOAD_TASK_SIZE_MAX
sub $1,%_ASM_BX
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
@@ -61,8 +66,7 @@ SYM_FUNC_END(__put_user_2)
EXPORT_SYMBOL(__put_user_2)
SYM_FUNC_START(__put_user_4)
- ENTER
- mov TASK_addr_limit(%_ASM_BX),%_ASM_BX
+ LOAD_TASK_SIZE_MAX
sub $3,%_ASM_BX
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
@@ -75,8 +79,7 @@ SYM_FUNC_END(__put_user_4)
EXPORT_SYMBOL(__put_user_4)
SYM_FUNC_START(__put_user_8)
- ENTER
- mov TASK_addr_limit(%_ASM_BX),%_ASM_BX
+ LOAD_TASK_SIZE_MAX
sub $7,%_ASM_BX
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
--
2.28.0
^ permalink raw reply related
* [PATCH 08/11] x86: make TASK_SIZE_MAX usable from assembly code
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
For 64-bit the only hing missing was a strategic _AC, and for 32-bit we
need to use __PAGE_OFFSET instead of PAGE_OFFSET in the TASK_SIZE
definition to escape the explicit unsigned long cast. This just works
because __PAGE_OFFSET is defined using _AC itself and thus never needs
the cast anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/include/asm/page_32_types.h | 4 ++--
arch/x86/include/asm/page_64_types.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index 26236925fb2c36..f462895a33e452 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -44,8 +44,8 @@
/*
* User space process size: 3GB (default).
*/
-#define IA32_PAGE_OFFSET PAGE_OFFSET
-#define TASK_SIZE PAGE_OFFSET
+#define IA32_PAGE_OFFSET __PAGE_OFFSET
+#define TASK_SIZE __PAGE_OFFSET
#define TASK_SIZE_LOW TASK_SIZE
#define TASK_SIZE_MAX TASK_SIZE
#define DEFAULT_MAP_WINDOW TASK_SIZE
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 996595c9897e0a..838515daf87b36 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -76,7 +76,7 @@
*
* With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
-#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
+#define TASK_SIZE_MAX ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
--
2.28.0
^ permalink raw reply related
* remove the last set_fs() in common code, and remove it for x86 and powerpc
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
Hi all,
this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.
The file system part has been posted a few times, and the read/write side
has been pretty much unchanced. For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them. The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it. All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.
Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged. The process to convert
architectures is roughtly:
- ensure there is no set_fs(KERNEL_DS) left in arch specific code
- implement __get_kernel_nofault and __put_kernel_nofault
- remove the arch specific address limitation functionality
Diffstat:
arch/Kconfig | 3
arch/alpha/Kconfig | 1
arch/arc/Kconfig | 1
arch/arm/Kconfig | 1
arch/arm64/Kconfig | 1
arch/c6x/Kconfig | 1
arch/csky/Kconfig | 1
arch/h8300/Kconfig | 1
arch/hexagon/Kconfig | 1
arch/ia64/Kconfig | 1
arch/m68k/Kconfig | 1
arch/microblaze/Kconfig | 1
arch/mips/Kconfig | 1
arch/nds32/Kconfig | 1
arch/nios2/Kconfig | 1
arch/openrisc/Kconfig | 1
arch/parisc/Kconfig | 1
arch/powerpc/include/asm/processor.h | 7 -
arch/powerpc/include/asm/thread_info.h | 5 -
arch/powerpc/include/asm/uaccess.h | 78 ++++++++-----------
arch/powerpc/kernel/signal.c | 3
arch/powerpc/lib/sstep.c | 6 -
arch/riscv/Kconfig | 1
arch/s390/Kconfig | 1
arch/sh/Kconfig | 1
arch/sparc/Kconfig | 1
arch/um/Kconfig | 1
arch/x86/ia32/ia32_aout.c | 1
arch/x86/include/asm/page_32_types.h | 11 ++
arch/x86/include/asm/page_64_types.h | 38 +++++++++
arch/x86/include/asm/processor.h | 60 ---------------
arch/x86/include/asm/thread_info.h | 2
arch/x86/include/asm/uaccess.h | 26 ------
arch/x86/kernel/asm-offsets.c | 3
arch/x86/lib/getuser.S | 28 ++++---
arch/x86/lib/putuser.S | 21 +++--
arch/xtensa/Kconfig | 1
drivers/char/mem.c | 16 ----
drivers/misc/lkdtm/bugs.c | 2
drivers/misc/lkdtm/core.c | 4 +
drivers/misc/lkdtm/usercopy.c | 2
fs/read_write.c | 69 ++++++++++-------
fs/splice.c | 130 +++------------------------------
include/linux/fs.h | 2
include/linux/uaccess.h | 18 ++++
lib/test_bitmap.c | 10 ++
46 files changed, 235 insertions(+), 332 deletions(-)
^ permalink raw reply
* [PATCH 07/11] x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32, 64}_types.h
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
At least for 64-bit this moves them closer to some of the defines
they are based on, and it prepares for using the TASK_SIZE_MAX
definition from assembly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/include/asm/page_32_types.h | 11 +++++++
arch/x86/include/asm/page_64_types.h | 38 +++++++++++++++++++++
arch/x86/include/asm/processor.h | 49 ----------------------------
3 files changed, 49 insertions(+), 49 deletions(-)
diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index 565ad755c785e2..26236925fb2c36 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -41,6 +41,17 @@
#define __VIRTUAL_MASK_SHIFT 32
#endif /* CONFIG_X86_PAE */
+/*
+ * User space process size: 3GB (default).
+ */
+#define IA32_PAGE_OFFSET PAGE_OFFSET
+#define TASK_SIZE PAGE_OFFSET
+#define TASK_SIZE_LOW TASK_SIZE
+#define TASK_SIZE_MAX TASK_SIZE
+#define DEFAULT_MAP_WINDOW TASK_SIZE
+#define STACK_TOP TASK_SIZE
+#define STACK_TOP_MAX STACK_TOP
+
/*
* Kernel image size is limited to 512 MB (see in arch/x86/kernel/head_32.S)
*/
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 288b065955b729..996595c9897e0a 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -58,6 +58,44 @@
#define __VIRTUAL_MASK_SHIFT 47
#endif
+/*
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
+ */
+#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
+
+#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
+
+/* This decides where the kernel will search for a free chunk of vm
+ * space during mmap's.
+ */
+#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? \
+ 0xc0000000 : 0xFFFFe000)
+
+#define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \
+ IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW)
+#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
+ IA32_PAGE_OFFSET : TASK_SIZE_MAX)
+#define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
+ IA32_PAGE_OFFSET : TASK_SIZE_MAX)
+
+#define STACK_TOP TASK_SIZE_LOW
+#define STACK_TOP_MAX TASK_SIZE_MAX
+
/*
* Maximum kernel image size is limited to 1 GiB, due to the fixmap living
* in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 97143d87994c24..1618eeb08361a9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -782,17 +782,6 @@ static inline void spin_lock_prefetch(const void *x)
})
#ifdef CONFIG_X86_32
-/*
- * User space process size: 3GB (default).
- */
-#define IA32_PAGE_OFFSET PAGE_OFFSET
-#define TASK_SIZE PAGE_OFFSET
-#define TASK_SIZE_LOW TASK_SIZE
-#define TASK_SIZE_MAX TASK_SIZE
-#define DEFAULT_MAP_WINDOW TASK_SIZE
-#define STACK_TOP TASK_SIZE
-#define STACK_TOP_MAX STACK_TOP
-
#define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
.sysenter_cs = __KERNEL_CS, \
@@ -802,44 +791,6 @@ static inline void spin_lock_prefetch(const void *x)
#define KSTK_ESP(task) (task_pt_regs(task)->sp)
#else
-/*
- * User space process size. This is the first address outside the user range.
- * There are a few constraints that determine this:
- *
- * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
- * address, then that syscall will enter the kernel with a
- * non-canonical return address, and SYSRET will explode dangerously.
- * We avoid this particular problem by preventing anything executable
- * from being mapped at the maximum canonical address.
- *
- * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
- * CPUs malfunction if they execute code from the highest canonical page.
- * They'll speculate right off the end of the canonical space, and
- * bad things happen. This is worked around in the same way as the
- * Intel problem.
- *
- * With page table isolation enabled, we map the LDT in ... [stay tuned]
- */
-#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
-
-#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
-
-/* This decides where the kernel will search for a free chunk of vm
- * space during mmap's.
- */
-#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? \
- 0xc0000000 : 0xFFFFe000)
-
-#define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \
- IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW)
-#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
- IA32_PAGE_OFFSET : TASK_SIZE_MAX)
-#define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
- IA32_PAGE_OFFSET : TASK_SIZE_MAX)
-
-#define STACK_TOP TASK_SIZE_LOW
-#define STACK_TOP_MAX TASK_SIZE_MAX
-
#define INIT_THREAD { \
.addr_limit = KERNEL_DS, \
}
--
2.28.0
^ permalink raw reply related
* [PATCH 01/11] mem: remove duplicate ops for /dev/zero and /dev/null
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
There is no good reason to implement both the traditional ->read and
->write as well as the iter based ops. So implement just the iter
based ones.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/char/mem.c | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 687d4af6945d36..14851f36787372 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -670,18 +670,6 @@ static ssize_t write_port(struct file *file, const char __user *buf,
return tmp-buf;
}
-static ssize_t read_null(struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
-{
- return 0;
-}
-
-static ssize_t write_null(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
-{
- return count;
-}
-
static ssize_t read_iter_null(struct kiocb *iocb, struct iov_iter *to)
{
return 0;
@@ -872,7 +860,6 @@ static int open_port(struct inode *inode, struct file *filp)
#define zero_lseek null_lseek
#define full_lseek null_lseek
-#define write_zero write_null
#define write_iter_zero write_iter_null
#define open_mem open_port
#define open_kmem open_mem
@@ -903,8 +890,6 @@ static const struct file_operations __maybe_unused kmem_fops = {
static const struct file_operations null_fops = {
.llseek = null_lseek,
- .read = read_null,
- .write = write_null,
.read_iter = read_iter_null,
.write_iter = write_iter_null,
.splice_write = splice_write_null,
@@ -919,7 +904,6 @@ static const struct file_operations __maybe_unused port_fops = {
static const struct file_operations zero_fops = {
.llseek = zero_lseek,
- .write = write_zero,
.read_iter = read_iter_zero,
.write_iter = write_iter_zero,
.mmap = mmap_zero,
--
2.28.0
^ permalink raw reply related
* [PATCH 04/11] uaccess: add infrastructure for kernel builds with set_fs()
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/Kconfig | 3 +++
arch/alpha/Kconfig | 1 +
arch/arc/Kconfig | 1 +
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/c6x/Kconfig | 1 +
arch/csky/Kconfig | 1 +
arch/h8300/Kconfig | 1 +
arch/hexagon/Kconfig | 1 +
arch/ia64/Kconfig | 1 +
arch/m68k/Kconfig | 1 +
arch/microblaze/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/nds32/Kconfig | 1 +
arch/nios2/Kconfig | 1 +
arch/openrisc/Kconfig | 1 +
arch/parisc/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/riscv/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sh/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/um/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/xtensa/Kconfig | 1 +
include/linux/uaccess.h | 18 ++++++++++++++++++
26 files changed, 45 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493fc..3fab619a6aa51a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,9 @@ config KEXEC_ELF
config HAVE_IMA_KEXEC
bool
+config SET_FS
+ bool
+
config HOTPLUG_SMT
bool
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 9c5f06e8eb9bc0..d6e9fc7a7b19e2 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -39,6 +39,7 @@ config ALPHA
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select MMU_GATHER_NO_RANGE
+ select SET_FS
help
The Alpha is a 64-bit general-purpose processor designed and
marketed by the Digital Equipment Corporation of blessed memory,
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index ba00c4e1e1c271..c49f5754a11e40 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -48,6 +48,7 @@ config ARC
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
+ select SET_FS
config ARCH_HAS_CACHE_LINE_SIZE
def_bool y
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e00d94b1665876..87e1478a42dc4f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -118,6 +118,7 @@ config ARM
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC
select RTC_LIB
+ select SET_FS
select SYS_SUPPORTS_APM_EMULATION
# Above selects are sorted alphabetically; please add new ones
# according to that. Thanks.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbeee8..fbd9e35bef096f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -192,6 +192,7 @@ config ARM64
select PCI_SYSCALL if PCI
select POWER_RESET
select POWER_SUPPLY
+ select SET_FS
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index 6444ebfd06a665..48d66bf0465d68 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -22,6 +22,7 @@ config C6X
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
select MMU_GATHER_NO_RANGE if MMU
+ select SET_FS
config MMU
def_bool n
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 3d5afb5f568543..2836f6e76fdb2d 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -78,6 +78,7 @@ config CSKY
select PCI_DOMAINS_GENERIC if PCI
select PCI_SYSCALL if PCI
select PCI_MSI if PCI
+ select SET_FS
config LOCKDEP_SUPPORT
def_bool y
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index d11666d538fea8..7945de067e9fcc 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -25,6 +25,7 @@ config H8300
select HAVE_ARCH_KGDB
select HAVE_ARCH_HASH
select CPU_NO_EFFICIENT_FFS
+ select SET_FS
select UACCESS_MEMCPY
config CPU_BIG_ENDIAN
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 667cfc511cf999..f2afabbadd430e 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -31,6 +31,7 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
+ select SET_FS
help
Qualcomm Hexagon is a processor architecture designed for high
performance and low power across a wide variety of applications.
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf5863a..22a6853840e235 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -56,6 +56,7 @@ config IA64
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
select NUMA if !FLATMEM
+ select SET_FS
default y
help
The Itanium Processor Family is Intel's 64-bit successor to
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 6f2f38d05772ab..dcf4ae8c9b215f 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -32,6 +32,7 @@ config M68K
select OLD_SIGSUSPEND3
select OLD_SIGACTION
select MMU_GATHER_NO_RANGE if MMU
+ select SET_FS
config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index d262ac0c8714bd..7e3d4583abf3e6 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -46,6 +46,7 @@ config MICROBLAZE
select CPU_NO_EFFICIENT_FFS
select MMU_GATHER_NO_RANGE if MMU
select SPARSE_IRQ
+ select SET_FS
# Endianness selection
choice
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..fbc26391b588f8 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -87,6 +87,7 @@ config MIPS
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select PERF_USE_VMALLOC
select RTC_LIB
+ select SET_FS
select SYSCTL_EXCEPTION_TRACE
select VIRT_TO_BUS
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index e30298e99e1bdf..e8e541fd2267d0 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -48,6 +48,7 @@ config NDS32
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
+ select SET_FS
help
Andes(nds32) Linux support.
diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index c6645141bb2a88..c7c6ba6bec9dfc 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -27,6 +27,7 @@ config NIOS2
select USB_ARCH_HAS_HCD if USB_SUPPORT
select CPU_NO_EFFICIENT_FFS
select MMU_GATHER_NO_RANGE if MMU
+ select SET_FS
config GENERIC_CSUM
def_bool y
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 7e94fe37cb2fdf..6233c62931803f 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -39,6 +39,7 @@ config OPENRISC
select ARCH_WANT_FRAME_POINTERS
select GENERIC_IRQ_MULTI_HANDLER
select MMU_GATHER_NO_RANGE if MMU
+ select SET_FS
config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..be70af482b5a9a 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -63,6 +63,7 @@ config PARISC
select HAVE_FTRACE_MCOUNT_RECORD if HAVE_DYNAMIC_FTRACE
select HAVE_KPROBES_ON_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
+ select SET_FS
help
The PA-RISC microprocessor is designed by Hewlett-Packard and used
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce99d..3f09d6fdf89405 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -249,6 +249,7 @@ config PPC
select PCI_SYSCALL if PCI
select PPC_DAWR if PPC64
select RTC_LIB
+ select SET_FS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 7b590552914664..6e05a94f808a5a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -82,6 +82,7 @@ config RISCV
select PCI_MSI if PCI
select RISCV_INTC
select RISCV_TIMER
+ select SET_FS
select SPARSEMEM_STATIC if 32BIT
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3d86e12e8e3c21..fd81385a7787cb 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -185,6 +185,7 @@ config S390
select OLD_SIGSUSPEND3
select PCI_DOMAINS if PCI
select PCI_MSI if PCI
+ select SET_FS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index d20927128fce05..2bd1653f3b3fea 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -71,6 +71,7 @@ config SUPERH
select PERF_EVENTS
select PERF_USE_VMALLOC
select RTC_LIB
+ select SET_FS
select SPARSE_IRQ
help
The SuperH is a RISC processor targeted for use in embedded systems
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index efeff2c896a544..3e0cf0319a278a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -49,6 +49,7 @@ config SPARC
select LOCKDEP_SMALL if LOCKDEP
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
+ select SET_FS
config SPARC32
def_bool !64BIT
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index eb51fec759484a..3aefcd81566809 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -19,6 +19,7 @@ config UML
select GENERIC_CPU_DEVICES
select GENERIC_CLOCKEVENTS
select HAVE_GCC_PLUGINS
+ select SET_FS
select TTY # Needed for line.c
config MMU
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac64bb209d..f85c13355732fe 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -237,6 +237,7 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
+ select SET_FS
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index e997e0119c0251..94bad4d66b4bde 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -41,6 +41,7 @@ config XTENSA
select IRQ_DOMAIN
select MODULES_USE_ELF_RELA
select PERF_USE_VMALLOC
+ select SET_FS
select VIRT_TO_BUS
help
Xtensa processors are 32-bit RISC machines designed by Tensilica
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 94b28541165929..70073c802b48ed 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -8,6 +8,7 @@
#include <asm/uaccess.h>
+#ifdef CONFIG_SET_FS
/*
* Force the uaccess routines to be wired up for actual userspace access,
* overriding any possible set_fs(KERNEL_DS) still lingering around. Undone
@@ -25,6 +26,23 @@ static inline void force_uaccess_end(mm_segment_t oldfs)
{
set_fs(oldfs);
}
+#else /* CONFIG_SET_FS */
+typedef struct {
+ /* empty dummy */
+} mm_segment_t;
+
+#define uaccess_kernel() (false)
+#define user_addr_max() (TASK_SIZE_MAX)
+
+static inline mm_segment_t force_uaccess_begin(void)
+{
+ return (mm_segment_t) { };
+}
+
+static inline void force_uaccess_end(mm_segment_t oldfs)
+{
+}
+#endif /* CONFIG_SET_FS */
/*
* Architectures should provide two primitives (raw_copy_{to,from}_user())
--
2.28.0
^ permalink raw reply related
* [PATCH 06/11] lkdtm: disable set_fs-based tests for !CONFIG_SET_FS
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Once we can't manipulate the address limit, we also can't test what
happens when the manipulation is abused.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/misc/lkdtm/bugs.c | 2 ++
drivers/misc/lkdtm/core.c | 4 ++++
drivers/misc/lkdtm/usercopy.c | 2 ++
3 files changed, 8 insertions(+)
diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 4dfbfd51bdf774..66f1800b1cb82d 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -312,6 +312,7 @@ void lkdtm_CORRUPT_LIST_DEL(void)
pr_err("list_del() corruption not detected!\n");
}
+#ifdef CONFIG_SET_FS
/* Test if unbalanced set_fs(KERNEL_DS)/set_fs(USER_DS) check exists. */
void lkdtm_CORRUPT_USER_DS(void)
{
@@ -321,6 +322,7 @@ void lkdtm_CORRUPT_USER_DS(void)
/* Make sure we do not keep running with a KERNEL_DS! */
force_sig(SIGKILL);
}
+#endif
/* Test that VMAP_STACK is actually allocating with a leading guard page */
void lkdtm_STACK_GUARD_PAGE_LEADING(void)
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df916632..aae08b33a7ee2a 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -112,7 +112,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(CORRUPT_STACK_STRONG),
CRASHTYPE(CORRUPT_LIST_ADD),
CRASHTYPE(CORRUPT_LIST_DEL),
+#ifdef CONFIG_SET_FS
CRASHTYPE(CORRUPT_USER_DS),
+#endif
CRASHTYPE(STACK_GUARD_PAGE_LEADING),
CRASHTYPE(STACK_GUARD_PAGE_TRAILING),
CRASHTYPE(UNSET_SMEP),
@@ -172,7 +174,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
CRASHTYPE(USERCOPY_STACK_BEYOND),
CRASHTYPE(USERCOPY_KERNEL),
+#ifdef CONFIG_SET_FS
CRASHTYPE(USERCOPY_KERNEL_DS),
+#endif
CRASHTYPE(STACKLEAK_ERASING),
CRASHTYPE(CFI_FORWARD_PROTO),
#ifdef CONFIG_X86_32
diff --git a/drivers/misc/lkdtm/usercopy.c b/drivers/misc/lkdtm/usercopy.c
index b833367a45d053..4b632fe79ab6bb 100644
--- a/drivers/misc/lkdtm/usercopy.c
+++ b/drivers/misc/lkdtm/usercopy.c
@@ -325,6 +325,7 @@ void lkdtm_USERCOPY_KERNEL(void)
vm_munmap(user_addr, PAGE_SIZE);
}
+#ifdef CONFIG_SET_FS
void lkdtm_USERCOPY_KERNEL_DS(void)
{
char __user *user_ptr =
@@ -339,6 +340,7 @@ void lkdtm_USERCOPY_KERNEL_DS(void)
pr_err("copy_to_user() to noncanonical address succeeded!?\n");
set_fs(old_fs);
}
+#endif
void __init lkdtm_usercopy_init(void)
{
--
2.28.0
^ permalink raw reply related
* [PATCH 02/11] fs: don't allow kernel reads and writes without iter ops
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
Don't allow calling ->read or ->write with set_fs as a preparation for
killing off set_fs. All the instances that we use kernel_read/write on
are using the iter ops already.
If a file has both the regular ->read/->write methods and the iter
variants those could have different semantics for messed up enough
drivers. Also fails the kernel access to them in that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/read_write.c | 67 +++++++++++++++++++++++++++++++------------------
1 file changed, 42 insertions(+), 25 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..702c4301d9eb6b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -419,27 +419,41 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
return ret;
}
+static int warn_unsupported(struct file *file, const char *op)
+{
+ pr_warn_ratelimited(
+ "kernel %s not supported for file %pD4 (pid: %d comm: %.20s)\n",
+ op, file, current->pid, current->comm);
+ return -EINVAL;
+}
+
ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
- mm_segment_t old_fs = get_fs();
+ struct kvec iov = {
+ .iov_base = buf,
+ .iov_len = min_t(size_t, count, MAX_RW_COUNT),
+ };
+ struct kiocb kiocb;
+ struct iov_iter iter;
ssize_t ret;
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
return -EINVAL;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
+ /*
+ * Also fail if ->read_iter and ->read are both wired up as that
+ * implies very convoluted semantics.
+ */
+ if (unlikely(!file->f_op->read_iter || file->f_op->read))
+ return warn_unsupported(file, "read");
- if (count > MAX_RW_COUNT)
- count = MAX_RW_COUNT;
- set_fs(KERNEL_DS);
- if (file->f_op->read)
- ret = file->f_op->read(file, (void __user *)buf, count, pos);
- else if (file->f_op->read_iter)
- ret = new_sync_read(file, (void __user *)buf, count, pos);
- else
- ret = -EINVAL;
- set_fs(old_fs);
+ init_sync_kiocb(&kiocb, file);
+ kiocb.ki_pos = *pos;
+ iov_iter_kvec(&iter, READ, &iov, 1, iov.iov_len);
+ ret = file->f_op->read_iter(&kiocb, &iter);
if (ret > 0) {
+ *pos = kiocb.ki_pos;
fsnotify_access(file);
add_rchar(current, ret);
}
@@ -510,28 +524,31 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
/* caller is responsible for file_start_write/file_end_write */
ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
{
- mm_segment_t old_fs;
- const char __user *p;
+ struct kvec iov = {
+ .iov_base = (void *)buf,
+ .iov_len = min_t(size_t, count, MAX_RW_COUNT),
+ };
+ struct kiocb kiocb;
+ struct iov_iter iter;
ssize_t ret;
if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
+ /*
+ * Also fail if ->write_iter and ->write are both wired up as that
+ * implies very convoluted semantics.
+ */
+ if (unlikely(!file->f_op->write_iter || file->f_op->write))
+ return warn_unsupported(file, "write");
- old_fs = get_fs();
- set_fs(KERNEL_DS);
- p = (__force const char __user *)buf;
- if (count > MAX_RW_COUNT)
- count = MAX_RW_COUNT;
- if (file->f_op->write)
- ret = file->f_op->write(file, p, count, pos);
- else if (file->f_op->write_iter)
- ret = new_sync_write(file, p, count, pos);
- else
- ret = -EINVAL;
- set_fs(old_fs);
+ init_sync_kiocb(&kiocb, file);
+ kiocb.ki_pos = *pos;
+ iov_iter_kvec(&iter, WRITE, &iov, 1, iov.iov_len);
+ ret = file->f_op->write_iter(&kiocb, &iter);
if (ret > 0) {
+ *pos = kiocb.ki_pos;
fsnotify_modify(file);
add_wchar(current, ret);
}
--
2.28.0
^ permalink raw reply related
* [PATCH 03/11] fs: don't allow splice read/write without explicit ops
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
default_file_splice_write is the last piece of generic code that uses
set_fs to make the uaccess routines operate on kernel pointers. It
implements a "fallback loop" for splicing from files that do not actually
provide a proper splice_read method. The usual file systems and other
high bandwith instances all provide a ->splice_read, so this just removes
support for various device drivers and procfs/debugfs files. If splice
support for any of those turns out to be important it can be added back
by switching them to the iter ops and using generic_file_splice_read.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/read_write.c | 2 +-
fs/splice.c | 130 +++++----------------------------------------
include/linux/fs.h | 2 -
3 files changed, 15 insertions(+), 119 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 702c4301d9eb6b..8c61f67453e3d3 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1077,7 +1077,7 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
}
EXPORT_SYMBOL(vfs_iter_write);
-ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
+static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
unsigned long vlen, loff_t *pos, rwf_t flags)
{
struct iovec iovstack[UIO_FASTIOV];
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..412df7b48f9eb7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -342,89 +342,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
};
EXPORT_SYMBOL(nosteal_pipe_buf_ops);
-static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
- unsigned long vlen, loff_t offset)
-{
- mm_segment_t old_fs;
- loff_t pos = offset;
- ssize_t res;
-
- old_fs = get_fs();
- set_fs(KERNEL_DS);
- /* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
- set_fs(old_fs);
-
- return res;
-}
-
-static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
-{
- struct kvec *vec, __vec[PIPE_DEF_BUFFERS];
- struct iov_iter to;
- struct page **pages;
- unsigned int nr_pages;
- unsigned int mask;
- size_t offset, base, copied = 0;
- ssize_t res;
- int i;
-
- if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
- return -EAGAIN;
-
- /*
- * Try to keep page boundaries matching to source pagecache ones -
- * it probably won't be much help, but...
- */
- offset = *ppos & ~PAGE_MASK;
-
- iov_iter_pipe(&to, READ, pipe, len + offset);
-
- res = iov_iter_get_pages_alloc(&to, &pages, len + offset, &base);
- if (res <= 0)
- return -ENOMEM;
-
- nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
-
- vec = __vec;
- if (nr_pages > PIPE_DEF_BUFFERS) {
- vec = kmalloc_array(nr_pages, sizeof(struct kvec), GFP_KERNEL);
- if (unlikely(!vec)) {
- res = -ENOMEM;
- goto out;
- }
- }
-
- mask = pipe->ring_size - 1;
- pipe->bufs[to.head & mask].offset = offset;
- pipe->bufs[to.head & mask].len -= offset;
-
- for (i = 0; i < nr_pages; i++) {
- size_t this_len = min_t(size_t, len, PAGE_SIZE - offset);
- vec[i].iov_base = page_address(pages[i]) + offset;
- vec[i].iov_len = this_len;
- len -= this_len;
- offset = 0;
- }
-
- res = kernel_readv(in, vec, nr_pages, *ppos);
- if (res > 0) {
- copied = res;
- *ppos += res;
- }
-
- if (vec != __vec)
- kfree(vec);
-out:
- for (i = 0; i < nr_pages; i++)
- put_page(pages[i]);
- kvfree(pages);
- iov_iter_advance(&to, copied); /* truncates and discards */
- return res;
-}
-
/*
* Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
* using sendpage(). Return the number of bytes sent.
@@ -788,33 +705,6 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
EXPORT_SYMBOL(iter_file_splice_write);
-static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
- struct splice_desc *sd)
-{
- int ret;
- void *data;
- loff_t tmp = sd->pos;
-
- data = kmap(buf->page);
- ret = __kernel_write(sd->u.file, data + buf->offset, sd->len, &tmp);
- kunmap(buf->page);
-
- return ret;
-}
-
-static ssize_t default_file_splice_write(struct pipe_inode_info *pipe,
- struct file *out, loff_t *ppos,
- size_t len, unsigned int flags)
-{
- ssize_t ret;
-
- ret = splice_from_pipe(pipe, out, ppos, len, flags, write_pipe_buf);
- if (ret > 0)
- *ppos += ret;
-
- return ret;
-}
-
/**
* generic_splice_sendpage - splice data from a pipe to a socket
* @pipe: pipe to splice from
@@ -836,15 +726,23 @@ ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
EXPORT_SYMBOL(generic_splice_sendpage);
+static int warn_unsupported(struct file *file, const char *op)
+{
+ pr_debug_ratelimited(
+ "splice %s not supported for file %pD4 (pid: %d comm: %.20s)\n",
+ op, file, current->pid, current->comm);
+ return -EINVAL;
+}
+
/*
* Attempt to initiate a splice from pipe to file.
*/
static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
{
- if (out->f_op->splice_write)
- return out->f_op->splice_write(pipe, out, ppos, len, flags);
- return default_file_splice_write(pipe, out, ppos, len, flags);
+ if (unlikely(!out->f_op->splice_write))
+ return warn_unsupported(out, "write");
+ return out->f_op->splice_write(pipe, out, ppos, len, flags);
}
/*
@@ -866,9 +764,9 @@ static long do_splice_to(struct file *in, loff_t *ppos,
if (unlikely(len > MAX_RW_COUNT))
len = MAX_RW_COUNT;
- if (in->f_op->splice_read)
- return in->f_op->splice_read(in, ppos, pipe, len, flags);
- return default_file_splice_read(in, ppos, pipe, len, flags);
+ if (unlikely(!in->f_op->splice_read))
+ return warn_unsupported(in, "read");
+ return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
/**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e019ea2f1347e6..d33cc3e8ed410b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1894,8 +1894,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
-extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *, rwf_t);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
--
2.28.0
^ permalink raw reply related
* [PATCH 05/11] test_bitmap: skip user bitmap tests for !CONFIG_SET_FS
From: Christoph Hellwig @ 2020-08-17 7:32 UTC (permalink / raw)
To: Al Viro, Michael Ellerman, x86
Cc: linux-fsdevel, linux-arch, linuxppc-dev, Kees Cook, linux-kernel
In-Reply-To: <20200817073212.830069-1-hch@lst.de>
We can't run the tests for userspace bitmap parsing if set_fs() doesn't
exist.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
lib/test_bitmap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index df903c53952bb9..49b1d25fbaf546 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -365,6 +365,7 @@ static void __init __test_bitmap_parselist(int is_user)
for (i = 0; i < ARRAY_SIZE(parselist_tests); i++) {
#define ptest parselist_tests[i]
+#ifdef CONFIG_SET_FS
if (is_user) {
mm_segment_t orig_fs = get_fs();
size_t len = strlen(ptest.in);
@@ -375,7 +376,9 @@ static void __init __test_bitmap_parselist(int is_user)
bmap, ptest.nbits);
time = ktime_get() - time;
set_fs(orig_fs);
- } else {
+ } else
+#endif /* CONFIG_SET_FS */
+ {
time = ktime_get();
err = bitmap_parselist(ptest.in, bmap, ptest.nbits);
time = ktime_get() - time;
@@ -454,6 +457,7 @@ static void __init __test_bitmap_parse(int is_user)
for (i = 0; i < ARRAY_SIZE(parse_tests); i++) {
struct test_bitmap_parselist test = parse_tests[i];
+#ifdef CONFIG_SET_FS
if (is_user) {
size_t len = strlen(test.in);
mm_segment_t orig_fs = get_fs();
@@ -464,7 +468,9 @@ static void __init __test_bitmap_parse(int is_user)
bmap, test.nbits);
time = ktime_get() - time;
set_fs(orig_fs);
- } else {
+ } else
+#endif /* CONFIG_SET_FS */
+ {
size_t len = test.flags & NO_LEN ?
UINT_MAX : strlen(test.in);
time = ktime_get();
--
2.28.0
^ permalink raw reply related
* [PATCH] powerpc/fixmap: Fix the size of the early debug area
From: Christophe Leroy @ 2020-08-17 6:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
Commit ("03fd42d458fb powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when
page size is 256k") reworked the setup of the early debug area and
mistakenly replaced 128 * 1024 by SZ_128.
Change to SZ_128K to restore the original 128 kbytes size of the area.
Fixes: 03fd42d458fb ("powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/fixmap.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h
index 925cf89cbf4b..6bfc87915d5d 100644
--- a/arch/powerpc/include/asm/fixmap.h
+++ b/arch/powerpc/include/asm/fixmap.h
@@ -52,7 +52,7 @@ enum fixed_addresses {
FIX_HOLE,
/* reserve the top 128K for early debugging purposes */
FIX_EARLY_DEBUG_TOP = FIX_HOLE,
- FIX_EARLY_DEBUG_BASE = FIX_EARLY_DEBUG_TOP+(ALIGN(SZ_128, PAGE_SIZE)/PAGE_SIZE)-1,
+ FIX_EARLY_DEBUG_BASE = FIX_EARLY_DEBUG_TOP+(ALIGN(SZ_128K, PAGE_SIZE)/PAGE_SIZE)-1,
#ifdef CONFIG_HIGHMEM
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
--
2.25.0
^ permalink raw reply related
* [PATCH v3] powerpc/numa: Restrict possible nodes based on platform
From: Srikar Dronamraju @ 2020-08-17 5:52 UTC (permalink / raw)
To: Michael Ellerman
Cc: Nathan Lynch, Tyrel Datwyler, Srikar Dronamraju, Bharata B Rao,
linuxppc-dev
As per draft LoPAPR (Revision 2.9_pre7), section B.5.3 "Run Time Abstaction
Services (RTAS) Node at
https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf,
there are 2 device tree property ibm,max-associativity-domains (which
defines the maximum number of domains that the firmware i.e PowerVM can
support) and ibm,current-associativity-domains (which defines the maximum
number of domains that the platform can support). Value of
ibm,max-associativity-domains property is always greater than or equal to
ibm,current-associativity-domains property. If the said property is not
available, use ibm,max-associativity-domain as fallback. In this yet to be
released LoPAPR, ibm,current-associativity-domains is mentioned in page 833
/ B.5.3 which is covered under under "Appendix B. System Binding" section
Currently Powerpc uses ibm,max-associativity-domains property while
setting the possible number of nodes. This is currently set at 32.
However the possible number of nodes for a platform may be significantly
less. Hence set the possible number of nodes based on
ibm,current-associativity-domains property.
Nathan Lynch had raised a valid concern that post LPM, a user could DLPAR
add processors and memory after LPM with "new" associativity properties.
https://lore.kernel.org/linuxppc-dev/871rljfet9.fsf@linux.ibm.com/t/#u
He also pointed out that ibm,max-associativity-domains has the same contents
on all currently available PowerVM systems, unlike
ibm,current-associativity-domains and hence may be better able to handle the
new numa associativity properties.
However with the recent commit dbce45628085 ("powerpc/numa: Limit possible
nodes to within num_possible_nodes"), all new numa associativity properties
are capped to initially set nr_node_ids. Hence this commit should be safe
with any new dlpar add post LPM.
$ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
/proc/device-tree/rtas/ibm,current-associativity-domains
00000005 00000001 00000002 00000002 00000002 00000010
/proc/device-tree/rtas/ibm,max-associativity-domains
00000005 00000001 00000008 00000020 00000020 00000100
$ cat /sys/devices/system/node/possible ##Before patch
0-31
$ cat /sys/devices/system/node/possible ##After patch
0-1
Note the maximum nodes this platform can support is only 2 but the
possible nodes is set to 32.
This is important because lot of kernel and user space code allocate
structures for all possible nodes leading to a lot of memory that is
allocated but not used.
I ran a simple experiment to create and destroy 100 memory cgroups on
boot on a 8 node machine (Power8 Alpine).
Before patch
free -k at boot
total used free shared buff/cache available
Mem: 523498176 4106816 518820608 22272 570752 516606720
Swap: 4194240 0 4194240
free -k after creating 100 memory cgroups
total used free shared buff/cache available
Mem: 523498176 4628416 518246464 22336 623296 516058688
Swap: 4194240 0 4194240
free -k after destroying 100 memory cgroups
total used free shared buff/cache available
Mem: 523498176 4697408 518173760 22400 627008 515987904
Swap: 4194240 0 4194240
After patch
free -k at boot
total used free shared buff/cache available
Mem: 523498176 3969472 518933888 22272 594816 516731776
Swap: 4194240 0 4194240
free -k after creating 100 memory cgroups
total used free shared buff/cache available
Mem: 523498176 4181888 518676096 22208 640192 516496448
Swap: 4194240 0 4194240
free -k after destroying 100 memory cgroups
total used free shared buff/cache available
Mem: 523498176 4232320 518619904 22272 645952 516443264
Swap: 4194240 0 4194240
Observations:
Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v2->v3:
v2: https://lore.kernel.org/linuxppc-dev/20200707140644.7241-1-srikar@linux.vnet.ibm.com/t/#u
Updated commit msg to mention LoPAPR reference section and url details
(Michael Ellerman)
Updated commit msg to discuss about possible new numa associativity properties.
(Nathan Lynch)
Changelog v1->v2:
v1: https://lore.kernel.org/linuxppc-dev/20200706064002.14848-1-srikar@linux.vnet.ibm.com/t/#u
Fallback to ibm,max-associativity-domains if ibm,current-associativity
is not available. Suggested by Tyrel Datwyler.
arch/powerpc/mm/numa.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..fc7b0505bdd8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -896,10 +896,19 @@ static void __init find_possible_nodes(void)
if (!rtas)
return;
- if (of_property_read_u32_index(rtas,
- "ibm,max-associativity-domains",
+ if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
+ min_common_depth, &numnodes)) {
+ /*
+ * ibm,current-associativity-domains is a fairly recent
+ * property. If it doesn't exist, then fallback on
+ * ibm,max-associativity-domains. Current denotes what the
+ * platform can support compared to max which denotes what the
+ * Hypervisor can support.
+ */
+ if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains",
min_common_depth, &numnodes))
- goto out;
+ goto out;
+ }
for (i = 0; i < numnodes; i++) {
if (!node_possible(i))
--
2.18.2
^ permalink raw reply related
* [PATCH v1 3/4] powerpc/process: Remove useless #ifdef CONFIG_SPE
From: Christophe Leroy @ 2020-08-17 5:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu>
cpu_has_feature(CPU_FTR_SPE) returns false when CONFIG_SPE is
not set.
There is no need to enclose the test in an #ifdef CONFIG_SPE.
Remove it.
CPU_FTR_SPE only exists on 32 bits. Define it as 0 on 64 bits.
We have a couple of places like:
#ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE)) {
do_something_that_requires_CONFIG_SPE
} else {
return -EINVAL;
}
#else
return -EINVAL;
#endif
Replace them by a cleaner version:
if (cpu_has_feature(CPU_FTR_SPE)) {
#ifdef CONFIG_SPE
do_something_that_requires_CONFIG_SPE
#endif
} else {
return -EINVAL;
}
When CONFIG_SPE is not set, this resolves to an unconditional
return of -EINVAL
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/cputable.h | 1 +
arch/powerpc/kernel/process.c | 21 +++++++--------------
2 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index fdddb822d564..cc21dbcc6794 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -165,6 +165,7 @@ static inline void cpu_feature_keys_init(void) { }
#else /* CONFIG_PPC32 */
/* Define these to 0 for the sake of tests in common code */
#define CPU_FTR_PPC_LE (0)
+#define CPU_FTR_SPE (0)
#endif
/*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 360415689f8a..7090c99a60d9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -413,10 +413,8 @@ static int __init init_msr_all_available(void)
msr_all_available |= MSR_VEC;
if (cpu_has_feature(CPU_FTR_VSX))
msr_all_available |= MSR_VSX;
-#ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE))
msr_all_available |= MSR_SPE;
-#endif
return 0;
}
@@ -446,10 +444,8 @@ void giveup_all(struct task_struct *tsk)
#endif
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
-#ifdef CONFIG_SPE
if (usermsr & MSR_SPE)
__giveup_spe(tsk);
-#endif
msr_check_and_clear(msr_all_available);
}
@@ -1843,7 +1839,6 @@ int set_fpexc_mode(struct task_struct *tsk, unsigned int val)
* fpexc_mode. fpexc_mode is also used for setting FP exception
* mode (asyn, precise, disabled) for 'Classic' FP. */
if (val & PR_FP_EXC_SW_ENABLE) {
-#ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE)) {
/*
* When the sticky exception bits are set
@@ -1857,16 +1852,15 @@ int set_fpexc_mode(struct task_struct *tsk, unsigned int val)
* anyway to restore the prctl settings from
* the saved environment.
*/
+#ifdef CONFIG_SPE
tsk->thread.spefscr_last = mfspr(SPRN_SPEFSCR);
tsk->thread.fpexc_mode = val &
(PR_FP_EXC_SW_ENABLE | PR_FP_ALL_EXCEPT);
+#endif
return 0;
} else {
return -EINVAL;
}
-#else
- return -EINVAL;
-#endif
}
/* on a CONFIG_SPE this does not hurt us. The bits that
@@ -1887,8 +1881,7 @@ int get_fpexc_mode(struct task_struct *tsk, unsigned long adr)
{
unsigned int val;
- if (tsk->thread.fpexc_mode & PR_FP_EXC_SW_ENABLE)
-#ifdef CONFIG_SPE
+ if (tsk->thread.fpexc_mode & PR_FP_EXC_SW_ENABLE) {
if (cpu_has_feature(CPU_FTR_SPE)) {
/*
* When the sticky exception bits are set
@@ -1902,15 +1895,15 @@ int get_fpexc_mode(struct task_struct *tsk, unsigned long adr)
* anyway to restore the prctl settings from
* the saved environment.
*/
+#ifdef CONFIG_SPE
tsk->thread.spefscr_last = mfspr(SPRN_SPEFSCR);
val = tsk->thread.fpexc_mode;
+#endif
} else
return -EINVAL;
-#else
- return -EINVAL;
-#endif
- else
+ } else {
val = __unpack_fe01(tsk->thread.fpexc_mode);
+ }
return put_user(val, (unsigned int __user *) adr);
}
--
2.25.0
^ permalink raw reply related
* [PATCH v1 4/4] powerpc/process: Remove useless #ifdef CONFIG_PPC_FPU
From: Christophe Leroy @ 2020-08-17 5:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu>
Add a stub for __giveup_fpu() when CONFIG_PPC_FPU is
not selected, as done for CONFIG_SPE and CONFIG_ALTIVEC.
This allows to remove some #ifdef CONFIG_PPC_FPU.
Also change one to IS_ENABLED().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 7090c99a60d9..8133a6ce3242 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -229,6 +229,8 @@ void enable_kernel_fp(void)
}
}
EXPORT_SYMBOL(enable_kernel_fp);
+#else
+static inline void __giveup_fpu(struct task_struct *tsk) { }
#endif /* CONFIG_PPC_FPU */
#ifdef CONFIG_ALTIVEC
@@ -406,9 +408,8 @@ static unsigned long msr_all_available;
static int __init init_msr_all_available(void)
{
-#ifdef CONFIG_PPC_FPU
- msr_all_available |= MSR_FP;
-#endif
+ if (IS_ENABLED(CONFIG_PPC_FPU))
+ msr_all_available |= MSR_FP;
if (cpu_has_feature(CPU_FTR_ALTIVEC))
msr_all_available |= MSR_VEC;
if (cpu_has_feature(CPU_FTR_VSX))
@@ -438,10 +439,8 @@ void giveup_all(struct task_struct *tsk)
WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC)));
-#ifdef CONFIG_PPC_FPU
if (usermsr & MSR_FP)
__giveup_fpu(tsk);
-#endif
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
if (usermsr & MSR_SPE)
--
2.25.0
^ permalink raw reply related
* [PATCH v1 2/4] powerpc/process: Remove useless #ifdef CONFIG_ALTIVEC
From: Christophe Leroy @ 2020-08-17 5:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu>
cpu_has_feature(CPU_FTR_ALTIVEC) returns false when CONFIG_ALTIVEC is
not set.
There is no need to enclose the test in an #ifdef CONFIG_ALTIVEC.
Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b64d71188715..360415689f8a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -409,10 +409,8 @@ static int __init init_msr_all_available(void)
#ifdef CONFIG_PPC_FPU
msr_all_available |= MSR_FP;
#endif
-#ifdef CONFIG_ALTIVEC
if (cpu_has_feature(CPU_FTR_ALTIVEC))
msr_all_available |= MSR_VEC;
-#endif
if (cpu_has_feature(CPU_FTR_VSX))
msr_all_available |= MSR_VSX;
#ifdef CONFIG_SPE
@@ -446,10 +444,8 @@ void giveup_all(struct task_struct *tsk)
if (usermsr & MSR_FP)
__giveup_fpu(tsk);
#endif
-#ifdef CONFIG_ALTIVEC
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
-#endif
#ifdef CONFIG_SPE
if (usermsr & MSR_SPE)
__giveup_spe(tsk);
--
2.25.0
^ permalink raw reply related
* [PATCH v1 1/4] powerpc/process: Remove useless #ifdef CONFIG_VSX
From: Christophe Leroy @ 2020-08-17 5:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
cpu_has_feature(CPU_FTR_VSX) returns false when CONFIG_VSX is
not set.
There is no need to enclose the test in an #ifdef CONFIG_VSX.
Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a17d0746d55f..b64d71188715 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -124,10 +124,8 @@ unsigned long notrace msr_check_and_set(unsigned long bits)
newmsr = oldmsr | bits;
-#ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX) && (bits & MSR_FP))
newmsr |= MSR_VSX;
-#endif
if (oldmsr != newmsr)
mtmsr_isync(newmsr);
@@ -144,10 +142,8 @@ void notrace __msr_check_and_clear(unsigned long bits)
newmsr = oldmsr & ~bits;
-#ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX) && (bits & MSR_FP))
newmsr &= ~MSR_VSX;
-#endif
if (oldmsr != newmsr)
mtmsr_isync(newmsr);
@@ -162,10 +158,8 @@ static void __giveup_fpu(struct task_struct *tsk)
save_fpu(tsk);
msr = tsk->thread.regs->msr;
msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
-#ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX))
msr &= ~MSR_VSX;
-#endif
tsk->thread.regs->msr = msr;
}
@@ -245,10 +239,8 @@ static void __giveup_altivec(struct task_struct *tsk)
save_altivec(tsk);
msr = tsk->thread.regs->msr;
msr &= ~MSR_VEC;
-#ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX))
msr &= ~MSR_VSX;
-#endif
tsk->thread.regs->msr = msr;
}
@@ -421,10 +413,8 @@ static int __init init_msr_all_available(void)
if (cpu_has_feature(CPU_FTR_ALTIVEC))
msr_all_available |= MSR_VEC;
#endif
-#ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX))
msr_all_available |= MSR_VSX;
-#endif
#ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE))
msr_all_available |= MSR_SPE;
@@ -509,19 +499,18 @@ static bool should_restore_altivec(void) { return false; }
static void do_restore_altivec(void) { }
#endif /* CONFIG_ALTIVEC */
-#ifdef CONFIG_VSX
static bool should_restore_vsx(void)
{
if (cpu_has_feature(CPU_FTR_VSX))
return true;
return false;
}
+#ifdef CONFIG_VSX
static void do_restore_vsx(void)
{
current->thread.used_vsr = 1;
}
#else
-static bool should_restore_vsx(void) { return false; }
static void do_restore_vsx(void) { }
#endif /* CONFIG_VSX */
--
2.25.0
^ permalink raw reply related
* [PATCH v1] powerpc/process: Tag an #endif to help locate the matching #ifdef.
From: Christophe Leroy @ 2020-08-17 5:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
That #endif is more than 100 lines after the matching #ifdef,
and there are several #ifdef/#else/#endif inbetween.
Tag it as /* CONFIG_PPC_BOOK3S_64 */ to help locate the
matching #ifdef.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..ffbe79960c73 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -575,7 +575,7 @@ void notrace restore_math(struct pt_regs *regs)
regs->msr |= new_msr;
}
}
-#endif
+#endif /* CONFIG_PPC_BOOK3S_64 */
static void save_all(struct task_struct *tsk)
{
--
2.25.0
^ permalink raw reply related
* [PATCH v1] powerpc/process: Replace an #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) by IS_ENABLED()
From: Christophe Leroy @ 2020-08-17 5:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
The #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) encloses some
printk which can be compiled in all cases.
Replace by IS_ENABLED().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fd3e11663613..087711aa0126 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1446,12 +1446,13 @@ void show_regs(struct pt_regs * regs)
trap = TRAP(regs);
if (!trap_is_syscall(regs) && cpu_has_feature(CPU_FTR_CFAR))
pr_cont("CFAR: "REG" ", regs->orig_gpr3);
- if (trap == 0x200 || trap == 0x300 || trap == 0x600)
-#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
- pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
-#else
- pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
-#endif
+ if (trap == 0x200 || trap == 0x300 || trap == 0x600) {
+ if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
+ pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
+ else
+ pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
+ }
+
#ifdef CONFIG_PPC64
pr_cont("IRQMASK: %lx ", regs->softe);
#endif
--
2.25.0
^ permalink raw reply related
* [PATCH v1] powerpc/process: Replace an #ifdef CONFIG_PPC_BOOK3S_64 by IS_ENABLED()
From: Christophe Leroy @ 2020-08-17 5:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
This #ifdef CONFIG_PPC_BOOK3S_64 calls preload_new_slb_context()
when radix is not enabled.
radix_enabled() is always defined, and the prototype for
preload_new_slb_context() is always present, so the #ifdef
is unneeded.
Replace it by IS_ENABLED().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 987bc8e73d5e..a17d0746d55f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1725,10 +1725,8 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
#ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
-#ifdef CONFIG_PPC_BOOK3S_64
- if (!radix_enabled())
+ if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
preload_new_slb_context(start, sp);
-#endif
#endif
/*
--
2.25.0
^ permalink raw reply related
* [PATCH v1] powerpc/process: Remove unnecessary #ifdef CONFIG_FUNCTION_GRAPH_TRACER
From: Christophe Leroy @ 2020-08-17 5:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
ftrace_graph_ret_addr() is always defined and returns 'ip' when
CONFIG_FUNCTION GRAPH_TRACER is not set.
So the #ifdef is not needed, remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ffbe79960c73..e86e28c28259 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2096,10 +2096,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
unsigned long sp, ip, lr, newsp;
int count = 0;
int firstframe = 1;
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
unsigned long ret_addr;
int ftrace_idx = 0;
-#endif
if (tsk == NULL)
tsk = current;
@@ -2127,12 +2125,10 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
if (!firstframe || ip != lr) {
printk("%s["REG"] ["REG"] %pS",
loglvl, sp, ip, (void *)ip);
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
ret_addr = ftrace_graph_ret_addr(current,
&ftrace_idx, ip, stack);
if (ret_addr != ip)
pr_cont(" (%pS)", (void *)ret_addr);
-#endif
if (firstframe)
pr_cont(" (unreliable)");
pr_cont("\n");
--
2.25.0
^ permalink raw reply related
* [PATCH v1] powerpc/process: Replace #ifdef CONFIG_KALLSYMS by IS_ENABLED()
From: Christophe Leroy @ 2020-08-17 5:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
The #ifdef CONFIG_KALLSYMS encloses some printk which can
compile in all cases.
Replace by IS_ENABLED().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/process.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 087711aa0126..987bc8e73d5e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1469,14 +1469,14 @@ void show_regs(struct pt_regs * regs)
break;
}
pr_cont("\n");
-#ifdef CONFIG_KALLSYMS
/*
* Lookup NIP late so we have the best change of getting the
* above info out without failing
*/
- printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
- printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
-#endif
+ if (IS_ENABLED(CONFIG_KALLSYMS)) {
+ printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
+ printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
+ }
show_stack(current, (unsigned long *) regs->gpr[1], KERN_DEFAULT);
if (!user_mode(regs))
show_instructions(regs);
--
2.25.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox