* [git pull] uaccess-related bits of vfs.git [not found] <CA+55aFww=ZG0wPad3ELg+ibScr0eWuSxvhGLFaF+PO9kfkSkdw@mail.gmail.com> @ 2017-05-01 3:45 ` Al Viro 2017-05-13 1:00 ` Linus Torvalds 0 siblings, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-01 3:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC. One trivial conflict in arch/sparc/Kconfig - this series takes HAVE_ARCH_HARDENED_USERCOPY out, and mainline has renaming of PROVE_LOCKING_SMALL to LOCKDEP_SMALL done in the next line. Please, pull. There's other stuff in vfs.git, but this is the most massive one this cycle, so other pull requests will be smaller. I apologize for the topology of this one - it's a common stem + per-architecture branches + merge joining those + short tail after that. Ugly, but I wanted to make it less painful in terms of conflicts with arch trees... The following changes since commit b884a190afcecdbef34ca508ea5ee88bb7c77861: metag/usercopy: Add missing fixups (2017-04-05 15:25:07 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.uaccess for you to fetch changes up to 2fefc97b2180518bac923fba3f79fdca1f41dc15: HAVE_ARCH_HARDENED_USERCOPY is unconditional now (2017-04-26 12:11:06 -0400) ---------------------------------------------------------------- Al Viro (99): uaccess: move VERIFY_{READ,WRITE} definitions to linux/uaccess.h uaccess: drop duplicate includes from asm/uaccess.h uaccess: drop pointless ifdefs add asm-generic/extable.h new helper: uaccess_kernel() asm-generic/uaccess.h: don't mess with __copy_{to,from}_user asm-generic: zero in __get_user(), not __get_user_fn() generic ...copy_..._user primitives alpha: switch __copy_user() and __do_clean_user() to normal calling conventions alpha: add asm/extable.h alpha: get rid of 'segment' argument of __{get,put}_user_check() alpha: don't bother with __access_ok() in traps.c alpha: kill the 'segment' argument of __access_ok() alpha: add a helper for emitting exception table entries alpha: switch to RAW_COPY_USER arc: get rid of unused declaration arm: switch to generic extable.h arm: switch to RAW_COPY_USER arm64: add extable.h avr32: switch to generic extable.h arm64: switch to RAW_COPY_USER avr32: switch to RAW_COPY_USER blackfin: switch to generic extable.h bfin: switch to RAW_COPY_USER c6x: remove duplicate definition of __access_ok c6x: switch to RAW_COPY_USER cris: switch to generic extable.h cris: don't rely upon __copy_user_zeroing() zeroing the tail cris: get rid of zeroing in __asm_copy_from_user_N for N > 4 cris: get rid of zeroing cris: rename __copy_user_zeroing to __copy_user_in cris: switch to RAW_COPY_USER frv: switch to use of fixup_exception() frv: switch to RAW_COPY_USER 8300: switch to RAW_COPY_USER m32r: switch to generic extable.h m32r: get rid of zeroing m68k: switch to generic extable.h m68k: get rid of zeroing m68k: switch to RAW_COPY_USER metag: switch to generic extable.h metag: kill verify_area() microblaze: switch to generic extable.h mn10300: switch to generic extable.h mn10300: get rid of zeroing mn10300: switch to RAW_COPY_USER nios2: switch to generic extable.h nios2: switch to RAW_COPY_USER openrisc: switch to generic extable.h openrisc: switch to RAW_COPY_USER powerpc: switch to extable.h s390: switch to extable.h score: switch to generic extable.h score: it's "VERIFY_WRITE", not "VERFITY_WRITE"... score: switch to RAW_COPY_USER sh: switch to extable.h sh: switch to RAW_COPY_USER sparc32: kill __ret_efault() tile: switch to generic extable.h tile: get rid of zeroing, switch to RAW_COPY_USER um: switch to RAW_COPY_USER amd64: get rid of zeroing unicore32: get rid of zeroing and switch to RAW_COPY_USER kill __copy_from_user_nocache() xtensa: switch to generic extable.h xtensa: get rid of zeroing, use RAW_COPY_USER arc: switch to RAW_COPY_USER x86: don't wank with magical size in __copy_in_user() x86: switch to RAW_COPY_USER s390: get rid of zeroing, switch to RAW_COPY_USER Merge branch 'parisc-4.11-3' of git://git.kernel.org/.../deller/parisc-linux into uaccess.parisc parisc: switch to RAW_COPY_USER sparc: switch to RAW_COPY_USER Merge branch 'fixes' of git://git.kernel.org/.../jhogan/metag into uaccess.metag Merge commit 'fc69910f329d' into uaccess.mips mips: sanitize __access_ok() mips: consolidate __invoke_... wrappers mips: clean and reorder the forest of macros... mips: make copy_from_user() zero tail explicitly mips: get rid of tail-zeroing in primitives mips: switch to RAW_COPY_USER don't open-code kernel_setsockopt() alpha: fix stack smashing in old_adjtimex(2) esas2r: don't open-code memdup_user() Merge commit 'a7d2475af7aedcb9b5c6343989a8bfadbf84429b' into uaccess.powerpc powerpc: get rid of zeroing, switch to RAW_COPY_USER Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64 ia64: add extable.h ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: sanitize __access_ok() ia64: get rid of copy_in_user() get rid of padding, switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER m32r: switch to RAW_COPY_USER Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now HAVE_ARCH_HARDENED_USERCOPY is unconditional now James Hogan (1): metag/usercopy: Switch to RAW_COPY_USER Max Filippov (1): xtensa: fix prefetch in the raw_copy_to_user Vineet Gupta (1): ARC: uaccess: enable INLINE_COPY_{TO,FROM}_USER ... arch/alpha/include/asm/extable.h | 55 ++++ arch/alpha/include/asm/futex.h | 16 +- arch/alpha/include/asm/uaccess.h | 305 +++++--------------- arch/alpha/kernel/traps.c | 152 +++------- arch/alpha/lib/clear_user.S | 66 ++--- arch/alpha/lib/copy_user.S | 82 +++--- arch/alpha/lib/csum_partial_copy.c | 10 +- arch/alpha/lib/ev6-clear_user.S | 84 +++--- arch/alpha/lib/ev6-copy_user.S | 104 +++---- arch/arc/include/asm/Kbuild | 1 + arch/arc/include/asm/uaccess.h | 25 +- arch/arc/mm/extable.c | 14 - arch/arm/Kconfig | 1 - arch/arm/include/asm/Kbuild | 1 + arch/arm/include/asm/uaccess.h | 87 ++---- arch/arm/lib/uaccess_with_memcpy.c | 4 +- arch/arm64/Kconfig | 1 - arch/arm64/include/asm/extable.h | 25 ++ arch/arm64/include/asm/uaccess.h | 83 +----- arch/arm64/kernel/arm64ksyms.c | 2 +- arch/arm64/lib/copy_in_user.S | 4 +- arch/avr32/include/asm/Kbuild | 1 + arch/avr32/include/asm/uaccess.h | 39 +-- arch/avr32/kernel/avr32_ksyms.c | 2 - arch/avr32/lib/copy_user.S | 15 - arch/blackfin/include/asm/Kbuild | 1 + arch/blackfin/include/asm/uaccess.h | 47 +--- arch/blackfin/kernel/process.c | 2 +- arch/c6x/include/asm/Kbuild | 1 + arch/c6x/include/asm/uaccess.h | 19 +- arch/c6x/kernel/sys_c6x.c | 2 +- arch/cris/arch-v10/lib/usercopy.c | 31 +-- arch/cris/arch-v32/lib/usercopy.c | 30 +- arch/cris/include/arch-v10/arch/uaccess.h | 46 ++- arch/cris/include/arch-v32/arch/uaccess.h | 54 ++-- arch/cris/include/asm/Kbuild | 1 + arch/cris/include/asm/uaccess.h | 77 +---- arch/frv/include/asm/Kbuild | 1 + arch/frv/include/asm/uaccess.h | 84 ++---- arch/frv/kernel/traps.c | 7 +- arch/frv/mm/extable.c | 27 +- arch/frv/mm/fault.c | 6 +- arch/h8300/include/asm/Kbuild | 2 +- arch/h8300/include/asm/uaccess.h | 54 ++++ arch/hexagon/include/asm/Kbuild | 1 + arch/hexagon/include/asm/uaccess.h | 26 +- arch/hexagon/kernel/hexagon_ksyms.c | 4 +- arch/hexagon/mm/copy_from_user.S | 2 +- arch/hexagon/mm/copy_to_user.S | 2 +- arch/ia64/Kconfig | 1 - arch/ia64/include/asm/extable.h | 11 + arch/ia64/include/asm/uaccess.h | 102 ++----- arch/ia64/lib/memcpy_mck.S | 13 +- arch/ia64/mm/extable.c | 5 +- arch/m32r/include/asm/Kbuild | 1 + arch/m32r/include/asm/uaccess.h | 189 +------------ arch/m32r/kernel/m32r_ksyms.c | 2 - arch/m32r/lib/usercopy.c | 21 -- arch/m68k/include/asm/Kbuild | 1 + arch/m68k/include/asm/processor.h | 10 - arch/m68k/include/asm/uaccess.h | 1 + arch/m68k/include/asm/uaccess_mm.h | 103 ++++--- arch/m68k/include/asm/uaccess_no.h | 43 +-- arch/m68k/kernel/signal.c | 2 +- arch/m68k/kernel/traps.c | 9 +- arch/m68k/lib/uaccess.c | 12 +- arch/m68k/mm/fault.c | 2 +- arch/metag/include/asm/Kbuild | 1 + arch/metag/include/asm/uaccess.h | 60 +--- arch/metag/lib/usercopy.c | 6 +- arch/microblaze/include/asm/Kbuild | 1 + arch/microblaze/include/asm/uaccess.h | 64 +---- arch/mips/Kconfig | 1 - arch/mips/cavium-octeon/octeon-memcpy.S | 31 +-- arch/mips/include/asm/checksum.h | 4 +- arch/mips/include/asm/r4kcache.h | 4 +- arch/mips/include/asm/uaccess.h | 449 ++++-------------------------- arch/mips/kernel/mips-r2-to-r6-emul.c | 24 +- arch/mips/kernel/syscall.c | 2 +- arch/mips/kernel/unaligned.c | 10 +- arch/mips/lib/memcpy.S | 49 ---- arch/mips/oprofile/backtrace.c | 2 +- arch/mn10300/include/asm/Kbuild | 1 + arch/mn10300/include/asm/uaccess.h | 187 +------------ arch/mn10300/kernel/mn10300_ksyms.c | 2 - arch/mn10300/lib/usercopy.c | 18 -- arch/nios2/include/asm/Kbuild | 1 + arch/nios2/include/asm/uaccess.h | 55 +--- arch/nios2/mm/uaccess.c | 16 +- arch/openrisc/include/asm/Kbuild | 1 + arch/openrisc/include/asm/uaccess.h | 53 +--- arch/parisc/Kconfig | 1 - arch/parisc/include/asm/futex.h | 2 +- arch/parisc/include/asm/uaccess.h | 69 +---- arch/parisc/lib/memcpy.c | 16 +- arch/powerpc/Kconfig | 1 - arch/powerpc/include/asm/extable.h | 29 ++ arch/powerpc/include/asm/uaccess.h | 96 +------ arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/copy_32.S | 14 - arch/powerpc/lib/copyuser_64.S | 35 +-- arch/powerpc/lib/usercopy_64.c | 41 --- arch/s390/Kconfig | 1 - arch/s390/include/asm/extable.h | 28 ++ arch/s390/include/asm/uaccess.h | 153 +--------- arch/s390/lib/uaccess.c | 68 ++--- arch/score/include/asm/Kbuild | 1 + arch/score/include/asm/extable.h | 11 - arch/score/include/asm/uaccess.h | 59 +--- arch/sh/include/asm/extable.h | 10 + arch/sh/include/asm/uaccess.h | 64 +---- arch/sparc/Kconfig | 1 - arch/sparc/include/asm/uaccess.h | 2 +- arch/sparc/include/asm/uaccess_32.h | 44 +-- arch/sparc/include/asm/uaccess_64.h | 44 +-- arch/sparc/kernel/head_32.S | 7 - arch/sparc/lib/GENcopy_from_user.S | 2 +- arch/sparc/lib/GENcopy_to_user.S | 2 +- arch/sparc/lib/GENpatch.S | 4 +- arch/sparc/lib/NG2copy_from_user.S | 2 +- arch/sparc/lib/NG2copy_to_user.S | 2 +- arch/sparc/lib/NG2patch.S | 4 +- arch/sparc/lib/NG4copy_from_user.S | 2 +- arch/sparc/lib/NG4copy_to_user.S | 2 +- arch/sparc/lib/NG4patch.S | 4 +- arch/sparc/lib/NGcopy_from_user.S | 2 +- arch/sparc/lib/NGcopy_to_user.S | 2 +- arch/sparc/lib/NGpatch.S | 4 +- arch/sparc/lib/U1copy_from_user.S | 4 +- arch/sparc/lib/U1copy_to_user.S | 4 +- arch/sparc/lib/U3copy_to_user.S | 2 +- arch/sparc/lib/U3patch.S | 4 +- arch/sparc/lib/copy_in_user.S | 6 +- arch/sparc/lib/copy_user.S | 16 +- arch/tile/include/asm/Kbuild | 1 + arch/tile/include/asm/uaccess.h | 166 +---------- arch/tile/lib/exports.c | 7 +- arch/tile/lib/memcpy_32.S | 41 +-- arch/tile/lib/memcpy_user_64.c | 15 +- arch/um/include/asm/Kbuild | 1 + arch/um/include/asm/uaccess.h | 13 +- arch/um/kernel/skas/uaccess.c | 18 +- arch/unicore32/include/asm/Kbuild | 1 + arch/unicore32/include/asm/uaccess.h | 15 +- arch/unicore32/kernel/ksyms.c | 4 +- arch/unicore32/kernel/process.c | 2 +- arch/unicore32/lib/copy_from_user.S | 16 +- arch/unicore32/lib/copy_to_user.S | 6 +- arch/x86/Kconfig | 1 - arch/x86/include/asm/uaccess.h | 70 +---- arch/x86/include/asm/uaccess_32.h | 127 +-------- arch/x86/include/asm/uaccess_64.h | 128 +-------- arch/x86/lib/usercopy.c | 54 +--- arch/x86/lib/usercopy_32.c | 288 +------------------ arch/x86/lib/usercopy_64.c | 13 - arch/xtensa/include/asm/Kbuild | 1 + arch/xtensa/include/asm/asm-uaccess.h | 3 - arch/xtensa/include/asm/uaccess.h | 67 +---- arch/xtensa/lib/usercopy.S | 116 ++++---- block/bsg.c | 2 +- drivers/scsi/esas2r/esas2r_ioctl.c | 25 +- drivers/scsi/sg.c | 2 +- fs/ocfs2/cluster/tcp.c | 25 +- include/asm-generic/extable.h | 26 ++ include/asm-generic/uaccess.h | 135 +-------- include/linux/uaccess.h | 197 ++++++++++++- include/rdma/ib.h | 2 +- kernel/trace/bpf_trace.c | 2 +- lib/Makefile | 2 +- lib/iov_iter.c | 6 +- lib/usercopy.c | 26 ++ mm/memory.c | 2 +- net/rds/tcp.c | 5 +- net/rds/tcp_send.c | 8 +- security/Kconfig | 9 - security/tomoyo/network.c | 2 +- 176 files changed, 1458 insertions(+), 4335 deletions(-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-01 3:45 ` [git pull] uaccess-related bits of vfs.git Al Viro @ 2017-05-13 1:00 ` Linus Torvalds 2017-05-13 6:57 ` Al Viro 0 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 1:00 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 5157 bytes --] So I should have asked earlier, but I was feeling rather busy during the early merge window.. On Sun, Apr 30, 2017 at 8:45 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > uaccess unification pile. It's _not_ the end of uaccess work, but > the next batch of that will go into the next cycle. This one mostly takes > copy_from_user() and friends out of arch/* and gets the zero-padding behaviour > in sync for all architectures. > Dealing with the nocache/writethrough mess is for the next cycle; > fortunately, that's x86-only. So I'm actually more interested to hear if you have any pending work on cleaning up the __get/put_user() mess we have. Is that on your radar at all? In particular, right now, both __get_user() and __put_user() are nasty and broken interfaces. It *used* to be that they were designed to just generate a single instruction. That's not what they currently do at all, due to the whole SMAP/PAN on x86 and arm. For example, on x86, right now a single __put_user() call ends up generating something like #APP # 58 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xcb 6651: .popsection # 0 "" 2 #NO_APP xorl %eax, %eax # __pu_err #APP # 269 "fs/readdir.c" 1 1: movq %rcx,8(%rdi) # offset, MEM[(struct __large_struct *)_16] 2: .section .fixup,"ax" 3: mov $-14,%eax #, __pu_err jmp 2b .previous .pushsection "__ex_table","a" .balign 4 .long (1b) - . .long (3b) - . .long (ex_handler_default) - . .popsection # 0 "" 2 # 52 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xca 6651: .popsection # 0 "" 2 #NO_APP and while much of that is out-of-line and in different sections, it basically means that it's insane how we inline these things. Yes, yes, the inlined part of the above horror-story ends up being just clac xor %eax,%eax mov %rcx,0x8(%rdi) stac and everything else is just boiler-plate for the alt-instruction handling and the exception handling. But even that small thing is rather debatable from a performance angle: the actual cost of __put_user() these days is not the function call, but the clac/stac (on modern x86) and the PAN bit games (on arm). So I'm actually inclined to just say "We should make __get_user/__put_user() just be aliases for the normal get_user/put_user() model". The *correct* way to do fast user space accesses when you hoist error checking outside is to use if (!access_ok(..)) goto efault; user_access_begin(); .. ... loop over or otherwise do multiple "raw" accessess ... unsafe_get/put_user(c, ptr, got_fault); unsafe_get/put_user(c, ptr, got_fault); ... user_access_end(); return 0; got_fault: user_access_end(); efault: return -EFAULT; which is more boilerplate, but ends up generating much better code. And for "unsafe_put_user()" in particular, we actually could teach gcc to use "asm goto" to really improve code generation. I have patches for that if you want to look. I'm attaching an example patch for "filldir()" that does that modern thing. It almost had the right form as-is anyway, and under some loads (eg "find") filldir actually shows up in profiles too.\ And just from a code generation standpoint, look at what the attached patch does: [torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat readdir.s | 548 ++++++++++++++++++++++---------------------------------------- 1 file changed, 201 insertions(+), 347 deletions(-) just from getting rid of that crazy repeated SMAP overhead. Yeah, yeah, doing diffstat on generated assembly files is all kinds of crazy, but it's an example of what kind of odd garbage we currently generate. But the *first* thing I'd like to do would be to just get rid of __get_user/__put_user as a thing. It really does generate nasty code, and we might as well just make it do that function call. Because what we do now isn't right. If we care about performance, the "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And if you don't care about performance, you should use the regular xyz_user() functions that do an ok job by just calling the right-sized helper function instead of inlining the crud. Hmm? Linus [-- Attachment #2: patch.diff --] [-- Type: text/plain, Size: 1378 bytes --] fs/readdir.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/readdir.c b/fs/readdir.c index 9d0212c374d6..03324f54c0e9 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -184,25 +184,27 @@ static int filldir(struct dir_context *ctx, const char *name, int namlen, if (dirent) { if (signal_pending(current)) return -EINTR; - if (__put_user(offset, &dirent->d_off)) - goto efault; } + + user_access_begin(); + if (dirent) + unsafe_put_user(offset, &dirent->d_off, efault_end); dirent = buf->current_dir; - if (__put_user(d_ino, &dirent->d_ino)) - goto efault; - if (__put_user(reclen, &dirent->d_reclen)) - goto efault; + unsafe_put_user(d_ino, &dirent->d_ino, efault_end); + unsafe_put_user(reclen, &dirent->d_reclen, efault_end); + unsafe_put_user(0, dirent->d_name + namlen, efault_end); + unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end); + user_access_end(); + if (copy_to_user(dirent->d_name, name, namlen)) goto efault; - if (__put_user(0, dirent->d_name + namlen)) - goto efault; - if (__put_user(d_type, (char __user *) dirent + reclen - 1)) - goto efault; buf->previous = dirent; dirent = (void __user *)dirent + reclen; buf->current_dir = dirent; buf->count -= reclen; return 0; +efault_end: + user_access_end(); efault: buf->error = -EFAULT; return -EFAULT; ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 1:00 ` Linus Torvalds @ 2017-05-13 6:57 ` Al Viro 2017-05-13 12:05 ` Adam Borowski 2017-05-13 16:15 ` Linus Torvalds 0 siblings, 2 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 6:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > So I should have asked earlier, but I was feeling rather busy during > the early merge window.. > So I'm actually more interested to hear if you have any pending work > on cleaning up the __get/put_user() mess we have. Is that on your > radar at all? Yes, it is. > In particular, right now, both __get_user() and __put_user() are nasty > and broken interfaces. > > It *used* to be that they were designed to just generate a single > instruction. That's not what they currently do at all, due to the > whole SMAP/PAN on x86 and arm. > > For example, on x86, right now a single __put_user() call ends up > generating something like [snip] > But even that small thing is rather debatable from a performance > angle: the actual cost of __put_user() these days is not the function > call, but the clac/stac (on modern x86) and the PAN bit games (on > arm). > > So I'm actually inclined to just say "We should make > __get_user/__put_user() just be aliases for the normal > get_user/put_user() model". > which is more boilerplate, but ends up generating much better code. > And for "unsafe_put_user()" in particular, we actually could teach gcc > to use "asm goto" to really improve code generation. I have patches > for that if you want to look. > > I'm attaching an example patch for "filldir()" that does that modern > thing. It almost had the right form as-is anyway, and under some loads > (eg "find") filldir actually shows up in profiles too.\ > But the *first* thing I'd like to do would be to just get rid of > __get_user/__put_user as a thing. It really does generate nasty code, > and we might as well just make it do that function call. > > Because what we do now isn't right. If we care about performance, the > "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And > if you don't care about performance, you should use the regular > xyz_user() functions that do an ok job by just calling the right-sized > helper function instead of inlining the crud. > > Hmm? First, some stats: there's a thousand-odd callers of __get_user(). Out of those, about 70% are in arch/, mostly in sigframe-related code. Take a look at the output of git grep -n -w __get_user|grep -v '^arch' | sed -e 's/:.*//'|uniq -c|sort -nr 55 drivers/gpu/drm/drm_ioc32.c 43 drivers/staging/comedi/comedi_compat32.c 35 kernel/compat.c 27 net/compat.c 27 block/compat_ioctl.c 15 drivers/usb/core/devio.c 13 drivers/char/ipmi/ipmi_devintf.c 11 kernel/signal.c 10 fs/fcntl.c 8 ipc/compat.c 8 drivers/video/fbdev/sbuslib.c 7 net/socket.c 7 drivers/gpu/drm/mga/mga_ioc32.c 6 fs/select.c 6 drivers/tty/vt/vt_ioctl.c 5 drivers/tty/vt/selection.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 4 drivers/tty/vt/consolemap.c 3 sound/oss/sys_timer.c 3 drivers/media/usb/uvc/uvc_v4l2.c 3 drivers/macintosh/ans-lcd.c and then we have a smallish bunch of files with one or two callers. For __put_user() we have ~1800 callers total, ~1100 of them in arch/* and the rest goes like this: 73 drivers/gpu/drm/drm_ioc32.c 66 ipc/compat.c 58 drivers/gpu/drm/radeon/radeon_ioc32.c 55 block/compat_ioctl.c 52 kernel/compat.c 49 kernel/signal.c 43 drivers/staging/comedi/comedi_compat32.c 31 drivers/gpu/drm/r128/r128_ioc32.c 30 drivers/gpu/drm/mga/mga_ioc32.c 27 fs/signalfd.c 25 fs/readdir.c 24 fs/statfs.c 19 kernel/sys.c 17 net/compat.c 11 drivers/scsi/sg.c 10 fs/fcntl.c 8 sound/core/pcm_native.c 8 fs/binfmt_flat.c 7 fs/binfmt_elf_fdpic.c 6 net/socket.c 6 drivers/char/ipmi/ipmi_devintf.c 5 sound/oss/sys_timer.c 5 fs/binfmt_elf.c 5 drivers/video/fbdev/sbuslib.c 5 drivers/tty/vt/consolemap.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 3 fs/select.c 3 drivers/tty/vt/vt.c ... IOW, we have * most of users in arch/* (heavily dominated by signal-related code, both loads and stores). Those need careful massage; maybe unsafe-based solution, maybe something else, but it's obviously per-architecture work and these paths are sensitive. * few places outside of arch/* where these are abused; absolute majority are in ioctl compat code and they are _bad_. Bad on x86, bad on s390, etc. I'm not sure if switch to unsafe is the right solution for those, actually - depends on how we end up dealing with compat ioctls of that sort. Might be better to do bulk copy to/from userland, combined with conversion from 32bit to native kernel-side and passing pointers to kernel objects to functions doing actual work. Really depends upon the details. * some places in fairly common codepaths where __get_user() and __put_user() are being played with. And I certainly agree that they are not good. But IMO the first step is not to convert __get_user()/__put_user() to aliases of get_user()/put_user() - it's to get rid of their infestations outside of arch/*. They are concentrated in relatively few places, so we can deal with those individually and then just fucking ban their use outside of arch/*. That's easy enough to watch for - simple git grep will do, so anything creeping back will be immediately spotted. In -next, with subsequent explanation of the reasons Not To Do That Or Else(tm) to the people responsible. As for fs/readdir.c... 4 back-to-back stores like if (__put_user(ino, &dirent->d_ino)) goto efault; if (__put_user(0, &dirent->d_off)) goto efault; if (__put_user(reclen, &dirent->d_reclen)) goto efault; if (__put_user(d_type, &dirent->d_type)) goto efault; might be asking for copy_to_user(). Maybe that one is too small for that to be a win (on s390 it almost certainly would be a win, judging by what Martin and Heiko are saying); fs/statfs.c ones almost certainly are better off with 'convert, then copy_to_user()' approach. Another couple of places worth looking into are copy_siginfo_to_user() and signalfd_copyinfo(). Maybe unsafe is the best approach there as well, maybe not, but it's in the 'large enough for copy_to_user() be interesting, not too large for auto variable' range. I certainly want that shit gone from the common paths, no arguments here, but I want to avoid having to crawl through every architecture's sigframe handling first. Fortunately, we have that crap outside of arch/* concentrated in a few places and they are far enough from each other to be dealt with independently. There are interesting interplays with set_fs() stuff, BTW... There will be more than enough crap that _will_ require crawls through arch/* ;-/ I'm plotting that pile right now (basically, what pieces should go into never-rebased stem, so that per-architecture branches had what they need). get_user()/put_user() are part of it, it's just that I don't want to bring arch/*/kernel/signal*.c into the mix ;-/ Re asm-goto - which architectures would be able to use that? IOW, which gcc versions are stable enough in that area? I'd obviously like to see your patches... ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 6:57 ` Al Viro @ 2017-05-13 12:05 ` Adam Borowski 2017-05-13 13:46 ` Brian Gerst 2017-05-13 16:46 ` Al Viro 2017-05-13 16:15 ` Linus Torvalds 1 sibling, 2 replies; 28+ messages in thread From: Adam Borowski @ 2017-05-13 12:05 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: > On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > > But the *first* thing I'd like to do would be to just get rid of > > __get_user/__put_user as a thing. It really does generate nasty code, > > and we might as well just make it do that function call. > > But IMO the first step is not to convert __get_user()/__put_user() to > aliases of get_user()/put_user() - it's to get rid of their infestations > outside of arch/*. They are concentrated in relatively few places, so > we can deal with those individually and then just fucking ban their use > outside of arch/*. That's easy enough to watch for - simple git grep will do, > so anything creeping back will be immediately spotted. As someone from the peanuts gallery, I took a look for __put_user() in my usual haunt, drivers/tty/vt/ * use 1: con_[gs]et_trans_*(): Copies a linear array of 256 bytes/shorts, one by one. The obvious patch has 9 insertions(+), 22 deletions(-). * use 2: con_[gs]et_unimap(): Ditto, up to 65535*2 shorts, also in a nice linear array. * use 3: tioclinux(): Does a __put into a place that was checked only for read. This got me frightened as it initially looked like something that can allow an user to write where they shouldn't. Fortunately, it turns out the first argument to access_ok() is ignored on every single architecture -- why does it even exist then? I imagined it's there for some odd arch that allows writes when in privileged mode, but unlike what the docs say, VERIFY_WRITE is exactly same as VERIFY_READ. Ie, every use in this sample is wrong. I suspect the rest of the kernel should be similar. Meow! -- Don't be racist. White, amber or black, all beers should be judged based solely on their merits. Heck, even if occasionally a cider applies for a beer's job, why not? On the other hand, corpo lager is not a race. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 12:05 ` Adam Borowski @ 2017-05-13 13:46 ` Brian Gerst 2017-05-13 13:46 ` Brian Gerst 2017-05-13 16:46 ` Al Viro 1 sibling, 1 reply; 28+ messages in thread From: Brian Gerst @ 2017-05-13 13:46 UTC (permalink / raw) To: Adam Borowski Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 8:05 AM, Adam Borowski <kilobyte@angband.pl> wrote: > On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: >> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: >> > But the *first* thing I'd like to do would be to just get rid of >> > __get_user/__put_user as a thing. It really does generate nasty code, >> > and we might as well just make it do that function call. >> >> But IMO the first step is not to convert __get_user()/__put_user() to >> aliases of get_user()/put_user() - it's to get rid of their infestations >> outside of arch/*. They are concentrated in relatively few places, so >> we can deal with those individually and then just fucking ban their use >> outside of arch/*. That's easy enough to watch for - simple git grep will do, >> so anything creeping back will be immediately spotted. > > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. The original i386 (no longer supported) ignored the write-protect bit when in supervisor mode. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 13:46 ` Brian Gerst @ 2017-05-13 13:46 ` Brian Gerst 0 siblings, 0 replies; 28+ messages in thread From: Brian Gerst @ 2017-05-13 13:46 UTC (permalink / raw) To: Adam Borowski Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 8:05 AM, Adam Borowski <kilobyte@angband.pl> wrote: > On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: >> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: >> > But the *first* thing I'd like to do would be to just get rid of >> > __get_user/__put_user as a thing. It really does generate nasty code, >> > and we might as well just make it do that function call. >> >> But IMO the first step is not to convert __get_user()/__put_user() to >> aliases of get_user()/put_user() - it's to get rid of their infestations >> outside of arch/*. They are concentrated in relatively few places, so >> we can deal with those individually and then just fucking ban their use >> outside of arch/*. That's easy enough to watch for - simple git grep will do, >> so anything creeping back will be immediately spotted. > > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. The original i386 (no longer supported) ignored the write-protect bit when in supervisor mode. -- Brian Gerst ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 12:05 ` Adam Borowski 2017-05-13 13:46 ` Brian Gerst @ 2017-05-13 16:46 ` Al Viro 1 sibling, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 16:46 UTC (permalink / raw) To: Adam Borowski Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote: > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. It's a remnant of old kludge that never properly worked in the first place. access_ok() should have been called userland_range() - that's all it checks and that's all it *can* check. As it is, each of those __get_user() can bloody well yield -EFAULT. Despite having "passed" access_ok(). Again, the only thing access_ok() checks is that (on architecture with shared address space for userland and kernel) the addresses given are on the userland side. That's _it_ - they can be unmapped, mmapped to broken floppy, whatever; you'll find out when you try to actually copy bytes from from it. What the kludge used to attempt was "let's check that we are not trying to copy into read-only mmapped area - 80386 MMU is fucked in head and won't fault on such stores in ring 0". It had always been racy. Look: thread A: going to copy something to user-supplied address. Do access_ok(). thread A: looks like it's mapped writable, let's go ahead and copy thread A: do something blocking before actually doing __put_user() or __copy_to_user() or whatever it's going to be. thread B: munmap(), mmap() something read-only here. thread A: get to actual __put_user()/__copy_to_user(). 80386: hey, it's ring 0, no need to look at the write protect bit in page tables. What can go wrong, anyway? You can't move any non-static checks to access_ok(). On any architecture. Anything that could change between access_ok() and actual copying can't be checked in access_ok(). As it is, access_ok() has actively misleading calling conventions: * the name implies that having passed access_ok() you don't have to worry about EFAULT * 'direction' argument of that thing reinforces that impression *and* has to be produced by the caller. Most simply pass a constant, which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's amusing), but in some cases it's calculated elsewhere and carefully passed through several levels of call chain. Only to be discarded by access_ok(), of course... > Ie, every use in this sample is wrong. I suspect the rest of the kernel > should be similar. Looking through vt... * con_set_trans_old(): copy_from_user() + loop for doing or with UNI_DIRECT_BASE. Almost certainly will be faster that way - on *any* architecture. * con_get_trans_old(): copy_to_user() would be an obvious optimization. * con_set_trans_new(): copy_from_user(). * con_get_trans_new(): copy_to_user(). * con_set_unimap(): memdup_user() instead of the entire kmalloc_array + loop copying the sucker member by member. With ushort ct you don't need overflow-related logics of kmalloc_array() anyway... * con_get_unimap(): copy_to_user() + put_user() (for uct) * set_selection(): just copy_from_user() into local struct tiocl_selection. * tioclinux(): use put_user(). Yes, you will repeat the same check twice. Once per ioctl(), that involves enough work to make that "recalculate access_ok() once for no reason" non-issue on any architecture. * vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into struct vt_consize size; or something like that and use copy_from_user() And AFAICS you can lose each and every access_ok() call in there. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 6:57 ` Al Viro 2017-05-13 12:05 ` Adam Borowski @ 2017-05-13 16:15 ` Linus Torvalds 2017-05-13 16:17 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 16:15 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Fri, May 12, 2017 at 11:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > those, about 70% are in arch/, mostly in sigframe-related code. Sure. And they can be trivially converted, and none of them should care at all. > IOW, we have > * most of users in arch/* (heavily dominated by signal-related code, > both loads and stores). Those need careful massage; maybe unsafe-based > solution, maybe something else, but it's obviously per-architecture work > and these paths are sensitive. Why are they sensitive? Why not just do this: git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 :^arch/x86/include/asm/uaccess.h | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' which converts all the x86 uses in one go. Anybody who *relies* on not checking the address_limit is so broken as to be not even funny. And anything that is so performance-sensitive that anybody can even measure the effect of the above we can convert later. Sure, do it in pieces (eg each architecture separately, then "drivers", then "networking", then whatever, until all done), just so that *if* something actually depends on semantics (and that really shouldn't be the case), we have at least some information from a bisect. But I don't see the excuse for not just doing it. If nobody notices, it's an obvious improvement. And if somebody *does* notice, we know how to do it properly with unsafe_xyz_user(), because "__xyz_user()" most definitely isn't it. An example of something that *should* be converted is "csum_partial_copy_from_user()", but it really does need to use "user_access_begin()" and friends, because right now it's using stac/clac for each 16-bit word. That's *expensive*, and it's expensive whether you use __get_user() or get_user(). Adding x86 people just to see how they react to the above "patch". For me, in my fairly minimal personal config, the above cleanup patch shrinks the text size of the resulting kernel by 1.7kB. Yes, most of it is the out-of-line code, but still.. Interestingly, the signal handling code didn't change at all. It looks like only the 32-bit code uses __put_user() for the frame setup. The regular code uses put_user_try/put_user_catch, which is the x86-specific early try at the unsafe versions, but it would actually be improved by using "unsafe_put_user()" and my patch to make that use "asm goto". Linus PS. That "patch" depends on modern git - with older versions of git you need to do the path negation with ":!pattern", and then you need to quote it too since '!' is special for shell. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 16:15 ` Linus Torvalds @ 2017-05-13 16:17 ` Linus Torvalds 2017-05-13 17:00 ` Al Viro 2017-05-14 18:13 ` Ingo Molnar 2 siblings, 0 replies; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 16:17 UTC (permalink / raw) To: Al Viro, the arch/x86 maintainers Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org Oops. *Really* adding the x86 guys now. Blush. Linus On Sat, May 13, 2017 at 9:15 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> First, some stats: there's a thousand-odd callers of __get_user(). Out of >> those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at all. > >> IOW, we have >> * most of users in arch/* (heavily dominated by signal-related code, >> both loads and stores). Those need careful massage; maybe unsafe-based >> solution, maybe something else, but it's obviously per-architecture work >> and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. > > An example of something that *should* be converted is > "csum_partial_copy_from_user()", but it really does need to use > "user_access_begin()" and friends, because right now it's using > stac/clac for each 16-bit word. That's *expensive*, and it's expensive > whether you use __get_user() or get_user(). > > Adding x86 people just to see how they react to the above "patch". > > For me, in my fairly minimal personal config, the above cleanup patch > shrinks the text size of the resulting kernel by 1.7kB. Yes, most of > it is the out-of-line code, but still.. > > Interestingly, the signal handling code didn't change at all. It looks > like only the 32-bit code uses __put_user() for the frame setup. The > regular code uses put_user_try/put_user_catch, which is the > x86-specific early try at the unsafe versions, but it would actually > be improved by using "unsafe_put_user()" and my patch to make that use > "asm goto". > > Linus > > PS. That "patch" depends on modern git - with older versions of git > you need to do the path negation with ":!pattern", and then you need > to quote it too since '!' is special for shell. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 16:15 ` Linus Torvalds 2017-05-13 16:17 ` Linus Torvalds @ 2017-05-13 17:00 ` Al Viro 2017-05-13 17:12 ` Al Viro 2017-05-13 17:18 ` Linus Torvalds 2017-05-14 18:13 ` Ingo Molnar 2 siblings, 2 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 17:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? Because there we do have tons of back-to-back __get_user() (on sigreturn()) or __put_user() (on signal setup). > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. x86 actually tends to use get_user_ex/put_user_ex instead. > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. Except for open-coded probe_kernel_read() somewhere in arch/*; I have not read through those 700+ callers, so I don't know if there's any such. Sure, those won't be performance-sensitive. > And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. I think we ought to actually look through those places - there are few enough of them (outside of arch/, that is) and stac/clac overhead is not the only problem they tend to have. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 17:00 ` Al Viro @ 2017-05-13 17:12 ` Al Viro 2017-05-13 17:18 ` Linus Torvalds 1 sibling, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 17:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 06:00:56PM +0100, Al Viro wrote: > > But I don't see the excuse for not just doing it. If nobody notices, > > it's an obvious improvement. And if somebody *does* notice, we know > > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > > most definitely isn't it. > > I think we ought to actually look through those places - there are few > enough of them (outside of arch/, that is) and stac/clac overhead is > not the only problem they tend to have. PS: just to make it clear - I do _not_ propose to keep that shit around indefinitely; I want __get_user()/__put_user() gone by the end of that work. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 17:00 ` Al Viro 2017-05-13 17:12 ` Al Viro @ 2017-05-13 17:18 ` Linus Torvalds 2017-05-13 18:04 ` Al Viro 1 sibling, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 17:18 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 10:00 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: >> > IOW, we have >> > * most of users in arch/* (heavily dominated by signal-related code, >> > both loads and stores). Those need careful massage; maybe unsafe-based >> > solution, maybe something else, but it's obviously per-architecture work >> > and these paths are sensitive. >> >> Why are they sensitive? > > Because there we do have tons of back-to-back __get_user() (on sigreturn()) > or __put_user() (on signal setup). It doesn't actually matter. Regular "put_user()" doesn't generate noticeably worse code. It's not a normal function call, it's still an inline asm - so it basically generates the exact same code as __xyz_user(), except it's a call to a fixed copy of the code. So the only difference tends to be (a) the extra call/ret instructions, possibly frame pointers (b) fixed registers (c) the added addr_limit checks but (a) is cheap, and (b) isn't a big deal since with "asm volatile" you can't re-order those things against each other anyway. So (b) ends up being approximately the cost of "one extra lea instruction" for the address generation that would otherwise be in the load/store instruction. And (c) is actually a reason *for* doing it. We've had bugs due to people not getting it right. So there really is basically no performance downside. Even with consecutive uses. The code that uses a function call might even be smaller (ie the 1.7kB savings isn't necessarily all in the out-of-line exception handling cases: the stac/clac instructions are six bytes for each use, so it more than makes up for the call instruction). > x86 actually tends to use get_user_ex/put_user_ex instead. Yes. Because anybody who *really* cared about performance will have converted already. The real cost has been stac/clac for a few years now. So we pretty much know that none of the remaining users are really all that critical. Certainly not "five cycles" kind of critical. >> Anybody who *relies* on not checking the address_limit is so broken as >> to be not even funny. > > Except for open-coded probe_kernel_read() somewhere in arch/*; I have not > read through those 700+ callers, so I don't know if there's any such. .. and those need to be fixed. But I'm not seeing the point in wasting valuable human time in reading through thousands of cases, when we can just automate it and see if anything breaks. Because it will break in a *safe* direction. You'll get an error from bad uses that shouldn't have worked to begin with. And some of the existing cases are just disgusting. There's no excuse for compat code or for ioctl to use the "__" versions. That seems to be the bulk of those uses too. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 17:18 ` Linus Torvalds @ 2017-05-13 18:04 ` Al Viro 2017-05-13 18:26 ` Al Viro 2017-05-13 19:00 ` Linus Torvalds 0 siblings, 2 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 18:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 10:18:22AM -0700, Linus Torvalds wrote: > > x86 actually tends to use get_user_ex/put_user_ex instead. > > Yes. Because anybody who *really* cared about performance will have > converted already. The real cost has been stac/clac for a few years > now. On x86. Which has (not counting arch/x86/um, which definitely needs a careful look - there __..._user() is the same as ..._user() and costly as hell) all of 12 callers of __get_user() and 31 callers of __put_user(). More than a half of the latter - in cp_stat64() and I would at least try and see if "convert on stack, then copy_to_user" is better for that one. Other than that, all we have is: arch/x86/entry/common.c:372: __get_user(*(u32 *)®s->bp, arch/x86/ia32/ia32_signal.c:124: if (__get_user(set.sig[0], &frame->sc.oldmask) arch/x86/ia32/ia32_signal.c:275: if (__put_user(sig, &frame->sig)) arch/x86/include/asm/xen/page.h:86: return __put_user(val, (unsigned long __user *)addr); arch/x86/include/asm/xen/page.h:91: return __get_user(*val, (unsigned long __user *)addr); arch/x86/kernel/fpu/signal.c:100: err |= __get_user(xfeatures, (__u32 *)&x->header.xfeatures); arch/x86/kernel/fpu/signal.c:115: err |= __put_user(xfeatures, (__u32 *)&x->header.xfeatures); arch/x86/kernel/fpu/signal.c:46: if (__get_user(magic2, (__u32 __user *)(fpstate + fx_sw->xstate_size)) arch/x86/kernel/fpu/signal.c:66: __put_user(xsave->i387.swd, &fp->status) || arch/x86/kernel/fpu/signal.c:67: __put_user(X86_FXSR_MAGIC, &fp->magic)) arch/x86/kernel/fpu/signal.c:72: if (__get_user(swd, &fp->swd) || __put_user(swd, &fp->status)) arch/x86/kernel/fpu/signal.c:72: if (__get_user(swd, &fp->swd) || __put_user(swd, &fp->status)) arch/x86/kernel/fpu/signal.c:93: err |= __put_user(FP_XSTATE_MAGIC2, arch/x86/kernel/ptrace.c:1043: if (__put_user(word, u++)) arch/x86/kernel/ptrace.c:1070: ret = __get_user(word, u++); arch/x86/kernel/ptrace.c:472: if (__put_user(getreg(target, pos), u++)) arch/x86/kernel/ptrace.c:499: ret = __get_user(word, u++); arch/x86/kernel/signal.c:326: if (__put_user(sig, &frame->sig)) arch/x86/kernel/signal.c:347: err |= __put_user(restorer, &frame->pretcode); arch/x86/kernel/signal.c:356: err |= __put_user(*((u64 *)&retcode), (u64 *)frame->retcode); arch/x86/kernel/signal.c:613: if (__get_user(set.sig[0], &frame->sc.oldmask) || (_NSIG_WORDS > 1 arch/x86/kernel/signal.c:647: if (__get_user(uc_flags, &frame->uc.uc_flags)) arch/x86/kernel/signal.c:870: if (__get_user(uc_flags, &frame->uc.uc_flags)) arch/x86/lib/csum-wrappers_64.c:103: *errp = __put_user(val16, (__u16 __user *)dst); arch/x86/lib/csum-wrappers_64.c:45: if (__get_user(val16, (const __u16 __user *)src)) A bunch of strays in signal.c (extending ..._ex use might be a good idea) xen_safe_{write,read}_ulong() (might very well break from adding access_ok() - or be security holes; I'm not familiar enough with that code to tell) and csum_partial_copy_{to,from}_user() - those would need to extend stac/clac pairs already there and switch to unsafe_... Plus several loops in ptrace - might or might not be sensitive, no idea. Plus do_fast_syscall_32(). Might be sensitive, Andy would be the guy to talk to, AFAICS. > And some of the existing cases are just disgusting. There's no excuse > for compat code or for ioctl to use the "__" versions. That seems to > be the bulk of those uses too. Sure. My point is, this stuff needs looking at. Even this quick look in arch/x86 has shown several fairly different classes of that stuff, probably needing different approaches. And that - on an architecture that had tons of TLC around signal delivery; I'm not saying that result is optimal (asm-goto sounds potentially useful there), but it had a lot of attention given to it... ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 18:04 ` Al Viro @ 2017-05-13 18:26 ` Al Viro 2017-05-13 19:11 ` Al Viro 2017-05-13 19:00 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-13 18:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 07:04:13PM +0100, Al Viro wrote: > My point is, this stuff needs looking at. Even this quick look in arch/x86 > has shown several fairly different classes of that stuff, probably needing > different approaches. And that - on an architecture that had tons of TLC > around signal delivery; I'm not saying that result is optimal (asm-goto sounds > potentially useful there), but it had a lot of attention given to it... BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers total. Distribution by files: 35 arch/alpha/kernel/osf_sys.c 92 arch/alpha/kernel/signal.c 1 arch/alpha/kernel/traps.c 4 arch/alpha/lib/csum_partial_copy.c 1 arch/alpha/mm/fault.c Distribution by functions: 1 osf_getdomainname() [1] 2 osf_sigstack() 2 get_tv32() 2 put_tv32() 4 get_it32() 4 put_it32() 2 osf_select() 18 osf_wait4() [2] 6 osf_sigaction() 34 restore_sigcontext() 1 do_sigreturn() 42 setup_sigcontext() 3 setup_frame() 6 setup_rt_frame() 1 dik_show_code() [3] 2 csum_partial_cfu_aligned() 2 csum_partial_cfu_src_aligned() 1 do_page_fault() [4] [1] insane, BTW - should be strnlen() + copy_to_user(); should report -EFAULT on failure, while we are at it. [2] with fairly disgusting use of set_fs() in the mix. [3] would break with get_user() - it's oopser fetching code to printk. [4] this: /* As of EV6, a load into $31/$f31 is a prefetch, and never faults (or is suppressed by the PALcode). Support that for older CPUs by ignoring such an instruction. */ if (cause == 0) { unsigned int insn; __get_user(insn, (unsigned int __user *)regs->pc); if ((insn >> 21 & 0x1f) == 0x1f && /* ldq ldl ldt lds ldg ldf ldwu ldbu */ (1ul << (insn >> 26) & 0x30f00001400ul)) { regs->pc += 4; return; } } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 18:26 ` Al Viro @ 2017-05-13 19:11 ` Al Viro 2017-05-13 19:34 ` Al Viro 0 siblings, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-13 19:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 07:26:56PM +0100, Al Viro wrote: > BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers > total. Distribution by files: > 35 arch/alpha/kernel/osf_sys.c > 92 arch/alpha/kernel/signal.c > 1 arch/alpha/kernel/traps.c > 4 arch/alpha/lib/csum_partial_copy.c > 1 arch/alpha/mm/fault.c Another large pile is on sparc (208 callers). Except that on sparc64 access_ok() is constant 1, which reduces it to 42 callers. 3 arch/sparc/kernel/ptrace_32.c (all in arch_ptrace()) 31 arch/sparc/kernel/signal_32.c (5 in do_sigreturn(), 6 in do_rt_sigreturn(), 8 in setup_frame(), 11 in setup_rt_frame(), 1 in do_sys_sigstack()) 7 arch/sparc/kernel/sigutil_32.c (2 in save_fpu_state(), 2 in restore_fpu_state(), 2 in save_rwin_state(), 1 in restore_rwin_state() 1 arch/sparc/mm/fault_32.c (in compute_si_addr()) The last one ignores -EFAULT, BTW - not sure what should it have done in that case, though. It's calculation of ->si_addr on SIGSEGV and SIGBUS in case of data access SIGSEGV or SIGBUS; it wants to peek into insn to figure out the effective address. arch_ptrace() one is zeroing several fields in userland struct fps for PTRACE_GETFPREGS. Could've been put_user() (or clear_user(), for that matter); we'd just done copy_regset_to_user() on adjacent addresses, so the lack of access_ok() is not a security hole there). Everything else is in sigframe handling... Other large piles are on mips, ppc and itanic. parisc is next, but there access_ok() is constant 1 as well. Same for s390 and m68k. nios2 and unicore32 are a bit under 80 callers each and the next are tile (~60), sh (~50) and then it drops off fast. It's not impossible to review; the problem is in figuring out which codepaths are hot enough to worry about them. And I'm even more convinced that bulk search-and-replace is the right approach here; we probably _can_ get rid of that shit this cycle (or, at least, reduce its use to very few places in arch/*), but that'll require some cooperation from architecture maintainers. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 19:11 ` Al Viro @ 2017-05-13 19:34 ` Al Viro 0 siblings, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 19:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 08:11:42PM +0100, Al Viro wrote: > It's not impossible to review; the problem is in figuring out which codepaths > are hot enough to worry about them. And I'm even more convinced that > bulk search-and-replace is the right approach here; we probably _can_ get ^- not apologies for sloppy editing... > rid of that shit this cycle (or, at least, reduce its use to very few places > in arch/*), but that'll require some cooperation from architecture maintainers. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 18:04 ` Al Viro 2017-05-13 18:26 ` Al Viro @ 2017-05-13 19:00 ` Linus Torvalds 2017-05-13 19:17 ` Al Viro ` (2 more replies) 1 sibling, 3 replies; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 19:00 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 11:04 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > My point is, this stuff needs looking at. No. You don't have a point. I've tried to explain that there's no performance difference, and there is no way in hell that the current "__" versions are better. So what's the point in looking? All you are ever going to come up with is theoretical "this might" cases. The only sane forward is to just get rid of them. At that point, you *may* end up actually noticing something, but if you do, it's not a "you might" issue. There's literally no upside to this "needs looking at". There are only downsides. And one major downside of your "needs looking at". is this one: From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 24 Mar 2015 10:42:18 -0700 > > So I'd suggest we should just do a wholesale replacement of > __copy_to/from_user() with the non-underlined cases. Then, we could > switch insividual ones back - with reasoning of why they matter, and > with pointers to how it does access_ok() two lines before. > > We should probably even consider looking at __get_user/__put_user(). > Few of them are actually performance-critical. Look at that date. It's over two years ago. In the intervening two years, how many of those conversions have happened? Here's a hint: it's a very very round number. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 19:00 ` Linus Torvalds @ 2017-05-13 19:17 ` Al Viro 2017-05-13 19:56 ` Al Viro 2017-05-13 20:37 ` Al Viro 2 siblings, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 19:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > On Sat, May 13, 2017 at 11:04 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > My point is, this stuff needs looking at. > > No. > > You don't have a point. I've tried to explain that there's no > performance difference, and there is no way in hell that the current > "__" versions are better. > > So what's the point in looking? All you are ever going to come up with > is theoretical "this might" cases. Umm... Just during this subthread - a couple of likely breakages in xen (which would need looking into, right?) and "oopsen on alpha would stop actually printing code", which is better to spot early. The first one might be theoretical, but it's worth giving heads-up to xen folks; I'm fairly sure that this spot will break. The second is not theoretical at all - it *will* break. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 19:00 ` Linus Torvalds 2017-05-13 19:17 ` Al Viro @ 2017-05-13 19:56 ` Al Viro 2017-05-13 20:08 ` Al Viro 2017-05-13 20:37 ` Al Viro 2 siblings, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-13 19:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? > > Here's a hint: it's a very very round number. FWIW, just this cycle (this one I remembered off-hand, there might be more): commit edb88cef0570914375d461107759cf0d6d677ed5 Author: Arnd Bergmann <arnd@arndb.de> Date: Sat Apr 22 00:02:31 2017 +0200 scsi: pmcraid: use normal copy_from_user As pointed out by Al Viro for my previous series, the driver has no need to call access_ok() and __copy_from_user()/__copy_to_user(). Changing it to regular copy_from_user()/copy_to_user() simplifies the code without any real downsides, making it less error-prone at best. This patch by itself also addresses the warning about the access_ok() macro on MIPS, but both fixes improve the code, so ideally we apply them both. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 19:56 ` Al Viro @ 2017-05-13 20:08 ` Al Viro 2017-05-13 20:32 ` Geert Uytterhoeven 0 siblings, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-13 20:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > FWIW, just this cycle (this one I remembered off-hand, there might be > more): And looking through my queue (will be pushed to -next as soon as -rc1 goes out): commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Apr 20 15:47:34 2017 -0400 spidev: quit messing with access_ok() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 9e2e099baf8c..8dd22de5e3b5 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -254,10 +254,6 @@ static int spidev_message(struct spidev_data *spidev, goto done; } k_tmp->rx_buf = rx_buf; - if (!access_ok(VERIFY_WRITE, (u8 __user *) - (uintptr_t) u_tmp->rx_buf, - u_tmp->len)) - goto done; rx_buf += k_tmp->len; } if (u_tmp->tx_buf) { @@ -305,7 +301,7 @@ static int spidev_message(struct spidev_data *spidev, rx_buf = spidev->rx_buffer; for (n = n_xfers, u_tmp = u_xfers; n; n--, u_tmp++) { if (u_tmp->rx_buf) { - if (__copy_to_user((u8 __user *) + if (copy_to_user((u8 __user *) (uintptr_t) u_tmp->rx_buf, rx_buf, u_tmp->len)) { status = -EFAULT; @@ -325,8 +321,7 @@ static struct spi_ioc_transfer * spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, unsigned *n_ioc) { - struct spi_ioc_transfer *ioc; - u32 tmp; + u32 size; /* Check type, command number and direction */ if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC @@ -334,22 +329,15 @@ spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, || _IOC_DIR(cmd) != _IOC_WRITE) return ERR_PTR(-ENOTTY); - tmp = _IOC_SIZE(cmd); + size = _IOC_SIZE(cmd); if ((tmp % sizeof(struct spi_ioc_transfer)) != 0) return ERR_PTR(-EINVAL); - *n_ioc = tmp / sizeof(struct spi_ioc_transfer); + *n_ioc = size / sizeof(struct spi_ioc_transfer); if (*n_ioc == 0) return NULL; /* copy into scratch area */ - ioc = kmalloc(tmp, GFP_KERNEL); - if (!ioc) - return ERR_PTR(-ENOMEM); - if (__copy_from_user(ioc, u_ioc, tmp)) { - kfree(ioc); - return ERR_PTR(-EFAULT); - } - return ioc; + return memdup_user(u_ioc, size); } static long @@ -367,19 +355,6 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC) return -ENOTTY; - /* Check access direction once here; don't repeat below. - * IOC_DIR is from the user perspective, while access_ok is - * from the kernel perspective; so they look reversed. - */ - if (_IOC_DIR(cmd) & _IOC_READ) - err = !access_ok(VERIFY_WRITE, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err == 0 && _IOC_DIR(cmd) & _IOC_WRITE) - err = !access_ok(VERIFY_READ, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err) - return -EFAULT; - /* guard against device removal before, or while, * we issue this ioctl. */ @@ -402,31 +377,31 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) switch (cmd) { /* read requests */ case SPI_IOC_RD_MODE: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u8 __user *)arg); break; case SPI_IOC_RD_MODE32: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u32 __user *)arg); break; case SPI_IOC_RD_LSB_FIRST: - retval = __put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, + retval = put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, (__u8 __user *)arg); break; case SPI_IOC_RD_BITS_PER_WORD: - retval = __put_user(spi->bits_per_word, (__u8 __user *)arg); + retval = put_user(spi->bits_per_word, (__u8 __user *)arg); break; case SPI_IOC_RD_MAX_SPEED_HZ: - retval = __put_user(spidev->speed_hz, (__u32 __user *)arg); + retval = put_user(spidev->speed_hz, (__u32 __user *)arg); break; /* write requests */ case SPI_IOC_WR_MODE: case SPI_IOC_WR_MODE32: if (cmd == SPI_IOC_WR_MODE) - retval = __get_user(tmp, (u8 __user *)arg); + retval = get_user(tmp, (u8 __user *)arg); else - retval = __get_user(tmp, (u32 __user *)arg); + retval = get_user(tmp, (u32 __user *)arg); if (retval == 0) { u32 save = spi->mode; @@ -445,7 +420,7 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; case SPI_IOC_WR_LSB_FIRST: - retval = __get_user(tmp, (__u8 __user *)arg); + retval = get_user(tmp, (__u8 __user *)arg); if (retval == 0) { u32 save = spi->mode; @@ -462,7 +437,7 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; case SPI_IOC_WR_BITS_PER_WORD: - retval = __get_user(tmp, (__u8 __user *)arg); + retval = get_user(tmp, (__u8 __user *)arg); if (retval == 0) { u8 save = spi->bits_per_word; @@ -475,7 +450,7 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; case SPI_IOC_WR_MAX_SPEED_HZ: - retval = __get_user(tmp, (__u32 __user *)arg); + retval = get_user(tmp, (__u32 __user *)arg); if (retval == 0) { u32 save = spi->max_speed_hz; @@ -525,8 +500,6 @@ spidev_compat_ioc_message(struct file *filp, unsigned int cmd, struct spi_ioc_transfer *ioc; u_ioc = (struct spi_ioc_transfer __user *) compat_ptr(arg); - if (!access_ok(VERIFY_READ, u_ioc, _IOC_SIZE(cmd))) - return -EFAULT; /* guard against device removal before, or while, * we issue this ioctl. ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 20:08 ` Al Viro @ 2017-05-13 20:32 ` Geert Uytterhoeven 2017-05-13 20:32 ` Geert Uytterhoeven 2017-05-13 20:45 ` Al Viro 0 siblings, 2 replies; 28+ messages in thread From: Geert Uytterhoeven @ 2017-05-13 20:32 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org Hi Al, On Sat, May 13, 2017 at 10:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > >> FWIW, just this cycle (this one I remembered off-hand, there might be >> more): > > And looking through my queue (will be pushed to -next as soon as -rc1 goes > out): > > commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 > Author: Al Viro <viro@zeniv.linux.org.uk> > Date: Thu Apr 20 15:47:34 2017 -0400 > > spidev: quit messing with access_ok() > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c > index 9e2e099baf8c..8dd22de5e3b5 100644 > --- a/drivers/spi/spidev.c > +++ b/drivers/spi/spidev.c You better run that one through linux-spi, to avoid conflicts, cfr. https://patchwork.kernel.org/patch/9714993/ Gr{oetje,eeting}s, Geert ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 20:32 ` Geert Uytterhoeven @ 2017-05-13 20:32 ` Geert Uytterhoeven 2017-05-13 20:45 ` Al Viro 1 sibling, 0 replies; 28+ messages in thread From: Geert Uytterhoeven @ 2017-05-13 20:32 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org Hi Al, On Sat, May 13, 2017 at 10:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > >> FWIW, just this cycle (this one I remembered off-hand, there might be >> more): > > And looking through my queue (will be pushed to -next as soon as -rc1 goes > out): > > commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 > Author: Al Viro <viro@zeniv.linux.org.uk> > Date: Thu Apr 20 15:47:34 2017 -0400 > > spidev: quit messing with access_ok() > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c > index 9e2e099baf8c..8dd22de5e3b5 100644 > --- a/drivers/spi/spidev.c > +++ b/drivers/spi/spidev.c You better run that one through linux-spi, to avoid conflicts, cfr. https://patchwork.kernel.org/patch/9714993/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 20:32 ` Geert Uytterhoeven 2017-05-13 20:32 ` Geert Uytterhoeven @ 2017-05-13 20:45 ` Al Viro 1 sibling, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 20:45 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 10:32:31PM +0200, Geert Uytterhoeven wrote: > You better run that one through linux-spi, to avoid conflicts, cfr. > https://patchwork.kernel.org/patch/9714993/ What I'm going to do is never-rebased #for-spi (well, never after -rc1) merged into #work.uaccess and proposed for pull to linux-spi. The local tree is a mess right now - bloody lot of branches, huge unsorted pile, etc. This was from #unsorted, actually - should've picked the one from #for-spi (a couple of brainos fixed in the version there)... ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 19:00 ` Linus Torvalds 2017-05-13 19:17 ` Al Viro 2017-05-13 19:56 ` Al Viro @ 2017-05-13 20:37 ` Al Viro 2017-05-13 20:52 ` Linus Torvalds 2 siblings, 1 reply; 28+ messages in thread From: Al Viro @ 2017-05-13 20:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > From: Linus Torvalds <torvalds@linux-foundation.org> > Date: Tue, 24 Mar 2015 10:42:18 -0700 > > > > So I'd suggest we should just do a wholesale replacement of > > __copy_to/from_user() with the non-underlined cases. Then, we could > > switch insividual ones back - with reasoning of why they matter, and > > with pointers to how it does access_ok() two lines before. > > > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? Speaking of killing that kind of crap off: there was a question left from the last cycle that hadn't been sorted out. SCTP does this in a couple of places: /* Check the user passed a healthy pointer. */ if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size))) return -EFAULT; /* Alloc space for the address array in kernel memory. */ kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN); if (unlikely(!kaddrs)) return -ENOMEM; if (__copy_from_user(kaddrs, addrs, addrs_size)) { kfree(kaddrs); return -EFAULT; } instead of memdup_user(). Part of the rationale is pretty weak (access_ok() as sanity check to prevent user-triggerable attempts to allocate too much - it still can trivially trigger 2G, so it's not worth much), part is more interesting. Namely, that whining into the syslog shouldn't be that easy to trigger. That's a valid point and it might apply to memdup_user() callers out there. Potential variants: * add an explicit upper bound on the size and turn that into memdup_user() (and check that all memdup_user() callers are bounded). * have memdup_user() itself pass __GFP_NOWARN. * add kvmemdup_user() that would use kvmalloc() (with its callers expected to use kvfree()); see who else might benefit from conversion. Preferences? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 20:37 ` Al Viro @ 2017-05-13 20:52 ` Linus Torvalds 2017-05-13 21:25 ` Al Viro 0 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2017-05-13 20:52 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 1:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > That's a valid point and it might apply to memdup_user() callers out there. > Potential variants: > * add an explicit upper bound on the size and turn that into > memdup_user() (and check that all memdup_user() callers are bounded). > * have memdup_user() itself pass __GFP_NOWARN. > * add kvmemdup_user() that would use kvmalloc() (with its callers > expected to use kvfree()); see who else might benefit from conversion. All of the above sound reasonable. I wouldn't change the existing "memdup_user()" interface itself, but if there really are users that can validly pass in a maxbyte value, why not add a new helper: void *memdup_user_limit(userptr, nmember, nsize, maxsize); and then have #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) or something. I definitely see a couple of memdup_user() people who do that "num*size" multiplication by hand, and it's very easy to get wrong and have an overflow. And for a kvmalloc/kvfree() interface, you *definitely* want that maxsize thing, since it absolutely needs an upper limit. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 20:52 ` Linus Torvalds @ 2017-05-13 21:25 ` Al Viro 0 siblings, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-13 21:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sat, May 13, 2017 at 01:52:29PM -0700, Linus Torvalds wrote: > I wouldn't change the existing "memdup_user()" interface itself, but > if there really are users that can validly pass in a maxbyte value, > why not add a new helper: > > void *memdup_user_limit(userptr, nmember, nsize, maxsize); > > and then have > > #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) > > or something. I definitely see a couple of memdup_user() people who do > that "num*size" multiplication by hand, and it's very easy to get > wrong and have an overflow. > > And for a kvmalloc/kvfree() interface, you *definitely* want that > maxsize thing, since it absolutely needs an upper limit. *nod* Speaking of insanities around open-coded memdup_user()... Enjoy: ias_opt = kmalloc(sizeof(struct irda_ias_set), GFP_ATOMIC); if (ias_opt == NULL) { err = -ENOMEM; goto out; } /* Copy query to the driver. */ if (copy_from_user(ias_opt, optval, optlen)) { Can't have it block, sir, has to be GFP_ATOMIC... Whaddya mean, "what if copy_from_user() blocks?" As far as I can see, that came in circa 2.4.6, when local sturct irda_ias_set got switched to dynamic allocation... ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-13 16:15 ` Linus Torvalds 2017-05-13 16:17 ` Linus Torvalds 2017-05-13 17:00 ` Al Viro @ 2017-05-14 18:13 ` Ingo Molnar 2017-05-14 18:57 ` Al Viro 2 siblings, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2017-05-14 18:13 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Linux Kernel Mailing List, linux-arch@vger.kernel.org * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > > those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at all. > > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. I'd say that the CLAC/STAC addition pretty much killed any argument in favor of "optimized" __get_user() code, so I'd be very happy to see these interfaces gone altogether. So as far as x86 usage goes: Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [git pull] uaccess-related bits of vfs.git 2017-05-14 18:13 ` Ingo Molnar @ 2017-05-14 18:57 ` Al Viro 0 siblings, 0 replies; 28+ messages in thread From: Al Viro @ 2017-05-14 18:57 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch@vger.kernel.org On Sun, May 14, 2017 at 08:13:56PM +0200, Ingo Molnar wrote: > I'd say that the CLAC/STAC addition pretty much killed any argument in favor of > "optimized" __get_user() code, so I'd be very happy to see these interfaces gone > altogether. You and everybody else - these interfaces suck. If anything, we want paired brackets around a series of accesses instead of a single check in front of it. > So as far as x86 usage goes: > > Acked-by: Ingo Molnar <mingo@kernel.org> Umm... Could you elaborate the situation with xen/page.h stuff? I don't see any obvious reasons that would guaratee that addresses passed to __get_user() and __put_user() there would match the set_fs() state. It might very well be true, but it's not obvious from that code... BTW, does anybody have a suggestion regarding a test load that would hit wait4/waitid as hard as possible? I've turned sys_wait4/sys_waitid into long kernel_wait4(pid_t upid, int *stat_addr, int options, struct rusage *ru) and static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop, int options, struct rusage *ru) (with struct waitid_info { pid_t pid; uid_t uid; int status; int why; };), so that all copying to userland is done in sys_wait4() and friends. It seems to survive testing without any noticable slowdowns, but that's just LTP and xfstests - and a bug in my earlier version of that was _not_ caught by the LTP side; xfstests caught it... So any extra tests (both for correctness and timing) would be very much appreciated... ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2017-05-14 18:57 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+55aFww=ZG0wPad3ELg+ibScr0eWuSxvhGLFaF+PO9kfkSkdw@mail.gmail.com>
2017-05-01 3:45 ` [git pull] uaccess-related bits of vfs.git Al Viro
2017-05-13 1:00 ` Linus Torvalds
2017-05-13 6:57 ` Al Viro
2017-05-13 12:05 ` Adam Borowski
2017-05-13 13:46 ` Brian Gerst
2017-05-13 13:46 ` Brian Gerst
2017-05-13 16:46 ` Al Viro
2017-05-13 16:15 ` Linus Torvalds
2017-05-13 16:17 ` Linus Torvalds
2017-05-13 17:00 ` Al Viro
2017-05-13 17:12 ` Al Viro
2017-05-13 17:18 ` Linus Torvalds
2017-05-13 18:04 ` Al Viro
2017-05-13 18:26 ` Al Viro
2017-05-13 19:11 ` Al Viro
2017-05-13 19:34 ` Al Viro
2017-05-13 19:00 ` Linus Torvalds
2017-05-13 19:17 ` Al Viro
2017-05-13 19:56 ` Al Viro
2017-05-13 20:08 ` Al Viro
2017-05-13 20:32 ` Geert Uytterhoeven
2017-05-13 20:32 ` Geert Uytterhoeven
2017-05-13 20:45 ` Al Viro
2017-05-13 20:37 ` Al Viro
2017-05-13 20:52 ` Linus Torvalds
2017-05-13 21:25 ` Al Viro
2017-05-14 18:13 ` Ingo Molnar
2017-05-14 18:57 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).