All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <jhogan@kernel.org>,
	linux-mips <linux-mips@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	sparclinux@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Slaby <jslaby@suse.com>
Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers
Date: Mon, 26 Mar 2018 04:47:50 +0100	[thread overview]
Message-ID: <20180326034750.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180326004017.GA2211@ZenIV.linux.org.uk>

On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote:

> Kinda-sorta part:
> 	* asmlinkage_protect is taken out for now, so m68k has problems.
> 	* syscalls that run out of 6 slots barf violently.  For mips it's
> wrong (there we have 8 slots); for stuff like arm and ppc it's right, but
> it means that things like e.g. compat sync_file_range() should not even
> be compiled on those.  __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably...
> In any case, we *can't* do pt_regs-based wrappers for those syscalls on
> such architectures, so ifdefs around those puppies are probably the right
> thing to do.
> 	* s390 macrology in compat_wrapper.c not even touched; it needs
> a trivial update to keep working (__MAP callbacks take an extra argument,
> unused for those users).
> 	* sys_... and compat_sys_... aliases are unchanged; if we kill
> direct callers, we can trivially rename SyS##name and compat_SyS##name
> to sys##name and compat_sys##name and get rid of aliases.

	* mips n32 and x86 x32 can become an extra source of headache.
That actually applies to any plans of passing struct pt_regs *.  As it
is, e.g. syscall 515 on amd64 is compat_sys_readv().  Dispatched via
this:
        /*
         * NB: Native and x32 syscalls are dispatched from the same
         * table.  The only functional difference is the x32 bit in
         * regs->orig_ax, which changes the behavior of some syscalls.
         */
        if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
                nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
                regs->ax = sys_call_table[nr](
                        regs->di, regs->si, regs->dx,
                        regs->r10, regs->r8, regs->r9);
        }
Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched
via
                nr = array_index_nospec(nr, IA32_NR_syscalls);
                /*
                 * It's possible that a 32-bit syscall implementation
                 * takes a 64-bit parameter but nonetheless assumes that
                 * the high bits are zero.  Make sure we zero-extend all
                 * of the args.
                 */
                regs->ax = ia32_sys_call_table[nr](
                        (unsigned int)regs->bx, (unsigned int)regs->cx,
                        (unsigned int)regs->dx, (unsigned int)regs->si,
                        (unsigned int)regs->di, (unsigned int)regs->bp);
Right now it works - we call the same function, passing it arguments picked
from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one).
But if we switch to passing struct pt_regs * and have the wrapper fetch
regs->{bx,cx,dx}, we have a problem.  It won't work for both entry points.

IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs
and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing
them and arranging them into arguments expected by syscall body.

Linus, Dominik - how do you plan dealing with that fun?  Regardless of the
way we generate the glue, the issue remains.  We can't get the same
struct pt_regs *-taking function for both; we either need to produce
a separate chunk of glue for each compat_sys_... involved (either making
COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE
for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that)
or we need to have the registers-to-slots mapping done in dispatcher...

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <jhogan@kernel.org>,
	linux-mips <linux-mips@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	sparclinux@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Slaby <jslaby@suse.com>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers
Date: Mon, 26 Mar 2018 04:47:50 +0100	[thread overview]
Message-ID: <20180326034750.GN30522@ZenIV.linux.org.uk> (raw)
Message-ID: <20180326034750.ABT0nCSoS6qbo41zAz_0ivCDtV_0U0s1ZT9l5aOiCHc@z> (raw)
In-Reply-To: <20180326004017.GA2211@ZenIV.linux.org.uk>

On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote:

> Kinda-sorta part:
> 	* asmlinkage_protect is taken out for now, so m68k has problems.
> 	* syscalls that run out of 6 slots barf violently.  For mips it's
> wrong (there we have 8 slots); for stuff like arm and ppc it's right, but
> it means that things like e.g. compat sync_file_range() should not even
> be compiled on those.  __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably...
> In any case, we *can't* do pt_regs-based wrappers for those syscalls on
> such architectures, so ifdefs around those puppies are probably the right
> thing to do.
> 	* s390 macrology in compat_wrapper.c not even touched; it needs
> a trivial update to keep working (__MAP callbacks take an extra argument,
> unused for those users).
> 	* sys_... and compat_sys_... aliases are unchanged; if we kill
> direct callers, we can trivially rename SyS##name and compat_SyS##name
> to sys##name and compat_sys##name and get rid of aliases.

	* mips n32 and x86 x32 can become an extra source of headache.
That actually applies to any plans of passing struct pt_regs *.  As it
is, e.g. syscall 515 on amd64 is compat_sys_readv().  Dispatched via
this:
        /*
         * NB: Native and x32 syscalls are dispatched from the same
         * table.  The only functional difference is the x32 bit in
         * regs->orig_ax, which changes the behavior of some syscalls.
         */
        if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
                nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
                regs->ax = sys_call_table[nr](
                        regs->di, regs->si, regs->dx,
                        regs->r10, regs->r8, regs->r9);
        }
Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched
via
                nr = array_index_nospec(nr, IA32_NR_syscalls);
                /*
                 * It's possible that a 32-bit syscall implementation
                 * takes a 64-bit parameter but nonetheless assumes that
                 * the high bits are zero.  Make sure we zero-extend all
                 * of the args.
                 */
                regs->ax = ia32_sys_call_table[nr](
                        (unsigned int)regs->bx, (unsigned int)regs->cx,
                        (unsigned int)regs->dx, (unsigned int)regs->si,
                        (unsigned int)regs->di, (unsigned int)regs->bp);
Right now it works - we call the same function, passing it arguments picked
from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one).
But if we switch to passing struct pt_regs * and have the wrapper fetch
regs->{bx,cx,dx}, we have a problem.  It won't work for both entry points.

IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs
and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing
them and arranging them into arguments expected by syscall body.

Linus, Dominik - how do you plan dealing with that fun?  Regardless of the
way we generate the glue, the issue remains.  We can't get the same
struct pt_regs *-taking function for both; we either need to produce
a separate chunk of glue for each compat_sys_... involved (either making
COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE
for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that)
or we need to have the registers-to-slots mapping done in dispatcher...

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <jhogan@kernel.org>,
	linux-mips <linux-mips@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	sparclinux@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Slaby <jslaby@suse.com>
Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers
Date: Mon, 26 Mar 2018 03:47:50 +0000	[thread overview]
Message-ID: <20180326034750.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180326004017.GA2211@ZenIV.linux.org.uk>

On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote:

> Kinda-sorta part:
> 	* asmlinkage_protect is taken out for now, so m68k has problems.
> 	* syscalls that run out of 6 slots barf violently.  For mips it's
> wrong (there we have 8 slots); for stuff like arm and ppc it's right, but
> it means that things like e.g. compat sync_file_range() should not even
> be compiled on those.  __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably...
> In any case, we *can't* do pt_regs-based wrappers for those syscalls on
> such architectures, so ifdefs around those puppies are probably the right
> thing to do.
> 	* s390 macrology in compat_wrapper.c not even touched; it needs
> a trivial update to keep working (__MAP callbacks take an extra argument,
> unused for those users).
> 	* sys_... and compat_sys_... aliases are unchanged; if we kill
> direct callers, we can trivially rename SyS##name and compat_SyS##name
> to sys##name and compat_sys##name and get rid of aliases.

	* mips n32 and x86 x32 can become an extra source of headache.
That actually applies to any plans of passing struct pt_regs *.  As it
is, e.g. syscall 515 on amd64 is compat_sys_readv().  Dispatched via
this:
        /*
         * NB: Native and x32 syscalls are dispatched from the same
         * table.  The only functional difference is the x32 bit in
         * regs->orig_ax, which changes the behavior of some syscalls.
         */
        if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
                nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
                regs->ax = sys_call_table[nr](
                        regs->di, regs->si, regs->dx,
                        regs->r10, regs->r8, regs->r9);
        }
Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched
via
                nr = array_index_nospec(nr, IA32_NR_syscalls);
                /*
                 * It's possible that a 32-bit syscall implementation
                 * takes a 64-bit parameter but nonetheless assumes that
                 * the high bits are zero.  Make sure we zero-extend all
                 * of the args.
                 */
                regs->ax = ia32_sys_call_table[nr](
                        (unsigned int)regs->bx, (unsigned int)regs->cx,
                        (unsigned int)regs->dx, (unsigned int)regs->si,
                        (unsigned int)regs->di, (unsigned int)regs->bp);
Right now it works - we call the same function, passing it arguments picked
from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one).
But if we switch to passing struct pt_regs * and have the wrapper fetch
regs->{bx,cx,dx}, we have a problem.  It won't work for both entry points.

IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs
and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing
them and arranging them into arguments expected by syscall body.

Linus, Dominik - how do you plan dealing with that fun?  Regardless of the
way we generate the glue, the issue remains.  We can't get the same
struct pt_regs *-taking function for both; we either need to produce
a separate chunk of glue for each compat_sys_... involved (either making
COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE
for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that)
or we need to have the registers-to-slots mapping done in dispatcher...

  reply	other threads:[~2018-03-26  3:47 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-18 16:10 [RFC PATCH 0/6] remove in-kernel syscall invocations (part 3 == compat cruft) Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 1/6] fs: provide a generic compat_sys_fallocate() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 17:49   ` Al Viro
2018-03-18 18:21   ` Linus Torvalds
2018-03-18 18:21     ` Linus Torvalds
2018-03-19  6:29   ` Kevin Easton
2018-03-19  6:29     ` Kevin Easton
2018-03-18 16:10 ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read,write}64() implementations Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 16:10   ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations Dominik Brodowski
2018-03-18 17:40   ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read,write}64() implementations Linus Torvalds
2018-03-18 17:40     ` Linus Torvalds
2018-03-18 17:40     ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations Linus Torvalds
2018-03-18 18:05   ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read,write}64() implementations Al Viro
2018-03-18 18:05     ` Al Viro
2018-03-18 18:05     ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations Al Viro
2018-03-18 16:10 ` [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 17:40   ` Al Viro
2018-03-18 18:06     ` Linus Torvalds
2018-03-18 18:06       ` Linus Torvalds
2018-03-18 18:18       ` Al Viro
2018-03-19  4:23         ` Al Viro
2018-03-19  9:29           ` Ingo Molnar
2018-03-19  9:29             ` Ingo Molnar
2018-03-19  9:29             ` Ingo Molnar
2018-03-19 23:23             ` Al Viro
2018-03-19 23:23               ` Al Viro
2018-03-19 23:23               ` Al Viro
2018-03-20  8:56               ` Dominik Brodowski
2018-03-20  8:56                 ` Dominik Brodowski
2018-03-20  8:56                 ` Dominik Brodowski
2018-03-20  8:59               ` Ingo Molnar
2018-03-20  8:59                 ` Ingo Molnar
2018-03-20  8:59                 ` Ingo Molnar
2018-03-22  0:15               ` Al Viro
2018-03-22  0:15                 ` Al Viro
2018-03-26  0:40                 ` [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers Al Viro
2018-03-26  0:40                   ` Al Viro
2018-03-26  0:40                   ` Al Viro
2018-03-26  3:47                   ` Al Viro [this message]
2018-03-26  3:47                     ` Al Viro
2018-03-26  3:47                     ` Al Viro
2018-03-26  6:15                     ` Linus Torvalds
2018-03-26  6:15                       ` Linus Torvalds
2018-03-26  6:15                       ` Linus Torvalds
2018-03-26  6:20                       ` Linus Torvalds
2018-03-26  6:20                         ` Linus Torvalds
2018-03-26  6:20                         ` Linus Torvalds
2018-03-26  6:44                       ` John Paul Adrian Glaubitz
2018-03-26  6:44                         ` John Paul Adrian Glaubitz
2018-03-26  6:44                         ` John Paul Adrian Glaubitz
2018-03-27  1:03                         ` Linus Torvalds
2018-03-27  1:03                           ` Linus Torvalds
2018-03-27  1:03                           ` Linus Torvalds
2018-03-27  2:37                           ` John Paul Adrian Glaubitz
2018-03-27  2:37                             ` John Paul Adrian Glaubitz
2018-03-27  2:37                             ` John Paul Adrian Glaubitz
2018-03-27  3:40                             ` Linus Torvalds
2018-03-27  3:40                               ` Linus Torvalds
2018-03-27  3:40                               ` Linus Torvalds
2018-03-27  4:58                               ` John Paul Adrian Glaubitz
2018-03-27  4:58                                 ` John Paul Adrian Glaubitz
2018-03-27  4:58                                 ` John Paul Adrian Glaubitz
2018-03-30 10:58                                 ` Ingo Molnar
2018-03-30 10:58                                   ` Ingo Molnar
2018-03-30 10:58                                   ` Ingo Molnar
2018-03-30 15:54                                   ` Adam Borowski
2018-03-30 15:54                                     ` Adam Borowski
2018-03-30 15:54                                     ` Adam Borowski
2018-03-26  6:24                     ` Dominik Brodowski
2018-03-26  6:24                       ` Dominik Brodowski
2018-03-26  6:24                       ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 5/6] x86: use _do_fork() in compat_sys_x86_clone() Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 6/6] x86: remove compat_sys_x86_waitpid() Dominik Brodowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180326034750.GN30522@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jhogan@kernel.org \
    --cc=jslaby@suse.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=ralf@linux-mips.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.