public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <jhogan@kernel.org>,
	linux-mips <linux-mips@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	sparclinux@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Slaby <jslaby@suse.com>
Subject: Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
Date: Mon, 19 Mar 2018 10:29:20 +0100	[thread overview]
Message-ID: <20180319092920.tbh2xwkruegshzqe@gmail.com> (raw)
In-Reply-To: <20180319042300.GW30522@ZenIV.linux.org.uk>


* Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Sun, Mar 18, 2018 at 06:18:48PM +0000, Al Viro wrote:
> 
> > I'd done some digging in that area, will find the notes and post.
> 
> OK, found:

Very nice writeup - IMHO this should go into Documentation/!

> OTOH, consider arm.  There we have
> 	* r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
> of objects used to pass arguments
> 	* 32bit and less - pick the next available slot
> 	* 64bit - skip a slot if we'd already taken an odd number, then use
> the next two slots for lower and upper 32 bits of the argument.
> 
> So our classes take
> simple n-argument:	0 to 6 slots
> WD			4 slots
> DWW			4 slots
> WDW			5 slots
> WWDD			6 slots
> WDWW			5 slots
> WWWD			6 slots
> WWDWW			6 slots
> WDDW			7 slots (!)  Also ****, !!!!, !@#!@#!@#!# and other nice
> and well-deserved comments from arch maintainers, some of them even printable:
> /* It would be nice if people remember that not all the world's an i386
>    when they introduce new system calls */
> SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
>                                  loff_t, offset, loff_t, nbytes)

Such idiosyncratic platform quirks that have an impact on generic code should be 
as self-maintaining as possible: i.e. there should be a build time warning even on 
x86 if someone introduces a new, suboptimally packed system call.

Otherwise we'll have such incidents again and again as new system calls get added.

> [snip the preprocessor horrors - the sketches I've got there are downright obscene]

I still think we should consider creating a generic facility and a tool: which 
would immediately and automatically add new system calls to *every* architecture - 
or which would initially at least check these syscall ABI constraints.

I.e. this would start with a new generic kernel facility that warns about 
suboptimal new system call argument layouts on every architecture, not just on the 
affected ones.

That's a significant undertaking but should be possible to do.

Once such a facility is in place all the existing old mess is still a PITA, but 
should be manageable eventually - as no new mess is added to it.

IMHO that's the only thing that could break the somewhat deadly current dynamic of 
system call mappings mess. Complaining about people not knowing about quirks won't 
help.

One way to implement this would be to put the argument chain types (string) and 
sizes (int) into a special debug section which isn't included in the final kernel 
image but which can be checked at link time.

For example this attempt at creating a new system call:

  SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)

... would translate into something like:

	.name = "moron", .pattern = "WWW", .type = "int",    .size = 4,
	.name = NULL,                      .type = "loff_t", .size = 8,
	.name = NULL,                      .type = "size_t", .size = 4,
	.name = NULL,                      .type = NULL,     .size = 0,     /* end of parameter list */

i.e. "WDW". The build-time constraint checker could then warn about:

  # error: System call "moron" uses invalid 'WWW' argument mapping for a 'WDW' sequence
  #        please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' instead

Each architecture can provide its own syscall parameter checking logic. Both 
'stack boundary' and parameter packing rules would be straightforward to express 
if we had such a data structure.

Also note that this tool could also check for optimum packing, i.e. if the new 
system call is defined as:

  SYSCALL_DEFINE3_WDW(moron, int, fd, loff_t, offset, size_t, count)

... would translate to something like:

	.name = "moron", .pattern = "WDW", .type = "int",    .size = 4,
	.name = NULL,                      .type = "loff_t", .size = 8,
	.name = NULL,                      .type = "size_t", .size = 4,
	.name = NULL,                      .type = NULL,     .size = 0,     /* end of parameter list */

where the tool would print out this error:

  # error: System call "moron" uses suboptimal 'WDW' argument mapping instead of 'WWD'

there would be a whitelist of existing system calls that are already using an 
suboptimal argument order - but the warnings/errors would trigger for all new 
system calls.

But adding non-straight-mapped system calls would be the exception in any case.

Such tooling could also do other things, such as limit the C types used for system 
call defines to a well-chosen set of ABI-safe types, such as:

      3  key_t
      3  uint32_t
      4  aio_context_t
      4  mqd_t
      4  timer_t
     10  clockid_t
     10  gid_t
     10  loff_t
     10  long
     10  old_gid_t
     10  old_uid_t
     10  umode_t
     11  uid_t
     31  pid_t
     34  size_t
     69  unsigned int
    130  unsigned long
    226  int

This would also allow us some cleanups as well, such as dropping the pointless 
'const' from arithmetic types in syscall definitions for example.

etc.

Basically this tool would be a secondary parser of the syscall arguments, with 
most of the parsing and type sizing difficulties solved by the C parser already.

I think this problem could be much more sanely solved via annotations and a bit of 
tooling, than trying to trick CPP into doing this for us (which won't really work 
in any case).

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <jhogan@kernel.org>,
	linux-mips <linux-mips@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	sparclinux@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Slaby <jslaby@suse.com>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
Date: Mon, 19 Mar 2018 10:29:20 +0100	[thread overview]
Message-ID: <20180319092920.tbh2xwkruegshzqe@gmail.com> (raw)
Message-ID: <20180319092920.uQ8bJt-Qg1SV41EN9c_tEBqaNxRFUPGfepGVpNbQs4s@z> (raw)
In-Reply-To: <20180319042300.GW30522@ZenIV.linux.org.uk>


* Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Sun, Mar 18, 2018 at 06:18:48PM +0000, Al Viro wrote:
> 
> > I'd done some digging in that area, will find the notes and post.
> 
> OK, found:

Very nice writeup - IMHO this should go into Documentation/!

> OTOH, consider arm.  There we have
> 	* r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
> of objects used to pass arguments
> 	* 32bit and less - pick the next available slot
> 	* 64bit - skip a slot if we'd already taken an odd number, then use
> the next two slots for lower and upper 32 bits of the argument.
> 
> So our classes take
> simple n-argument:	0 to 6 slots
> WD			4 slots
> DWW			4 slots
> WDW			5 slots
> WWDD			6 slots
> WDWW			5 slots
> WWWD			6 slots
> WWDWW			6 slots
> WDDW			7 slots (!)  Also ****, !!!!, !@#!@#!@#!# and other nice
> and well-deserved comments from arch maintainers, some of them even printable:
> /* It would be nice if people remember that not all the world's an i386
>    when they introduce new system calls */
> SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
>                                  loff_t, offset, loff_t, nbytes)

Such idiosyncratic platform quirks that have an impact on generic code should be 
as self-maintaining as possible: i.e. there should be a build time warning even on 
x86 if someone introduces a new, suboptimally packed system call.

Otherwise we'll have such incidents again and again as new system calls get added.

> [snip the preprocessor horrors - the sketches I've got there are downright obscene]

I still think we should consider creating a generic facility and a tool: which 
would immediately and automatically add new system calls to *every* architecture - 
or which would initially at least check these syscall ABI constraints.

I.e. this would start with a new generic kernel facility that warns about 
suboptimal new system call argument layouts on every architecture, not just on the 
affected ones.

That's a significant undertaking but should be possible to do.

Once such a facility is in place all the existing old mess is still a PITA, but 
should be manageable eventually - as no new mess is added to it.

IMHO that's the only thing that could break the somewhat deadly current dynamic of 
system call mappings mess. Complaining about people not knowing about quirks won't 
help.

One way to implement this would be to put the argument chain types (string) and 
sizes (int) into a special debug section which isn't included in the final kernel 
image but which can be checked at link time.

For example this attempt at creating a new system call:

  SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)

... would translate into something like:

	.name = "moron", .pattern = "WWW", .type = "int",    .size = 4,
	.name = NULL,                      .type = "loff_t", .size = 8,
	.name = NULL,                      .type = "size_t", .size = 4,
	.name = NULL,                      .type = NULL,     .size = 0,     /* end of parameter list */

i.e. "WDW". The build-time constraint checker could then warn about:

  # error: System call "moron" uses invalid 'WWW' argument mapping for a 'WDW' sequence
  #        please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' instead

Each architecture can provide its own syscall parameter checking logic. Both 
'stack boundary' and parameter packing rules would be straightforward to express 
if we had such a data structure.

Also note that this tool could also check for optimum packing, i.e. if the new 
system call is defined as:

  SYSCALL_DEFINE3_WDW(moron, int, fd, loff_t, offset, size_t, count)

... would translate to something like:

	.name = "moron", .pattern = "WDW", .type = "int",    .size = 4,
	.name = NULL,                      .type = "loff_t", .size = 8,
	.name = NULL,                      .type = "size_t", .size = 4,
	.name = NULL,                      .type = NULL,     .size = 0,     /* end of parameter list */

where the tool would print out this error:

  # error: System call "moron" uses suboptimal 'WDW' argument mapping instead of 'WWD'

there would be a whitelist of existing system calls that are already using an 
suboptimal argument order - but the warnings/errors would trigger for all new 
system calls.

But adding non-straight-mapped system calls would be the exception in any case.

Such tooling could also do other things, such as limit the C types used for system 
call defines to a well-chosen set of ABI-safe types, such as:

      3  key_t
      3  uint32_t
      4  aio_context_t
      4  mqd_t
      4  timer_t
     10  clockid_t
     10  gid_t
     10  loff_t
     10  long
     10  old_gid_t
     10  old_uid_t
     10  umode_t
     11  uid_t
     31  pid_t
     34  size_t
     69  unsigned int
    130  unsigned long
    226  int

This would also allow us some cleanups as well, such as dropping the pointless 
'const' from arithmetic types in syscall definitions for example.

etc.

Basically this tool would be a secondary parser of the syscall arguments, with 
most of the parsing and type sizing difficulties solved by the C parser already.

I think this problem could be much more sanely solved via annotations and a bit of 
tooling, than trying to trick CPP into doing this for us (which won't really work 
in any case).

Thanks,

	Ingo

  parent reply	other threads:[~2018-03-19  9:29 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-18 16:10 [RFC PATCH 0/6] remove in-kernel syscall invocations (part 3 == compat cruft) Dominik Brodowski
2018-03-18 16:10 ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 1/6] fs: provide a generic compat_sys_fallocate() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 17:49   ` Al Viro
2018-03-18 17:49     ` Al Viro
2018-03-18 18:21   ` Linus Torvalds
2018-03-18 18:21     ` Linus Torvalds
2018-03-19  6:29   ` Kevin Easton
2018-03-19  6:29     ` Kevin Easton
2018-03-18 16:10 ` [RFC PATCH 3/6] fs: provide generic compat_sys_p{read,write}64() implementations Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 17:40   ` Linus Torvalds
2018-03-18 17:40     ` Linus Torvalds
2018-03-18 18:05   ` Al Viro
2018-03-18 18:05     ` Al Viro
2018-03-18 16:10 ` [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 17:40   ` Al Viro
2018-03-18 17:40     ` Al Viro
2018-03-18 18:06     ` Linus Torvalds
2018-03-18 18:06       ` Linus Torvalds
2018-03-18 18:18       ` Al Viro
2018-03-18 18:18         ` Al Viro
2018-03-19  4:23         ` Al Viro
2018-03-19  4:23           ` Al Viro
2018-03-19  9:29           ` Ingo Molnar [this message]
2018-03-19  9:29             ` Ingo Molnar
2018-03-19 23:23             ` Al Viro
2018-03-19 23:23               ` Al Viro
2018-03-20  8:56               ` Dominik Brodowski
2018-03-20  8:56                 ` Dominik Brodowski
2018-03-20  8:59               ` Ingo Molnar
2018-03-20  8:59                 ` Ingo Molnar
2018-03-22  0:15               ` Al Viro
2018-03-22  0:15                 ` Al Viro
2018-03-26  0:40                 ` [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers Al Viro
2018-03-26  0:40                   ` Al Viro
2018-03-26  3:47                   ` Al Viro
2018-03-26  3:47                     ` Al Viro
2018-03-26  6:15                     ` Linus Torvalds
2018-03-26  6:15                       ` Linus Torvalds
2018-03-26  6:20                       ` Linus Torvalds
2018-03-26  6:20                         ` Linus Torvalds
2018-03-26  6:44                       ` John Paul Adrian Glaubitz
2018-03-26  6:44                         ` John Paul Adrian Glaubitz
2018-03-27  1:03                         ` Linus Torvalds
2018-03-27  1:03                           ` Linus Torvalds
2018-03-27  2:37                           ` John Paul Adrian Glaubitz
2018-03-27  2:37                             ` John Paul Adrian Glaubitz
2018-03-27  3:40                             ` Linus Torvalds
2018-03-27  3:40                               ` Linus Torvalds
2018-03-27  4:58                               ` John Paul Adrian Glaubitz
2018-03-27  4:58                                 ` John Paul Adrian Glaubitz
2018-03-30 10:58                                 ` Ingo Molnar
2018-03-30 10:58                                   ` Ingo Molnar
2018-03-30 15:54                                   ` Adam Borowski
2018-03-30 15:54                                     ` Adam Borowski
2018-03-26  6:24                     ` Dominik Brodowski
2018-03-26  6:24                       ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 5/6] x86: use _do_fork() in compat_sys_x86_clone() Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski
2018-03-18 16:10 ` [RFC PATCH 6/6] x86: remove compat_sys_x86_waitpid() Dominik Brodowski
2018-03-18 16:10   ` Dominik Brodowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180319092920.tbh2xwkruegshzqe@gmail.com \
    --to=mingo@kernel.org \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jhogan@kernel.org \
    --cc=jslaby@suse.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=ralf@linux-mips.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox