All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ulrich Drepper <drepper@redhat.com>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [patch 7/8] fdmap v2 - implement sys_socket2
Date: Thu, 07 Jun 2007 22:47:54 +0200	[thread overview]
Message-ID: <46686EFA.1030302@cosmosbay.com> (raw)
In-Reply-To: <alpine.LFD.0.98.0706071240150.4205@woody.linux-foundation.org>

Linus Torvalds a écrit :
> 
> On Wed, 6 Jun 2007, Alan Cox wrote:
>> This still all seems really really ugly.
> 
> I do agree that it's ugly. That many new system calls with new prototypes 
> and new glibc support is just nasty.
> 
> So I don't think this is viable.
> 
>> Is there anything wrong with throwing all these extra cases out and 
>> replacing the entire lot with
>>
>> 	prctl(PR_SPARSEFD, 1);
>>
>> to turn on sparse fd allocation for a process ?
> 
> Yes. We really don't want to set global state that affects any random 
> library thing that runs after it.
> 
> HOWEVER.
> 
> I think we could introduce a *single* new system call, which does 
> basically a "run the specified system call with the following flags".
> 
> The flags would literally be local to that *one* system call, and one of 
> the flags could be the semantics for FD allocation.
> 
> [ There are a few other cases where such an indirect system call might be 
>   interesting: temporarily unmasking a signal for just the duration of a 
>   single system call is the reason for things like 'pselect()' and 
>   'sigtimedwait()', and similarly the 'access()' system call is basically 
>   a "temporarily run with my real UID, rather than the effective UID 
>   thing, and quite frankly, it might be perfectly valid to want to do an 
>   'open()' with that rule too, because "access()+open()" is racy! ]
> 
> So maybe the proper solution to this mess is *not* to add fifteen new 
> system calls, but to add *one*, which takes a "flags" value to set certain 
> things:
> 
>  - FD_NONSEQ: "allocate any new fd's nonsequentially"
>  - FD_CLOEXEC: "allocate any new fd's as close-on-exec"
> 
>    Rationale: allow people to open any fd with the flags set a certain 
>    way, regardless of the system call.
> 
>  - LOOKUP_REALUID/GID: "make the fsuid/fsgid temporarily be my _real_ 
>    uid/gid for this single system call"
> 
>    Rationale: avoid the inevitable races that the fundamentally broken 
>    "access()" system call has! 
> 
>  - LOOKUP_NOFOLLOW: "do not follow any symlink at the end of the path"
>    LOOKUP_NOATIME: "don't update atime"
> 
>    Rationale: "open()" already has O_NOFOLLOW/O_NOATIME, and "stat()" has 
>    "lstat()", but a lot of other path-handlign system calls cannot do the 
>    same thing.
> 
>  - LOOKUP_NOSYMLINKS: "do not allow any symlink traversal at *all*"
>    LOOKUP_NODOTDOT: "don't traverse a .. upwards"
>    LOOKUP_NOMOUNT: "don't traverse a mount point"
> 
>    Rationale: for security-conscious things, quite often it's not the 
>    _last_ symlink you want to avoid, it's any symlinks at all, and 
>    sometimes it's things like guaranteeing that you stay in a certain 
>    directory structure - which means not going outside with ".." or some 
>    magic mount-point.
> 
>    People currently literally end up traversing things one path component 
>    at a time, doing a "lstat()" on it, and checking. Even if 99% 
>    of the time you probably don't actually ever hit the problem case. 
>    (Eg Apache at some point used to do something like this if you asked 
>    for security, I'm not sure if it still does).
> 
>  - signal mask for temporarily blocking/unblocking during a single system 
>    call.
> 
>  - something else? The above are things that I know I _personally_ have 
>    occasionally cursed not having had.
> 
> What do people think about that kind of approach? It has the advantage 
> that it does *not* involve multiple kernel entries (just a single entry to 
> a small wrapper that sets some process state temporarily), and that it 
> doesn't have any sticky state that might confuse a library (or a signal 
> handler: even if you end up doing "prctrl(ON) ; syscall(); prctrl(OFF)", a 
> signal handler that happens in between the prctrl's would see unexpected 
> behaviour).
> 
> It has the disadvantage that it would need some per-architecture setup to 
> load the actual real arguments from memory: the system call would probably 
> look something like
> 
> 	syscall_indirect(unsigned long flags, sigset_t *, 
> 			 int syscall, unsigned long args[6]);
> 
> and the rule would be that it would just load the six system call 
> registers from that "args[]" array. Always load the full six registers, to 
> make it simpler and faster, and not having any confusion or ever needing 
> any wrappers that depend on the number of system calls.

This is a nice idea, but 32/64 compat code is going to hate it :)

syscall_indirect() would be writen in assembly for each arch, since there is 
no generic syscall table. Thats really a lot of work, especially if we want to 
mess with signal mask, umask ...



  reply	other threads:[~2007-06-07 20:48 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-06 22:30 [patch 7/8] fdmap v2 - implement sys_socket2 Davide Libenzi
2007-06-06 22:44 ` David Miller
2007-06-06 22:52   ` Davide Libenzi
2007-06-06 22:57     ` David Miller
2007-06-06 22:57   ` Ulrich Drepper
2007-06-06 23:02     ` David Miller
2007-06-06 22:59 ` Alan Cox
2007-06-06 22:58   ` Ulrich Drepper
2007-06-06 23:04   ` Davide Libenzi
2007-06-06 23:08     ` David Miller
2007-06-06 23:19     ` Alan Cox
2007-06-06 23:22       ` Ulrich Drepper
2007-06-07 10:04         ` Alan Cox
2007-06-07 11:59           ` Kyle Moffett
2007-06-07 13:12             ` Eric Dumazet
2007-06-07 15:51               ` Davide Libenzi
2007-06-07 19:49               ` Davide Libenzi
2007-06-07 20:02                 ` Ulrich Drepper
2007-06-07 20:05                 ` Eric Dumazet
2007-06-07 20:18                   ` Ulrich Drepper
2007-06-07 21:44                     ` Davide Libenzi
2007-06-07 22:03                       ` Ulrich Drepper
2007-06-07 22:40                         ` Davide Libenzi
2007-06-08 12:07                           ` Theodore Tso
2007-06-08 13:01                             ` Alan Cox
2007-06-08 18:11                               ` Davide Libenzi
2007-06-08 18:26                                 ` Alan Cox
2007-06-08 18:43                                   ` Ulrich Drepper
2007-06-08 18:46                                     ` Al Viro
2007-06-08 18:56                                       ` Ulrich Drepper
2007-06-08 19:07                                         ` Linus Torvalds
2007-06-08 19:21                                           ` Davide Libenzi
2007-06-09  0:03                                             ` Linus Torvalds
2007-06-09  0:13                                               ` Davide Libenzi
2007-06-09  0:36                                               ` Al Viro
2007-06-09  1:19                                                 ` Ulrich Drepper
2007-06-09  1:41                                                   ` Al Viro
2007-06-09  2:10                                                     ` Ulrich Drepper
2007-06-09 15:15                                                       ` Al Viro
2007-06-09 16:26                                                         ` Ulrich Drepper
2007-06-09 16:54                                                           ` Al Viro
2007-06-09 17:04                                                             ` Davide Libenzi
2007-06-09 17:08                                                               ` Davide Libenzi
2007-06-09 17:08                                                             ` Ulrich Drepper
2007-06-09 17:24                                                               ` Al Viro
2007-06-09 19:27                                                                 ` Kyle Moffett
2007-06-09 20:06                                                                   ` Al Viro
2007-06-09 20:21                                                                     ` Linus Torvalds
2007-06-09 20:31                                                                       ` Davide Libenzi
2007-06-09 21:41                                                                         ` Matt Mackall
2007-06-09 22:12                                                                           ` Davide Libenzi
2007-06-09 20:49                                                                       ` Al Viro
2007-06-09 21:55                                                                         ` Matt Mackall
2007-06-09 23:33                                                                         ` Linus Torvalds
2007-06-10  3:35                                                                           ` Davide Libenzi
2007-06-10  3:49                                                                             ` Davide Libenzi
2007-06-10  3:19                                                                       ` Al Viro
2007-06-10  3:48                                                                         ` Linus Torvalds
2007-06-10  4:00                                                                           ` Al Viro
2007-06-10  4:03                                                                             ` Linus Torvalds
2007-06-10  4:06                                                                               ` Al Viro
2007-06-10  4:45                                                                           ` dean gaudet
2007-06-10  5:06                                                                             ` Linus Torvalds
2007-06-10  5:46                                                                               ` Al Viro
2007-06-10 17:23                                                                                 ` Linus Torvalds
2007-06-10  6:35                                                                           ` Kari Hurtta
2007-06-10 15:21                                                                             ` Alan Cox
2007-06-10  9:14                                                                       ` Eric Dumazet
2007-06-10 15:16                                                                         ` Alan Cox
2007-06-10 18:19                                                                         ` Linus Torvalds
2007-06-10  2:40                                                                   ` Al Viro
2007-06-08 19:34                                         ` Alan Cox
2007-06-08 19:30                                     ` Alan Cox
2007-06-08 19:37                                       ` Davide Libenzi
2007-06-08 19:48                                         ` Alan Cox
2007-06-08 19:51                                           ` Davide Libenzi
2007-06-08 21:24                                             ` Alan Cox
2007-06-08 21:59                                               ` Davide Libenzi
2007-06-08 22:28                                                 ` Alan Cox
2007-06-08 22:38                                                   ` Davide Libenzi
2007-06-11  8:24                                       ` Xavier Bestel
2007-06-08 19:22                                   ` Davide Libenzi
2007-06-09  5:41                                 ` Paul Mackerras
2007-06-09 14:38                                   ` Kyle Moffett
2007-06-10  6:48                                     ` Paul Mackerras
2007-06-10 15:56                                       ` Davide Libenzi
2007-06-10 19:16                                       ` Davide Libenzi
2007-06-09 17:00                                   ` Davide Libenzi
2007-06-10  6:26                                     ` Paul Mackerras
2007-06-10  7:10                                       ` William Lee Irwin III
2007-06-10 15:52                                       ` Davide Libenzi
2007-06-08 18:07                             ` Davide Libenzi
2007-06-08 18:35                             ` Linus Torvalds
2007-06-07 21:57                   ` Davide Libenzi
2007-06-08  4:38                     ` Eric Dumazet
2007-06-08  5:20                       ` Davide Libenzi
2007-06-07 14:25           ` Ulrich Drepper
2007-06-07 17:56             ` Eric Dumazet
2007-06-07 18:03               ` Davide Libenzi
2007-06-07 18:57                 ` Eric Dumazet
2007-06-07 18:26               ` Ulrich Drepper
2007-06-07 18:39                 ` Davide Libenzi
2007-06-07 18:56                   ` Ulrich Drepper
2007-06-07 19:12                     ` Davide Libenzi
2007-06-07 20:03                   ` Andrew Morton
2007-06-08  2:55                     ` Ulrich Drepper
2007-06-08  5:16                       ` Davide Libenzi
2007-06-06 23:29       ` Davide Libenzi
2007-06-07 10:06         ` Alan Cox
2007-06-07 10:45           ` Eric Dumazet
2007-06-07 11:27             ` Alan Cox
2007-06-07 15:41           ` Davide Libenzi
2007-06-07 20:10   ` Linus Torvalds
2007-06-07 20:47     ` Eric Dumazet [this message]
2007-06-07 21:08       ` Linus Torvalds
2007-06-07 21:41         ` Davide Libenzi
2007-06-07 20:59     ` Guillaume Chazarain
2007-06-07 21:06       ` Guillaume Chazarain
2007-06-07 21:31     ` Ulrich Drepper
2007-06-07 22:22     ` Davide Libenzi
2007-06-07 23:42       ` Linus Torvalds
2007-06-08  0:04         ` Davide Libenzi
2007-06-08  0:59     ` Matt Mackall
2007-06-08  2:25       ` Linus Torvalds
2007-06-08 15:56     ` Jeff Dike
2007-06-07  0:29 ` Arnd Bergmann
2007-06-07  0:33   ` Davide Libenzi
  -- strict thread matches above, loose matches on Subject: below --
2007-06-06 22:30 [patch 1/8] fdmap v2 - fdmap core Davide Libenzi
2007-06-07  6:54 ` Eric Dumazet
2007-06-07  7:10   ` Davide Libenzi
2007-06-07 10:39     ` [patch 7/8] fdmap v2 - implement sys_socket2 Eric Dumazet
2007-06-07 15:42       ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46686EFA.1030302@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davidel@xmailserver.org \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.